A Hybrid Model for Predicting Heart Disease Using DNN and ML Algorithms

Ayad Salamah, Tharaa; Abuzir, Prof. Yousef

الرئيسية
→
Graduate Studies - الدراسات العليا
→
ماجستير تكنولوجيا المعلومات Master’s in Information Technology
→
عرض المادة

dc.contributor.author	Ayad Salamah, Tharaa
dc.contributor.author	Abuzir, Prof. Yousef
dc.date.accessioned	2026-04-27T09:24:39Z
dc.date.available	2026-04-27T09:24:39Z
dc.date.issued	2026-02-06
dc.identifier.uri	https://dspace.qou.edu/handle/194/3069
dc.description.abstract	Cardiovascular diseases are among the most prominent health challenges facing medical systems worldwide, with high mortality rates associated with this disease recorded annually. This is largely due to the difficulty of early detection of these diseases, as diagnosis often relies on clinical findings that appear in advanced stages of the disease. In this context, the need to employ artificial intelligence (AI) techniques, particularly machine learning and deep learning, emerges to develop accurate predictive models that contribute to the early identification of patients at risk of heart disease. This study aims to develop a hybrid model that combines traditional machine learning algorithms—Support Vector Machines (SVMs), Random Forests, XGBoost, LightGBM, and Logistic Regression—with Deep Neural Networks (DNNs), which are used to extract advanced representative features from data. This study uses a reliable medical dataset obtained from Kaggle consisting of four databases: Cleveland, Hungary, Switzerland, and Long Beach. It contains 76 features, including the predicted feature, but all published experiments report using a subset of 14 of them. The target field indicates the presence of heart disease in the patient. It is an integer value where 0 = no disease and 1 = disease. This study introduces an innovative multi-stage methodology for heart disease prediction, leveraging feature augmentation to enhance performance on a moderately sized dataset of 1,025 patient records. The process begins with preprocessing clinical data, comprising 13 features (e.g., age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate, exercise induced angina, oldpeak, slope, number of major vessels, thal ), followed by initial training of traditional machine learning models (e.g., SVM, Random Forest , XGBoost, LightGBM , Logistic Regression) using these features. A Deep Neural Network (DNN) then extracts 32 high-level features from its penultimate layer, capturing complex, nonlinear patterns not evident in the original data. These DNN features are combined with the 13 clinical features to form a 45-feature set, significantly enriching the input space. A set of comprehensive evaluation indicators was used, including accuracy, confusion matrix, precision, recall, F1 coefficient, and the area under the ROC-AUC curve, to provide a comprehensive evaluation of the models before and after feature combination. The Random Forest model achieved the highest performance among classification models on the original features, with an accuracy rate of 97.80%, a high recall rate of 99.31%, and a predictive accuracy of 96.64%. In a previous study, Smith et al. (2018) employed traditional machine learning algorithms The experimental results showed that the Support Vector Machine achieved an accuracy of 85.2%. The results also showed a significant improvement in the performance of all classification algorithms after combining the original features with those extracted by deep neural networks (DNNs). This combination resulted in increased classification accuracy across all key indicators. The SVM algorithm achieved the highest AUC value of 99.90%, demonstrating its high ability to accurately distinguish between classes. The Random Forest, XGBoost, and LightGBM algorithms also achieved identical results in overall accuracy (99.63%) and other indicators. The results showed that combining the DNN-extracted features with the original features led to a significant improvement denotes consistency across models, not the numerical accuracy alone. in the prediction accuracy of all machine learning algorithms used, reflecting the effectiveness of the hybrid approach in enhancing predictive performance, especially in light of the challenges associated with class imbalance and small dataset sizes. This study confirms that combining machine learning and deep learning techniques provides a promising path for developing intelligent diagnostic tools capable of supporting medical decision-making, reducing false alarm rates, and contributing to improving early treatment opportunities, especially in resource-constrained medical settings.	en
dc.publisher	qou	en
dc.subject	heart disease prediction	en
dc.subject	hybrid predictive model	en
dc.subject	deep neural networks	en
dc.subject	machine learning classifiers	en
dc.subject	feature fusion	en
dc.subject	cardiovascular risk assessment	en
dc.subject	Evaluation Metrics	en
dc.title	A Hybrid Model for Predicting Heart Disease Using DNN and ML Algorithms	en
dc.type	Thesis	en
dc.description.arAbstract	تُعد أمراض القلب والأوعية الدموية من أبرز التحديات الصحية التي تواجه النظم الطبية في جميع أنحاء العالم، حيث تُسجل معدلات عالية من الوفيات المرتبطة بهذا المرض سنويًا. ويعود ذلك إلى حد كبير إلى صعوبة الكشف المبكر عن هذه الأمراض، إذ يعتمد التشخيص غالبًا على النتائج السريرية التي تظهر في مراحل متقدمة من المرض. في هذا السياق، تبرز الحاجة إلى توظيف تقنيات الذكاء الاصطناعي، وخاصةً التعلم الآلي والتعلم العميق، لتطوير نماذج تنبؤية دقيقة تُسهم في التحديد المبكر للمرضى المعرضين لخطر الإصابة بأمراض القلب. تهدف هذه الدراسة إلى تطوير نموذج هجين يجمع بين خوارزميات التعلم الآلي التقليدية - آلات المتجهات الداعمة (SVMs)، وغابات القرار العشوائية، وXGBoost، وLightGBM، والانحدار اللوجستي - مع الشبكات العصبية العميقة (DNNs)، والتي تُستخدم لاستخراج السمات التمثيلية المتقدمة من البيانات. تستخدم هذه الدراسة مجموعة بيانات طبية موثوقة تم الحصول عليها من Kaggle وتتكون من أربع قواعد بيانات: Cleveland, Hungary, Switzerland, and Long Beach. وتحتوي على 76 سمة، بما في ذلك السمة المتوقعة، ولكن جميع التجارب المنشورة تُفيد باستخدام مجموعة فرعية من 14 منها. يشير حقل "الهدف" إلى وجود مرض قلبي لدى المريض. وهو قيمة صحيحة 0 = لا مرض و1 = مرض. تُقدم هذه الدراسة منهجية مبتكرة متعددة المراحل للتنبؤ بأمراض القلب، مستفيدةً من تعزيز الميزات لتحسين الأداء على مجموعة بيانات متوسطة الحجم تضم 1025 سجلًا للمرضى. تبدأ العملية بالمعالجة المسبقة للبيانات السريرية، والتي تشمل 13 ميزة (مثل: العمر، الجنس، نوع ألم الصدر، ضغط الدم أثناء الراحة، نسبة الكوليسترول في الدم، نسبة السكر في الدم أثناء الصيام، نتائج تخطيط القلب أثناء الراحة، معدل ضربات القلب الأقصى، الذبحة الصدرية الناتجة عن التمرين، ذروة العمر، المنحدر، عدد الأوعية الدموية الرئيسية، thal)، يليها تدريب أولي لنماذج التعلم الآلي التقليدية (مثل: SVM، الغابة العشوائية، XGBoost، LightGBM، الانحدار اللوجستي) باستخدام هذه الميزات. بعد ذلك، تستخرج شبكة عصبية عميقة (DNN) 32 ميزة عالية المستوى من طبقتها قبل الأخيرة، ملتقطةً أنماطًا معقدة وغير خطية غير واضحة في البيانات الأصلية. تم دمج ميزات DNN هذه مع الميزات السريرية الـ 13 لتشكيل مجموعة مكونة من 45 ميزة، مما أدى إلى إثراء مساحة الإدخال بشكل كبير. تم استخدام مجموعة من مؤشرات التقييم الشاملة، بما في ذلك الدقة ومصفوفة الارتباك والدقة والتذكر ومعامل F1 والمساحة الواقعة أسفل منحنى ROC-AUC، لتوفير تقييم شامل للنماذج قبل وبعد دمج الميزات. حقق نموذج الغابة العشوائية أعلى أداء بين نماذج التصنيف على الميزات الأصلية، بمعدل دقة 97.80٪ ومعدل تذكر مرتفع 99.31٪ ودقة تنبؤية 96.64٪. أظهرت النتائج أيضًا تحسنًا كبيرًا في أداء جميع خوارزميات التصنيف بعد دمج الميزات الأصلية مع تلك المستخرجة بواسطة الشبكات العصبية العميقة (DNNs). أدى هذا المزيج إلى زيادة دقة التصنيف عبر جميع المؤشرات الرئيسية. حققت خوارزمية SVM أعلى قيمة AUC بنسبة 99.90٪، مما يدل على قدرتها العالية على التمييز بدقة بين الفئات. حققت خوارزميات الغابة العشوائية، وXGBoost، وLightGBM نتائج متطابقة في الدقة الإجمالية (99.63%) ومؤشرات أخرى. أظهرت النتائج أن دمج السمات المستخرجة من الشبكة العصبية العميقة مع السمات الأصلية أدى إلى تحسن كبير في دقة التنبؤ لجميع خوارزميات التعلم الآلي المستخدمة، مما يعكس فعالية النهج الهجين في تحسين الأداء التنبؤي، لا سيما في ضوء التحديات المرتبطة باختلال توازن الفئات وصغر حجم مجموعات البيانات. تؤكد هذه الدراسة أن دمج تقنيات التعلم الآلي والتعلم العميق يوفر مسارًا واعدًا لتطوير أدوات تشخيص ذكية قادرة على دعم اتخاذ القرارات الطبية، وتقليل معدلات الإنذارات الكاذبة، والمساهمة في تحسين فرص العلاج المبكر، لا سيما في البيئات الطبية محدودة الموارد.	en
dc.contributor.arAuthor	ثراء, سلامة
dc.contributor.arAuthor	أ.د. يوسف, أبو زر
dc.arTitle	نموذج هجين للتنبؤ بأمراض القلب باستخدام الشبكات العصبية العميقة وخوارزميات تعلم الالة	en