Document Details

Document Type : Thesis 
Document Title :
A HYBRID APPROACH TO DEVELOP A MODEL FOR HANDLING MISSING DATA IN ELECTRONIC HEALTH RECORDS
نهج مهجن لتطوير نموذج لمعالجة البيانات المفقودة في السجلات الصحية الالكترونية
 
Subject : Faculty of Computing and Information Technology 
Document Language : Arabic 
Abstract : Electronic Health Records ‘EHR’ systems are crucial to the operations of medical practices. They are used by physicians and decision-makers to track all aspects of patients care. Chronic diseases patients usually have extensive data collections that store a huge amount of various data to track patients’ medical history. Therefore, having missing values in EHR systems is quite the most substantial obstacle that faces researchers and medical staff. Recently, machine learning ‘ML’ plays an important role in different sectors in addition to computing technology, to improve the provided services and enhance the quality of data. Therefore, ML techniques have been used to impute clinical missing data. However, it has been found that various methods were used for the imputation process, but each of them tackles particular issues. This thesis aims to propose a hybrid recursive imputation method to impute missing data in chronic disease patients’ EHR (diabetes patients EHR as a case study). This method referred to as ‘C2I’ and it stands for (Clustering, Imitation, Imputation), C2I has been examined with a retrospective diabetes dataset from King Abdulaziz University Hospital and the most related variables to diabetes patients. Then, the missingness patterns and percentages of the diabetes dataset are imitated in the complete records. Later, the imitated missing values are imputed using C2I. The performance of C2I was measured with three different techniques: (i) calculate imputation error measures, (ii) compare the performance with the commonly used method Multiple Imputation ‘MI’, and (iii) apply C2I on the complete dataset. In C2I, supervised and unsupervised ML methods were used. Interestingly, C2I produces better positive results by 19% of error rates over MI. Furthermore, C2I decreases the dispersion in the diabetes dataset. The results of this study indicate that a high missing percent of a related variable causes a high error rate. Further studies, which take the correlated variables into account, will need to be undertaken with new methods like deep learning and neural networks. 
Supervisor : Dr. Arwa Abdullah Jamjoom 
Thesis Type : Master Thesis 
Publishing Year : 1442 AH
2020 AD
 
Added Date : Monday, August 31, 2020 

Researchers

Researcher Name (Arabic)Researcher Name (English)Researcher TypeDr GradeEmail
مرام عبدالله باعطيةBaatya, Maram AbdullahResearcherMaster 

Files

File NameTypeDescription
 46712.pdf pdf 

Back To Researches Page