Diagnosis Retinal Disease by using Deep Learning Models

Deep learning approaches have shown to be useful in assisting physicians in making decisions about cancer, heart disease, degenerative brain disorders, and eye disease. In this work, a deep learning model was proposed for the diagnosis of retinal diseases utilizing optical coherence tomography X-ray pictures (OCT) to identify four states of retina disease. The proposed model consists of three different convolutional neural network (CNN) models to be used in this approach and compare the results of each one with others. The models were named respectively as 1FE1C, 2FE2C, and 3FE3C according to the design complexity. The concept uses deep CNN to learn a feature hierarchy from pixels to layers of classification retinal diseases. On the test set, the classifier accuracy is 65.60 % for a (1FE1C) Model, 86.81% for (2FE2C) Model, 96.00% for (3FE3C) Model, and 88.62% for (VGG16) Pre-Train Model. The third model (3FE3C) achieves the best accuracy, although the VGG16 model comes close. Also, this model improves the results of previous works and paves the way for the use of state-of-the-art technology of neural network in retinal disease diagnoses. The suggested strategy may have a bearing on the development of a tool for automatically identifying retinal disease.


I. INTRODUCTION
Around 250 million individuals worldwide suffer moderate to severe visual impairment or blindness. Due to population growth and aging, the figure is anticipated to double by 2040. This puts substantial pressure on the healthcare system [1], [2], [3]. A layered tissue lining the inside surface of the eye is the retina. It converts incoming light into the potential for action (Neural Signal) that is further processed in the brain's visual centres. The retina is distinctive as blood vessels can be observed directly noninvasively in vivo [2]. Fundus fluorescein angiography (FFA) is an imaging technique that can reveal details about the retinal vasculature. This data will assist ophthalmologists in better comprehending fundus lesions, micro angiomas, and capillary non-perfusion regions, which are essential for diagnosing and treating AMD and PM [3], [4]. Automatic classification of ophthalmological and cardiovascular disorders by retinal image processing has become a proven telemedicine procedure. Manual segmentation was used in the years under analysis. But it was tedious, time-consuming, inconvenient, labourintensive, observer-driven, and required technical expertise, whereas computer-aided detection of retinal anomalies is cost-effective, feasible, objective, and does not require efficiently skilled clinicians to grade the images [4], [5], and

Al-Rafidain Journal of Computer Sciences and Mathematics (RJCM)
www.csmj.mosuljournals.com [13][14][15]. Deep Learning has made AI extremely useful in ophthalmic diagnostics. To manage grid-based data like images, Deep Learning uses Convolutional Neural Networks (CNN). A Deep Learning system mimics the way human knowledge recognizes visual characteristics that separate normal from abnormal groups [6 -12] and for more details about the design (CNN) with medical image analysis in [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34]. Some of the most important related works in this field were in 2020 [8] proposed a deep convolutional neural network (CNN) structure for successful diagnosis and classification into Normal, DMD, and DME also the same year [3] proposed a self-supervised feature learning method by effectively exploiting multi-modal data for retinal disease diagnosis and in [9] A model based on deep learning (DL) architecture, consisting of a densely connected neural network (DenseNet) and a trainable end-to-end recurrent neural network (RNN), is proposed. To obtain a lowerdimensional OCT representation, the method starts by sampling multiple 2D images from an OCT volume. While in 2019 [10] proposed a deep learning approach in detecting retinal diseases from optical coherence tomography (OCT) Xray images which can identify three conditions of the retina. More in [11] Color fundus photography, OCT, and OCT-A scans was achieved. Seventy-five participants were employed and separated into three cohorts: young healthy (YH), old healthy (OH), and patients with middle dry AMD and in [12] proposed the deep neural networks based on Vgg16 pretrained network model. Before that in 2018 [14] proposed a patch-based semi-supervised learning approach and assess performance on the classification of diabetic retinopathy from funduscopic images. While in [15] used a U-Net architecture to achieve vessel segmentation and then a GoogLeNet to make disease classification. The previous works that were compared with this work are found in [3,6,8,9, and 13] The employment of Deep Learning models that automatically learn important features for specific tasks, rather than developing visual characteristics manually, has gradually exacerbated the problem of autonomous medical image analysis performance. Providing insights and interpretation of the model's predictions, however, remains a challenge [5,6,9-15, 29, and 31]. This work describes a Deep Learning model which is able to detect medically interpretable information in relevant images from a volume and classify diabetes-related retinal diseases use (OCT) images. The aim of this work is to detect and classify the retinal disease as image represented, and the work of this system includes three (CNN) structures with different learning algorithm optimizers and learning rates also one pre-train CNN (VGG16 [21]). Also evaluate the successful model with the high rate of performance accuracy, loss function, and confusing matrix measurements. Finally, the best-trained model with the batter accuracy is used in the suggested medical assisting tool for the diagnosis of retinal diseases to ensure the quality of the diagnosis system.
The employment of Deep Learning models that automatically learn important features for specific tasks, rather than developing visual characteristics manually, has gradually exacerbated the problem of autonomous medical image analysis performance. Providing insights and interpretation of the model's predictions, however, remains a challenge [5,6,9-15, 29, and 31]. This work describes a Deep Learning model which is able to detect medically interpretable information in relevant images from a volume and classify diabetes-related retinal diseases use (OCT) images. The aim of this work is to detect and classify the retinal disease as image represented, and the work of this system includes three (CNN) structures with different learning algorithm optimizers and learning rates also one pre-train CNN (VGG16 [21]). Also evaluate the successful model with the high rate of performance accuracy, loss function, and confusing matrix measurements. Finally, the best-trained model with the batter accuracy is used in the suggested medical assisting tool for the diagnosis of retinal diseases to ensure the quality of the diagnosis system.

II. RELATED WORK
This section presents the suggested retinal disease detection and classification method based on image processing and Deep Learning. Figure 1 shows the basic block diagram of the retinal disease detection and categorization system. Furthermore, the system architecture with the recommended algorithms in the various phases of the proposed approach is described.

II. I. Dataset
This is a crucial part of algorithm design: finding the appropriate dataset. In this case study, the dataset was collected from adult patients in Shiley Eye Institute at the University of California San Diego, the California Retinal Research Foundation, Medical Center Ophthalmology Associates, and the Shanghai First People's Hospital. Between July 1, 2013, and March 1, 2017, a retrospective study was conducted on (OCT) images captured from the four aforementioned institutions. The dataset has subfolders for each of the image categories, including the train, test, and validation folders (NORMAL, CNV, DME, DRUSEN) as shown in Figure 2. 84,495 x-ray photographs (JPEG). This technique, which is known as Ocular (OCT), is used to collect cross-sectional images of patients' retinas while they are still alive. (OCT) scans are done on about 30 million people each year, and this procedure takes time [35]. .

Figure 2. Representative (OCT) Images Of The Different Types Of Retinal Disease
In its present state, the database includes imbalanced data in terms of the number of samples accessible for each category. For this, 49,088 samples were utilized to ensure that the total number of samples is equal, thus reducing the complexity of the training determinants caused by the imbalance in the number of classes trained. (OCT) uses low coherence interferometry principles to gather data from various layers of the retina. Figure 3 shows a 3D volume with a cross-sectional area and a number of scans to meet the stated needs. When determining medical conclusions, the physical examination findings must be taken into consideration.

Figure 3. A Healthy Subject's EFI And (OCT) Volume
Including Cross-Sectional B-scans Disease

II. II. CNN Architectures
Three (CNN) models were proposed in this work for the retinal disease classifier using (OCT) image. They are referred to as Model 1 (One Feature Extraction Layer And One Classifier Layer -1FE1C), Model 2 (Two Feature Extraction Layer And Two Classifier Layer -2FE2C), and Model 3 (Three Feature Extraction Layer And Three Classifier Layer -3FE3C). The concept employs deep CNN to create a distinct hierarchy between pixels and classifier layers. Figures 4, 5, and 6 illustrate the architectural design of the presence of retinal disease. In general, almost all designs include input and output layers. The input layer is made up of N neurons, where N is the scale of the training data. As the length and width of the pictures vary depending on the size of the image, all images were down sampled by 128 pixels before being fed as data. And N is equal to 128*128 = 16384. However, since the architectures are designed to cope with a 4-class classification issue, all designs have a 4-neuron output layer. Furthermore, the previously trained (VGG16) network was constructed under identical conditions, and accuracy, speed, and complexity were compared to the models presented in this work.
Regarding the number of computational layers, the first model (1FE1C) is less complex than the other two suggested models. The features extraction layer and the classification layer are the only two layers in the system, as shown in Figure 4. The features extraction layer consisted of a single Convolutional layer that included just 32 features maps, which was achieved by combining a relu activation function with the Max-pooling filter 3x3 and utilizing a relu activation function with the Max-pooling filter 3x3. It was taken into account that the picture files were greyscaled images with a resolution of 128x128 pixels. However, only 50 nodes with a relu activation function are present in the classification layer. The second model (2FE2C) is regarded to be of medium complexity since it is more difficult than the first but less complicated than the third. By adding a second layer to the feature extraction layers in this model, the emphasis was on the feature extraction layer in the Convolutional Neural Network. As indicated in Figure 5, the model has four layers: the first two for feature extraction and the final two layers for classification. The main features extraction layer was a single Convolutional layer with just 30 features maps filter size (3x3) by using a relu activation function with the Maxpooling filter 2x2. The second features extraction layer was a Convolutional layer that contained 44 feature maps with a filter size of (3x3) by utilizing a relu activation function with the Max-pooling filter 2x2. It was taken into account that the picture files are greyscaled images with a resolution of 128x128. Finally, the last two fully connected layers comprise 128 and 50 neurons with a relu activation function, respectively. In the third Model (3FE3C), both the feature extraction layer and the classification layer were improved by adding a new layer to each tier. It includes six layers: the first three for extracting the feature and the final three for classification, as shown in Figure 6. The initial characteristics of the extraction layer were a single Convolutional layer with just 32 maps using a 3x3 filter with the Max-pooling filter 2x2. While A Convolution layer with 64 feature mappings utilizing Max-Pooling Filter 2x2 reactivation function was used for the second feature extraction layer. The latest feature extraction layer is a Convolution Layer with 128 maps that use the 2x2 reactivation feature Max-Pooling Filter. The grayscale pictures of size 128x128 have been taken into consideration. The layers of 128, 64, and 32 neurons with a relu-activating function are finally completely integrated.

II. III. CNN Train
Each of the three models undergoes many phases, beginning with preparation and training until it reaches a final architecture. The suggested approach requires preprocessing of the original pictures in order to account for retinal layers that may have been rotated, deformed, or shifted vertically during the collection process as in Table 1. The process of preparing training and examination data is described in Table 2. contains all the training parameters used in training all models in this research. This model was tested in several situations to get the best model design. Firstly, Training data was divided into four different ratios like 70% training _ 30% testing. Moreover, a specific optimization algorithm has been used, namely (Adam) optimization algorithm ('Adam') with a learning rate of 0.001. Except for the pre-train model, which had 10 epochs, all of the models' epochs were 100, and the batch size was 800. The two general metrics of network performance, MSE accuracy, and Loss function were employed to assess the network's performance. A confusion matrix was also utilized to assess network performance with each kind of retinal disease database, as detailed in section (2.1). Algorithm 1, which has 16 steps, illustrates the training procedure for the models.  = (128, 128)) imgs, labels = next(batches) Step 2: Divided the number of training and test data by use train rate.
Step 7: Set number of epochs = 100 for suggested models and 10 for pre-train models.
Step 9: Set number of batch size =800.

III. RESULTS AND DISCUSSIONS
The process of preparing training and examination data is described in section (2.3). Table 2 contains all the training parameters used in the training of all models in this work. The experimental results of the (1FE1C) Model demonstrate that the optimizer method attained the accuracy used up to 74.41 % in terms of training accuracy. In terms of the loss function, the optimizer algorithm reported 0.6527, while in testing the accuracy was 65.48 % and the loss function was 1.616, which is regarded as a low value when compared to other experimental findings in the loss function and the accuracy curve, as shown in Figure 7. According to the experimental findings of the (2FE2C) Model, the optimizer algorithm achieved a training accuracy of 83.91 % and loss function of 0.6527. While in testing the accuracy was 86.81 % and the loss function was 0.3629, which is considered an acceptable value when compared to other experimental results as in Figure 8 illustrates the loss function and the accuracy curve. The third model (3FE3C) has the better result than other proposed models as shown in Figure   8, the optimizer method produced a training accuracy of 91.10 % and a loss function of 0.2682, however in testing the accuracy was 96.00 % and the loss function was 0.1230, which is deemed acceptable when compared to preceding experimental results. The fourth model in this research was the (VGG16) Convolutional Neural Network, this model was trained in the same manner as Table 2, except that the number of epochs was limited to ten. The optimizer technique achieved a training accuracy of 74.75 % and a loss function of 0.6580, while a testing accuracy of 88.62 % and a loss function of 0.3664 were obtained, which is considered satisfactory in comparison to prior experimental findings. Figure 9 illustrates the loss function and accuracy.  Table 3 show the (3FE3C) model was superior to others models in terms of accuracy and value of the loss function. For the purpose of clarifying the results, the results were compared in terms of the training time of each model, as shown in Figure 10, and also a comparison in terms of the loss function and accuracy in Figure 11. Finally, the confusion matrix was utilized to assess the (CNN's) performance in identifying each a kind retinal disease independently. Figure 12 displays the confusion matrix equivalent to the experiment on the (3FE3C) Model. The confusion matrix reveals that (CNN) with three convolutional layers and three classifier layers slight trouble categorizing data from DRUSEN, and they misclassified some occurrences as belonging to other Retinal Diseases with an accuracy value of 96%. In fact, DME samples resulted in a large number of misclassifications, with 89% of Model classifications being inaccurate. This model, on the other hand, succeeds 100% of the time in classifying Normal and CNV categories. In order to reconcile the effects of this Model (3FE3C) with the previous work, this model was compared with the different models that used the Retinal Diseases dataset [35] as illustrated in Table 4.

IIII. CONCLUSION
The suggested Deep Learning architecture was used to classify (OCT) images and achieved acceptable results. To summerize the work of the experimental method, which conclusions were drawn from the results of the experiments. (OCT) systems need novel models to represent and extract characteristics that assist the prognosis, diagnosis, and followup of ocular disease. Thus, making use of multimodal information such as clinical reports, physiological data, and other medical pictures is a significant challenge. The proposed Model 3 (3FE3C) in this work elasticities an enhancement on the previous works done and covers the track for applying state-of-the-art CNN techniques in retinal diseases as shown in Table 4. The suggested network has been shown to outperform current pre-trained models and it can used to assist the diagnosis of Retinal Disease. Moreover, the training time took larger than the rest of the models.
Future work in this area will involve processing bigger datasets and identifying new approaches to properly summarize the highly dimensional (OCT) volumes. Together, end-to-end DL methods and attribution visualization tools may help uncover new predictive imaging patterns, which may require additional clinical study. Additionally, no additional image processing methods were employed in the work. Perhaps the use of image processing methods would have resulted in an interesting outcome that would improve the quality and performance of this work. These are problems that we will address in future efforts.