The Discrimination of Red Blood Cells Infected by Hereditary Hemolytic Anemia

This paper presents a medical application based on digital image processing and Artificial Neural Network (ANN), which can recognize three types of Hereditary Hemolytic Anemia (HHA) that affect the Red Blood Cells (RBCs) and change their shape. Three Feed Forward Back Propagation Learning (FFBBL) Neural Networks are used in hierarchical approach to achieve this goal. The essence of this research is to segment each Red Blood Cell in a separate image and then extract some interesting features from each image in order to present them to the neural networks. The latter will, in turn, take the decision whether the RBC is infected or not. The results showed a recognition rate 92.38 %.


Introduction
Many real world complex problems can be solved through Artificial Intelligence (AI) such as medical, commercial, industrial, and agricultural problems. The most immense area in the AI is the ANN which is widely used to solve complex problems in the pattern recognition, data mining, data security, time series prediction and many problems in various fields. This paper proposes a preprocessing that segment each RBC in separated image, then present each segmented RBC to three ANN arranged in hierarchical way. The ANNs determined whether the RBC infected or not and the type of infection.
The ANN is a computational simulation of biological neural network. ANN successfully simulate the functions of human brain in the term of learning, classification, optimization, prediction, clustering , and generalization [1]. ANN consists of multiple, highly connected neurons that are arranged in single or multiple layers, depending on the type of network These neurons connect with each other via weights; the type of application is the determining element of these connections and their weights [2].
A three feed forward back propagation learning neural networks in hierarchical way is implemented to produce a system which is able to recognize three types of hemolytic anemia that affect RBCs and change their shape.

Related Work
In the medical field, there are many related works for using medical image processing to discriminate the hematological diseases.
In 2008 Basim Alhadidi and Hussam Nawwaf Fakhouri proposed an iron deficiency anemia blue and red cells calculating system. They implemented an algorithm that achieves an automated way for the analysis of images taken for intestine villi. This algorithm will count the number blue and the red stained cells blood cells that contain iron in each villi alone. And also calculate the percentage of blue cells and red cells in the image [3].
In 2009 Hirimutugoda and Wijayarathna presented a research about using artificial intelligence for determining hematologic diseases, namely Malaria, thalassemia, and possible other abnormal red cell, they reached to 86.54% successfully recognition [4].
Also, in 2009 Makkapati and Rao presented a scheme based on HSV color space to segment RBCs and Malaria parasites by detecting dominant hue range and calculating optimal saturation thresholds, they reached to 83.54% sensitivity and 98% specificity [5].
Furthermore, Kondo and Ueno presented in 2011 a medical image diagnosis system for lung cancer detection by Revised GMDH-type neural network that uses heuristic self-organization network [6].

Background
Anemia can be defined as low hemoglobin concentration and is appeared in several types. One of these types is Hereditary Hemolytic Anemia which occurs when the body destroys the abnormal RBC's faster than the bone marrow can create new normal RBC's [7]. Hereditary Hemolytic Anemia occurs as a result of one of three causes [8]: • Enzymatic Defect: It occurs when one of the Enzymes inside the RBC is defected.
It results into two hemolytic anemia, (Pyruvate Kinase deficiency anemia) and (Glucose -6-Phosphate Dehydrogenase deficiency anemia). The former doesn't change the outer shape of the RBC. It needs some biological tests and the infection does not appear in the image of the blood smears, so it's out of the scope of this research. On the other hand, the latter is an anemia which is resulted from the deficiency of cellular enzyme called G6PD. In this case, when some free radicals and oxygen enter the RBC, they oxidase the DNA of the RBC and result in formation of bodies that stick on the wall of the RBC and then causes the membrane of RBC to be broken at that side [9] see ( Fig.1a) and ( Fig.1b ). • Membrane Defect: It occurs when one of the membrane proteins is defected. It results into two hemolytic anemia, (Spherocytosis anemia) and (Elliptocytosis anemia). The former changes the color of the RBC, not the shape, so it's out of the scope of this research, while the latter, which is usually called Elliptocytosis or oval, is an anemia which changes the RBC's shape to Ellipse [9] (see ( Fig.1c ) and (Fig.1d)). • Hemoglobin Defect: It occurs when the hemoglobin is defected. It results into two hemolytic anemia (Thalassemia) and (Sickle Cell Anemia). The former is divided into four types; all of them do not change the shape of the RBC, while the latter results from substituting a single amino acid passed in the globin chain of the RBC's hemoglobin by another one which is regarded as abnormal. When this type of RBCs is put in a situation in which there is a decrease in the oxygen, the RBC's will consequently undergo sickling. This shape makes the RBC fragile and sticky; which leads to decomposition of the cell [9] ((see Fig. 1e)). This paper proposes discriminating three types of hereditary hemolytic anemia, these three types are: G6PD deficiency anemia, hereditary elliptocytosis anemia, and Sickle cell anemia.

Proposed Scheme
Each RBC has been segmented in separated images, then some interesting features were taken from each image and presented to three Feed Forward Back Propagation Learning Neural Networks arranged in hierarchical approach. So the preprocessing for segmenting each RBC will be presented first, and then the proposed neural networks are presented.

Pre-Processing:
Though, it differs from one research to another, but all the pattern recognition researches must contain this step. The pre-processing in this research consists of six steps where the output from each step feeds as input to the next step. These steps are: 1. Converting the input image from Red Green Blue RGB color space to YCbCr color space. The new color space with three components Y, Cb, Cr which are given as follows [10] The image is converted to this color space to take advantage of using the luminous component Y, which feeds as input to the second step. 2. Applying the k-means clustering algorithm to the input image (i.e. Y component), taking into consideration that there are two centers in the processed image. This makes sense because considering two centers will make the RBCs, White Blood Cells WBC's, platelet, and artifacts as a class and the background as a second class, ((see Fig. 2)). The output from this step (i.e. k-means result) feeds to the third step [7]. Applying Canny edge detector [11] to the gray image resulted from converting the colored k-means result (Fig.2 b) to gray. Applying the edge detector will obtain only the edges of the objects inside the image. The result is a black and white image (BW image) that contains RBC's edges, WBC's edges, platelet edges and artifacts edges. 4. This step performs two operations: the first one is eliminating the clipped cells that are located on the perimeter of the image, since these cells are sometimes hard to analyze even with the pathological analyzer. This is done by eliminating each open object. The second operation is filling all the enclosed objects inside the image [12], that is the RBC's, WBC's, Platelets, and Artifacts (see Fig. 3a). Then, applying the canny edge detector to the filled image. This step is necessary because the normal RBC may contain hole inside it in the clustered image (see Fig.2 b). This hole will be eliminated in the filled image (see Fig.3 a). It is worth noting that only RBC's objects are needed, so the other elements will be eliminated in the next step. Ignoring every enclosed object that either has less than or equals to 150 boundary pixels will eliminate the platelet, or eliminate the artifacts. While, ignoring every enclosed object that has greater than or equals to 1000 boundary pixels will eliminate the WBC, since these objects are enclosed, but they are not RBCs. The result of this step is the segmented black and white RBC's which will be presented to the last step of pre-processing (see Fig.4). 6. Though most of the segmented RBCs are now ready to be presented to the neural network, but this step is very important because there are occasions on which the segmented rectangle of the RBC has some noise from the neighboring cell(s), The noise is surrounded by red circle in (Fig. 5a) and (Fig.5b), when some parts of that cell fall inside the segmented RBC rectangle. This step eliminates that noise by deleting every object having less than 100 pixels inside the segmented RBC image, (see Fig5.c). After deleting the noise from all the segmented images that is, segmented RBC's, they are ready to be presented to the ANN.

Artificial Neural Networks:
ANN is a set of simple units called Processing Elements (PEs). These processing elements are arranged in layers and connected to other processing elements through weights in order to form the neural network. The processing element itself performs some simple computation such as computing the weighted sum of its inputs and then testing some activation function to produce the output which, in turn, passed to the next layer [13].
There are two types of ANN supervised and unsupervised. This paper proposes using three supervised ANN. In supervised ANN, the input presented to the input layer, and the weights are adjusted depending on a comparison of the network output and the target (see Fig.6), until network output and target are matched [14]. The essence of this work is to extract some descriptors from each cell and present these descriptors to the ANN. As it is previously mentioned, three types of hereditary hemolytic anemia were discriminated by hierarchical neural networks. It might be thought that the network must be trained on four types only, which are G6PD deficiency anemia, Elliptocytosis anemia, Sickle cell anemia, and the normal, or not infected, RBC. Rather, the Fieldwork has some complication. For example, the cell infected with G6PD deficiency may appear in two shapes: the first one cuts off a large part (perhaps half of the RBC), (see Fig.1a or Fig.2a), while the second cuts off a small part of the RBC, (see Fig.1b or Fig.2b).
Likewise, the RBC infected by Elliptocytosis Anemia may appear in two shapes: the first one is fully ellipse, (see Fig.1c or Fig.2c), while the second has an ellipse shape, but thinner from one of its ends, (see Fig.1d or Fig.2d).
Each one of the two types of G6PD deficiency will be handled by separated neural network, while the two forms of elliptocytosis anemia used in the training treated as separated type, i.e. they were given different output codes during the training of the network. Then, the codes are gathered when given the result of infection.
One more complication occurs when two cells appear attached in the image (see Fig.7). This kind of cells is regarded as normal. Though, there is a possibility that one of these cells is infected, but this possibility is low.

Figure (7). Connected cells appear in the samples
As mentioned previously, three feed forward back propagation learning neural networks are used in hierarchical way, (see Fig.8), to achieve the recognition. Each segmented RBC is presented to the first neural network (NET 1) to check whether the RBC has circular shape or not. Depending on the result of (NET 1), the segmented RBC is sent either to the second neural network (NET 2), circular, or to the third neural network (NET 3), not circular.
For better understanding of the networks and their functions each network will be presented separately: • Compactness: The compactness for a 2D shape is defined as the ratio between its perimeter and area. The compactness is equal (perimeter2)/area [15]. • Eccentricity: To find the eccentricity of an object, two lines must be drawn: the first one connects the farthest points on the border of the object, and the second one should pass vertically the first one and also connect the farthest points on the border of the object [16]. These lines can be used as descriptors. The eccentricity can be defined as the ratio between the longest line to the shortest one. In the next figure (Fig.9) the eccentricity is the ratio between the lines A and B i.e. Eccentricity = A/B.

Figure (9). Eccentricity
• Elongatedness: For each object a bounding rectangle can be drawn. The minimum bounding rectangle, dotted rectangle in (Fig. 10), can be used as a descriptor. The elongatedness is the ratio between the longest side of the minimum bounding rectangle to the shortest one [13]. In the (Fig. 10) the elongatedness is the ratio between A and B i.e. Eccentricity = A/B. The network NET 1 has three layers; its topology is illustrated in (Fig. 11). The input layer has three neurons, the hidden layer has five neurons, while the output layer has single neuron. It takes 1 second to reach stability. This network reaches stability after 112 epochs when the mean squared error was 1*10E-6 (see Fig. 12).
Each segmented RBC is presented to NET 1, the three features are calculated to the RBC, and then it is presented to the network to determine whether the RBC have circular shape. If NET 1 responds by output 1, i.e. NET 1 decides that the RBC has circular shape, then the RBC is either infected by G6PD deficiency anemia that cuts off small part of the RBC (Fig.1 b), or it is normal RBC. It is note-worthy that in both cases, the RBC has circular shape. The RBC, then is sent to NET 2 to determine the correct type. Before using the signature, it is made as a scale invariant through dividing all the values of the signature by the maximum value of it. This will make all the values of the signature between the interval (0,1] and [13]. Finally, to reduce the dimensionality of the scale invariant signature, only 57 points of it have been taken as a descriptor, these 57 points starting at theta=1,7,13 ……343. The network NET 2 has four layers and its topology is illustrated in (Fig. 14). The input layer has 57 neurons, the first hidden layer has 57 neurons, the second hidden layer has 30 neurons, the output layer has single neuron. It takes 96 epochs to reach stability after training for 16 minutes and 57 seconds when the mean squared error was 1*10E-6 (see Fig. 15).  If the RBC is presented to this network and the network responds by 1 at the output layer, then the presented RBC is infected by G6PD deficiency that cuts off small part of the RBC, else the RBC is normal. 3. NET 3: This network is trained on one feature, that is the signature. The network NET 3 has four layers and its topology is illustrated in (Fig. 16). The input layer has 57 neurons, the First hidden layers have 57 neurons, the second hidden layer has 30 neurons, and the output layer has 6 neurons. It takes 120 epochs to reach stability Signature (Ө) Ө (Xc, Yc) after training for 24 minutes, when the means squared error was 1*10E-5, (see Fig.  17).

Figure (16). NET 3 Topology
When the RBC is presented to NET 1 and NET 1 responds by 0 at the output layer, then it means that the RBC does not have circular shape. In this case, the RBC is sent to network NET 3 which, in turn, checks the type of the RBC, that can be either Sickle cell anemia, or G6PD anemia which cuts off large part of the RBC, or one of the two forms of Elliptocytosis anemia (Fig. 1c ) and (Fig. 1d), or two cells connecting together, (see Fig. 7). Again the latter type is regarded as normal; all the other four types are infections.

Results
The data set used in this paper is 407 cells. The cells are taken from ten images and the data set is presented as shown in the following table: For better understanding of networks results, the results of each network will be discussed separately and a presentation of the results of the all structure will be done: 1. NET 1: The Network NET1 has been trained on 90 cells, i.e. (22.11 %) of the data and is tested on the all 407 cells. The training database contains 40 non-circular cells and 50 circular ones. The next three figures, particularly (Fig.18), (Fig.19) and (Fig.20) show the compactness, Eccentricity and Elongatedness training data respectively.  The final results show that 92.38 % of the cells recognized correctly.