Prediction of COVID-19 Effect on Patients during Six Month After Recovery, by Using AI Algorithm

Abstract


INTRODUCTION
A significant global issue was brought on by the Coronavirus (COVID-19) outbreak in late 2019 [1].Just one month after it started to spread, the WHO had to declare the epidemic to be a pandemic due to its extreme severity.The spread of the virus is inflicting a shock on the global economy by severely disrupting many industries and public functions, including the supply chain, payments, public transportation, and the financial system [2].Artificial intelligence mimics human intelligence.Through automatic driving, fraud detection, robotics, computer vision, and internet advertising all uses AI.With its effectiveness in diagnosis, care, patient monitoring, medication research, epidemiology, etc., AI may become a vibrant field of study to answer humanity's concerns [3].
Corona virus infection leads to long-term effects, although these effects appear on some survivors only, and this suggests the need to study and follow up on the long-term effects that may get to those recovering from Covid-19 [4].It is necessary to determine the prevalence of these long-term outcomes to facilitate timely preparations for the management of survivors.

Related Works
Claudia et al. presented research to describe clinical progression and predict symptom continuity during 2-month follow-up in adults with non-critical COVID-19 in [5].150 patients were followed up in Tours Hospital, Italy, from 17/3 to 3/4.Hong et al. also presented research to study the longterm pulmonary function and related physiological characteristics of COVID-19 survivors in [6].COVID-19 survivors were recruited to undergo chest and lung HRCT and IgG antibody tests 3 months after discharge.In [7] the long-term health consequences of hospitalized patients with COVID-19 and their associated risk factors are described by Huang et al.A study was conducted on COVID-19 patients who were discharged from Jin Yin-tan Hospital (Wuhan) between 7/1 and 29/5 and all patients were interviewed through a series of questionnaires to assess symptoms, healthrelated quality of life, physical examinations and a 6-minute walking test, Blood and lung examinations and a highresolution CT scan of the chest were performed.Each day brings new AI concepts, applications, and technology, so it has been useful in identifying corona disease outbreaks, diagnosing patients, and disinfecting regions [8]."Prediction" refers to statistical and probabilistic data from past observations.AI algorithms have been used to disease spread [9], inventory valuation [10], weather [11] and sales [12] in recent years.Prediction methods are also useful in healthcare.Sarkar et al. in [13]  showed in [16] how they used general machine learning algorithms to come up with a model that could predict the severity of the condition and also the likely outcome by using RF with the AdaBoost algorithm and data from COVID-19 patients including their health, travel, and demographics.The model was tested using the F1 Score, accuracy, precision, and recall, and it was found to be 94% accurate because there was a link between the sex of the patients as well as their deaths that could be seen in the data.
This paper aims to know and proactively identify the effects that can take place on those having recovered from Covid 19 in order to support the infected during the recovery phase with the objective of restoring their health and early identification of the effects that may encourage people to follow them quickly.Including warning the infection and medical staff by the risk of persistent symptoms in the injured, regardless of the severity of their damage.Also performing pulmonary rehabilitation on patients with chronic respiratory disorders in order to increase their exercise capacity and eliminate shortness of breath.Community Respiratory Teams will play a key role in the early and long-term care of discharged patients in order to identify recovery needs, control breathing, and examine physical health and physical activity.This goal was achieved by using a proposed model that uses intelligent algorithms to predict the effects that may have on people recovering from Covid-19 and test the accuracy of this prediction using different measures.

Data Acquisition
Due to the lack of clinical information and laboratory data to follow up on the state of patients after their recovery a Google-hosted questionnaire form was developed and distributed via social media platforms including WhatsApp, Messenger, Facebook, and others.Where the form was prepared exactly based on published medical studies and a specialist doctor was consulted to confirm useful questions (Juli Evangelou Strait, 2021; World Health organization, 2020).As the questionnaire asks about the person's age and gender, chronic illnesses like diabetes, asthma, kidney disease, and other conditions, as well as whether or not they smoke, as well as information about the person's health at the time of the infection, such as date of infection, length of the infection, oxygen saturation, whether or not they might need to be admitted to an intensive care unit, and the symptoms of the infection.The general framework of the proposed model is shown in Figure 1.

Data pre-processing
The following steps are included in the process of managing the collected data in order to be suitable for the proposed model:

A. Data Cleaning
The responses that represent the period of follow-up were taken in the first six months after the infection, were covered by 457 people.The data was cleaned and processed by discarding the responses that lacked age and that were less than a month old after the infection.Some of the responses were processed with the aid of a medical practitioner.

B. Data Aggregation
In order to infer the effect of Corona virus on each organ or system in the human body separately, we aggregate the answers related to the effect of a particular organ or system to obtain one answer placed under the title of this organ or system as follows:  Gastrointestinal system: constipation, diarrhea and acid reflux.Where  : denotes the feature's greatest value and : denotes its smallest value.To do this, the MinMaxScaler function was used.

Solve Multi-label Classification Problem
After the pre-processing worked on the data that represents the data of people who recovered from COVID-19 within 6 months after recovery, there will be 11 effects (outputs) for each person (record), which means that this data is of a multilabel classification type.To solve this problem, we transform this data into a single label using the classifier chains method, because it will keep the correlation [20].The process will be repeated 11 times for the model, which means that every time we produce a target output, it will be added as an additional input to the feature being entered into the model and so on until the process is complete.This last process of the model will use the existing features with the 10 outputs resulting from the previous operations (whole output + features) to get the final prediction.

Data Splitting
The data generated by the previous processes is randomly separated into two groups, the training group (which contains 80% of the data) and the testing group (20% of the data).

Resampling Training Data
Unbalanced data sets have about equal categories, but one has more samples than the other.Classifiers may perform well on the majority class but badly on the minority due to its increased effect.Unbalanced data sets are often resampled to produce a more balanced distribution of class states.Random undersampling and oversampling are resampling methods.
Undersampling removes majority class samples to balance the collection.Random oversampling balances datasets by duplicating minority class examples [21].Because the data is unbalanced, we changed it to balance in order to achieve accuracy and unbiasedness in the model, either by undersampling or oversampling [22].Figure (3) depicts the data after resampling techniques were used to obtain the Gastrointestinal system output.

Features Selection (FS) Operation
Algorithms based on swarm intelligence are a clear solution for enhancing feature subset selection in the wrapper methodology.Wrapper models evaluate the quality of the features using a predetermined ML approach, and the FS process avoids the algorithm's representational biases [23].The Glowworm Swarm Optimization (GSO) Approach, a swarm optimization method based upon this Ant-Colony Optimization (ACO) suite of algorithms, was used, which was originally created by Krishnanand et al [24].The GSO Algorithm was updated by transforming the fitness function to a classification function to be utilized in feature selection, which will speed up training and improve accuracy.Her GSO chooses the feature subset.The fitness value of a glowworm is then calculated using K-fold crossvalidation.There are K subsets of the training data.K-1 training subsets are utilized as inputs for GSO, and one test subset is used to determine the fitness of each glowworm.The glowworm's fitness value will be the mean of the K classification accuracies calculated.While the test dataset is not used in this GSO feature selection procedure, it is used in the final evaluation, when the classification accuracy of the best feature subsets is determined.The RF algorithm determines the classification accuracy of each created glowworm model.Figure ( 4) depicts every aspect of the suggested method.Algorithm 1 includes feature selection operation steps.

Output:
Best feature subset with best fitness.

Method:
Step1: Generate vector of float element between 0 and 1 randomly as following: Where d is the total number of features in the dataset.
Step2: Convert X to binary, using the threshold value 0.5 to given selected feature.
If no feature were selected: If the value of optimization function condition is satisfying: Save the best feature and best fitness.
Step3: Run GSO to get vector of float element between 0 and 1 as following.
Step5: Return Best feature subset with best fitness.
Step6: End algorithm When the feature selection technique was used to predict some effects, the results varied because we received better results for some outputs (effects), but when it was applied to other effects, it did not produce satisfactory results.Because of this, not all effects used this algorithm.

Tuning operation for classifier model
To choose the best hyper-parameters for classification algorithms, hyper-parameter determination is typically a challenging problem because choosing the right hyperparameters can significantly affect how well a prediction model performs, allowing for a more optimal solution and a higher degree of model accuracy [25].In order to change the hyper-parameter, swarm intelligence optimization utilizing the GSO algorithm has been suggested.This approach will be used to adjust the hyper-parameters for RF algorithm.The tuning procedures for the classifier model are given by algorithm 2.

Output:
Best hyper-parameter with best fitness.

Method:
Step1: Generate the value of hyper-parameter randomly from their range.
Step2: Compute the mean of running 5-fold cross validation on the training set using hyper-parameter of classifier/regressor, that denoted as optimization function to get best fitness.
If the value of optimization function condition is satisfying: Save the best hyper-parameter and best fitness.
Step3: Run GSO to get the value of hyper-parameter from their range.
Step5: Return best hyper-parameter with best fitness.

Prediction using RF algorithm as classifier model
After the feature selection process was done with GSO, these selected features will be fed along with the outcome into the RF algorithm to predict the effects on people recovering from COVID-19.

Result performance of algorithms
Using training and testing data, the suggested model's performance was evaluated.The model was trained using the training data.The evaluation was done on to testing data using the following metrics: AUC, F1-score, accuracy, hamming loss, as illustrated in Table 1.persons after six months from recovery from it, the second purpose of this study is to build an intelligent model that can predict these impacts.Third, determining the needs of those who have contracted the virus as soon as possible, which ensures early detection of potential effects on those who are recovering, as well as determining the necessity of conducting pulmonary rehabilitation for those who have chronic respiratory conditions, which is intended to enhance their capacity for exercise and breathing.
A RF algorithm was used to predict the impact on the recovery people and evaluate this model using various metrics after performing multiple processing operations on the data and using the GSO algorithm to apply tuning hyper-parameter on RF algorithm, and used to perform the feature selection process in order to select the useful and influential features to obtain better accuracy at a faster time, where obtained good predictors result for all effects and in different proportions as shown in the preceding Table 1.

6.Conclusion
Using data from recovered individuals, including age, sex, medical history, symptoms of infection, and various details about the effects that occurred on people after recovery, an intelligent model using the RF classifier is proposed in this paper to predict the impacts within six months after recovery from COVID-19.This will then reveal the effects that Covid-19 had on persons after they had recovered.Different degrees of effects on the body's organs are observed in recovering patients.This model will help proactively determine how much care and follow-up patients need while they are injured.To identify and treat any residual or newly emerging longterm sequelae in affected and recovered individuals where follow-up and comprehensive assessment and early rehabilitation activities are required for these patients.Reference


Respiratory system: persistent cough, shortness of breath. Nervous system: headaches, memory problems and problems with senses of taste and smell. Mental health: anxiety, depression, sleep problems and substance abuse. Metabolism: new onset of diabetes, obesity and high cholesterol. Cardiovascular system & Coagulation regulation: heart failure, blood clots in the legs and lungs. Kidney: acute kidney infection and chronic kidney disease that can, in severe cases, require dialysis. General health: anemia. Skin: rash and hair loss. Musculoskeletal system: joint pain and muscle weakness.Physical Activity: Exercises, Walking, running, doing daily chores, Up the stairs, Carrying heavy things.

Figure 3 .
Figure 3. Resampling data from first group.

Return 1 .
0 as fitness.Else Compute the mean of running 2-fold cross validation on the training set using evaluation metric of the RF classifier/regressor. Compute the value of optimization function to get best fitness of the best feature that selected using: F(x)=α * (1value of evaluation metric) + (1-α) * No. of selected feature / No. of all features.

Figure 4 .
Figure 4. Framework of the proposed features selection model

Table 1 .
Performance results of algorithms