Principle Components Analysis and Multi Layer Perceptron Based Intrusion Detection System

Security has become an important issue for networks. Intrusion detection technology is an effective approach in dealing with the problems of network security. In this paper, we present an intrusion detection model based on PCA and MLP. The key idea is to take advantage of different feature of NSL-KDD data set and choose the best feature of data, and using neural network for classification of intrusion detection. The new model has ability to recognize an attack from normal connections. Training and testing data were obtained from the complete NSL-KDD intrusion detection evaluation data set.


Introduction
Fast few years have witnessed a growing recognition of intelligent techniques for the construction of efficient and reliable Intrusion Detection Systems (IDS).Due to increasing incidents of cyber-attacks, building effective Intrusion Detection Systems are essential for protecting information system security, and yet it remains an elusive goal and a great challenge.
In general, the techniques for Intrusion Detection (ID) fall into two major categories depending on the modeling methods used: misuse detection and anomaly detection.Misuse detection is based on the knowledge of system vulnerabilities and known attack patterns, while anomaly detection assumes that an intrusion will always reflect some deviation from normal patterns.Many AI techniques have been applied to both misuse detection and anomaly detection.Pattern matching systems like rule-based expert systems, state transition analysis, and genetic algorithms are direct and efficient ways to implement misuse detection.On the other hand, inductive sequential patterns, artificial neural networks, statistical analysis and data mining methods have been used in anomaly detection [1].
Architecturally, an intrusion detection system can be categorized into three types host based IDS, network based IDS and hybrid IDS [2] [3].A host based intrusion detection system uses the audit trails of the operation system as a primary data source.A network based intrusion detection system, on the other hand, uses network traffic information as its main data source.Hybrid intrusion detection system uses both the methods [4].However, most available commercial IDS's use only misuse detection because most developed anomaly detector still cannot overcome the limitations (high false positive detection error, the difficulty of handling gradual misbehavior and expensive computation [5]).This trend motivates many research efforts to build anomaly detectors for the purpose of ID [6].
We organize this paper as follows, section 2 provides brief introduction about PCA and Neural Network, section 3 presents previous work, section 4 explain the model designer, section 5 discusses the experiments results followed by conclusion.

PCA and Neural Network
Principal Component Analysis (PCA) is an effective statistical technique for reducing the dimensions of a given unlabeled high-dimensional dataset while keeping its spatial characteristics as much as possible by performing a covariance analysis between factors.As such, it is suitable for data sets from multiple dimensions field of application, such as image compression, pattern recognition (face recognition in particular), gene expression, data clustering and traffic flow events intrusion detection.One of the main advantages of PCA is that you can compress the data, i.e. by reducing the number of dimensions, without much loss of information.Now it is mostly used as a tool in exploratory data analysis and for making predictive models.PCA can be done by eigen value decomposition of a data covariance matrix or singular value decomposition of a data matrix.PCA is also known as the discrete Karhunen-Loeve transformation, or the Hotelling transformation [7].
An increasing amount of research in the last few years has investigated the application of Neural Networks to intrusion detection.If properly designed and implemented, Neural Networks have the potential to address many of the problems encountered by rule-based approaches.Neural Networks were specifically proposed to learn the typical characteristics of system's users and identify statistically significant variations from their established behavior.In order to apply this approach to Intrusion Detection, we would have to introduce data representing attacks and non-attacks to the Neural Network to adjust automatically coefficients of this Network during the training phase.In other words, it will be necessary to collect data representing normal and abnormal behavior and train the Neural Network on those data.After training is accomplished, a certain number of performance tests with real network traffic and attacks should be conducted.Instead of processing program instruction sequentially, Neural Network based models on simultaneously explorer several hypotheses making the use of several computational interconnected elements (neurons), this parallel processing may imply time savings in malicious traffic analysis [8].

Previous Works
Mrutyunjaya Panda et al. [9] use discriminative multinomial Naïve Bayes with various filtering analysis in order to build a network intrusion detection system, they perform 2 class classifications with 10-fold cross validation for building the model .In [10] Shilpa lakhina et al. propose a new hybrid algorithm PCANNA (principal component analysis neural network algorithm) is used to reduce the number of computer resources, both memory and CPU time required to detect attack.The PCA transform used to reduce the feature and trained neural network is used to identify any kinds of new attacks.The model gives better and robust representation of data as it was able to reduce features resulting in a 80.4% data reduction, approximately 40% reduction in training time and 70% reduction in testing time is achieved.In [11 ] Syed Muhammad Aqil develops intrusion detection system by using principle component analysis and Neural Network the authors use four Multi Layer (MLP) working in parallel for each attack with the normal dataset such as normal vs. probe, normal vs. DoS, normal vs. U2R and normal vs. R2L.

Experiment Design
The block diagram of the hybrid model is showen in the following figure (1)

A. NSL-KDD Data Set
KDD Cup 1999 intrusion detection benchmark dataset is used by many researchers in order to build an efficient network intrusion detection system [12].However, recent study shows that there are some inherent problems present in KDD Cup 1999 dataset .The first important limitation in the KDD Cup 1999 dataset is the huge number of redundant records in the sense that almost 78% training and 75% testing records are duplicated, as shown in Tables 1 and 2 [13]; which cause the learning algorithm to be biased towards the most frequent records, thus prevent it from recognizing rare attack records that fall under U2R and R2L categories.At the same time, it causes the evaluation results to be biased by the methods which have better detection rates on the frequent records.This new dataset, NSL-KDD dataset is used for our experimentation and is now publicly available for research in intrusion detection.It is also stated that though the NSL-KDD dataset still suffers from some of the problems discussed in [14] and may not be a perfect representative of existing real networks, it can be applied an effective benchmark dataset to detect network intrusions.In this NSL-KDD dataset, the simulated attacks can fall in any one of the following four categories [15]:  DOS (Denial of Service): an attacker tries to prevent legitimate users from using a service e.g.TCP SYN Flood, Smurf.
 Probe: an attacker tries to find information about the target host.For example: scanning victims in order to get knowledge about available services, using Operating System.
 U2R (User to Root): an attacker has local account on victim's host and tries to gain the root privileges.
 R2L (Remote to Local): an attacker does not have local account on the victim host and try to obtain it.

B. Data Preprocessing
Some features have symbolic form (e.g.Protocol type ,Service ,Flag) were converted into numerical ones by assigning a unique number for each feature from the range [1.. no. of the values in the feature] ,lower iteration value takes no.1 and the upper iteration value takes no.equal number of the values within the feature.

C. Principle Components Analysis (PCA)
The basic knowledge of PCA requires the covariance matrix for the features in the training set.The covariance matrix is defined by Where M,N number of the records in training set, number of features in each record, i location of feature in record and j location of the record in dataset, M i , M j mean of feature i,j.
The mean (μ) is defined by the following Equation: By using Jacobi's Method, we find eigen values as the following steps.
1. Find the largest element in the square matrix that is not in the main Diagonal 2. Find the angle θ 3. Rotation can be done by the following: Find the value α Find the other elements of the rotation matrix by the two following equations:

D. MLP Algorithm
The anomaly detection is to recognize different authorized system users and identify intruders from that knowledge.Thus, intruders can be recognized from the distortion of normal behavior.Multi-layer feeds forward networks (MLP) is used in this work.The number of hidden layers and the number of nodes in the hidden layers, were also determined based on the process of trial and error.We choose several initial values for the network weight and biases.Generally, theses are chosen to be small random values.The Neural Network was trained with the training data which contain normal and attack records.When the generated output result doesn't satisfy the target output result, adjust the error from the distortion of target output.Retrain or stop training the network depending on this error value.Once, the training is over, the weight value is stored to use in recall stage.In training stage, we used different network architectures with different training algorithms to find the best architecture with a good result.Resilient back propagation and Levenberg-marquardt with two hidden layers were best result from the others.After many experiments to the best features of data which is resulted from PCA algorithm, we take 16 features from 41.The architecture of Multilayer feeds forward networks consisted from 16 nodes in input layer, 10 nodes, in the first hidden layer, 5 nodes in the second hidden layer, and 1 node in the output layer is illustrated in the following figure.

Conclusions
The main contribution of the present work is to achieve a classification model with a high intrusion detection Rate and with low false negative, this was done through the design of a classification model for the problem using PCA and MLP neural network for the detection of attacks.The first stage of the model is PCA, to find the best filed from the NSL-KDD dataset, we chose 16 features from 41 features.The second stage of the model is MLP neural network which is used for the classification of normal connection from attack connection.After many experiment on the Neural Network by using different training algorithms and object functions, we observe that Resilient back propagation with sigmoid function is the best one for classification.We used two hidden layers, 10 nodes in the first hidden layer and 5 nodes in the second hidden layer.We used the complete NSL_KDD dataset which are 125973 records for the training stage and 22544 records for testing stage.

6 .
Rearrange the steps from (1-3) until we get the elements of off-diagonal near the zero[16] .Steps for executing PCA algorithm 1. Reading training NSL-KDD data set.2. Processing data mentioned above in section B. 3. Calculate Variance/Covariance matrix for the features in every record of the training data.4. Calculate Eigen vector of Variance/Covariance matrix as follows: A. Find the largest element in the matrix.B. Find the angle of Rotation.C. Find the elements of rotating matrix.D. Rearrange the steps from (A -C) until, we get the elements of off-diagonal near the zero. 5. Calculate the values of Eigen vector from the resulted matrix and put it in the Eigen matrix.Arrange the Eigen matrix.

Figure 2 .
Figure 2. The Architecture of the MLP The goal which is used in the algorithm was 0.001, and the epochs number was 1000.The training time for Resilient back propagation was 50 seconds and the training time for Levenberg-marquardt was 12 minutes.While, the testing time for Resilient back propagation was 17.939403 seconds and the testing time for Levenberg-marquardt was 17.293176 seconds.The result of recall stage of two algorithms and the previous works is shown in the following table.

Table ( 3
).The result of recall stage of two algorithms