Application of Polyalphabetic Substitution cipher Using Genetic Algorithm

Several Genetic Algorithms have been developed for applications of cryptography problem; the primary distinction among all of them being the G


Abstract
Several Genetic Algorithms have been developed for applications of cryptography problem; the primary distinction among all of them being the G.A. used for decryption problem and obtains the plain text.In this paper a new approach is proposed using Genetic Algorithm with cryptography.G.A. is used to obtain a best secret key in polyalphabetic substitution cipher.This key will be used then for encryption and decryption with a high level of security.The program is written in Matlab language (6.5).

Introduction:
Many problems that computer scientists encounter are very hard to solve.Some of these problems commonly called NP-hard problems, have no known efficient solution process (i.e., no algorithm that returns a solution in a time that is polynomial with respect to the size of the input).While no efficient process is known for generating an optimal solution all of the time to an NP-hard problem, many NP-hard problems can be solved efficiently much of the time to near optimality using a heuristic.A heuristic is a solution-generating rule that gives near-optimal solutions a high percentage of the time.However, there is no guarantee that a heuristic will ever give a near-optimal solution at all.(11) Several Genetic Algorithms have been developed for decryption for many types of encryption methods.For example, Spillman R. (12) has shown on his paper that genetic algorithm could be used to easily compromise even high density knapsack ciphers.Gester J. ( 6) used a simple genetic algorithm to search the key space of cryptograms in an attempt to create a general solver for such problems.In their paper, Spillman R., Janssen M., Nelson B., and Kepner M., (13) consider a new approach to cryptanalysts based on the application of a genetic algorithm.They showed that such an algorithm could be used to discover the key for a simple substitution cipher.But Dimovski A., Gligoroski D.(4) in their paper presented three optimization heuristics which can be utilized in attacks on the transposition cipher, These heuristics are simulated annealing, genetic algorithm and tabu search.In this paper our goal of using G.A. with encryption is to generate the best secret key for Monoalphabetic Substitution Cipher which satisfied a high level of security.

Cryptography:
Is the science of writing in secret code.In data and telecommunications, cryptography is necessary when communicating over any UN trusted medium, which includes just about any network, particularly the Internet.
There are, in general, three types of cryptographic schemes: secret key (or symmetric) cryptography, public-key (or asymmetric) cryptography, and hash functions.With secret key cryptography, (which is proposed in this paper) a single key is used for both encryption and decryption.The sender uses the key (or some set of rules) to encrypt the plaintext and sends sender recever encrypted into cipher text, which will in turn (usually) be decrypted into usable plaintext.( 5) figure (1) shows the block diagram for cipher system.

Polyalphabetic Substitution cipher:
The Polyalphabetic substitution cipher is a simple extension of the monoalphabetic one.The message is broken into blocks of equal length, say B, and then each position in the block ( Figure (2): An example explaining polyalphabetic substitution cipher.

Genetic Algorithm:
Genetic algorithms (GA) are general, domain-independent search and optimization techniques developed in the 1970s.These algorithms borrow from nature the concept of natural selection, according to which, the stronger survives to a competition while the weaker will die, so that the genes contained within the chromosomes of dominant individuals will spread within the next generation.As the number of generations' increases, an individual able to withstand the external environmental pressure is likely to be contained in the population.Similarly, GAs is based on an initial population of individuals, each of which represents a possible solution to the problem at hand (9).
Over time, the genetic algorithm will produce few exceptionally fit individual solutions within the population.These individual solutions would be highly, but not completely, optimized.(11) A G.A. is composed of three main components, which are problem dependent: the encoding problem ( Chromosome Representation), evaluation function (Fitness Function) and the operators.

A. Encoding Problem (Chromosome Representation):
The chromosome representation selected for the problem concerned is a simple one in which each gene is an integer value indicating the key of length 26 integer values.The positions of genes in a chromosome specify the shifting value which each letter in the plain text file should be changed to construct encrypted text file.For example, the following chromosome representation in figure (3) shows that char 1 in the plain text file should be shifted by 4 and char 2 shifted by 6 and so on.

B. Evaluation Function (Fitness Function):
The evaluation of chromosome is a critical portion of a genetic algorithm.For encryption problem, we used a simple evaluation function in which the fitness value of a chromosome is determined by determining the maximum differentce between letters frequencies in the plain text file and the encrypted file.

C. Genetic Operators:
Crossover: The main genetic operator randomly selects two chromosomes from the population and swaps second part of each gene after a randomly selected point.This is equivalent to assigning a subset of key to other.Mutation: An operator which produces random changes in various chromosomes.Mutation serves the critical role of either replacing the chromosomes lost from the population during the selection process or introducing new chromosomes that were not present in the initial population.The mutation rate controls the rate at which new chromosomes are introduced into the population.(7) Selection: Is the process of keeping and eliminating chromosomes in the population based on their relative quality or fitness.In most practices, a roulette wheel approach is adopted as the selection procedure.A valuebased selection scheme assigns roulette wheel sectors proportional to the fitness value of the chromosomes.( 7

Proposed Algorithm:
In this paper, new algorithm is proposed using G.A. for the encryption and decryption text file.

A. Encryption:
1.Input the plain text file.2.Measure the letters frequencies of the plain text file.3.Divide the plain text file into fixed size of blocks (each block consists of 26 chars).4.Creat initial generation of 10 individuals(each individual representing a secret key),each of them consists of 26 random integer value.5.Encrypt the plain text file as follows: -By using each generated key we shift each letter in each block of the plain text file by corresponding value in this key.For each key, we obtain new encrypted file with different frequencies of letters.6.Measure the frequencies of letters in each encrypted file (10 encrypted files).
7.Determine the fitness function for each individual( each key) as a maximum difference between frequencies of the letters in the plain text file and encrypted files.8.Repeat 9.Creat new generation by selection, crossover and mutation.10.Compute fitness for new generation.11.Until no fitness improvement is achieved.12.Determine the best key as the key which corresponds to maximum fitness value.13.Converting the integer form of the best key to character form and adding it to the ending of the encrypted file.

B. Decryption:
In the second step of the algorithm, a decryption of the encrypted file is performed.After the receptor has the encrypted file do the following: 1. Spread the key from the encrypted file and convert it from character form to integer form.2. Subtracting each value in the key from corresponding letters values in the encrypted file.

Practical Representation:
A. Initializing Population (pop): (by using rand order in Matlab language) we build function pop which generates array of 10 keys each key consists of 26 columns (randomly values between 1 and 26).

B. Frequency Function (FRE):
We built function FRE for calculating the frequencies of each letter in the plain text file and in the encrypted file.C. Dividing the plain text file: After determining the frequencies of the letters in the plain text file we divide it into blocks, each block consists of 26 characters.D. Encrypting the plain text file: For encrypting the plain text file we used each key in pop array (10 row * 26 column) to shift each letter in each block of the plain text file (length of key = length of each block in the plain text file = 26 character).We used key for shifting letters in first block, second block… est., until we reached the ending of the file, as shown bellow:

While not end of file Encrypted block=position of each letter in the plain text file+ coorespon
-ding value in the key.End After encryption we obtain 10 encryption files.

E. Fitness Function (Evaluation Function):
We measure the differences between frequencies for each letter in the plain text file and in the encrypted files.Assumption of their frequencies of letter represents the fitness function; the largest fitness value represents the better solution for the problem.F. Selection: We use roulette-wheel for selection which is considered the simplest selection scheme which involves the following technique: 1.The individuals are mapped to contiguous segments of a line, each individual's segment is equal in size to its fitness.2.A random number is generated and the individual whose segment spans the random number is selected.3. The process is repeated until the desired number of individuals is obtained (called mating population).
Table (1) shows the selection probability for 11 individuals.Individual 1 is the most fit individual and occupies the largest interval, whereas individual 10 as the second least fit individual has the smallest interval on the line (see figure).Individual 11, the least fit interval, has a fitness value of 0 and get no chance for reproduction.( 12)

Table 1: Selection probability and fitness value
For selecting the mating population the appropriate number of uniformly distributed random numbers (uniform distributed between 0.0 and 1.0) is independently generated.{Sample of 6 random numbers} 0.81, 0.32, 0.96, 0.01, 0.65, 0.42.
Figure (5) shows the selection process of the individuals for the example in table together with the above sample trials.The bad thing about this selection method is that if there is not much difference in the fitness of the chromosomes compared to the absolute fitness value, all chromosomes have a roughly equal chance of being selected to survive.

G. Crossover:
We use a simple crossover for generating next generation; we don't neglect the original parents, since we generate a new population from original parents.This causes duplication of size population.(new generation consists of 20 keys in each iteration).As an example, if the crossover point =10, the result of crossover shown below in figure (6).
H. Mutation: we choose mutation rate equal to 0.01.I. Stop (Terminating conditions):After executing the above mentioned steps a new generation of keys is created and the steps are repeated until the stop condition is reached.Our algorithm will be stops when the fitness function reaches high value or when algorithm generates 15 populations.

Results and Discussion:
The program has been tested on the plain text from paper(5) which consists of (319) characters.An experimental result for the genetic algorithms was generated with 15 runs using 'Matlab' language.Roulette wheel selection is used to exploit past results to direct the search for efficient secret key.Probabilities of crossover and mutation are fixed.Mutation acts as a safety net to recover good genetic material that may be lost through selection and crossover.The genetic algorithm will stop when the fitness function is unchanged after some predefined number of generation.
-The plain text file: [does increased security provide comfort to paranoid people or does security provide some very basic protections that we are naive to believe that we don't need during this time when the Internet provides essential communication between tens of millions of people and is being increasingly used as a tool for commerce security becomes a tremendously important issue to deal with.]From figure (7), and depending on the fitness function value which is represented by Sums of all differences for all letters in the encrypted file and in the plain text file, we can note that the best key is generated after fifteen iteration, so this key will be used for encrypting the plain text file and add it to the end of the encrypted file before sending it to the receiver.When receptor has the encrypted file he separates the key from the ending of the file(separate 26 letters),and converts it from character form to integer form , then subtracting each key value from encrypted file to obtain the plain text file.

Conclusion:
In this work, we have developed a novel method for using Genetic Algorithm with Cryptography.The method utilizes a genetic algorithm for evaluating the consequences of secret keys for encryption.The experimental results show that this kind of approach is very suitable for the encryption problem.

Future Work:
Several future directions are identified for further investigation.One of them includes applying Genetic Algorithm in other types of ciphers, like Monoalphabetic cipher.

Figure 1 :
Figure 1: The block diagram for cipher system.thecipher text to the receiver.The receiver applies the same key (or ruleset) to decrypt the message and recover the plaintext.Because a single key is used for both functions, secret key cryptography is also called symmetric encryption.The initial unencrypted data is referred to as plaintext.It is : Chromosome representation, the value of each gene in the chromosome represented the shifting value for each letter in the plain text file.

Figure ( 4 )
Figure (4): A basic steps of a typical genetic algorithm.

4 Figure ( 6 )
Figure (6): illustrates the child construction from two parents.Note the first two child the parent themselves, and second two parent result from crossover the parents in position 10 (10 selected randomly).

Figure ( 7 )Figure ( 7 )
Figure (7): shows an overview result of encrypted file for a variety of differing keys in several iterations.