Parallel Programming for Solving Linear BVP’s by Linear Superposition using CS_Tools

The objective of this paper is to speed the solution of linear Boundary Value Problem (LPVPs) by parallel Programming by using cs_tools where the solution methods is the linear superposition Algorithm. ملا ةجمربلا ت لحلل ةيزاو لا ةيدودحلا ةميقلا ةلاسمل مادختساب ةيطخ ةبكرملا قوف ةقيرطلا ةيطخلا مادختساب CS-Tools فلخ دمحم ريشب ديعس م سارف ةيلك ةيبرتلا لصوملا ةعماج قا رعلا ،لصوملا ، بطلا ةيلك لصوملا ةعماج قا رعلا ،لصوملا ، :ثحبلا ملاتسا خيرات 17 / 09 / 2012 :ثحبلا لوبق خيرات 30 / 01 / 2013 صخلملا ةدد مر لا ةطددلاو ةدد س زلا بيددةلا لبادد م لدد ةقرددل اساددح وددا دددز لا احددا اددم م ددهلا ا تددلاس ةددح اوتملا cs_tools ام نق يا لزلا ةقحرط نوكت .ة كرملا قوف ةقحرطلا


Introduction
Theoretical techniques require practical Implementation to confirm their efficiency and validity: the latter is affected by the computer system available, especially when considering parallel algorithms. Practical problems of development or implementation e.g. the reliability and accessibility of the parallel systems can obviously offset the gain in execution time. Here, we have not attempted a comprehensive implementation of wide-ranging applications but a single straightforward problem for each of the three principal techniques presented. The Meiko equipment briefly is described in Appendix B [2].
In this paper, we discuss the programming and running of these parallel algorithms for the test of the problems on the Meiko Computing Surface by using CS-Tools, with C as the programming language. We have only considered the parallel solution of ODEs. This area of differential equations, is well known which consists of two main categories [1, 5 and 6]: 1) Initial value problems, and 2) Boundary value problems.
For the first category, there is no essential difference between the numerical solution of linear and non-linear cases, so that, we consider only the best of our parallel techniques for this category on the Meiko transputer system. We noted in reference [2] that the newly developed PBS technique offers several advantages over the other parallel techniques, such as: 1) It is completely parallel, 2) It is more effective in controlling the numerical stability of the numerical algorithms for initial value problems by virtue of integrating over small subintervals only. This is, accordingly, the technique applied, and is considered in [2] .In contrast, there are various techniques available for solving either Linear or nonlinear boundary differential equations. Accordingly [2], we consider the parallel execution of the preferred parallel method for linear BVPs, namely, the parallel linear superposition algorithm, whereas in [2], we consider the parallel execution of the PS techniques for non-linear BVPs. The advantages of the PS techniques are: 1) They are completely parallel, 2) They are particularly effective in controlling the numerical instability of the solutions of the BVPs.
Before considering the parallel execution of these parallel techniques, it is essential to understand the problem of implementation to give a brief explanation of the Programmer's Guide to Sun-CS-Tools [5].

Introduction to Sun-CS-Tools [2, 3]
CS-Tools is a parallel program development system which runs on Sun workstations and supports parallel programming of In-Sun and Sun-hosted Computing Surface hardware, using standard C and FORTRAN only. It has four components: 1. A set of compilers, 2. A library of communications routines for C and FORTRAN, 3. A loader for distributed programs, 4. A runtime support environment.

Parallel Programming using CS-Tools [2, 3 and 5]
Here, we need to specify clearly the communicating processes model of parallel programming used in CS-Tools (further literature is available from meiko which illustrates, its use in a number of high computation application areas.
To make use of the communication processes approach, the programmer must first structure an application as a number of separate processes, each process being a conventional independent C program. The set of programs is constructed to work cooperatively on a single overall task. A key feature of the model is that processes communicate and synchronize only by means of message passing systems calls. The computing-surface hardware offers parallel processing in the form of a number of independent high performance microprocessors. A programmer makes use of this parallelism by arranging for different processes to execute on different processors. Interprocess communication and synchronization is provided entirely by system library calls. The programmer specifies which processes run on which processor by writing a simple text file which is read by mrun, the parallel network loader.
The Run Time Executive (RTE) provides operating system facilities to application programs running on any processor; where services cannot be provided locally they are referred to the Sun host machine.
The Computing Surface Network (CSN) provides a mechanism through which any application process is able to pass messages, apparently directly, to any other. It hides details of physical connectivity from the programmer: it supports the abstraction that messages are sent to and received from system-global message ports. The CSN handles the transfer of messages between these ports transparently to the user.

Figure (1). A model for sending messages through a port
A port may be referenced and used identically from any point in the network. Where it is necessary to transport messages between processors; this is done invisibly by the CSN. A port is created at run time by an application code fragment. On creation, each port is associated with a user defined name. Other code fragments may use this name to identify a destination address for messages. A port is a one way communication path only. Only one process creates each port from which it, then receives messages. Any number of processes may send message to it. Messages sent to ports can be automatically buffered and queued by the CSN. A block-until-received mode of operation is also supported.
A number of routines is provided to enable applications programmers to use ports from C programs. They include 'status reporting' and 'network control' as well as message-passing routines for various data types. Four of the basic message-passing routines, which exemplify the use of message ports are: 1) cs-createport ("name") creates a port and associates it with the user defined character string name. For example: port1=cs_createport("portl"); creates the port portl. 2) cs_findport ("name".block) must be used by message senders before communication can commence with a port. For example: portl-cs_findport("portl",1); this can be used by the message sender to send messages to the port portl previously created with the name "portl". Block determines behaviour in the event that the port does not exist. 3) (void) cs_send(name, data, data-size, block) is used to send message data, held in a character buffer pointed to by data, to the port represented by the string name, datasize holds the number of bytes to be sent, block determines blocking behaviour (in a way similar to cs_findport). 4) (void) cs_recv(name,data,data-size) is used to receive a message from the port referenced by the string name. This message is copied into the buffer area pointed to by data, data-size is used to specify the maximum size of a message.

Running Parallel Linear Superposition on the Meiko Computing Surface using CS-Tools [2, 5 and 6]
The parallel linear superposition method requires at most (n+1) independent partial solutions of a system which consists of n first-order linear differential equations, followed by the solution of a linear system of at most n algebraic equations. The various cases of the distribution of the (n+1) partial solutions over the p independent processors of the computing system have been considered in detail in [2]. In this section, we consider the running of the linear parallel superposition algorithm on the School's particular transputer system, namely, the Meiko Computing Surface using CS-Tools; we discuss an example which is executed on the Computing Surface. The determination of the general solution of this particular example requires 5 partial solutions of the system in [1]. The general solution of the example is: y(x)=y*(x)+ , Where y*(x)=PI (the particular integral).
Computing each of the partial solutions can be assigned to a single processor. A conventional C code is constructed for computing each partial solution by the Runge-Kutta-Merson algorithm. The respective codes which compute y1(x), y2(x), y3(x) and y4(x) are included in the files "superpyl.c", "superpy2.c", "superpy3.c" and "superpy4.c", and the code which computes y*(x) is included in the file "superppi. c". The last code also solves the linear algebraic system which determines A3 and A4 only, since the values of A1 and A2 can be estimated directly. The "superposition.par" file specifies the distribution of the executable files amongst the processors. The "superposition.par" file is composed of the following statements: par processor 1 superpyl processor 2 superpy2 processor 3 superpy3 processor 4 superpy4 processor 5 superppi endpar

Appendix D Parallel Codes For Parallel Linear Superposition
This Appendix lists the parallel codes of the parallel linear superposition for solving Example (5.5.1).