A New Restarting Criterion for FR-CG Method with Exact and Inexact Line Searches

Maha S. Younis College of Education University of Mosul, Iraq Received on: 11/09/2007 Accepted on: 04/03/2008 ABSTRACT A new restarting criterion for FR-CG method is derived and investigated in this paper. This criterion is globally convergent whenever the line search fulfills the Wolfe conditions. Our numerical tests and comparisons with the standard FR-CG method for large-scale unconstrained optimization are given, showining significantly improvements.


INTRODUCTION:
The classical conjugate gradient method to minimize a non linear function f(x) of the vector variable x= (x1, x2,.............., xn) T is an iterative method defined by ... (1) ... (2) and ... (3) where ) i x ( f i g  = , i  is a line search parameter, and ...(4) with the method was originally proposed by Hestenes and Stiefel [Hestenes and Stiefel, 1952] to solve a systems of linear equations, and first applied to nonlinear optimization problems by Fletcher and Reeves [Fletcher and Reeves, 1964].
In the orginial Fletcher-Reeves paper, the parameter i  defined by (4) is redefined by: …(5) The definitions (4) and (5) are identical if is chosen to minimize f(x) along i d and f(x) is quadratic.
Polak and Ribiere [Polak and Ribiere, 1969] suggested a i  defined by : ... (6) which is identical to (4) whenever a i  is chosen to minimize f(x) along i d , independent of any assumption.
Shanno  noted that the search direction (3) was equivalent to: .. (7) , whenever The last condition is simply the condition that a i  minimize f(x) along di, an advantage of (7) over (3) is that under much looser line search criteria than exact line minimization, the direction is a descent direction, while all the above algorithms reduce to the same algorithm under the assumption of exact line minimization and a quadratic f(x). A complicated algorithm based on (7), using self scaling, Beale restarts [Beale, 1972] and Powell's restart criterion [Powell, 1977] has been implemented [Shanno and Phua, 1980], and shown to be generally numerically far more efficient than any of the standard algorithms using (3) with various choices of βi. Further, the algorithm has been shown to converge to a stationary point of f(x)  under loose line search criteria for convex functions, but has not been shown convergent for general functions satisfying the conditions that: F(x) has continuous second partial derivatives ... (8) And the set x defined by: Zoutendijk (1970) showed convergence of the Fletcher-Reeves conjugate gradient method, corresponding to the choice of defined by (5), for such functions which have also recently been shown by Powell (1983).
Furthermore, on the sequence of points for which cycling occurs, g(x) is bounded away from zero.
It is the purpose of this note to show that convergence proof for the Fletcher-Reeves method may be used to guarantee convergence to stationary point for any conjugate gradient method. Numerical results testing the proposed modification on the algorithm of Shanno and Phua show that the efficiency of the modified algorithm is no worse than the original algorithm, and is sometimes better.
Further, test results indicate potential real improvement of the original algorithm may be achieved for at least some large problems. As large problems are the problems for which conjugate gradient methods have been devised, the test appears to have computational as we as theoretical utility [Shanno, 1985].
The work of Hestenes and Stiefel,(1952) presents achoice for closely related to the Polak and Ribiere scheme : ... (10) If is obtained by an exact line search, then by (3) we have: ... (11) Hence when is obtained by an exact line search. More recent nonlinear conjugate gradient algorithms include the conjugate descent algorithm of Fletcher (1987) the scheme of Liu and Storey [1991], and the scheme of Dai and Yuan, (1999), (See also the survey article of Hager and Zhang, (2006). The scheme of Dai and Yuan corresponds to the following choice for the update parameter [Hager and Zhang, 2006]. By:

Restarting Criteria for a CG-Algorithm:
... (12) In the implementation of many CG-algorithms, one may often meet the difficulty that the search direction of some iteration is very poor. For example, the Newton direction is not well-defined if the Hessian of the objective function is singular but not positive, the Newton's direction is not necessarily a descent direction. Also PR-CG is now believed to be one of the most efficient CG-methods even for strictly convex quadratic function. however, PR-CG method with strong Wolfe condition may produce an uphill search direction is poor, a simple way is to restart. The method withgk is to guarantee the global convergence of the method. In this section, we can investigate and derive a new restarting criterion restart FR-CG and still obtain the global convergence property.
CG-methods are usually implemented with restarts after n iterations, to match the quadratic model and in order to avoid the effects of an accumulation of errors. It was shown by Cohen (1972) that several restarted CG-methods have n-step quadratic convergence. It was established by Crounder and Wolfe (1972) that if restating is not employed for general functions, the convergence of CG-methods will only be linear : they also came to the conclusion that convergence is not better than linear for quadratic functions. Again Powell (1976) showed that for a convex quadratic function the convergence rate is linear. Fletcher and Reeves (1964) suggested restarting their algorithm every n iterations where n is the number of variables. Their standard reset was: di=-gi for i=l, n, 2n ,... ... (13) The following remarks show that the Fletcher-Reeves algorithm may be inefficient for several iterations if a search direction di occurs that is almost orthogonal to the steepest decent direction -gi. We let be the angle between di and -gi, the definition : di=-gi + di-1 ... (14) and the orthogonality of gi to di-1. This is useful because it gives the equation : ...(15) Further, if i is replaced by (i + 1) in the figure, we find the identity : ...(16) Fig.(1)...The definition of We may eliminate from equations (15) and (16) and substitute the definition to deduce the inequality : ... (17) Now if is close to l/2π , the iteration may take a very small step in which case the change (gi+1-gi) is small also. Thus the ratio is close to one. It follows from inequality (17) that is close to l/2 π , so slow progress may occur again on the next iteration.
Numerical calculations, show that this inefficient behavior can continue for several iterations when is defined by equation Suppose that the early iterations of the algorithm have made positive, but that a region in the space of the variables has been reached where f(x) is the quadratic function : f(x)= ... (18) In this case the line search along di makes the ratio equal to sin Therefore the first line of expression (17) shows that is equal to . Thus the angle between the search direction and the steepest descent direction remains constant for all iterations, which is highly inefficient if is close to 1/2π. Note that this inefficient behavior is corrected by a steepest descent restart.
Alternatively, if expression is used to define then the iterations of the conjugate gradient method have never seemed to be less efficient than those of the steepest descent method. We used equations (15) and (16) to show that the behavior described in the last two paragraphs does not occur. Now the definition of provides the bound: ... (19) So the elimination of from the two equations gives the inequality.
... (20) It follows that, if is close to 1/2 π and if this causes the step from xi to Xi+1 to be so small that the change (gi+i -gi) is much less than , then the tan is much less than sec Thus the search direction di+1 is turned towards the steepest descent direction. Inequality (20) is sufficiently powerful to prove the following convergence theorem which, in contrast to a similar theorem given by Polak (1971) does not require f(x) to satisfy any convexity conditions.

A new restarting criterion for FR-CG method
In this section we are going to introduce a new descent condition to FR-CG method as:

Numerical Results:
The numerical performance of the CG-methods is greatly improved by using restarts. The disadvantages of restarting according to (13) is that the immediate reduction in the objective function is usually less than that what it would be without restarts, Moreover it is inefficient of errors and has already affected the conjugacy property.
A restart direction different from (13) was proposed by Beale, (1972) , which can be used to derive a sophisticated restart procedure. The merit of Beale's restarting direction is that it allows an increase in the immediate reduction of the function value when using CG-method to minimize a non quadratic function. Powell (1977), also developed a new procedure for restarting CGmethods. He suggested a restart criterion whenever: ...(26) The rationale behind this check is that successive gradients will be close to orthogonality. He also checked that the new search direction di+1 will be sufficiently downhill, using the formula: ... obtained by a special nonlinear scaling of a quadratic function has been considered by Tassopoulos and Story (1984), with an arbitrary search direction other than the steepest descent with evident success (Al-Bayati, 1993).
And we define some symbols we use in the tables: NOI = The number of iterations. NOF = The number of function evaluations. ELS = Exact Line Searches. ILS = Inexact Line Searches.