Automatic Super-linear Self-Scaling VM-algorithm

In this paper, a new self-scaling VM-algorithm for unconstrained non-linear optimization is investigated. Some theoretical and experimental results are given on the scaling technique, which guarantee the Super-linear of the new proposed algorithm.


1.Introduction
Conjugate Gradient (CG) methods were first used to solve the general unconstrained problem by Fletcher and Reeves [14]. Their algorithm (or simple variants) is still frequently used, especially for problems with a large number of variables since they require only a few vectors of length n to be stored.
Given a symmetric positive definite matrix G, the finite set of nonnull vectors {d1, d2, …, dk} are said to form a conjugate set if j i all for 0 Gd d j T i  = An important class of quasi-Newton methods for solving the unconstrained optimization problem, [13] f(x), min n R x (1) was proposed by [7]. It consists f iterations of the form Here k is a step length parameter satisfies the Wolfe conditions with exact line search strategy, i.e.
where  is a scalar, yk = gk+1-gk, sk = xk+1-xk and The choice of the parameter k  is important, since it can greatly affect the performance of the methods. The BFGS method corresponds to k  = 0.
Variable metric (VM) methods were originally proposed by Davidon [11]. Subsequently, many authors have extended the theory and practice, [12] for a survey. The search direction in a VM-method is the solution of the system of equations: where the matrix Hk is an approximation to 1 k G − , the inverse Hessian of the function f(x). and: See [1] for more details and properties of this algorithm. Algorithm 1.1:, [6] (1) For a starting point x1 and non singular matrix V1 ; set k =1.   (10) and the update is performed directly on Vk .

Basic Results for Super Linear Convergence
First we define the following quantities to be used in this section: where G* is the Hessian of f at the minimizer x* .
The limiting behavior of k q and k Cos is enough to characterize the asymptotic rate of convergence of a sequence of iterates {xk} generated by a quasi-Newton algorithm. Their result which can be seen as a restatement of the, [12] characterization, is reproduced in the following lemma.

Lemma (2.1):
Suppose that the sequence of iterates {xk} is generated by algorithm (1.1) using some positive definite sequence{Bk},and that k = 1 whenever this value satisfies Wolfe conditions(4)- (5).If xk → x* then the following two conditions are equivalent : (i) The steplength k = 1 satisfies conditions (4)-(5) for all larg k and the rate of convergence is superlinear.
Proof: Proof of this lemma can be found in [9]. The next theorem specifies conditions on the scaling parameters k and k that allow k q and k Cos , produced by Algorithm 1.1, to exhibit the desirable limiting behavior of Lemma 2.1 . Such conditions involve the following quantities: and whether they sum finitely or not. Note that k and k need not be positive. Recall that the sets Ik and Jk defined by: contain the indices of the columns that are scaled down at iteration k. We are now ready to state the theorem.

Theorem 2.1:
For the quadratic function f, x1, B1, k and k satisfy the assumptions in theorem 1.1 . In addition, assume that G is Lipschitz continuous at x*. Let {xk} → x* be generated by Algorithm 1.1; then if the iterates converge superlinearly (for the case of non-quadratic functions, see [2] and, [3]. Then by the definition (9) of ci , We know that the iterates converge to x* r-linearly. Using this and the Lipschitz continuity of G at x* , it is not difficult to show, see [9] that: Now we describe a specific and modified implementation of algorithm 1.1 and make use of the theory developed so far to show that it is superlinearly convergent for strictly convex objective functions.

New Algorithm:
Step (0) Choose x1 and a nonsingular and lower matrix V1 ; set k = 1.
Step (1) Terminate if a stopping criterion is satisfied.
Step (2) Find an orthogonal matrix Qk such that Lk = Vk Qk is a lower triangular. Compute :  is a steplenghth that satisfies the Wolfe conditions (The stepsize 1 k =  is always tried first and is accepted if admissible). Compute: sk = xk+1xk yk = gk+1gk Step (3) Step (5) Set k = k + 1 and go to step (1).

3.Numerical Results
In order to asses the value of this new technique, numerical tests on twenty tests functions were carried out for unconstraint optimization problems. As a standard for the purpose of comparison, the test functions, (from general literature) were solved using two different VM-algorithms.
The standard BFGS algorithm. (ii) The new proposed algorithm (which it has been proved to be superlinear convergent algorithm).
All the numerical results were presented in tables (1)- (2). All the algorithms terminate whenever and the two algorithms use exactly the same line search strategy, namely, the cubic fitting technique directly adapted from that published by [8].
Analysis of the two tables shows that the new proposed VM-algorithm is superior to the standard BFGS algorithm. The superiority of the new algorithm is clear for high dimensionality test problems because of the automatic scaling strategy.

Final Remarks and Conclusions
We have described in this paper the conditions under which new automatic self-scaling algorithms based on the direct form of [1] VM-Update can be proved to be superlinearly convergent. Also some sort of numerical experiments have been done to know the effectiveness of the new proposed algorithm.
It is also possible to describe another similar algorithm based on the inverse scaled-BFGS algorithm. A column scaling algorithm which was proposed by [15] may be modified and implemented with this family of algorithms.
However, values of k, k selected in the new algorithm are arbitrary. It might occasionally be better to increase k and to decrease k. In any case, the theory developed in this paper will prove to be useful for analyzing the super linear convergence of this algorithm.
Finally this, idea may be extended to constrained optimization problems, see [5] for more details and for non-quadratic models see [4].