Conjugate Gradient Algorithm

Idea

It is a method that efficiently avoids the calculation of the inverse Hessian (Newton Method) by iteratively descending conjugate directions. We seek to find a search direction that is conjugate to the previous line search direction; that is, it will not undo progress made in that direction (which is the case in steepest descent).

Working

At training iteration \(t\), the next search direction \(d_t\) takes the form: \(d_t = \nabla_{\theta} J(\theta) + \beta_t d_{t-1}\) where \(\beta_t\) is a coefficient whose magnitude controls how much of the direction, \(d_{t-1}\), we should add back to the current search direction. Two directions, \(d_t\) and \(d_{t-1}\), are defined as conjugate if \(d_t^T Hd_{t-1} = 0\), where \(H\) is the Hessian matrix.