Roots II

Minimization

1 Minimization

In this lecture, instead of finding function zeros, we are now going to be locating function minima. Note that we will be studying the problem of unconstrained minimization, meaning that we will not be imposing any further constraints on our variables; we are simply looking for the variable values that minimize a scalar function.

1.1 One-Dimensional Minimization

For simplicity, let’s start from the case of a function of a single variable, \phi(x). As you may recall from elementary calculus, a stationary point (which, for a differentiable function, is also known as a critical point) is a point at which the derivative vanishes, namely \phi'(x^*) = 0 where we are now using x^* to denote the stationary point. If \phi''(x^∗) > 0 we are dealing with a local minimum, whereas if \phi''(x^*)< 0 a local maximum. Minima and maxima together are known as extrema.

A simple example is our function \phi(x) = e^{x - \sqrt{x}} - x. In the previous section we see that it has two zeros at \simeq 1 and \simeq 2.5. We are interested in its minimum, which is located at x^* \simeq 1.8, as can be seen in Figure 1.

Figure 1: The function f(x) = e^{x - \sqrt{x}} - x

It is easy to see that \phi''(x^*)>0 so that is a (single) minimum. To find out this minimum, we can in principle apply the root finding method you learned last time on the function f(x) = \phi'(x).

1.2 Multidimensional Minimization

The problem of multidimensional minimization is, in general, much harder to solve; as the dimensionality grows one cannot even visualize what’s going on very effectively. We start with some mathematical aspects of the problem, then turn to a two-dimensional example, and after that discuss specific minimization methods.

General Features

Consider a scalar function of many variables, i.e., \phi(\boldsymbol{x}), where \boldsymbol{x} bundles together the variables x_0, x_1,\dots, x_{n−1} but \phi produces scalar values. We will now employ a multidimensional Taylor expansion.

We assume \phi(\boldsymbol{x}) has bounded first, second, and third derivatives. Then \phi(\boldsymbol{x} + \boldsymbol{q}) = \phi(\boldsymbol{x}) + (\nabla\phi(\boldsymbol{x}))^T\boldsymbol{q} + \frac{1}{2}\boldsymbol{q}^T\boldsymbol{H}(\boldsymbol{x})\boldsymbol{q} + O(\|\boldsymbol{q} \|^3).

Here, the first-order term involves \nabla\phi(\boldsymbol{x}), the gradient vector of \phi at \boldsymbol{x}. This is \nabla \phi(\boldsymbol{x}) = \left(\frac{\partial\phi}{\partial x_0} \quad \frac{\partial\phi}{\partial x_1} \quad \dots \quad \frac{\partial\phi}{\partial x_{n-1}} \right)^T. The first order term is simply (\nabla\phi(\boldsymbol{x}))^T\boldsymbol{q} = \sum_{j=0}^{n-1} \frac{\partial \phi}{\partial x_j} q_j = \nabla\phi(\boldsymbol{x})\cdot \boldsymbol{q}.

Note that \nabla \phi(\boldsymbol{x}) is the direction of steepest ascent, to which we will come back later. To see this, we have for small \boldsymbol{q} the term linear in \boldsymbol{q} is the dominant contribution since \|\boldsymbol{q} \|^2 \ll \| \boldsymbol{q} \|. If we choose \boldsymbol{q} to be aligned with \nabla \phi(\boldsymbol{x}), then the dot product \nabla\phi(\boldsymbol{x})\cdot \boldsymbol{q} will be maximized.

Assuming \boldsymbol{x}^∗ is a local minimum of \phi and ignoring higher-order terms we have \phi(\boldsymbol{x}* + \boldsymbol{q}) \simeq \phi(\boldsymbol{x}^*) + \nabla\phi(\boldsymbol{x^*})\cdot \boldsymbol{q}. On the other hand, if we choose \boldsymbol{q} to be aligned in the direction of -\nabla\phi(\boldsymbol{x}), then \nabla\phi(\boldsymbol{x})\cdot \boldsymbol{q}<0 whenever \nabla\phi(\boldsymbol{x})\neq 0. This then means \phi(\boldsymbol{x}* + \boldsymbol{q}) < \phi(\boldsymbol{x}^*), in contradiction to the fact \phi(x^*) is a local minimum. Because of this, we must have \nabla \phi(\boldsymbol{x^*}) = \boldsymbol{0} at local minima (in general extrema or critical points).

Having established that the gradient vector vanishes at a critical point, we now turn to the second-order term in the Taylor expansion, which involves the Hessian matrix, \boldsymbol{H}(\boldsymbol{x}). To see what this is, we expand the quadratic form as follows \frac{1}{2}\boldsymbol{q}^T \boldsymbol{H}(\boldsymbol{x})\boldsymbol{q} =\frac{1}{2} \sum_{i,j = 0}^{n-1} \frac{\partial^2\phi}{\partial x_i\partial x_j} q_i q_j. This means the matrix element H_{ij}(\boldsymbol{x}) \equiv (\boldsymbol{H}(\boldsymbol{x}))_{ij} =\frac{\partial^2\phi}{\partial x_i\partial x_j}.

Since the lowest order in Taylor expansion is the second order, we have \phi(\boldsymbol{x}* + \boldsymbol{q}) \simeq \phi(\boldsymbol{x}^*) + \frac{1}{2}\boldsymbol{q}^T \boldsymbol{H}(\boldsymbol{x}^*)\boldsymbol{q} + O(\|\boldsymbol{q} \|^3).

If we now further assume that \boldsymbol{H}(\boldsymbol{x}^∗) is positive definite (meaning \boldsymbol{v}^T \boldsymbol{H}\boldsymbol{v}>0, \forall \boldsymbol{v} \neq \boldsymbol{0}), then we can see that, indeed, \phi(\boldsymbol{x}^* + \boldsymbol{q}) > \phi(\boldsymbol{x}^*), as it should, since \boldsymbol{x}^* is a minimum.

To summarize:

  1. a necessary condition for x^∗ being a local minimum is that it be a critical point, i.e., that its gradient vector vanish, and

  2. a sufficient condition for the critical point x^∗ being a local minimum is that its Hessian matrix be positive definite.

A Two-Dimensional Example

Let us consider \phi(x_0, x_1) = x_0^2 − 2x_0 + x_1^4 - 2x_1^2 + x_1. This function is shown in Figure 2.

Figure 2: Example of a scalar function of two variables

This is attempting both to visualize the third dimension and to draw equipotential curves (also known as contour lines).

We find that we are dealing with two local minima. The one on the “right” leads to smaller/more negative function values, so it appears to be the global minimum. If you place a marble somewhere near these two well, it will roll down to one of the minima; which of the two minima you end up in depends on where you start.

1.3 Gradient Descent

We now turn to a simple and intuitively clear approach to multidimensional minimization. This method, known as gradient descent, does not exhibit great convergence properties and can get in trouble for non-differentiable functions. Even so, it is a pedagogical, straightforward approach.

Algorithm and Interpretation

Recall that \nabla\phi(\boldsymbol{\boldsymbol{x}}) is in the direction of steepest ascent. This leads to the conclusion that -\nabla\phi(\boldsymbol{x}) is the direction of steepest descent. We know that choosing \boldsymbol{q} to point along the negative gradient guarantees that the function value decrease will be the fastest. The method we are about to introduce, which employs -\nabla \phi(\boldsymbol{x}), is known as gradient descent.

Qualitatively, this approach makes use of local information: if you’re exploring a mountainous region (with your eyes closed), you can take a small step downhill at that point; this doesn’t mean that you’re always actually moving in the direction that will most quickly bring you to a (possibly distant) local minimum, simply that you are moving in a downward direction.

Implicit in our discussion above is the fact that the steps we make will be small: while -\nabla \phi(\boldsymbol{x}) helps you pick the direction, it doesn’t tell you how far in that direction you should go, i.e., how large a \|\boldsymbol{q} \| you should employ. The simplest possible choice, analogously to our closed-eyes example, is to make small fixed steps, quantified by a parameter \gamma. This leads to the following prescription: \boldsymbol{x}^{(k)} = \boldsymbol{x}^{(k-1)} -\gamma \nabla \phi(\boldsymbol{x}^{(k-1)}). At each step, this method picks the direction that is perpendicular to the contour line, as illustrated in Figure 3.

Figure 3: Illustrate of gradient descent

In practice, we may not know the gradient analytically, and we typically approximate it using a forward-difference scheme. In equation form, this means that \nabla\phi(\boldsymbol{x}) = \begin{pmatrix} [\phi(\boldsymbol{x} + \boldsymbol{e}_0 h) - \phi(\boldsymbol{x})]/h\\ [\phi(\boldsymbol{x} + \boldsymbol{e}_1 h) - \phi(\boldsymbol{x})]/h\\ \cdots \\ [\phi(\boldsymbol{x} + \boldsymbol{e}_{n-1} h) - \phi(\boldsymbol{x})]/h\\ \end{pmatrix} for a given spacing h. This involves “only” n+1 function evaluations.

Implementation

Note that in the function gradient(), we used xs*np.ones((n,n)).T. The meaning of this expression can be understood in the following way.

  • Let’s say xs=np.array([x0,x1])
  • Then xs*np.ones((2,2)) = np.array([[x0,x1],[x0,x1]])
  • Thus,(xs*np.ones((2,2))).T = np.array([[x0,x0],[x1,x1]])

Then, Xph = (xs*np.ones((n,n))).T + np.identity(n)*h = np.array([[x0+h,x0],[x1,x1+h]]).

When Xph is inserted into phi(), we have x0,x1 = Xph which leads to x0 = np.array([x0+h,x0]), x1 = np.array([x1,x1+h]). Then the function phi(Xph) returns a vector \begin{pmatrix} \phi(\boldsymbol{x}+\boldsymbol{e}_0 h) \\ \phi(\boldsymbol{x}+\boldsymbol{e}_1 h) . \end{pmatrix}

import numpy as np

def phi(xs):
    x0, x1 = xs
    return x0**2 - 2*x0 + x1**4 - 2*x1**2 + x1

def gradient(phi,xs,h=1.e-6):
    n = xs.size
    phi0 = phi(xs)
    Xph = (xs*np.ones((n,n))).T + np.identity(n)*h
    grad = (phi(Xph) - phi0)/h
    return grad

def descent(phi,gradient,xolds,gamma=0.15,kmax=200,tol=1.e-8):
    for k in range(1,kmax):
        xnews = xolds - gamma*gradient(phi,xolds)

        err = termcrit(xolds,xnews)
        print(k, xnews, err, phi(xnews))
        if err < tol:
            break

        xolds = np.copy(xnews)
    else:
        xnews = None
    return xnews

def termcrit(xolds,xnews):
    errs = np.abs((xnews - xolds)/xnews)
    return np.sum(errs)
    
if __name__ == '__main__':
    xolds = np.array([2.,0.25])
    xnews = descent(phi, gradient, xolds)
    print(xnews)
1 [1.69999985 0.24062524] 0.21543067857184484 -0.38182351336797915
2 [1.48999974 0.22664124] 0.20264073065897387 -0.6333530209502826
3 [1.34299967 0.20564122] 0.21157625945177916 -0.7594983275463574
4 [1.24009962 0.17380848] 0.2661256195506861 -0.8280498618293368
5 [1.16806958 0.12494345] 0.45276305522029103 -0.8777871985954815
6 [1.11764856 0.04873952] 1.608607274900684 -0.9421647380026318
7 [ 1.08235384 -0.07208595] 1.7087398591446958 -1.0756695552415236
8 [ 1.05764754 -0.26511247] 0.7514526354432409 -1.3974185387806914
9 [ 1.04035313 -0.56299971] 0.5457308766418241 -2.0948395460096236
10 [ 1.02824704 -0.94372756] 0.41520335042339795 -2.9309660540915905
11 [ 1.01977278 -1.15566205] 0.19169789499758141 -3.0426740826911614
12 [ 1.01384079 -1.0729902 ] 0.08289908836262229 -3.0499045321707055
13 [ 1.00968841 -1.12557975] 0.05083473301209256 -3.054234380060766
14 [ 1.00678173 -1.09531014] 0.03052274820652257 -3.05538233943229
15 [ 1.00474706 -1.11406803] 0.018862352410894102 -3.05589334211737
16 [ 1.00332279 -1.10287596] 0.011567627294141984 -3.056063921191618
17 [ 1.00232581 -1.1097221 ] 0.007163907055812726 -3.056132246972263
18 [ 1.00162791 -1.10559374] 0.004430823076617533 -3.056157117969501
19 [ 1.00113939 -1.10810556] 0.0027547414719851855 -3.0561667943226265
20 [ 1.00079742 -1.10658539] 0.0017154484765218943 -3.0561704829239886
21 [ 1.00055805 -1.10750841] 0.0010726689437897882 -3.0561719231405475
22 [ 1.00039048 -1.10694906] 0.0006728037904015246 -3.056172494841105
23 [ 1.00027319 -1.10728843] 0.0004237463185425745 -3.056172722103985
24 [ 1.00019108 -1.10708268] 0.00026793975693115976 -3.0561728168266145
25 [ 1.00013361 -1.10720748] 0.00017018001506541433 -3.0561728552582403
26 [ 1.00009337 -1.1071318 ] 0.00010858056296563575 -3.0561728723063135
27 [ 1.00006521 -1.1071777 ] 6.961331368515191e-05 -3.0561728792903975
28 [ 1.0000455  -1.10714986] 4.485147212742855e-05 -3.0561728826380756
29 [ 1.0000317  -1.10716674] 2.9044432169820975e-05 -3.056172883986762
30 [ 1.00002204 -1.10715651] 1.8904903190473045e-05 -3.0561728846981224
31 [ 1.00001528 -1.10716272] 1.236866979687839e-05 -3.056172884967764
32 [ 1.00001054 -1.10715895] 8.133575043645464e-06 -3.0561728851287384
33 [ 1.00000723 -1.10716123] 5.37538599036186e-06 -3.056172885182245
34 [ 1.00000491 -1.10715985] 3.5698772250060277e-06 -3.0561728852203363
35 [ 1.00000329 -1.10716069] 2.3819480385739737e-06 -3.056172885230078
36 [ 1.00000215 -1.10716018] 1.5963972365386816e-06 -3.056172885239327
37 [ 1.00000136 -1.10716049] 1.0743651801166516e-06 -3.0561728852405894
38 [ 1.0000008 -1.1071603] 7.259538200711832e-07 -3.0561728852428383
39 [ 1.00000041 -1.10716041] 4.923441634847997e-07 -3.0561728852427184
40 [ 1.00000014 -1.10716035] 3.3505999120764203e-07 -3.0561728852432424
41 [ 0.99999995 -1.10716039] 2.287712532568859e-07 -3.0561728852430368
42 [ 0.99999981 -1.10716036] 1.564959858869905e-07 -3.0561728852431402
43 [ 0.99999972 -1.10716038] 1.0724305579788708e-07 -3.056172885243015
44 [ 0.99999965 -1.10716037] 7.378388107643756e-08 -3.0561728852430234
45 [ 0.99999961 -1.10716037] 5.101074560430894e-08 -3.056172885242959
46 [ 0.99999957 -1.10716037] 3.5236294141767044e-08 -3.0561728852429515
47 [ 0.99999955 -1.10716037] 2.431386477902596e-08 -3.056172885242921
48 [ 0.99999954 -1.10716037] 1.6863918806819815e-08 -3.0561728852429133
49 [ 0.99999953 -1.10716037] 1.1653038863784987e-08 -3.056172885242899
50 [ 0.99999952 -1.10716037] 8.088151816259404e-09 -3.0561728852428938
[ 0.99999952 -1.10716037]

1.4 Newton’s Method

Gradient descent is nice and simple but if your \gamma is chosen to be very small you will waste iterations. Or if you choose \gamma large, the convergence may not be met as well.

A distinct approach goes as follows: instead of using only the value of the gradient at a given point, perhaps we should be building in more information. Specifically, we have that gradient vanishes at a critical point \nabla\phi(\boldsymbol{x}^*) = \boldsymbol{0}.

Combine that with the fact that the gradient of a scalar function is a column vector, and you can recast that equation as a set of n coupled nonlinear equations \boldsymbol{f}(\boldsymbol{x}) = \nabla\phi(\boldsymbol{x}) = \boldsymbol{0}.

In other words, finding a critical point is simply a special case of solving a nonlinear system of equations. We shall apply the Newton’s method in the following to find out the critical point.

Map to a finding roots of multiple nonlinear equations

We assume that \boldsymbol{f} has bounded first and second derivatives; the actual solution of our problem \boldsymbol{f}(\boldsymbol{x})=\boldsymbol{0} is \boldsymbol{x}^∗ and we will be trying to approximate it using iterates, which this time are themselves vectors, \boldsymbol{x}^{(k)}. In order to make the transition to the general problem as simple as possible, let’s start from a Taylor expansion of a single function component f_i around our latest iterate \boldsymbol{x}^{(k-1)}: f_i(\boldsymbol{x}) = f_i(\boldsymbol{x}^{(k-1)}) + (\nabla f_i(\boldsymbol{x}^{(k-1)}))^T(\boldsymbol{x} - \boldsymbol{x}^{(k-1)}) + O(\|\boldsymbol{x} - \boldsymbol{x}^{(k-1)} \|^2), where i = 0, 1, \dots, n-1 is the index for each component of \boldsymbol{f}. We can rewrite the second term on the right-hand side in terms of vector components (\nabla f_i(\boldsymbol{x}^{(k-1)}))^T(\boldsymbol{x} - \boldsymbol{x}^{(k-1)}) = \sum_{j=0}^{n-1}\left. \frac{\partial f_i}{\partial x_i}\right|_{x_j^{(k-1)}}(x_j - x_j^{(k-1)}).

With a view to collecting the n function components together, we now introduce the Jacobian matrix: \boldsymbol{J}(\boldsymbol{x}) = \left\{\frac{\partial f_i}{\partial x_j} \right\} = \begin{pmatrix} \frac{\partial f_0}{\partial x_0} &\frac{\partial f_0}{\partial x_1} &\cdots &\frac{\partial f_0}{\partial x_{n-1}} \\ \frac{\partial f_1}{\partial x_0} &\frac{\partial f_1}{\partial x_1} &\cdots &\frac{\partial f_1}{\partial x_{n-1}} \\ \vdots &\vdots & \ddots & \vdots \\ \frac{\partial f_{n-1}}{\partial x_0} &\frac{\partial f_{n-1}}{\partial x_1} &\cdots &\frac{\partial f_{n-1}}{\partial x_{n-1}} \end{pmatrix}.

We can thus write \boldsymbol{f}(\boldsymbol{x}) = \boldsymbol{f}(\boldsymbol{x}^{(k-1)}) + \boldsymbol{J}(\boldsymbol{x}^{(k-1)})(\boldsymbol{x} - \boldsymbol{x}^{(k-1)}) + O(\|\boldsymbol{x} - \boldsymbol{x}^{(k-1)} \|^2).

The solution correspoding to \boldsymbol{f}(\boldsymbol{x}^*)=\boldsymbol{0} satisfies \boldsymbol{0} = \boldsymbol{f}(\boldsymbol{x}^{(k-1)}) + \boldsymbol{J}(\boldsymbol{x}^{(k-1)})(\boldsymbol{x}^* - \boldsymbol{x}^{(k-1)}), where we neglected the second order terms.

In practice, one iteration will not be enough to find the solution so, instead, we use our latest formula to introduce the prescription of Newton’s method for the next iterate, x^{(k)}: \boldsymbol{J}(\boldsymbol{x}^{(k-1)})(\boldsymbol{x}^{(k)}-\boldsymbol{x}^{(k-1)}) = -\boldsymbol{f}(\boldsymbol{x}^{(k-1)}).

When we are minimizing a function \phi(\boldsymbol{x}), we take the above function \boldsymbol{f}(\boldsymbol{x}) = \nabla\phi(\boldsymbol{\phi}). Thus, the Jacobian \boldsymbol{J}(\boldsymbol{x}) = \boldsymbol{H}(\boldsymbol{x}), where \boldsymbol{H}(\boldsymbol{x}) is the Hessian matrix at point \boldsymbol{x}. We hence have \boldsymbol{H}(\boldsymbol{x}^{(k-1)})(\boldsymbol{x}^{(k)}-\boldsymbol{x}^{(k-1)}) = - \nabla\phi(\boldsymbol{\phi}).

Since all quantities at the location of our previous iterate, x^{(k−1)}, are known, this equation has the form of \boldsymbol{Ax} = \boldsymbol{b}, i.e., it is a linear system of n equations in n unknowns. Assuming \boldsymbol{J}(\boldsymbol{x}^{(k−1)}) is non-singular, we can solve this system and then we will be able to find the \boldsymbol{x}^{(k)}. This process is repeated, until we satisfy a termination criterion, which could be taken to be \sum_{j=0}^{n-1} \left|\frac{x_j^{(k)}-x_j^{(k-1)}}{x_j^{(k)}} \right| \leq \epsilon.

In the following, we shall apply the Newton’s method to a physics problem.

2 Project: Extremizing the Action in Classical Mechanics

2.1 Defining and Extremizing the Action

Let us study a single particle in one dimension. We can denote the particle’s location by x(t), where we’re explicitly showing that the position is a function of time.

The kinetic energy of the particle will be a function of only \dot{x}(t), i.e., of the time derivative of the position: K = K(\dot{x}(t)). Specifically, since we are dealing with a single particle, we know that: K = \frac{1}{2} m\dot{x}^2 where m is the mass of the particle. Similarly, in the absence of time-dependent external fields, the potential energy is a function of only x(t): V = V(x(t)). The difference of these two quantities is defined as the Lagrangian: L(x(t),\dot{x}(t)) \equiv K(\dot{x}(t)) - V(x(t)), where, for our case, there is no explicit dependence of the Lagrangian on time.

We are interested in studying the particle from time t = 0 to time t = T. Then, one can define the action functional as the integral of the Lagrangian over time: S[x(t)] = \int_0^T dt \, L(x(t),\dot{x}(t)) = \int_0^T dt\, \left(\frac{1}{2}m\dot{x}^2 - V(x)\right).

Notice that we called the action a functional and used square brackets on the left-hand side. Roughly speaking, a functional is a function of a function. A reminder: an ordinary function \phi takes us from one number t to another number \phi(t). A functional is an entity that takes in an entire function and gives back a number. In other words, a functional is a mapping from a space of functions into the real (or complex) numbers.

In case you haven’t encountered functionals before, let us start with a simple case: a functional F of \phi(t) (where t is a regular variable – this is a one-dimensional problem): F[\phi(t)]. Being a functional, F depends simultaneously on the values of \phi at all points t but it does not depend on t itself: we provide it with the entire function and it provides us with one number as a result. A trivial example: F[\phi] =\int_1^0 dt\phi(t) gives us one number for \phi(t) = t and a different number for \phi(t) = t^2. In both cases, however, the answer is an ordinary number that does not depend on t.

Applied to our mechanics problem, we see that the action S depends on the position of the particle x(t) at all times from 0 to T, but not on t directly, since t has been “integrated out”. For a given trajectory x(t) from t = 0 to t = T, the action produces a single number, S. The question then arises: which trajectory x(t) from t = 0 to time t = T does the particle actually “choose”? The answer comes from Hamilton’s principle: of all possible paths, the path that is actually followed is that which minimizes the action. As it so happens, the action only needs to be stationary, i.e., we are extremizing and not necessarily minimizing, but we will usually be dealing with a minimum. This extremization is taking place with the endpoints kept fixed, i.e., x(0) and x(T) are not free to vary.

2.2 Discretizing the Action

We will assume the positions x(t) from t = 0 to t = T can be accessed only at a discrete set of n points: t_k = k\Delta t = k\frac{T}{n-1} where k = 1, 2, \dots, n-1. We will denote x_k \equiv x(t_k) to denote the possible position of the particle at t = t_k. After we discretize the time, the action functional can be approximated as S_n \equiv \sum_{k=0}^{n-2} \Delta t\left[ \frac{1}{2}m \left(\frac{x_{k+1} - x_{k}}{\Delta t}\right)^2 - V(x_k)\right], in which the integral is replaced by summation of small rectangles.

Since we want to extremize the action with end points fixed, we should fix x_0 and x_{n-1} in our case. Thus, we shall consider the following function S_n = S_n(x_1, x_2, \dots, x_{n-2}), and we will be finding a minimum of S_n in the (n-2)-dimensional space.

2.3 Newton’s Method for the Discrete Action

We shall employ the Newton’s method for this multidimensional minimization problem. To apply this method, we require the gradient vecotr as well as the Hessian.

We can compute the gradient \begin{align*} \frac{\partial S_n}{\partial x_i} &= \sum_{k=0}^{n-2}\Delta t\left[ \frac{m}{\Delta t}^2(x_{k+1} - x_k)(\delta_{i,k+1} - \delta_{i,k}) - \frac{\partial V(x_k)}{\partial x_i}\delta_{i,k} \right] \\ &= \frac{m}{\Delta t} (2 x_i - x_{i-1} - x_{i+1}) - \Delta t\frac{\partial V(x_i)}{\partial x_i} \end{align*}.

Similarly, the Hessian H_{ji} = \frac{\partial^2 S_n}{\partial x_j \partial x_i} = \frac{m}{\Delta t} (2\delta_{j,i} - \delta_{j,i-1} - \delta_{j, i+1}) - \Delta t \frac{\partial^2 V(x_i)}{\partial x_i^2} \delta_{j,i}

2.4 Implementation

Let us pick a specific form for the potential energy V = \frac{1}{4}x^4 which describes a quartic oscillator.

With this potential, we can compute its first and second order derivatives analytically: F(x) = -\frac{\partial V}{\partial x} = -x^3, \quad F'(x) = -\frac{\partial^2 V}{\partial x^2} = -3x^2.

With this, we have the gradient vector \frac{\partial S_n}{\partial x_i} = \frac{m}{\Delta t} (2 x_i - x_{i-1} - x_{i+1}) - \Delta t x_i^3 or in a vector form \nabla S_n = \begin{pmatrix} \frac{m}{\Delta t} (2 x_1 - x_{0} - x_{2}) - \Delta t x_1^3 \\ \frac{m}{\Delta t} (2 x_2 - x_{1} - x_{3}) - \Delta t x_2^3 \\ \vdots \\ \frac{m}{\Delta t} (2 x_{n-2} - x_{n-3} - x_{n-1}) - \Delta t x_{n-2}^3 \\ \end{pmatrix}. We also have the Hessian H_{ji} = \frac{\partial^2 S_n}{\partial x_j \partial x_i} = \frac{m}{\Delta t} (2\delta_{j,i} - \delta_{j,i-1} - \delta_{j, i+1}) - 3\Delta t x_i^2 \delta_{j,i} or in matrix form \boldsymbol{H} = \begin{pmatrix} \frac{2m}{\Delta t} - 3\Delta t x_1^2 &-\frac{m}{\Delta t} & 0 &\cdots &0\\ -\frac{m}{\Delta t} &\frac{2m}{\Delta t} - 3\Delta t x_2^2 &-\frac{m}{\Delta t} & \cdots & 0\\ 0 &-\frac{m}{\Delta t} &\frac{2m}{\Delta t} - 3\Delta t x_3^2 & \ddots & 0\\ \vdots &\vdots &\ddots &\ddots & \vdots\\ 0 & 0 & \cdots &-\frac{m}{\Delta t} &\frac{2m}{\Delta t} - 3\Delta t x_{n-2}^2 \end{pmatrix}. We see that the Hessian matrix is a tri-diagonal matrix, in which the main diagonal consists of \frac{2m}{\Delta t} - 3\Delta t x_i^2 with i=1,2,\dots,n-2. The two neighboring subdiagonals are both made of -\frac{m}{\Delta t}.

import numpy as np
import matplotlib.pyplot as plt
def params():
    nvar = 99; m = 1. # We have 99 middle points, 1,2,...,99
    xini, xfin = 2., 0. # end points are fixed at 2 and 0
    tt = 1.; dt = tt/(nvar+1) # total time 1
    return nvar, m, xini, xfin, dt

def fod(der,x):
    return -x**3 if der==0 else -3*x**2

def gradient(xs):
    nvar, m, xini, xfin, dt = params()
    arr = np.zeros(nvar)
    arr[0] = (m/dt)*(2*xs[0]-xini-xs[1]) + dt * fod(0,xs[0])
    arr[1:-1] = (m/dt)*(2*xs[1:-1] - xs[:-2] - xs[2:]) + dt*fod(0,xs[1:-1])
    arr[-1] = (m/dt)*(2*xs[-1]-xs[-2]-xfin) +dt * fod(0,xs[-1])
    return arr

def hessian(xs):
    nvar, m, xini, xfin, dt = params()
    he = np.diag(2*m/dt+ dt*fod(1,xs))
    np.fill_diagonal(he[1:,:], -m/dt)   
    np.fill_diagonal(he[:,1:], -m/dt)
    return he

def multi_newton(gradient,hessian,xolds,kmax=200,tol=1.e-8):
    for k in range(1,kmax):
        grad_xolds = gradient(xolds)
        he_xolds = hessian(xolds)
        xnews = xolds + gauelim_pivot(he_xolds, -grad_xolds)
        err = termcrit(xolds,xnews)
        print(k, xnews, err)
        if err < tol:
            break
        xolds = np.copy(xnews)
    else:
        xnews = None
    return xnews


def gauelim_pivot(inA,inbs):
    A = np.copy(inA)
    bs = np.copy(inbs)
    n = bs.size

    for j in range(n-1):
        k = np.argmax(np.abs(A[j:,j])) + j
        if k != j:
            A[j,:], A[k,:] = A[k,:], A[j,:].copy()
            bs[j], bs[k] = bs[k], bs[j]

        for i in range(j+1,n):
            coeff = A[i,j]/A[j,j]
            A[i,j:] -= coeff*A[j,j:]
            bs[i] -= coeff*bs[j]

    xs = backsub(A,bs)
    return xs

def backsub(U,bs):
    n = bs.size
    xs = np.zeros(n)
    for i in reversed(range(n)):
        xs[i] = (bs[i] - U[i,i+1:]@xs[i+1:])/U[i,i]
    return xs

def termcrit(xolds,xnews):
    errs = np.abs((xnews - xolds)/xnews)
    return np.sum(errs)


if __name__ == '__main__':
    nvar, m, xini, xfin, dt = params()
    xolds = np.linspace(2,0,nvar+2)[1:-1]
    xnews = multi_newton(gradient, hessian, xolds); print(xnews)
    tlist = np.linspace(0,1,nvar+2)
    plt.plot(tlist[1:-1],xolds,label='initial guess')
    plt.plot(tlist[1:-1],xnews, label='minimized')
    plt.legend()
    plt.xlabel('t')
    plt.ylabel('x(t)')
1 [2.00243064 2.00405866 2.00488295 2.00490384 2.00412305 2.00254359
 2.00016973 1.9970069  1.99306166 1.98834158 1.98285524 1.9766121
 1.96962249 1.96189752 1.95344901 1.94428947 1.93443198 1.92389019
 1.91267825 1.90081072 1.88830256 1.87516909 1.86142587 1.84708874
 1.83217371 1.81669697 1.8006748  1.78412356 1.76705965 1.74949946
 1.73145938 1.71295568 1.6940046  1.67462221 1.65482445 1.6346271
 1.61404573 1.59309569 1.57179212 1.55014988 1.52818357 1.50590752
 1.48333574 1.46048195 1.43735953 1.41398156 1.39036074 1.36650947
 1.34243976 1.31816329 1.29369138 1.26903496 1.24420463 1.2192106
 1.19406273 1.1687705  1.14334304 1.1177891  1.09211709 1.06633505
 1.04045067 1.0144713  0.98840394 0.96225525 0.93603156 0.90973887
 0.88338287 0.85696893 0.83050211 0.80398719 0.77742863 0.75083064
 0.72419714 0.69753177 0.67083794 0.6441188  0.61737725 0.59061599
 0.56383745 0.5370439  0.51023736 0.4834197  0.45659257 0.42975747
 0.40291572 0.3760685  0.34921681 0.32236157 0.29550351 0.26864329
 0.24178145 0.21491843 0.18805458 0.16119016 0.1343254  0.10746044
 0.08059537 0.05373025 0.02686513] 20.58576209557278
2 [2.00641498 2.01202226 2.01681505 2.02078758 2.02393506 2.02625371
 2.02774078 2.02839456 2.02821437 2.02720058 2.02535461 2.02267892
 2.01917696 2.01485324 2.00971323 2.00376337 1.99701108 1.98946466
 1.98113333 1.97202715 1.962157   1.95153453 1.94017215 1.92808295
 1.91528067 1.90177966 1.88759483 1.8727416  1.85723584 1.84109384
 1.82433226 1.80696806 1.78901849 1.77050101 1.75143324 1.73183293
 1.71171793 1.69110612 1.67001536 1.64846349 1.62646824 1.60404726
 1.581218   1.55799776 1.53440359 1.51045232 1.48616048 1.46154432
 1.43661973 1.41140231 1.38590724 1.36014936 1.33414308 1.30790243
 1.28144099 1.25477193 1.22790797 1.20086138 1.173644   1.14626718
 1.11874184 1.09107845 1.063287   1.03537704 1.00735768 0.97923755
 0.95102487 0.92272743 0.89435256 0.86590719 0.83739785 0.80883065
 0.78021131 0.75154518 0.72283722 0.69409206 0.66531395 0.63650682
 0.6076743  0.57881966 0.54994593 0.5210558  0.49215175 0.46323595
 0.43431035 0.40537669 0.37643646 0.34749098 0.31854137 0.28958857
 0.26063339 0.23167645 0.20271829 0.17375932 0.14479982 0.11584003
 0.08688008 0.05792007 0.02896003] 5.506719009111121
3 [2.00682081 2.01283341 2.01803051 2.02240578 2.02595386 2.0286704
 2.03055203 2.03159645 2.03180236 2.0311695  2.02969866 2.02739166
 2.02425135 2.0202816  2.01548729 2.00987426 2.00344935 1.99622032
 1.98819584 1.97938548 1.96979963 1.95944951 1.94834711 1.93650514
 1.92393701 1.91065678 1.89667909 1.88201913 1.86669261 1.85071568
 1.83410491 1.8168772  1.7990498  1.78064016 1.761666   1.74214517
 1.72209564 1.70153546 1.6804827  1.65895542 1.63697163 1.61454924
 1.59170603 1.56845961 1.54482739 1.52082654 1.49647399 1.47178636
 1.44677997 1.42147078 1.39587442 1.37000612 1.34388072 1.31751265
 1.29091592 1.2641041  1.23709031 1.20988722 1.18250707 1.15496158
 1.12726206 1.09941932 1.07144371 1.04334512 1.01513298 0.98681624
 0.95840343 0.9299026  0.90132137 0.87266694 0.84394606 0.81516508
 0.78632994 0.75744619 0.72851899 0.69955314 0.67055305 0.64152282
 0.6124662  0.5833866  0.55428715 0.52517068 0.49603972 0.46689656
 0.43774322 0.4085815  0.37941296 0.35023896 0.32106066 0.29187905
 0.26269496 0.23350905 0.20432188 0.17513384 0.14594528 0.1167564
 0.08756736 0.05837825 0.02918913] 0.5892695101103118
4 [2.00682568 2.01284315 2.0180451  2.0224252  2.02597809 2.0286994
 2.03058578 2.03163488 2.03184542 2.03121714 2.02975081 2.02744824
 2.02431228 2.02034679 2.01555663 2.00994766 2.00352669 1.99630149
 1.98828071 1.97947392 1.9698915  1.95954467 1.94844542 1.93660645
 1.92404116 1.91076361 1.89678844 1.88213083 1.8668065  1.85083159
 1.83422267 1.81699664 1.79917074 1.78076244 1.76178945 1.74226961
 1.7222209  1.70166138 1.68060911 1.65908216 1.63709855 1.61467617
 1.59183283 1.56858612 1.54495346 1.52095205 1.49659879 1.47191033
 1.44690297 1.4215927  1.39599513 1.37012552 1.34399869 1.3176291
 1.29103075 1.26421721 1.23720162 1.20999666 1.18261454 1.15506702
 1.1273654  1.09952049 1.07154266 1.04344179 1.01522732 0.98690821
 0.95849297 0.92998968 0.90140595 0.87274898 0.84402554 0.81524197
 0.78640421 0.75751782 0.72858797 0.69961943 0.67061666 0.64158372
 0.61252437 0.58344205 0.55433986 0.52522064 0.49608693 0.46694101
 0.43778491 0.40862042 0.3794491  0.35027232 0.32109125 0.29190686
 0.26271999 0.23353131 0.20434135 0.17515053 0.14595918 0.11676752
 0.0875757  0.05838382 0.02919191] 0.007134034897490792
5 [2.00682568 2.01284315 2.0180451  2.02242521 2.0259781  2.02869941
 2.03058578 2.03163489 2.03184543 2.03121715 2.02975081 2.02744825
 2.02431229 2.0203468  2.01555664 2.00994767 2.0035267  1.9963015
 1.98828072 1.97947393 1.96989151 1.95954469 1.94844543 1.93660646
 1.92404118 1.91076363 1.89678845 1.88213085 1.86680652 1.85083161
 1.83422269 1.81699666 1.79917076 1.78076246 1.76178946 1.74226962
 1.72222092 1.70166139 1.68060913 1.65908218 1.63709857 1.61467619
 1.59183284 1.56858614 1.54495348 1.52095207 1.49659881 1.47191034
 1.44690299 1.42159272 1.39599515 1.37012553 1.34399871 1.31762912
 1.29103076 1.26421723 1.23720164 1.20999667 1.18261455 1.15506704
 1.12736541 1.09952051 1.07154267 1.04344181 1.01522733 0.98690822
 0.95849298 0.92998969 0.90140596 0.872749   0.84402555 0.81524198
 0.78640422 0.75751784 0.72858798 0.69961944 0.67061667 0.64158373
 0.61252438 0.58344206 0.55433987 0.52522064 0.49608693 0.46694101
 0.43778491 0.40862042 0.37944911 0.35027233 0.32109125 0.29190687
 0.26272    0.23353131 0.20434135 0.17515054 0.14595919 0.11676753
 0.08757571 0.05838382 0.02919191] 1.045648689538472e-06
6 [2.00682568 2.01284315 2.0180451  2.02242521 2.0259781  2.02869941
 2.03058578 2.03163489 2.03184543 2.03121715 2.02975081 2.02744825
 2.02431229 2.0203468  2.01555664 2.00994767 2.0035267  1.9963015
 1.98828072 1.97947393 1.96989151 1.95954469 1.94844543 1.93660646
 1.92404118 1.91076363 1.89678845 1.88213085 1.86680652 1.85083161
 1.83422269 1.81699666 1.79917076 1.78076246 1.76178946 1.74226962
 1.72222092 1.70166139 1.68060913 1.65908218 1.63709857 1.61467619
 1.59183284 1.56858614 1.54495348 1.52095207 1.49659881 1.47191034
 1.44690299 1.42159272 1.39599515 1.37012553 1.34399871 1.31762912
 1.29103076 1.26421723 1.23720164 1.20999667 1.18261455 1.15506704
 1.12736541 1.09952051 1.07154267 1.04344181 1.01522733 0.98690822
 0.95849298 0.92998969 0.90140596 0.872749   0.84402555 0.81524198
 0.78640422 0.75751784 0.72858798 0.69961944 0.67061667 0.64158373
 0.61252438 0.58344206 0.55433987 0.52522064 0.49608693 0.46694101
 0.43778491 0.40862042 0.37944911 0.35027233 0.32109125 0.29190687
 0.26272    0.23353131 0.20434135 0.17515054 0.14595919 0.11676753
 0.08757571 0.05838382 0.02919191] 2.2573685220323112e-14
[2.00682568 2.01284315 2.0180451  2.02242521 2.0259781  2.02869941
 2.03058578 2.03163489 2.03184543 2.03121715 2.02975081 2.02744825
 2.02431229 2.0203468  2.01555664 2.00994767 2.0035267  1.9963015
 1.98828072 1.97947393 1.96989151 1.95954469 1.94844543 1.93660646
 1.92404118 1.91076363 1.89678845 1.88213085 1.86680652 1.85083161
 1.83422269 1.81699666 1.79917076 1.78076246 1.76178946 1.74226962
 1.72222092 1.70166139 1.68060913 1.65908218 1.63709857 1.61467619
 1.59183284 1.56858614 1.54495348 1.52095207 1.49659881 1.47191034
 1.44690299 1.42159272 1.39599515 1.37012553 1.34399871 1.31762912
 1.29103076 1.26421723 1.23720164 1.20999667 1.18261455 1.15506704
 1.12736541 1.09952051 1.07154267 1.04344181 1.01522733 0.98690822
 0.95849298 0.92998969 0.90140596 0.872749   0.84402555 0.81524198
 0.78640422 0.75751784 0.72858798 0.69961944 0.67061667 0.64158373
 0.61252438 0.58344206 0.55433987 0.52522064 0.49608693 0.46694101
 0.43778491 0.40862042 0.37944911 0.35027233 0.32109125 0.29190687
 0.26272    0.23353131 0.20434135 0.17515054 0.14595919 0.11676753
 0.08757571 0.05838382 0.02919191]

3 Homework

Modify the above code for minimizing the action to address the physical context of the harmonic oscillator, namely V(x) = \frac{1}{2}x^2 Take m=1, x_0 = 0, x_{n−1} = 1 and T = 1. Plot the coordinate x(t) of the particle as a function of time; also show the (analytically known) answer.