An Algorithm for Finding Potential Minimizing Configurations of Points on A Sphere

Neubauer, Schilling, Watkins & Zeitlin, CSUN, 1998.

Every pair of electrons repels each other as one over the distance between them—the Coulomb potential or energy. For distinct particles at points P1, …, Pn set P={ P1, … , Pn} and

C(P) = C(P1,…,Pn) = Sum for i<j [ 1/ |Pi-Pj| ]

Some formulations sum all the repulsions for each point, which counts each summand twice and gives twice this value. A configuration is considered optimal if it minimizes the Coulomb potential. If two points are close then the reciprocal of their distance will be large and so minimizing the energy is a way of uniformly spreading out points.

Several authors have remarked on possible comparisons to regular polyhedra, stable molecules (e.g. buckyballs), interlocking pollen grains and the positioning of strands of DNA. There are many variations of this problem of uniformly distributing points on the surface of a sphere. These include considering different potential functions, different notions of uniform distribution (including packing problems), different dimensions of spheres and asymptotic comparisons. These are described in [Saff and Kuijlaars] and a bibliography is given. For the particular problem of the Coulomb potential Erber and Hockney have calculated many apparent optima for different n and indicate that there are many local minimal that are close to but not equal to the global minimum. They also discuss expected values of random configurations of points.

For S2, the 2 dimensional surface of a sphere in R3 , the optimal configurations are known for small numbers of points. For 2 points the optimal configuration is 2 antipodal points, such as the North and South poles. For 3 points they must lie on a plane and on the sphere and so they are on the circle of intersection where the optimum is given by evenly spaced points—and potential is minimized when the three points are evenly spaced on a great circle. For any 4 points moving various points so that it is equidistant from the others lowers the potential, and so a regular tetrahedron is the only stable figure. However, these geometric arguments don’t seem to generalize. Optima for larger numbers of points are found by iterative calculations which we describe here and which produce best known but not provably optimal results.

ANALYTIC FORMULATION—COMPUTING THE GRADIENT.

The potential function C(P1, …, Pn ) where Pj = (xj,yj,zj) is a function of 3n variables. The constraints are that the particles all lie on the unit sphere-- |Pj|= 1, which is equivalent to gj(P)= 0 for

gj(P1, … , Pn )= xj2 +yj2 + zj2 – 1 . To minimize the potential function we employ the method of Lagrange multipliers--at any minimum the gradient of C will be a linear combination of the gradients of the constraint functions.

We compute the gradient of C. Taking j=1 and considering the partial derivative with respect to x1 provides an example which indicates each of the partial derivatives of C. The terms in C which contain x1 are given by Sum for 1<j [ 1 / |P1-Pj| ]. The derivative of the jth summand with respect to x1 is given by

 

(1) D11 [ ( (x1-xj )2 + (y1-yj)2 + (z1-zj)2 )-1/2 ] =-1/2 ( (x1-xj )2 + (y1-yj)2 + (z1-zj)2 )-3/2 ( 2 (x1-xj ) )

.

A similar formula arises for y1 and z1 and these can be grouped together in vector form as follows (where D1 represents the derivative with respect to P1 ) .

 

(2) D1 [ Sum for 1<j [ 1 / |P1-Pj| ] ] = Sum for 1<j [ (Pj-P1) / |P1-Pj|3 ].

Differentiation with respect to a point represents a vector with 3 derivatives as coordinates--the partials for the first three variables are the coordinates of the right hand side. The gradient of C has first three coordinates given by the right hand side above.

The right hand side of this expression can be seen as -1 times the unit vector from P1 to Pj divided by the square of the distance from P1 to Pj . The other coordinates of the gradient vector arise in the same fashion so that each successive group of three coordinates is given by the sum of multiples of vectors from each point to the other points.

Interpretation: The gradient vector gives the direction of the Coulomb force exerted on the configuration. As usual the gradient represents the direction of most rapid change.

Following the method of Lagrange multipliers we equate the gradient of C with a linear combination of the gradients of the constraints. This is equivalent to equating the gradient for the partial energy involving Ck with a multiple of the gradient of gk, which is Pk itself, for each k.

(3) Sum for 1<j [ (Pj-Pk) / |Pj-Pk|3 ] = lk Pk

Geometrically this says that the sum of the unit displacements times 1 over the distance squared in the direction of each other particle Pj is perpendicular to the sphere. Any figure with appropriate symmetry, such as the regular solids, will satisfy this condition. Any minimal configuration will thus be stable as this condition indicates.

 

AN ALGORITHM FOR FINDING A CONFIGURATION OF N POINTS ON A SPHERE WITH MINIMAL COULOMB POTENTIAL

The gradient is the direction of most rapid change and so this last calculation suggests the following procedure for finding a configuration with minimal Coulomb potential. First generate, in some random way, n points P1, … , Pn on the sphere. Now perturb each point Pk in this configuration by adding some multiple of the gradient of the partial energy at Pk to Pk to form a new configuration. Now perturb the new configuration in the same way and repeat until the resulting configurations remain essentially unchanged. Perturbing each configuration by 1/n times the gradient gives good convergence. We have found that running time is significantly improved by doing a binary search for the minimum potential in the direction of the gradient. We have written a program in Mathematica implementing this algorithm.

We have no proofs of the optimality of any of these configurations but expect that they give local minima including the absolute minimum. There is no known condition which guarantees a configuration is minimal. For emphasis one might refer to these examples as gradient optimal or locally optimal. It is easy to call them optimal and this has become common practice.

We can verify that the gradient-optimal examples satisfy (3), the Lagrange condition. For example the g-optimal configurations for 5, 6 and 7 points are given by points at the North and South poles and remaining points evenly spaced about the equator. The geometric symmetry indicates that these configurations satisfy the Lagrange condition.

To verify the Lagrange condition in general let vk represent the left-hand side of equation (3). Then we can compute the cosine of the angle between vk and Pk by taking the dot product of vk / | vk | and Pk . 1 minus this cosine then measures the kth deviation from condition (3) . This calculation is made easier by noticing that if P is the matrix with rows giving the points P1, … , Pn , then the gram matrix PPT has ijth entry equal to the dot product Pi· Pj . The square of the distance between Pi and Pj is then given by 2 ( 1 - Pi· Pj ). The total deviation, got by summing the kth deviations, then provides a measure of how much a configuration deviates from being "well balanced".

SEVERAL OBSERVATIONS MAY BE MADE.

  1. The regular solids do not always produce minimal configurations—for 8 points the cube has a higher potential than a twisted cube (rotate the top by 45 degrees). The cube does however satisfy the Lagrange condition. The potential function has a local minimum at the cube, but this is not a global minimum. Accordingly, the gradient algorithm applied exactly will not produce any change if we start from the cube. However, if we perturb the cube slightly, then the gradient algorithm converges to the twisted cube. It seems that the cube is an "unstable" minimum.
  2. The previous comment leads one to consider a surface for the potential function with various valleys and passes. [Wille] has used simulated annealing to search the surface for minima. This involves random perturbations followed according to some probability. The gradient algorithm has the advantage of choosing the best direction in which to change the configuration when it is not minimum and so seems to converge significantly faster.
  3. We have also used a brute force algorithm to find the faces of the convex hulls of configurations of points and can then find the volume of the convex hulls by summing the volumes of the simplices formed by the origin and triangulations of the faces. Apparently there are more efficient algorithms for finding the faces. We were unable to find any way of calculating the volume of the convex hull of a set of points in R3 without computing the faces. Given that in the plane one needs to sort points according to angle (which is comparable to computing the convex hull) this may not be so surprising.
  4. Following the area preserving projection scheme used to exhibit the optimal configurations we can generate points uniformly on the surface of the sphere. Using this notion of randomness, the expected potential of n points uniformly distributed on the sphere is n choose 2 . The expected distance between 2 uniformly distributed points on the sphere is 4/3.

How should one generate random points on the sphere? The most obvious ways to pick points "randomly" on the surface of the sphere have some problems. One natural approach is to pick spherical coordinates q (longitude) and j (latitude) uniformly and use these to generate rectangular coordinates from which distance can be easily calculated. Unfortunately this suffers from the property that the points will not be uniformly distributed. The likelihood of choosing points within the polar mini-cap 0£ j £ 0.1 is the same as choosing points in the equatorial mini-belt p / 2 £ j £ p / 2 + . 1 , but the areas of these two regions are different. Likewise, picking x, y and z uniformly between –1 and 1 and then dividing by length fails to be uniform since we are picking uniformly in the cube and projecting onto the surface of the sphere—the corners of the sphere are over represented. (Actually, you can throw away any points with length greater than 1 but there are analytic advantages to finding a "uniform" parametrization of S2 ).

 

THE EXPECTED DISTANCE BETWEEN ANY 2 POINTS ON S2 IS 4/3.

Without loss of generality take the first point to be the North Pole (0,0,1) and the second point is generated by the uniform parametrization. The distance between these 2 points is then

d( {0,0,1}, { Sqrt[ 1-z2] Cos[th] , Sqrt[ 1-z2] Sin[th] , z } ) = Sqrt[ 2-2 z ].

To find the expected potential energy we note that this distance only depends on z and so the average is given (with a substitution u=1-z ) by

1/ ( 1 – (-1)) Integral[Sqrt[ 2-2 z ] ,{z,-1,1}] = -1/Sqrt[2] Integral[ Sqrt[u],{u,2,0}] = 4/3.

 

THE AVERAGE POTENTIAL ENERGY BETWEEN ANY 2 POINTS IS 1 AND THUS THE EXPECTED TOTAL POTENTIAL BETWEEN ALL PAIRS IS THE NUMBER OF PAIRS TIMES 1.

 

As before take the first point to be the North Pole and the second from the form of the uniform parametrization. The average of 1 over the distance is then

˝ Integral[ 1/Sqrt[2(1-z)], {z, -1,1}] = 1/Sqrt[8] Integral[ u-1/2,{u,0,2}] = 1.