How to Use Nelder-Mead Optimization in Python

Author: Jason Brownlee

The Nelder-Mead optimization algorithm is a widely used approach for non-differentiable objective functions.

As such, it is generally referred to as a pattern search algorithm and is used as a local or global search procedure, challenging nonlinear and potentially noisy and multimodal function optimization problems.

In this tutorial, you will discover the Nelder-Mead optimization algorithm.

After completing this tutorial, you will know:

  • The Nelder-Mead optimization algorithm is a type of pattern search that does not use function gradients.
  • How to apply the Nelder-Mead algorithm for function optimization in Python.
  • How to interpret the results of the Nelder-Mead algorithm on noisy and multimodal objective functions.

Let’s get started.

How to Use the Nelder-Mead Optimization in Python

How to Use the Nelder-Mead Optimization in Python
Photo by Don Graham, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Nelder-Mead Algorithm
  2. Nelder-Mead Example in Python
  3. Nelder-Mead on Challenging Functions
    1. Noisy Optimization Problem
    2. Multimodal Optimization Problem

Nelder-Mead Algorithm

Nelder-Mead is an optimization algorithm named after the developers of the technique, John Nelder and Roger Mead.

The algorithm was described in their 1965 paper titled “A Simplex Method For Function Minimization” and has become a standard and widely used technique for function optimization.

It is appropriate for one-dimensional or multidimensional functions with numerical inputs.

Nelder-Mead is a pattern search optimization algorithm, which means it does require or use function gradient information and is appropriate for optimization problems where the gradient of the function is unknown or cannot be reasonably computed.

It is often used for multidimensional nonlinear function optimization problems, although it can get stuck in local optima.

Practical performance of the Nelder–Mead algorithm is often reasonable, though stagnation has been observed to occur at nonoptimal points. Restarting can be used when stagnation is detected.

— Page 239, Numerical Optimization, 2006.

A starting point must be provided to the algorithm, which may be the endpoint of another global optimization algorithm or a random point drawn from the domain.

Given that the algorithm may get stuck, it may benefit from multiple restarts with different starting points.

The Nelder-Mead simplex method uses a simplex to traverse the space in search of a minimum.

— Page 105, Algorithms for Optimization, 2019.

The algorithm works by using a shape structure (called a simplex) composed of n + 1 points (vertices), where n is the number of input dimensions to the function.

For example, on a two-dimensional problem that may be plotted as a surface, the shape structure would be composed of three points represented as a triangle.

The Nelder-Mead method uses a series of rules that dictate how the simplex is updated based on evaluations of the objective function at its vertices.

— Page 106, Algorithms for Optimization, 2019.

The points of the shape structure are evaluated and simple rules are used to decide how to move the points of the shape based on their relative evaluation. This includes operations such as “reflection,” “expansion,” “contraction,” and “shrinkage” of the simplex shape on the surface of the objective function.

In a single iteration of the Nelder–Mead algorithm, we seek to remove the vertex with the worst function value and replace it with another point with a better value. The new point is obtained by reflecting, expanding, or contracting the simplex along the line joining the worst vertex with the centroid of the remaining vertices. If we cannot find a better point in this manner, we retain only the vertex with the best function value, and we shrink the simplex by moving all other vertices toward this value.

— Page 238, Numerical Optimization, 2006.

The search stops when the points converge on an optimum, when a minimum difference between evaluations is observed, or when a maximum number of function evaluations are performed.

Now that we have a high-level idea of how the algorithm works, let’s look at how we might use it in practice.

Nelder-Mead Example in Python

The Nelder-Mead optimization algorithm can be used in Python via the minimize() function.

This function requires that the “method” argument be set to “nelder-mead” to use the Nelder-Mead algorithm. It takes the objective function to be minimized and an initial point for the search.

...
# perform the search
result = minimize(objective, pt, method='nelder-mead')

The result is an OptimizeResult object that contains information about the result of the optimization accessible via keys.

For example, the “success” boolean indicates whether the search was completed successfully or not, the “message” provides a human-readable message about the success or failure of the search, and the “nfev” key indicates the number of function evaluations that were performed.

Importantly, the “x” key specifies the input values that indicate the optima found by the search, if successful.

...
# summarize the result
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])
print('Solution: %s' % result['x'])

We can demonstrate the Nelder-Mead optimization algorithm on a well-behaved function to show that it can quickly and efficiently find the optima without using any derivative information from the function.

In this case, we will use the x^2 function in two-dimensions, defined in the range -5.0 to 5.0 with the known optima at [0.0, 0.0].

We can define the objective() function below.

# objective function
def objective(x):
	return x[0]**2.0 + x[1]**2.0

We will use a random point in the defined domain as a starting point for the search.

...
# define range for input
r_min, r_max = -5.0, 5.0
# define the starting point as a random sample from the domain
pt = r_min + rand(2) * (r_max - r_min)

The search can then be performed. We use the default maximum number of function evaluations set via the “maxiter” and set to N*200, where N is the number of input variables, which is two in this case, e.g. 400 evaluations.

...
# perform the search
result = minimize(objective, pt, method='nelder-mead')

After the search is finished, we will report the total function evaluations used to find the optima and the success message of the search, which we expect to be positive in this case.

...
# summarize the result
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])

Finally, we will retrieve the input values for located optima, evaluate it using the objective function, and report both in a human-readable manner.

...
# evaluate solution
solution = result['x']
evaluation = objective(solution)
print('Solution: f(%s) = %.5f' % (solution, evaluation))

Tying this together, the complete example of using the Nelder-Mead optimization algorithm on a simple convex objective function is listed below.

# nelder-mead optimization of a convex function
from scipy.optimize import minimize
from numpy.random import rand

# objective function
def objective(x):
	return x[0]**2.0 + x[1]**2.0

# define range for input
r_min, r_max = -5.0, 5.0
# define the starting point as a random sample from the domain
pt = r_min + rand(2) * (r_max - r_min)
# perform the search
result = minimize(objective, pt, method='nelder-mead')
# summarize the result
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])
# evaluate solution
solution = result['x']
evaluation = objective(solution)
print('Solution: f(%s) = %.5f' % (solution, evaluation))

Running the example executes the optimization, then reports the results.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the search was successful, as we expected, and was completed after 88 function evaluations.

We can see that the optima was located with inputs very close to [0,0], which evaluates to the minimum objective value of 0.0.

Status: Optimization terminated successfully.
Total Evaluations: 88
Solution: f([ 2.25680716e-05 -3.87021351e-05]) = 0.00000

Now that we have seen how to use the Nelder-Mead optimization algorithm successfully, let’s look at some examples where it does not perform so well.

Nelder-Mead on Challenging Functions

The Nelder-Mead optimization algorithm works well for a range of challenging nonlinear and non-differentiable objective functions.

Nevertheless, it can get stuck on multimodal optimization problems and noisy problems.

To make this concrete, let’s look at an example of each.

Noisy Optimization Problem

A noisy objective function is a function that gives different answers each time the same input is evaluated.

We can make an objective function artificially noisy by adding small Gaussian random numbers to the inputs prior to their evaluation.

For example, we can define a one-dimensional version of the x^2 function and use the randn() function to add small Gaussian random numbers to the input with a mean of 0.0 and a standard deviation of 0.3.

# objective function
def objective(x):
	return (x + randn(len(x))*0.3)**2.0

The noise will make the function challenging to optimize for the algorithm and it will very likely not locate the optima at x=0.0.

The complete example of using Nelder-Mead to optimize the noisy objective function is listed below.

# nelder-mead optimization of noisy one-dimensional convex function
from scipy.optimize import minimize
from numpy.random import rand
from numpy.random import randn

# objective function
def objective(x):
	return (x + randn(len(x))*0.3)**2.0

# define range for input
r_min, r_max = -5.0, 5.0
# define the starting point as a random sample from the domain
pt = r_min + rand(1) * (r_max - r_min)
# perform the search
result = minimize(objective, pt, method='nelder-mead')
# summarize the result
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])
# evaluate solution
solution = result['x']
evaluation = objective(solution)
print('Solution: f(%s) = %.5f' % (solution, evaluation))

Running the example executes the optimization, then reports the results.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, the algorithm does not converge and instead uses the maximum number of function evaluations, which is 200.

Status: Maximum number of function evaluations has been exceeded.
Total Evaluations: 200
Solution: f([-0.6918238]) = 0.79431

The algorithm may converge on some runs of the code but will arrive on a point away from the optima.

Multimodal Optimization Problem

Many nonlinear objective functions may have multiple optima, referred to as multimodal problems.

The problem may be structured such that it has multiple global optima that have an equivalent function evaluation, or a single global optima and multiple local optima where algorithms like the Nelder-Mead can get stuck in search of the local optima.

The Ackley function is an example of the latter. It is a two-dimensional objective function that has a global optima at [0,0] but has many local optima.

The example below implements the Ackley and creates a three-dimensional plot showing the global optima and multiple local optima.

# ackley multimodal function
from numpy import arange
from numpy import exp
from numpy import sqrt
from numpy import cos
from numpy import e
from numpy import pi
from numpy import meshgrid
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D

# objective function
def objective(x, y):
	return -20.0 * exp(-0.2 * sqrt(0.5 * (x**2 + y**2))) - exp(0.5 * (cos(2 * pi * x) + cos(2 * pi * y))) + e + 20

# define range for input
r_min, r_max = -5.0, 5.0
# sample input range uniformly at 0.1 increments
xaxis = arange(r_min, r_max, 0.1)
yaxis = arange(r_min, r_max, 0.1)
# create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
# compute targets
results = objective(x, y)
# create a surface plot with the jet color scheme
figure = pyplot.figure()
axis = figure.gca(projection='3d')
axis.plot_surface(x, y, results, cmap='jet')
# show the plot
pyplot.show()

Running the example creates the surface plot of the Ackley function showing the vast number of local optima.

3D Surface Plot of the Ackley Multimodal Function

3D Surface Plot of the Ackley Multimodal Function

We would expect the Nelder-Mead function to get stuck in one of the local optima while in search of the global optima.

Initially, when the simplex is large, the algorithm may jump over many local optima, but as it contracts, it will get stuck.

We can explore this with the example below that demonstrates the Nelder-Mead algorithm on the Ackley function.

# nelder-mead for multimodal function optimization
from scipy.optimize import minimize
from numpy.random import rand
from numpy import exp
from numpy import sqrt
from numpy import cos
from numpy import e
from numpy import pi

# objective function
def objective(v):
	x, y = v
	return -20.0 * exp(-0.2 * sqrt(0.5 * (x**2 + y**2))) - exp(0.5 * (cos(2 * pi * x) + cos(2 * pi * y))) + e + 20

# define range for input
r_min, r_max = -5.0, 5.0
# define the starting point as a random sample from the domain
pt = r_min + rand(2) * (r_max - r_min)
# perform the search
result = minimize(objective, pt, method='nelder-mead')
# summarize the result
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])
# evaluate solution
solution = result['x']
evaluation = objective(solution)
print('Solution: f(%s) = %.5f' % (solution, evaluation))

Running the example executes the optimization, then reports the results.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the search completed successfully but did not locate the global optima. It got stuck and found a local optima.

Each time we run the example, we will find a different local optima given the different random starting point for the search.

Status: Optimization terminated successfully.
Total Evaluations: 62
Solution: f([-4.9831427 -3.98656015]) = 11.90126

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

Books

APIs

Articles

Summary

In this tutorial, you discovered the Nelder-Mead optimization algorithm.

Specifically, you learned:

  • The Nelder-Mead optimization algorithm is a type of pattern search that does not use function gradients.
  • How to apply the Nelder-Mead algorithm for function optimization in Python.
  • How to interpret the results of the Nelder-Mead algorithm on noisy and multimodal objective functions.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Use Nelder-Mead Optimization in Python appeared first on Machine Learning Mastery.

Go to Source