The Runge-Kutta and Runge-Kutta-Fehlberg methods¶

This notebook covers roughly the same ground as this video.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

The classical Runge-Kutta method (for solving $dx/dt=f(t,x)$) is as shown below. The basic step is as follows: we have an estimate $x_i$ for the value of $x$ at time $t_i$, and we want an estimate $x_{i+1}$ for the value at $t_{i+1}=t_i+h$. This will be of the form $x_{i+1}=x_i+mh$ for some $m$. For the basic Euler method we would just take $m=f(t_i,x_i)$ which is a bit crude; we really want some kind of average value for $dx/dt$ as $t$ increases from $t_i$ to $t_{i+1}$, not just the value at $t=t_i$. For the Runge-Kutta method, $m$ will be close to $f(t_i,x_i)$, but not exactly the same. In fact we have four different candidates for $m$, called $m_0,\dotsc,m_3$, and we take $m=(m_0+2m_1+2m_2+m_3)/6$, which is a kind of weighted average of $m_0,\dotsc,m_3$.

  • We start with $m_0=f(t_i,x_i)$, as in the Euler method.
  • Now $x_i+hm_0/2$ is an estimate for the value of $x$ at time $t_i+h/2$, and we put $m_1=f(t_i+h/2,x_i+hm_0/2)$, which is an estimate of $dx/dt$ at time $t_i+h/2$. Note that we have used $m_0$ to calculate $m_1$.
  • Now $x_i+hm_1/2$ is another estimate for the value of $x$ at time $t_i+h/2$, perhaps slightly different from the previous one. We put $m_2=f(t_i+h/2,x_i+hm_1/2)$, which is another estimate of $dx/dt$ at time $t_i+h/2$. Note that we have used $m_1$ to calculate $m_2$.
  • Now $x_i+hm_2$ is an estimate for the value of $x$ at time $t_{i+1}=t_i+h$, and we put $m_3=f(t_i+h,x_i+hm_2)$, which is an estimate of $dx/dt$ at time $t_{i+1}$. Note that we have used $m_2$ to calculate $m_3$.

(We can also define $k_j=h\,m_j$, so $m_j=k_j/h$. In many places you will see the formulae written in terms of the variables $k_j$ instead of $m_j$.)

In [2]:
def runge_kutta(f, a, b, x0, n = 100):
    h = (b - a) / n
    x = x0
    ts = np.linspace(a, b, n + 1)
    xs = np.zeros(n + 1)
    for i in range(n):
        t = ts[i]
        m0 = f(t, x)
        m1 = f(t + h / 2, x + h * m0 / 2)
        m2 = f(t + h / 2, x + h * m1 / 2)
        m3 = f(t + h, x + h * m2)
        m4 = (m0 + 2 * m1 + 2 * m2 + m3) / 6
        x = x + h * m4 
        xs[i] = x
    return (ts, xs)

After the original work of Runge and Kutta, many people produced further methods extending their ideas, which are collectively known as Runge-Kutta methods. There is a lot of mathematical theory about how to choose the precise coefficients to make the method as effective as possible. If you are interested and have a taste for hard work, you can find some details in Chapter 3 of the book A First Course in the Numerical Analysis of Differential Equations by Iserles.

For the methods that we have seen so far, we start by choosing a number $n$, and dividing our time interval into equal steps of size $h=(b-a)/n$. This is not always a good plan. Often there are parts of the solution with a lot of action where we need a very small step size to maintain accuracy, and other parts where things are quiet and a larger step size is acceptable. It is therefore desirable to have a method in which we can change the step size, and we have a way of knowing when that would be a good idea. Below we will implement the Runge-Kutta-Fehlberg (RKF) method which is of this type.

This method was published as a NASA technical report in October 1968, the same month as the launch of Apollo 7. This was the first crewed flight in the Apollo programme; it orbited the Moon without landing. It would be interesting to know what NASA used the RKF method for, but I have not found information about that.

The RKF method depends on various coefficients which we can conveniently arrange into a matrix $A$ and three vectors $B4$, $B5$ and $C$ as shown below.

In [3]:
RKFA = np.array([[        0,          0,          0,         0,      0, 0],
                 [      1/4,          0,          0,         0,      0, 0],
                 [     3/32,       9/32,          0,         0,      0, 0],
                 [1932/2197, -7200/2197,  7296/2197,         0,      0, 0],
                 [  439/216,         -8,   3680/513, -845/4104,      0, 0],
                 [    -8/27,          2, -3544/2565, 1859/4104, -11/40, 0]])
RKFB4 = np.array([25/216, 0,  1408/2565,   2197/4104,  -1/5,    0])
RKFB5 = np.array([16/135, 0, 6656/12825, 28561/56430, -9/50, 2/55])
RKFC = np.array([0,1/4,3/8,12/13,1,1/2])

It is conventional to combine these coefficients into a single table called the Butcher tableau, as follows: $$ \begin{array}{|c|c|}\hline C & \hphantom{WWW} \stackrel{\stackrel{\vphantom{X}}{\displaystyle A}}{\vphantom{X}} \hphantom{WWW} \\ \hline 0 & B5 \\ \hline 0 & B4 \\ \hline \end{array} $$ We can do this in Python as follows:

In [4]:
butcher_tableau = np.zeros((8, 7 ))
butcher_tableau[0:6, 1:7] = RKFA
butcher_tableau[0:6,   0] = RKFC
butcher_tableau[  6, 1:7] = RKFB5
butcher_tableau[  7, 1:7] = RKFB4

print(np.round(butcher_tableau,6))
[[ 0.        0.        0.        0.        0.        0.        0.      ]
 [ 0.25      0.25      0.        0.        0.        0.        0.      ]
 [ 0.375     0.09375   0.28125   0.        0.        0.        0.      ]
 [ 0.923077  0.879381 -3.277196  3.320892  0.        0.        0.      ]
 [ 1.        2.032407 -8.        7.173489 -0.205897  0.        0.      ]
 [ 0.5      -0.296296  2.       -1.381676  0.452973 -0.275     0.      ]
 [ 0.        0.118519  0.        0.518986  0.506131 -0.18      0.036364]
 [ 0.        0.115741  0.        0.548928  0.535331 -0.2       0.      ]]

Part of the mathematical justification for the above numbers is that they have the following properties:

  • The sum of the entries in $B4$ is equal to one, and the same is true for $B5$
  • The sums of the rows in $A$ are the same as the entries in $C$
  • For each $j\lt 4$ we have $\sum_iB4_i C_i^j=1/(j+1)$
  • For each $j\lt 5$ we have $\sum_iB5_i C_i^j=1/(j+1)$

(There are also further properties that are much harder to explain.) We can check the above properties in Python as follows. All the calculations below should give zero, and we see that that is true with some tiny errors of size less than $10^{-15}$

In [5]:
print(RKFB4.sum() - 1)
print(RKFB5.sum() - 1)
print(RKFA . sum ( axis = 1 ) - RKFC)
print([(j+1) * RKFB4.dot(RKFC ** j) - 1 for j in range(4)])
print([(j+1) * RKFB5.dot(RKFC ** j) - 1 for j in range(5)])
0.0
0.0
[ 0.00000000e+00  0.00000000e+00  0.00000000e+00 -2.22044605e-16
 -3.33066907e-16 -1.11022302e-16]
[0.0, 0.0, 2.220446049250313e-16, 0.0]
[0.0, 0.0, 4.440892098500626e-16, 2.220446049250313e-16, 4.440892098500626e-16]
In [6]:
def runge_kutta_fehlberg(f, a, b, x0, h = None, epsilon = 1e-6):
    if h is None:
        h = (b - a) / 10
    x = x0
    ts = [a]
    xs = [x0]
    while (t := ts[-1]) < b:
        m = np.zeros((6,) + np.shape(x))
        for j in range(6):
            m[j] = f(t + RKFC[j] * h, x + np.tensordot(RKFA[j], m, (0,0)) * h)
        dx = np.dot(RKFB4, m) * h
        # estimate the error
        err = np.linalg.norm(np.tensordot(RKFB5 - RKFB4, m, (0,0)) * h)
        # step size ratio
        s = 0.9 * (epsilon / err) ** 0.2
        # If the estimated error is within the tolerance, we assume that 
        # our estimate of dx is OK so we add t + h to our list of t values
        # and x + dx to our list of x values, and use these values for
        # the next iteration. If the estimated error is too big then we
        # do not change t or x, and we do not add anything to our lists.
        # Instead we just proceed to the next line, which reduces the step
        # size, and then we go back to the start of the while loop,
        # recalculating m and dx with the smaller step size.
        if err < epsilon:
            t += h
            x += dx
            ts.append(t)
            xs.append(np.copy(x))

        # We now modify the step size.  If the estimated error is small then 
        # s > 1 and we increase the step size, but we do not increase it by
        # more than a factor of 2, to be on the safe side. If the estimated
        # error is large then s < 1 and we decrease the step size.  We also
        # include b - t in the min() function to ensure that the step size
        # does not take us past t = b, which is where we want to stop.
        h = min(s * h, 2 * h, b - t)
    return (np.array(ts), np.array(xs))

We now consider the example $dx/dt=-2x+u(t)$, where $u(t)=\exp(-2(t-5)^2)\sin(20t)$. This function $u(t)$ is a Gaussian wave packet; it is small for most of the time, but becomes reasonably large with rapid oscillations close to $t=5$.

In [7]:
def u(t):
    return np.exp(-2*(t-5)**2)*np.sin(20*t)

def f(t,x):
    return -2*x+u(t)

ts = np.linspace(0,10,1000)
plt.plot(ts, u(ts), label = 'u(t)')
plt.legend()
None
No description has been provided for this image

We now use the RKF method to solve the above equation for $0\leq t\leq 10$, starting with $x=1$ at $t=0$. The left hand plot shows the solution and the right hand plot shows the step sizes. You can see that the step sizes become smaller near $t=5$ when the oscillations come into play.

In [8]:
ts, xs = runge_kutta_fehlberg(f, 0, 10, 1)
fig, ax = plt.subplots(1,2)
fig.set_size_inches(10, 5)
ax[0].plot(ts, xs, label = 'x(t)')
ax[0].legend()
ax[1].plot(ts[1:], ts[1:]-ts[:-1],'.', label = 'step sizes')
ax[1].set_ylim(0)
ax[1].legend()
Out[8]:
<matplotlib.legend.Legend at 0x20e1f0f5190>
No description has been provided for this image

Now consider the second-order equation $\ddot{x}=-x(1+\dot{x})^3$. As usual we can reformulate this as a first-order equation for the pair $u=(x,v)$ where $v=\dot{x}$, namely $$ \dot{u} = (\dot{x},\dot{v})=(v,\ddot{x}) = (v,-x(1+v)^3) = (u_1,-u_0(1+u_1)^3). $$ At $t=0$ we assume that $dx/dt=0$ and $x=a$ for some parameter $a$ which can vary between $0$ and $1$. When $a$ is small the solution is close to a circle of radius $a$ and the step size does not vary very much. When $a$ is close to $1$ we find that there are periodic sharp spikes in the velocity $v$, and at those times the step size gets very small.

In [9]:
def g(t, u):
    return np.array([u[1], -(1 + u[1]) ** 3 * u[0]])

a = 0.95 # Try varying this from 0.1 to 0.95
ts, us = runge_kutta_fehlberg(g, 0, 4*np.pi, np.array([a, 0]), 0.01, 1e-9)
xs = us[:,0]
vs = us[:,1]
fig, ax = plt.subplots(1,3)
fig.set_size_inches(15, 5)
ax[0].plot(ts, xs, label = r"$x$")
ax[0].plot(ts, vs, label = r"$\dot{x}$")
ax[0].legend()
ax[1].plot(xs, vs, label = r"$(x(t),\dot{x}(t))$")
ax[1].legend()
ax[2].plot(ts[1:], ts[1:]-ts[:-1],'.' ,label = 'step sizes')
ax[2].set_ylim(0)
ax[2].legend()
Out[9]:
<matplotlib.legend.Legend at 0x20e1f17d490>
No description has been provided for this image

By default, the standard library function scipy.integrate.solve_ivp() uses a method quite similar to the one that we have discussed here. You can find the code for that implementation at https://github.com/scipy/scipy/blob/main/scipy/integrate/_ivp/rk.py

In [ ]: