Trajectory of the International Space Station¶

In this notebook we do some signal processing on a dataset showing the trajectory of the International Space Station (which was obtained from https://spotthestation.nasa.gov/trajectory_data.cfm)

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import datetime
import scipy

The data file starts with lots of header lines. The lines with real data all start with '2024', so we skip any lines that do not start like that. Each data line consists of a time (like 2024-04-19T12:28:00.000) followed by $x$, $y$ and $z$ components of position (with respect to a certain coordinate frame), and the $x$, $y$ and $z$ components of velocity. We use datetime.datetime.strptime() to parse the time, using the format string %Y-%m-%dT%H:%M:%S to specify how it is formatted: a 4-digit year, a dash, a 2-digit month, a dash, a 2-digit day, the letter 'T', and so on. We then use the timestamp() method to get the number of seconds from the beginning of 1970. We subtract the timestamp of the beginning of the dataset to get the number of seconds since the beginning of the dataset. We also parse the $x$, $y$ and $z$ component as floating point numbers, and ignore the components of velocity.

In [132]:
data_dir = '../data/astronomy/'
with open(data_dir + 'ISS.txt') as f:
    lines = f.readlines()

data = []
t_start = None
for line in lines:
    if not(line.startswith('202')):
        continue
    x = line.split()
    t = int(datetime.datetime.strptime(line[0:19], '%Y-%m-%dT%H:%M:%S').timestamp())
    if t_start is None:
        t_start = t
    data.append([t - t_start, float(x[1]), float(x[2]), float(x[3])])

data = np.array(data)
t0 = data[:,0].astype(int)
xyz0 = data[:,1:]

Most of the records in the file are at taken at intervals of 4 minutes (which is 240 seconds), but there are some deviations from that pattern. This would mess up any attempt to do frequency analysis, so we need to tidy things up. We make a new array t which contains times increasing by 240 seconds at every step, then we make a new array xyz with the corresponding x, y and z values. Many of the times in t will also occur in t0 and in those cases we just copy the relevant value of xyz0 into xyz. For times in t that do not appear in t0, we use linear interpolation to get the relevant value of xyz.

In [ ]:
n = int(t0[-1]/240)
t = [0]
xyz = [xyz0[0]]
for i in range(1,n):
    j = np.argmax(t0 >= i*240)
    s = (t0[j] - i*240)/(t0[j] - t0[j-1])
    t.append(i*240)
    xyz.append(xyz0[j] + s*(xyz0[j-1]-xyz0[j]))

t = np.array(t) / 3600
xyz = np.array(xyz)
x = xyz[:,0]
y = xyz[:,1]
z = xyz[:,2]

Lines 34 to 41 of the data file list three events during the relevant time period which could affect the trajectory (boosters firing, other spacecraft docking and so on). The time of the first of these is given as 117:02:50:00.000. Here the 117 refers to the 117th day of 2024, which is April 26 or 2024-04-26. After performing this kind translation we get the following list of three times, listed in hours from the beginning of the dataset for compatibility with our other time data.

In [133]:
events = ['2024-04-26T02:50:00', '2024-04-26T17:00:00', '2024-04-30T12:00:00']
events = [(datetime.datetime.strptime(e, '%Y-%m-%dT%H:%M:%S').timestamp() - t_start) / (60 * 60) for e in events]
events = np.array(events)
events
Out[133]:
array([158.83333333, 173.        , 264.        ])

The dataset has 5400 entries (one every 4 minutes for 15 days). If we try to plot them all, the result will be very crowded. We therefore define a function show_orbit(tmin,tmax) which shows the records for times between tmin and tmax. The top part of the figure shows $x$, $y$ and $z$ plotted against time. Times of events are shown as black dashed vertical lines. The bottom half shows plots of $x$ against $y$, $x$ against $z$ and $y$ against $z$.

In [163]:
def show_orbit(tmin = 0, tmax = 30):
    n = np.floor(tmin * 15).astype(int)
    m = np.ceil(tmax * 15).astype(int)
    t0 = t[n:m]
    x0 = x[n:m]
    y0 = y[n:m]
    z0 = z[n:m]
    plt.figure(figsize=(12,4))
    plt.plot(t0, x0,label='x')
    plt.plot(t0, y0,label='y')
    plt.plot(t0, z0,label='z')
    for e in events:
        if tmin <= e <= tmax:
            plt.axvline(e, color='k', linestyle='--')
    plt.legend()
    plt.show()
    fig, ax = plt.subplots(1, 3, figsize=(12,4))
    ax[0].plot(x0, y0)
    ax[0].set_title(r'$x$ against $y$')
    ax[1].plot(x0, z0)
    ax[1].set_title(r'$x$ against $z$')
    ax[2].plot(y0, z0)
    ax[2].set_title(r'$y$ against $z$')
    plt.show()
    
In [162]:
show_orbit(150,180)
No description has been provided for this image
No description has been provided for this image

We now use scipy.signal.periodogram() to calculate the power spectrum of $x$, $y$ and $z$. This tells us how much of the variation is at different frequencies. As is clear from the above, almost all of the variation is at a single frequency, with only minor perturbations from a purely periodic function.

When calling scipy.signal.periodogram(), it is useful to use the optional argument fs to specify the sampling frequency. In our dataset, time is measured in hours, and we have one sample every four minutes, which means 15 samples per hour, so fs = 15. The horizontal axis of the plot below is then in cycles per hour. The spike is at about 2/3 cycles per hour, corresponding to a period of about 3/2 hours or 90 minutes.

In [137]:
fx, px = scipy.signal.periodogram(x, fs = 15)
fy, py = scipy.signal.periodogram(y, fs = 15)
fz, pz = scipy.signal.periodogram(z, fs = 15)
plt.plot(fx, px)
plt.plot(fy, py)
plt.plot(fz, pz)
Out[137]:
[<matplotlib.lines.Line2D at 0x1b98cea1490>]
No description has been provided for this image

Because the variation is so strongly dominated by a single frequency, it is hard to see anything else. We can fix this by plotting the log of the power instead of the power. When the log is negative, the power is very low, and it is not very interesting to see exactly how low. We therefore plot max(0,log(p)), which cuts off the negative values. Also, the signals for $x$, $y$ and $z$ are not interestingly different, so we use the square root of the sum of the squares as a measure of the overall power. There is a small but noticeable spike at about 1.3: we zoom in on that in the right hand picture.

In [148]:
pp = np.sqrt(px * px + py * py + pz * pz)
fig, ax = plt.subplots(1, 2, figsize=(12,6))
ax[0].plot(fx[2:-1], np.maximum(0,np.log(pp))[2:-1])
ax[1].plot(fx[460:480], np.maximum(0,np.log(pp))[460:480])
Out[148]:
[<matplotlib.lines.Line2D at 0x1b98ddfad10>]
No description has been provided for this image

We now find the exact value of the primary period and also the spike near 1.3 which we call the secondary period. We find that the primary period is almost exactly twice as large as the secondary one.

In [150]:
primary_frequency = fx[np.argmax(pp)]
secondary_frequency = fx[460:480][np.argmax(pp[460:480])]
primary_period_hours = 1 / primary_frequency
secondary_period_hours = 1 / secondary_frequency
primary_period_minutes = primary_period_hours * 60
secondary_period_minutes = secondary_period_hours * 60
print(f"The primary period is {primary_period_minutes:.3f} minutes")
print(f"The secondary period is {secondary_period_minutes:.3f} minutes")
print(f"The ratio is {primary_period_minutes / secondary_period_minutes:.3f}")
The primary period is 92.704 minutes
The secondary period is 46.452 minutes
The ratio is 1.996

We next find the times when $x$ changes from positive to negative or vice-versa. Suppose that $x$ changes sign between $t_i$ and $t_{i+1}=t_i+\delta$, so $x_ix_{i+1}<0$. If we assume that $x$ changes at constant speed between these times, then the time when $x=0$ will be $t_i+x_i\delta/(x_i+x_{i+1})$. In our case $\delta$ is $4$ minutes or $1/15$ hours. Thus, the code below gives an array zx of times when $x=0$. Half of these are times when $x$ changes from negative to positive, and the other half are times when $x$ changes from positive to negative. We separate these cases into zx_even and zx_odd. We then put zxd_even=np.diff(zx_even), which means that zxd_even[i]=zx_even[i+1]-zx_even[i], and similarly for zxd_odd. We expect that the values in zxd_even and zxd_odd should be close to the primary period. The plot below shows that they ae actually slightly larger than the primary period, by about 0.2 minutes, and that they decrease gradually over the course of the dataset. There is also a noticeable jump at the time of the first event, which is a reboost operation.

In [166]:
zx = []
for i in range(x.shape[0] - 1):
    if x[i] * x[i + 1] < 0:
        zx.append((i + x[i]/(x[i] - x[i + 1])) / 15)
zx = np.array(zx)
zx_even = zx[::2]
zx_odd = zx[1::2]
zxd_even = np.diff(zx_even)
zxd_odd = np.diff(zx_odd)
plt.plot(zx_even[1:], 60 * zxd_even,'r.')
plt.plot(zx_odd[1:], 60 * zxd_odd,'g.')
plt.axhline(primary_period_minutes, color='b', linestyle='--')
for e in events:
    plt.axvline(e, color='k', linestyle='--')
No description has been provided for this image
In [ ]: