Signal processing

Trajectory of the International Space Station | View | Download | |

Musical instruments | View | Download |

To try the above notebooks yourself, you will need to download the following data files:

ISS.txt | instruments.zip |

There is an online test covering parts of Tasks 1 and 2.

Define Python functions

`f0(x),...,f4(x)`

that give these graphs.
For example, the sawtooth wave can be done as follows:
```
def f1(x):
return x - np.floor(x)
```

`np.abs()`

, `np.mod()`

, `np.floor()`

,
`np.round()`

and `np.sin()`

in appropriate ways.
Make sure that your functions work correctly on arrays as well as scalars.
Plot your functions to check that they are correct. Put
`functions=[f0,f1,f2,f3,f4]`

, to
make it easier to use the functions in a loop etc.
Now put

`N=64`

. Write your code for the subtasks below
in terms of `N`

, so that you can easily change it if necessary.
Put `ts=np.arange(0,1,1/N)`

. (Note that this gives an array
whose last entry is $1-1/N$, not $1$. This is appropriate if we want
exact formulae for Fourier transforms and so on.) For each of the above
functions $f()$, calculate the values on the array `ts`

then use
`scipy.fft.fft()`

to calculate the discrete Fourier transform.
(One could also use
`np.fft.fft()`

,
but we choose to use `scipy`

functions throughout this lab.)
This gives a one-dimensional array `ft`

of `N`

complex numbers. Because the values of the original function are real,
it works out that `ft[N-1-i]`

is the complex conjugate of
`ft[i]`

, so only the first `N/2`

values are
useful. Plot the absolute values of `ft[i]/N`

for
`i<N/2`

. (In the past
we have called `plt.plot`

`ax.plot`

with two
arguments `x`

and `y`

, but it you can also do
`ax.plot(y)`

and it will plot `y`

against the
numbers $0$ to $n-1$, where $n$ is the length of `y`

.)
It is best to plot the values as isolated points, not as a connected
curve. In many cases, you will see that the pattern is different
of odd-numbered and even-numbered values. In one case the behaviour
depends on the value of the index mod $4$.
When

`N`

is large and `k`

is much smaller than `N`

,
it can be shown that `ft[k]/N`

is approximately as follows:
Function | $k=0$ | $k>0$ even | $k>0$ odd | |

0 | Square wave | $1/2$ | $0$ | $i/(\pi k)$ |

1 | Sawtooth wave | $1/2$ | $i/(2\pi k)$ | |

2 | Triangle wave | $1/4$ | $0$ | $-1/(\pi^2k^2)$ |

3 | Pulse wave | $1/4$ | $-i(1-(-i)^k)/(2\pi k)$ | |

4 | Rectified sine wave | $2/(\pi(1-4k^2))$ |

`ft[k]/N`

obtained previously. Here
you should let `k`

go all the way up to `N-1`

,
to see the different pattern of behaviour for small `k`

and
large `k`

. Remember that Python notation for the
square root of $-1$ is `1j`

.
`types.txt`

and a collection of files
`AFIB.mat`

, `AFL.mat`

and so on. Each of the
`.mat`

files contains electrocardiogram data (a record of
the electrical activity of the heart) for a patient with a particular
condition. (This is a small subset of the data used in the paper
Novel genetic ensembles of classifiers applied to
myocardium dysfunction recognition based on ECG signals by
Paweł Pławiak.)
One of the lines in the file

`types.txt`

is
`04,AFIB,Atrial fibrillation`

, indicating that the file
`AFIB.mat`

contains data for a patient with atrial
fibrillation; the other lines give corresponding information for the
other files.
The

`.mat`

files are in Matlab format, so they can be read
using
`scipy.io.loadmat()`

.
If you inspect the value returned by `scipy.io.loadmat()`

,
you will find that it contains a one-dimensional `numpy`

array
wrapped up in some useless packaging. You will need to work out how to
extract the one-dimensional array itself. The array contains 10 seconds of
data, with 360 measurements per second.
Write a function

`make_ecg_dict()`

that reads the contents
of `types.txt`

and returns a dict `ecg_dict`

.
This should be set up in such a way that
`ecg_dict[4]=ecg_dict['AFIB']=(4,'AFIB','Atrial fibrillation')`

.
Then run `ecg_dict = make_ecg_dict()`

so that the
dict is available in the following subtasks.
Now write a function

`get_ecg_data(n)`

. This should allow
`n`

to either be a number or a code like `'AFIB'`

.
It should return a tuple `(index,code,full_name,data)`

where
`index`

is the number of the relevant line in
`types.txt`

, `code`

is the code like
`'AFIB'`

, `full_name`

is the full name like
`'Atrial fibrillation'`

, and `data`

is the
`numpy`

array of data for the patient.
Now get the data from one of the files. Normalise it by subtracting the mean and dividing by the standard deviation. Plot the normalised data. The $x$-axis should be time in seconds (remembering that there are 360 measurements per second). For most of the files, you should see that there are regularly spaced sharp peaks. Use

`scipy.signal.find_peaks()`

to find the location of the peaks.
You may need to experiment with the parameters of
`scipy.signal.find_peaks()`

to pick out the primary peaks
and ignore spurious ones closer to zero. Add red dots marking the
peaks on the plot. Calculate the average spacing between the peaks.
Now apply a high-pass filter to the data, to screen out the low frequency components. This can be done as follows:

```
filter = scipy.signal.butter(4, 0.2, 'high', output='sos')
x_filtered = scipy.signal.sosfilt(filter, x)
```

Now use

`scipy.signal.periodogram()`

to calculate the power spectrum of the data. Note that there
is an optional argument that you can use to specify that the
sampling frequency is 360 Hz. This will return a pair
`(fs, ps)`

, where `fs`

is an array of
frequencies and `ps`

is an array of power values.
Discard all frequencies greater than 50 Hz. (For example, you
could use boolean indexing to do this.)
Plot the power spectrum.
Now write a function

`show_ecg(n)`

to do
all the above steps. It should accept an argument `n`

as in `get_ecg_data()`

. It should plot a figure
of size $12\times 4$ with three subplots. The first subplot
should show the normalised data with red dots marking the peaks,
the middle subplot should show the filtered data, and the last
subplot should show the power spectrum. The title of the
first subplot should be the full name of the condition,
and the titles of the other two should be 'High-pass filtered'
and 'Power spectrum' respectively.
`blackbird.wav`

containing a short recording of a blackbird, and four similar
files for other birds.
Write a function

`play_sound(n)`

which plays the
recording of the bird named `n`

. You may need to use
`pip`

to install the package `simpleaudio`

first. (There are also other packages that do the same job,
you could investigate the possibilities if you like.)
Choose a bird, and use

`scipy.io.wavfile.read()`

to get the data from the corresponding file. This will return a pair
`(rate, data)`

, where `rate`

is the sampling
rate and `data`

is the array of data. For a mono
recording, `data`

will be a one-dimensional array.
For a stereo recording, `data`

will be two-dimensional
(so `data.ndim`

will be $2$). In this case, you should
do `data = data.mean(axis=1)`

to take the average of
the two channels. Then normalise the data by subtracting the mean
and dividing by the standard deviation. If `n`

is the
length of the data, then the corresponding array of times is
`t = np.arange(n)/rate`

. Plot the normalised data
against the times.
We will now make spectrograms of the recordings. (If you search Google Images for 'birdsong spectrogram' you will see many examples of the kind of thing we are trying to do.) With

`data`

and `rate`

as above, you can use
```
spf, spt, spp = scipy.signal.spectrogram(data, fs=rate)
```

`spf`

to
an array of frequencies, `spt`

to an array of times, and
`spp`

to a two-dimensional array of power values.
The value `spp[i,j]`

is the power at frequency
`spf[i]`

and time `spt[j]`

. To plot these,
we could do `plt.pcolormesh(spt, spf, spp)`

. However,
this gives a picture which is basically just an evenly coloured
rectangle. The main problem is that the range of values in the data
is too large. We can fix this by replacing `spp`

with
`np.maximum(0,np.log(spp)+15)`

. Also, it is better
to make the colour white in places where the power is very low.
For this we need to use the `cmap`

argument to change
the colour map. There is a list of possibilities at
matplotlib.org.
The key point is to choose one that has white at the left hand end.
Now wrap up the above steps in a function

`show_spectrogram(n)`

.
This should generate a figure of size $10\times 5$ with two subplots.
The upper half should plot the normalised data against the times, and
the lower half should plot the spectrogram. To make the time axes
of the two subplots match up, you should set the limits of $x$
in the upper plot to `t[0]`

and `t[-1]`

.
You should also set the title of the upper plot to the name of the bird.
Run the function for all five birds.