# MAS2008 Scientific Computing: Lab 9 Signal processing

## Instructions

The following notebooks and videos are relevant for this lab:

To try the above notebooks yourself, you will need to download the following data files:
There is an online test covering parts of Tasks 1 and 2.

Consider the following pictures:

Define Python functions f0(x),...,f4(x) that give these graphs. For example, the sawtooth wave can be done as follows:

def f1(x):
return x - np.floor(x)

The others can also be defined by combining np.abs(), np.mod(), np.floor(), np.round() and np.sin() in appropriate ways. Make sure that your functions work correctly on arrays as well as scalars. Plot your functions to check that they are correct. Put functions=[f0,f1,f2,f3,f4], to make it easier to use the functions in a loop etc.

Now put N=64. Write your code for the subtasks below in terms of N, so that you can easily change it if necessary. Put ts=np.arange(0,1,1/N). (Note that this gives an array whose last entry is $1-1/N$, not $1$. This is appropriate if we want exact formulae for Fourier transforms and so on.) For each of the above functions $f()$, calculate the values on the array ts then use scipy.fft.fft() to calculate the discrete Fourier transform. (One could also use np.fft.fft(), but we choose to use scipy functions throughout this lab.) This gives a one-dimensional array ft of N complex numbers. Because the values of the original function are real, it works out that ft[N-1-i] is the complex conjugate of ft[i], so only the first N/2 values are useful. Plot the absolute values of ft[i]/N for i<N/2. (In the past we have called plt.plot ax.plot with two arguments x and y, but it you can also do ax.plot(y) and it will plot y against the numbers $0$ to $n-1$, where $n$ is the length of y.) It is best to plot the values as isolated points, not as a connected curve. In many cases, you will see that the pattern is different of odd-numbered and even-numbered values. In one case the behaviour depends on the value of the index mod $4$.

When N is large and k is much smaller than N, it can be shown that ft[k]/N is approximately as follows:
 Function $k=0$ $k>0$ even $k>0$ odd 0 Square wave $1/2$ $0$ $i/(\pi k)$ 1 Sawtooth wave $1/2$ $i/(2\pi k)$ 2 Triangle wave $1/4$ $0$ $-1/(\pi^2k^2)$ 3 Pulse wave $1/4$ $-i(1-(-i)^k)/(2\pi k)$ 4 Rectified sine wave $2/(\pi(1-4k^2))$
Write some code to calculate the values in the above table, and plot them (either using the absolute values, real parts or imaginary parts) alongside the values of ft[k]/N obtained previously. Here you should let k go all the way up to N-1, to see the different pattern of behaviour for small k and large k. Remember that Python notation for the square root of $-1$ is 1j.

Download and unpack the file ecg.zip. You will find that there is a file types.txt and a collection of files AFIB.mat, AFL.mat and so on. Each of the .mat files contains electrocardiogram data (a record of the electrical activity of the heart) for a patient with a particular condition. (This is a small subset of the data used in the paper Novel genetic ensembles of classifiers applied to myocardium dysfunction recognition based on ECG signals by Paweł Pławiak.)

One of the lines in the file types.txt is 04,AFIB,Atrial fibrillation, indicating that the file AFIB.mat contains data for a patient with atrial fibrillation; the other lines give corresponding information for the other files.

The .mat files are in Matlab format, so they can be read using scipy.io.loadmat(). If you inspect the value returned by scipy.io.loadmat(), you will find that it contains a one-dimensional numpy array wrapped up in some useless packaging. You will need to work out how to extract the one-dimensional array itself. The array contains 10 seconds of data, with 360 measurements per second.

Write a function make_ecg_dict() that reads the contents of types.txt and returns a dict ecg_dict. This should be set up in such a way that ecg_dict[4]=ecg_dict['AFIB']=(4,'AFIB','Atrial fibrillation'). Then run ecg_dict = make_ecg_dict() so that the dict is available in the following subtasks.

Now write a function get_ecg_data(n). This should allow n to either be a number or a code like 'AFIB'. It should return a tuple (index,code,full_name,data) where index is the number of the relevant line in types.txt, code is the code like 'AFIB', full_name is the full name like 'Atrial fibrillation', and data is the numpy array of data for the patient.

Now get the data from one of the files. Normalise it by subtracting the mean and dividing by the standard deviation. Plot the normalised data. The $x$-axis should be time in seconds (remembering that there are 360 measurements per second). For most of the files, you should see that there are regularly spaced sharp peaks. Use scipy.signal.find_peaks() to find the location of the peaks. You may need to experiment with the parameters of scipy.signal.find_peaks() to pick out the primary peaks and ignore spurious ones closer to zero. Add red dots marking the peaks on the plot. Calculate the average spacing between the peaks.

Now apply a high-pass filter to the data, to screen out the low frequency components. This can be done as follows:

filter = scipy.signal.butter(4, 0.2, 'high', output='sos')
x_filtered = scipy.signal.sosfilt(filter, x)

The first couple of seconds of the result are not reliable, so you should discard them. Plot the rest of the filtered data.

Now use scipy.signal.periodogram() to calculate the power spectrum of the data. Note that there is an optional argument that you can use to specify that the sampling frequency is 360 Hz. This will return a pair (fs, ps), where fs is an array of frequencies and ps is an array of power values. Discard all frequencies greater than 50 Hz. (For example, you could use boolean indexing to do this.) Plot the power spectrum.

Now write a function show_ecg(n) to do all the above steps. It should accept an argument n as in get_ecg_data(). It should plot a figure of size $12\times 4$ with three subplots. The first subplot should show the normalised data with red dots marking the peaks, the middle subplot should show the filtered data, and the last subplot should show the power spectrum. The title of the first subplot should be the full name of the condition, and the titles of the other two should be 'High-pass filtered' and 'Power spectrum' respectively.

Download and unpack the file birds.zip. You will find that there is a file blackbird.wav containing a short recording of a blackbird, and four similar files for other birds.

Write a function play_sound(n) which plays the recording of the bird named n. You may need to use pip to install the package simpleaudio first. (There are also other packages that do the same job, you could investigate the possibilities if you like.)

Choose a bird, and use scipy.io.wavfile.read() to get the data from the corresponding file. This will return a pair (rate, data), where rate is the sampling rate and data is the array of data. For a mono recording, data will be a one-dimensional array. For a stereo recording, data will be two-dimensional (so data.ndim will be $2$). In this case, you should do data = data.mean(axis=1) to take the average of the two channels. Then normalise the data by subtracting the mean and dividing by the standard deviation. If n is the length of the data, then the corresponding array of times is t = np.arange(n)/rate. Plot the normalised data against the times.

We will now make spectrograms of the recordings. (If you search Google Images for 'birdsong spectrogram' you will see many examples of the kind of thing we are trying to do.) With data and rate as above, you can use

spf, spt, spp = scipy.signal.spectrogram(data, fs=rate)

to calculate the spectrogram. This will set spf to an array of frequencies, spt to an array of times, and spp to a two-dimensional array of power values. The value spp[i,j] is the power at frequency spf[i] and time spt[j]. To plot these, we could do plt.pcolormesh(spt, spf, spp). However, this gives a picture which is basically just an evenly coloured rectangle. The main problem is that the range of values in the data is too large. We can fix this by replacing spp with np.maximum(0,np.log(spp)+15). Also, it is better to make the colour white in places where the power is very low. For this we need to use the cmap argument to change the colour map. There is a list of possibilities at matplotlib.org. The key point is to choose one that has white at the left hand end.

Now wrap up the above steps in a function show_spectrogram(n). This should generate a figure of size $10\times 5$ with two subplots. The upper half should plot the normalised data against the times, and the lower half should plot the spectrogram. To make the time axes of the two subplots match up, you should set the limits of $x$ in the upper plot to t[0] and t[-1]. You should also set the title of the upper plot to the name of the bird. Run the function for all five birds.