Universal Functions¶

This notebook covers roughly the same ground as this video.

In [ ]:

import numpy as np

Here is a one-dimensional array

In [ ]:

a = np.linspace(0,2*np.pi,101)
a

The function np.sin is a universal function or ufunc. It can be applied directly to an array, to calculate sin of every element in the array. Because of the way Python is implemented, this is much faster than using a loop to do the same job. We can also apply np.round to the result to round each entry to three decimal places.

In [ ]:

np.round(np.sin(a),3)

We can also add or multiply arrays of the same shape, or raise an array to a power. This is again much faster than using loops to do the same job.

In [ ]:

a = np.arange(5)
b = 10 ** a
print(f'a    = {a}')
print(f'b    = {b}')
print(f'a+b  = {a+b}')
print(f'a*b  = {a*b}')
print(f'a**2 = {a ** 2}')

Note that when forming a * b we just multiply the corresponding entries in a and b, which is not the same as standard matrix multiplication. For arrays of appropriate shape, we can enter a @ b for the usual matrix product.

In [ ]:

a = np.array([[1, 2],[3, 4]])
b = np.array([[100, 1000],[100, 1000]])
print(f'a = \n{a}')
print(f'b = \n{b}')
print(f'a * b = \n{a*b}')
print(f'a @ b = \n{a@b}')

We can use %timeit to see how long a piece of code takes. Python will run the code will be run a large number of times, and report the mean and standard deviation of the amount of time taken. The example below illustrates the fact that universal functions are much faster than loops.

In [ ]:

u = np.arange(1000)
print('Using universal function')
%timeit np.sin(u)
print('Using a loop')
%timeit [np.sin(x) for x in u]

We now define three different functions to generate a list of random numbers and calculate their variance. The standard formula is as follows: given a list $x_0,\dotsc,x_{n-1}$, we put $\overline{x}=\frac{1}{n}\sum_ix_i$ and $\overline{x^2}=\frac{1}{n}\sum_ix_i^2$, then the variance is $\overline{x^2}-\overline{x}^2$.

The function var0() uses rnd.random() to generate random numbers one at a time, then uses a loop to calculate $\sum_ix_i$ and $\sum_ix_i^2$. This is slow and inefficient.
The function var1() uses np.random.random() to generate a long list of random numbers as a single vectorised operation, then uses further vectorised operations to calculate $\sum_ix_i$ and $\sum_ix_i^2$. This is much faster, and the code is also shorter and easier to understand.
The function var2() again uses np.random.random() to generate random numbers, and then uses the built in numpy method var() to calculate the variance. A very large number of standard mathematical operations are available in numpy, scipy and other libraries; you should avoid reinventing the wheel where possible.

We again use %timeit to check the performance of these three alternatives. This time, we write %timeit -n100, which ensures that Python will do 7 groups of 100 runs of each function. If we omit the -n then Python will decide for itself how many runs there should be in each group, and will usually use a much larger number than 100. That is sensible if you want an accurate answer, but if you just want a crude indication then it is overkill and slows things down. We find that var1() is a bit quicker than var2(); I am not sure why that is the case. However, both var1() and var2() are much faster than var0().

In [ ]:

import random as rnd

def var0(n = 1000):
    l = []
    for i in range(n):
        l.append(rnd.random())
    sum_x = 0
    sum_x2 = 0
    for x in l:
        sum_x += x
        sum_x2 += x**2
    mean_x = sum_x / n
    mean_x2 = sum_x2 / n
    var = mean_x2 - mean_x**2
    return var

def var1(n = 1000):
    l = np.random.random(n)
    return (l ** 2).sum() / n - l.sum() ** 2 / n ** 2

def var2(n = 1000):
    return np.random.random(n).var()

print('Version 0')
%timeit -n100 var0(10000)
print('Version 1')
%timeit -n100 var1(10000)
print('Version 2')
%timeit -n100 var2(10000)

In [ ]: