Listing subsets¶

In this notebook we a function list_subsets(m,n) that returns the list of subsets of size $m$ in the set $\{0,1,\dotsc,n-1\}$. For example, list_subsets(2,4) returns the list

[[0,1],[0,2],[0,3],[1,2],[1,3],[2,3]]

This is the list of subsets of size $2$ in the set $\{0,1,2,3\}$, with each subset represented as a strictly increasing list.

We also run various checks to verify that the function list_subsets() is behaving correctly. In fact, the main purpose of this notebook is to introduce the framework of Python unit tests which we use to run these checks.

In [1]:
import math
import unittest

By a ramp of length $m$ bound $n$ we mean a list $r=[r_0,\dotsc,r_{m-1}]$ of nonnegative integers such that $r_0\leq r_1\leq\dotsb\leq r_{m-1}\leq n$. Given such a ramp, we see that the sequence $$\text{spread}(r)=[r_0,1+r_1,2+r_2,\dotsc,m-1+r_{m-1}]$$ is strictly increasing and so gives a subset of size $m$ in the set $\{0,1,\dotsc,n+m-1\}$. It is not hard to see that every subset of size $m$ in $\{0,1,\dotsc,n+m-1\}$ arises in this way from a unique ramp. Thus, we can list subsets by first listing the corresponding ramps and then applying the spread() function.

To list the ramps of length $m$ and bound $n$, we start with $r=[0,\dotsc,0]$, and then we repeatedly apply the function next_ramp() to find the next entry in the list. To see how this works, consider the ramp $r=[10,30,30,50,80,80,80,80,80]$ of length $9$ and bound $80$. We want to find the next ramp in "alphabetical order", so we want to keep the beginning of the list unchanged and make a change as far to the right as possible. We cannot increase any of the $80$'s because the bound is $80$, so we have to find the last entry which is strictly less than $80$, which is the $50$ in position $3$ (counting from position $0$ as usual). We increase that $50$ to $51$. Then, to make the sequence as early in alphabetical order as possible while still being a ramp, we replace all the $80$'s after the $50$ by $51$ as well, giving the sequence $r'=[10,30,30,51,51,51,51,51,51]$. As another example, consider the case where $m=4$ and $n=2$. If we start with $[0,0,0,0]$ and apply the above algorithm repeatedly we get the following list: $$ \begin{array}{ccccc} [0,0,0,0] & [0,0,0,1] & [0,0,0,2] & [0,0,1,1] & [0,0,1,2] \\ [0,0,2,2] & [0,1,1,1] & [0,1,1,2] & [0,1,2,2] & [0,2,2,2] \\ [1,1,1,1] & [1,1,1,2] & [1,1,2,2] & [1,2,2,2] & [2,2,2,2]. \end{array} $$ At the end we have $[2,2,2,2]$ and we see that there are no entries that can be increased so the process stops.

The function list_ramps_a() implements the method above. The functions list_ramps_b(), list_ramps_c() and list_ramps_d() implement other methods. Each of them is almost correct but has a subtle error. After defining these four functions we put list_ramps = list_ramps_a and then define list_subsets() in terms of list_ramps.

In [ ]:
def next_ramp(r, n):
    """ 
    This function expects r to be a ramp of some length m with upper bound n,
    and returns the next ramp (in lexicographic order) of the same length 
    and upper bound.  If r is already the last ramp, the function returns None.
    """
    i = len(r) - 1
    while i >= 0 and r[i] >= n:
        i -= 1
    if i < 0:
        return None
    return r[:i] + [r[i] + 1] * (len(r) - i)

def list_ramps_a(m, n):
    """
    This function returns the list of all ramps of length m with upper bound n,
    in lexicographic order.
    """
    if n < 0:
        return []
    r = [0] * m
    L = []
    while r is not None:
        L.append(r)
        r = next_ramp(r, n)
    return L

def list_ramps_b(m, n):
    """
    This function is supposed to return the list of all ramps of length m with 
    upper bound n, in lexicographic order.  However, it does not always do 
    this correctly.  
    """
    L = [[i] for i in range(n+1)]
    for k in range(m-1):
        L = [L[i] + [j] for i in range(len(L)) for j in range(L[i][-1], n+1)]
    return L

def list_ramps_c(m, n):
    """
    This function is supposed to return the list of all ramps of length m with 
    upper bound n, in lexicographic order.  However, it does not always do 
    this correctly.  
    """
    L = [[]]
    for k in range(m):
        L = [L[i] + [j] for i in range(len(L)) for j in range(max([0]+L[i]), n)]
    return L

def list_ramps_d(m, n):
    """
    This function is supposed to return the list of all ramps of length m with 
    upper bound n, in lexicographic order.  However, it does not always do 
    this correctly.  
    """
    if m == 0:
        return [[]]
    L = [[j] for j in range(n+1)]
    for k in range(m-1):
        L = [[j] + s for s in L for j in range(s[0]+1)]
    return L

list_ramps = list_ramps_a

def spread(r):
    """
    Given a ramp [r0, r1, r2, ...], this function returns the list
    [r0, r1+1, r2+2, ...] (which is strictly increasing and so can be 
    thought of as representing a subset of the integers).
    """
    return [i + x for i, x in enumerate(r)]

def list_subsets(m, n):
    """
    This function returns the list of all subsets of {0, 1, ..., n-1} of size m
    (with each subset represented as a strictly increasing list of integers, 
    and the list of subsets being in lexicographic order).
    If list_ramps is defined to be list_ramps_a, then this function is correct.
    If list_ramps is defined to be one of the other functions, then this function
    will not always give the right answer.
    """
    L = list_ramps(m, n-m)
    return [spread(r) for r in L]

We now set up some unit tests to check that our function list_subsets(m, n) works correctly. Specifically, we will check the following properties:

  • The return value L = list_subsets(m ,n) should be a list of lists, say L = [s[0],...,s[N-1]].
  • Each of the individual lists s[i] should itself be a list of length m.
  • The entries in each list s[i]=[a[0],...,a[m-1]] should be integers with 0 ≤ a[j] < n, and a[j] < a[j+1] if j < m-1.
  • The list L itself should be sorted in lexicographic/"alphabetical" order.
  • The length N of the list L should be equal to the binomial coefficient $\binom{n}{m}$.

To do this, we define a class SubsetTest, which is a subclass of the class unittest.TestCase documented here. This has some methods whose names start with test_ and some others whose names start with check_. The ones that start with check_ each check one of the expected properties for a particular value of $n$ and $m$. The test_ methods run the check_ methods for all $n$ and $m$ in $\{1,2,3,4,5\}$.

Later we will execute unittest.main(). This looks for all subclasses of unittest.TestCase that have been defined, and for all methods of such classes whose names start with test_. It executes all those methods and reports any problems that arise. Specifically, if self.assertTrue(t) is ever called in a context where t is not true then a problem will be reported, and if self.assertEqual(a, b) is ever called in a context where $a\neq b$ then a problem will again be reported. (There are also a few other assertion methods that we are not using here.)

Note that all of our test_ methods include the line with self.subTest(m=m, n=n):. We will not explain the full background here, but the effect is to ensure that if the test fails then the relevant values of m and n will be reported.

In [ ]:
class SubsetTest(unittest.TestCase):
    def setUp(self):
       self.max_n = 5
       self.max_m = 5

    def check_type(self, m, n):
        """Check that list_subsets(m,n) returns a list of lists, each of length m."""
        L = list_subsets(m,n)
        self.assertTrue(isinstance(L, list))
        self.assertTrue(all([(isinstance(s, list) & (len(s) == m)) for s in L]))

    def check_all_int(self, m, n):
        """Check that list_subsets(m,n) returns a list of lists of integers."""
        L = list_subsets(m,n)
        self.assertTrue(all([all([isinstance(s[i],int) for i in range(m)]) for s in L]))

    def check_all_in_range(self, m, n):
        """Check that list_subsets(m,n) returns a list of lists of integers in [0,n)."""
        L = list_subsets(m,n)
        self.assertTrue(all([all([0 <= s[i] < n for i in range(m)]) for s in L]))

    def check_all_sorted(self, m, n):
        """Check that the lists returned by list_subsets(m,n) are each strictly increasing."""
        L = list_subsets(m,n)
        self.assertTrue(all([all([s[i] < s[i+1] for i in range(m-1)]) for s in L]))

    def check_sorted(self, m, n):
        """
        Check that list_subsets(m,n) is itself strictly increasing wrt lexicographic order.
        Python automatically compares lists lexicographically, so we do not need to write
        any extra code for that.
        """
        L = list_subsets(m,n)
        self.assertTrue(all([L[i] < L[i+1] for i in range(len(L)-1)]))

    def check_length(self, m, n):
        """Check that list_subsets(m,n) returns exactly (n choose m) subsets."""
        self.assertEqual(len(list_subsets(m,n)), math.comb(n, m))

    def test_0_type(self):
        for m in range(self.max_m+1):
            for n in range(self.max_n+1):
                # The line 'with self.subTest(m=m, n=n):' ensures that if 
                # self.check_type(n, m) fails, the error message will include
                # the values of n and m that caused the failure.
                with self.subTest(m=m, n=n):
                    self.check_type(m, n)

    def test_1_all_sorted(self):
        for m in range(self.max_m+1):
            for n in range(self.max_n+1):
                with self.subTest(m=m, n=n):
                    self.check_all_sorted(m, n)

    def test_2_all_in_range(self):
        for m in range(self.max_m+1):
            for n in range(self.max_n+1):
                with self.subTest(m=m, n=n):
                    self.check_all_in_range(m, n)

    def test_3_sorted(self):
        for m in range(self.max_m+1):
            for n in range(self.max_n+1):
                with self.subTest(m=m, n=n):
                    self.check_sorted(m, n)

    def test_4_length(self):
        for m in range(self.max_m+1):
            for n in range(self.max_n+1):
                with self.subTest(m=m, n=n):
                    self.check_length(m, n)

We now run all the tests by calling unittest.main(). The argument argv=[''] does not really do anything but is required for technical reasons. The argument verbosity=2 causes some extra information to be printed. The argument exit=False means that if one test fails, the system will still continue and run the other tests instead of just giving up.

If you change the line list_ramps = list_ramps_a to list_ramps_b or list_ramps_c or list_ramps_d then you will see that one or more tests fail. If you read the failure messages carefully and explore with the debugger then you should be able to work out what is wrong with these functions and fix them.

In [13]:
list_ramps = list_ramps_a

unittest.main(argv=[''], verbosity=2, exit=False)
test_0_type (__main__.SubsetTest.test_0_type) ... ok
test_1_all_sorted (__main__.SubsetTest.test_1_all_sorted) ... ok
test_2_all_in_range (__main__.SubsetTest.test_2_all_in_range) ... ok
test_3_sorted (__main__.SubsetTest.test_3_sorted) ... ok
test_4_length (__main__.SubsetTest.test_4_length) ... ok

----------------------------------------------------------------------
Ran 5 tests in 0.008s

OK
Out[13]:
<unittest.main.TestProgram at 0x23711470450>
In [ ]: