Strings¶

This notebook reviews some Python functions for dealing with strings. It covers roughly the same ground as this video.

Here are two strings which we will use as examples.

In [1]:
question = 'What is the air-speed velocity of an unladen swallow?'
fox = 'The quick brown fox jumps over the lazy dog'

The len() function can be used to find the length of a list, a set, a string or any of various othe Python objects.

In [2]:
len(question)
Out[2]:
53
  • The notation question[10] refers to character number 10 in the string question. Here it is important to remember that the count starts at zero, so question[0]='W' and question[1]='h' and so on.
  • Similarly, question[5:7] refers to the substring consisting of the characters numbered 5 and 6. This follows the standard Python convention for ranges: the notation n:m refers to the indices $i$ with $n\leq i\lt m$, so the case $i=n$ is included but the case $i=m$ is excluded.
  • It is also possible to use negative indices. The last character in the string counts as position -1, the character before that counts as position -2 and so on. The notation question[-8:] refers to the substring starting at position -8 with no explicit endpoint, so it runs all the way to the end of the string.
  • The substring question[::10] goes from the beginning to the end with steps of length 10, so we get every tenth character of the question. The length of the steps is called the "stride".
  • The substring question[::-1] goes from the end to the beginning with steps of length 1 backwards, so we get the reversed string.
In [13]:
print(f"{question[10]=}")
print(f"{question[5:7]=}")
print(f"{question[-8:]=}")
print(f"{question[::10]=}")
print(f"{question[::-1]=}")
question[10]='e'
question[5:7]='is'
question[-8:]='swallow?'
question[::10]='Wed ao'
question[::-1]='?wollaws nedalnu na fo yticolev deeps-ria eht si tahW'

We next want to find the positions of the spaces in question. Firstly, range(len(question)) generates the list $[0,1,2,\dotsc,52]$ of possible positions. The code below screens out the values $i$ for which question[i] is just a space, and returns the list of all those values $i$.

In [10]:
print([i for i in range(len(question)) if question[i] == ' ']) # Positions of spaces
[4, 7, 11, 21, 30, 33, 36, 44]

Here is an alternative approach that may be considered more "Pythonic". The function enumerate(question) returns a list of pairs like [(0,'W'),(1,'h'),(2,'a'),(3,'t'),(4,' '),...].

In [12]:
print([i for i,c in enumerate(question) if c == ' ']) # Positions of spaces
[4, 7, 11, 21, 30, 33, 36, 44]

We can use the keyword in to check whether one string occurs as a substring of another string.

In [ ]:
print('city' in question)          # 'city' appears as part of the word 'velocity'
print('velociraptor' in question)  # the word 'velociraptor' is not in the question

Another common operation is to split a string using a separator character. Below we a string with the names of seven cities separated by commas (with variable numbers of spaces after the commas). Using the split() method we get a list of seven strings, each of which is the name of one city, with some attached spaces. We use the strip() method to remove the spaces. We can then join the names back together using the function str.join().

In [15]:
cities_string = 'Sheffield,New York,  Paris, Hong Kong, Chicago,Los Angeles, Tokyo'
cities = [city.strip() for city in cities_string.split(',')] # Split on commas
print(cities)
print(str.join(' and ', cities)) # Join with ' and '
print(' and '.join(cities)) # Same thing
['Sheffield', 'New York', 'Paris', 'Hong Kong', 'Chicago', 'Los Angeles', 'Tokyo']
Sheffield and New York and Paris and Hong Kong and Chicago and Los Angeles and Tokyo
Sheffield and New York and Paris and Hong Kong and Chicago and Los Angeles and Tokyo

There are various functions for converting between lower case and upper case. We can also replace a substring with a different one. All of these functions create a new string and leave the original string unchanged.

In [16]:
request = 'Please can I have a cookie'
print(f'Original:         {request}')
print(f'Lower case:       {request.lower()}')
print(f'Upper case:       {request.upper()}')
print(f'Capitalize words: {request.title()}')
print(f'Improved:         {request.replace("cookie","chocolate cake")}')
print(f'Original:         {request}') # The original string is unchanged
Original:         Please can I have a cookie
Lower case:       please can i have a cookie
Upper case:       PLEASE CAN I HAVE A COOKIE
Capitalize words: Please Can I Have A Cookie
Improved:         Please can I have a chocolate cake
Original:         Please can I have a cookie
In [ ]: