String formatting¶

This notebook covers roughly the same ground as this video.

Often we need to create or print a string, in which some parts of the string depend on various Python variables. There are several different methods for this, but we will always use the newest one, based on f-strings. The idea is that if we write a string with an f before the initial quotation mark, then anything in curly brackets will be replaced by the corresponding value. A basic example is as follows:

In [ ]:

ans = 42
print(f'The answer to the ultimate question of life, the universe and everything is {ans}.')

We can also include various directives inside the curly brackets, after a colon, to control how things are displayed. For example:

The directive :.6f specifies that a floating point number should be given to six decimal places
The directive :010d specifies that an integer should be printed with extra zeros on the left to ensure that there are ten digits altogether
The directive :.2e specifies that a number should be printed in scientific notation with two decimal places, e.g. 1.23e8 for $123456789\approx 1.23\times 10^8$.

The full details can be found at docs.python.org, but this is a good example of a situation where it is often more efficient to ask Google Gemini rather than digesting the documentation.

The last example below illustrates the fact that we can have an expression like 2 ** 100 inside the curly brackets, not just a variable name.

In [ ]:

x = 1/7
print(f'The value of x is {x:.6f} to six decimal places') # The ':.6f' specifies 6 decimal places
serial_number = 87639
print(f'The ten-digit serial number is {serial_number:010d}')
print(f'2 ^ 100 is approximately {2**100:.2e} (in scientific notation)')

The example below illustrates an older method that is still needed occasionally. The functions repr() and str() can be used to convert any Python object to a string, and then we can join these strings to other strings using +. These two functions will often give the same result, but not always. In cases where they differ, the idea is roughly that the result of str() is intended to be read by humans, whereas the result of repr() is intended to be understood by Python itself.

In [ ]:

l = [9,8,7,6]
print('The list ' + repr(l) + ' contains ' + str(len(l)) + ' elements')
print(f'The list {l} still contains {len(l)} elements')

There is a standard correspondence between numbers and characters called Unicode. (If you know enough to complain that I am not distinguishing properly between Unicode and UTF8, then you know enough to resolve that issue for yourself.) The first 256 characters of Unicode agree with a much older system called ASCII. The main part of ASCII consists of characters 32 to 126, which correspond to lower and upper case letters A-Z, digits 0-9 and various punctuation marks. Much higher numbers are used to encode letters with accents, Chinese characters, mathematical symbols, emojis, runes and so on. The function ord() converts characters to numbers.

In [ ]:

print(ord('A'), ord('B'), ord('Z')) # ASCII/Unicode codes for A, B, Z
print(ord('a'), ord('b'), ord('z')) # ASCII/Unicode codes for a, b, c
print(ord('?'), ord('~'), ord('&'), ord(' ')) # ASCII/Unicode codes for punctuation

In [ ]:

print([ord(c) for c in '蟒蛇'])    # Unicode for 'python' in Chinese
print([ord(c) for c in 'mãng xà']) # Unicode for 'python' in Vietnamese
print([ord(c) for c in 'بيثون'])   # Unicode for 'python' in Arabic
print([ord(c) for c in '🐍'])      # Unicode for 'python' in Emoji

The function chr() converts numbers to characters; it is the inverse of the function ord(). Lowercase 'a' is character 97, so the codes for all lowercase letters are $97+i$ for $0\leq i\lt 26$, corresponding to the list comprehension [chr(97+i) for i in range(26)].

In [ ]:

print(chr(65), chr(66), chr(67), chr(68)) # chr() is the inverse of ord()
print(str.join('', [chr(97+i) for i in range(26)])) # full alphabet
print(str.join(' ',[chr(i) for i in range(8704, 8900, 12)])) # Miscellaneous mathematical symbols

In [ ]: