Periodic table of elements¶
In this notebook we load a CSV file containing information about chemical elements, and use pandas to analyse the resulting dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
The following line sets df to be an object of class pandas.DataFrame containing data about elements.
data_dir = '../data/chemistry/'
df = pd.read_csv(data_dir + 'elements.csv')
print(type(df))
display(df)
<class 'pandas.core.frame.DataFrame'>
| Atomic Number | Element | Symbol | Atomic Weight | Period | Group | Phase | Most Stable Crystal | Type | Ionic Radius | ... | Density | Melting Point (K) | Boiling Point (K) | Isotopes | Discoverer | Year of Discovery | Specific Heat Capacity | Electron Configuration | Display Row | Display Column | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Hydrogen | H | 1.007940 | 1 | 1 | gas | NaN | Nonmetal | 0.012 | ... | 0.000090 | 14.175 | 20.28 | 3.0 | Cavendish | 1766.0 | 14.304 | 1s1 | 1 | 1 |
| 1 | 2 | Helium | He | 4.002602 | 1 | 18 | gas | NaN | Noble Gas | NaN | ... | 0.000179 | NaN | 4.22 | 5.0 | Janssen | 1868.0 | 5.193 | 1s2 | 1 | 18 |
| 2 | 3 | Lithium | Li | 6.941000 | 2 | 1 | solid | bcc | Alkali Metal | 0.760 | ... | 0.534000 | 453.850 | 1615.00 | 5.0 | Arfvedson | 1817.0 | 3.582 | [He] 2s1 | 2 | 1 |
| 3 | 4 | Beryllium | Be | 9.012182 | 2 | 2 | solid | hex | Alkaline Earth Metal | 0.350 | ... | 1.850000 | 1560.150 | 2742.00 | 6.0 | Vaulquelin | 1798.0 | 1.825 | [He] 2s2 | 2 | 2 |
| 4 | 5 | Boron | B | 10.811000 | 2 | 13 | solid | rho | Metalloid | 0.230 | ... | 2.340000 | 2573.150 | 4200.00 | 6.0 | Gay-Lussac | 1808.0 | 1.026 | [He] 2s2 2p1 | 2 | 13 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 113 | 114 | Flerovium | Fl | 289.000000 | 7 | 14 | artificial | NaN | Transactinide | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7 | 14 |
| 114 | 115 | Moscovium | Mc | 288.000000 | 7 | 15 | artificial | NaN | Transactinide | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7 | 15 |
| 115 | 116 | Livermorium | Lv | 292.000000 | 7 | 16 | artificial | NaN | Transactinide | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7 | 16 |
| 116 | 117 | Tennessine | Ts | 295.000000 | 7 | 17 | artificial | NaN | Transactinide | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7 | 17 |
| 117 | 118 | Oganesson | Og | 294.000000 | 7 | 18 | artificial | NaN | Noble Gas | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7 | 18 |
118 rows × 23 columns
We now make a list of all the possible element types. The code df["Type"] gives a list of types, represented as an object of type pandas.Series. This contains many repetitions. The unique() method returns a numpy array of strings in which each type appears only once, and the sorted() function sorts the types into alphabetical order.
sorted(df['Type'].unique())
['Actinide', 'Alkali Metal', 'Alkaline Earth Metal', 'Halogen', 'Lanthanide', 'Metal', 'Metalloid', 'Noble Gas', 'Nonmetal', 'Transactinide', 'Transition Metal']
We can count the number of elements of each type as follows:
def count_elements_by_type(df):
return df.groupby('Type').size().sort_values(ascending=False)
x = count_elements_by_type(df)
x
Type Transition Metal 29 Lanthanide 15 Actinide 15 Transactinide 14 Metalloid 7 Nonmetal 7 Metal 7 Noble Gas 7 Alkali Metal 6 Alkaline Earth Metal 6 Halogen 5 dtype: int64
The cell above defines an object x. Here are several ways to explore what kind of thing x is.
print(str(type(x)))
print(type(x).__module__)
print(type(x).__qualname__)
print(isinstance(x,pd.core.series.Series))
print(type(x.index))
print(list(x))
<class 'pandas.core.series.Series'> pandas.core.series Series True <class 'pandas.core.indexes.base.Index'> [29, 15, 15, 14, 7, 7, 7, 7, 6, 6, 5]
We next extract a subset of information about the elements which are gasses
def gasses_only(df):
return df[df['Phase'] == 'gas'][['Element','Symbol']].reset_index(drop=True)
gasses_only(df)
| Element | Symbol | |
|---|---|---|
| 0 | Hydrogen | H |
| 1 | Helium | He |
| 2 | Nitrogen | N |
| 3 | Oxygen | O |
| 4 | Fluorine | F |
| 5 | Neon | Ne |
| 6 | Chlorine | Cl |
| 7 | Argon | Ar |
| 8 | Krypton | Kr |
| 9 | Xenon | Xe |
| 10 | Radon | Rn |
Here is an example of the highly unstable artificial element Darmstadtium for which many properties have not been measured experimentally. Missing data in pandas is usually represented by the Python object nan (short for "not a number"). We can use the dropna() method to remove entries for which the information is missing.
df[df['Element'] == 'Darmstadtium'][['Atomic Number','Element','Symbol']]
| Atomic Number | Element | Symbol | |
|---|---|---|---|
| 109 | 110 | Darmstadtium | Ds |
print("Full row:")
display(df.loc[109])
print("")
print("With missing information suppressed:")
display(df.iloc[109].dropna())
Full row:
Atomic Number 110 Element Darmstadtium Symbol Ds Atomic Weight 271.0 Period 7 Group 10 Phase artificial Most Stable Crystal NaN Type Transactinide Ionic Radius NaN Atomic Radius NaN Electronegativity NaN First Ionization Potential NaN Density NaN Melting Point (K) NaN Boiling Point (K) NaN Isotopes NaN Discoverer NaN Year of Discovery NaN Specific Heat Capacity NaN Electron Configuration NaN Display Row 7 Display Column 10 Name: 109, dtype: object
With missing information suppressed:
Atomic Number 110 Element Darmstadtium Symbol Ds Atomic Weight 271.0 Period 7 Group 10 Phase artificial Type Transactinide Display Row 7 Display Column 10 Name: 109, dtype: object
def Ds_stripped(df):
return df[df['Symbol'] == 'Ds'].iloc[0].dropna()
Ds_stripped(df)
Atomic Number 110 Element Darmstadtium Symbol Ds Atomic Weight 271.0 Period 7 Group 10 Phase artificial Type Transactinide Display Row 7 Display Column 10 Name: 109, dtype: object
We now want to extract information about the element with the highest boiling point. If we just wanted to find the boiling point, we could do df["Boiling Point (K)"].max(), but that would not tell us which element has that boiling point. If we instead do df["Boiling Point (K)"].argmax(), we get the number of the row with the highest boiling point, which is 74. To get all the data about the corresponding element, we can enter df.iloc[74] or df[df["Boiling Point (K)"].argmax()].
print("The number of the row with the highest boiling point is " + str(df["Boiling Point (K)"].argmax()))
df.iloc[df["Boiling Point (K)"].argmax()]
The number of the row with the highest boiling point is 74
Atomic Number 75 Element Rhenium Symbol Re Atomic Weight 186.207 Period 6 Group 7 Phase solid Most Stable Crystal hex Type Transition Metal Ionic Radius 0.56 Atomic Radius 2.0 Electronegativity 1.9 First Ionization Potential 7.8335 Density 21.02 Melting Point (K) 3453.15 Boiling Point (K) 5869.0 Isotopes 21.0 Discoverer Noddack, Berg, and Tacke Year of Discovery 1925.0 Specific Heat Capacity 0.137 Electron Configuration [Xe] 4f14 5d5 6s2 Display Row 6 Display Column 7 Name: 74, dtype: object
def highest_bp_element(df):
return df.iloc[df["Boiling Point (K)"].argmax()]["Element"]
highest_bp_element(df)
'Rhenium'
Later we will make a picture of the periodic table, with different types of elements displayed in different colours. Here we make a small dataframe which records the colours to be used for the different element types.
type_colours = pd.DataFrame([
['Alkali Metal','purple'],
['Alkaline Earth Metal','blue'],
['Halogen','cyan'],
['Transition Metal','green'],
['Metal','yellow'],
['Metalloid','orange'],
['Nonmetal','red'],
['Noble Gas','grey'],
['Actinide','brown'],
['Transactinide','purple'],
['Lanthanide','pink']
],columns=['Type','Colour'])
display(type_colours)
| Type | Colour | |
|---|---|---|
| 0 | Alkali Metal | purple |
| 1 | Alkaline Earth Metal | blue |
| 2 | Halogen | cyan |
| 3 | Transition Metal | green |
| 4 | Metal | yellow |
| 5 | Metalloid | orange |
| 6 | Nonmetal | red |
| 7 | Noble Gas | grey |
| 8 | Actinide | brown |
| 9 | Transactinide | purple |
| 10 | Lanthanide | pink |
We now use the method df.merge() to add colour information to the original dataframe. The argument on='Type' indicates that the Type column in the original dataframe should be matched with the Type column in the type_colours dataframe. The argument how='left' indicates that we should keep all the rows in the original dataframe, even if their types do not appear in the type_colours dataframe (in which case the colour will be set to NaN).
df = df.merge(type_colours, how='left', on='Type')
We now want to extract the names of all elements that end with "ium".
- The code
df["Element"]gives the list of all element names, represented as an object of classpandas.Series. - The code
lambda x: x[-3:] == "ium"defines a function which accepts a stringxand returnsTrueif the string ends with "ium" andFalseotherwise. - The code
df["Elements"].apply(lambda x: x[-3:] == "ium")gives a list ofTrueorFalsevalues, represented as an object of classpandas.Series. The $k$ th entry isTrueif and only if the name of the $k$ th element ends with "ium". - The code
df[df["Element"].apply(lambda x: x[-3:] == "ium")]is an example of boolean indexing. It returns a new dataframe consisting of those rows from the original dataframe where the element name ends with "ium". - Finally, the code
df[df["Element"].apply(lambda x: x[-3:] == "ium")]["Element"]just gives the column of element names in this reduced dataframe.
df[df["Element"].apply(lambda x: x[-3:] == "ium")]["Element"]
1 Helium
2 Lithium
3 Beryllium
10 Sodium
11 Magnesium
...
111 Copernicium
112 Nihonium
113 Flerovium
114 Moscovium
115 Livermorium
Name: Element, Length: 78, dtype: object
We now make a picture of the periodic table. The coordinates where each symbol should be displayed are stored in the columns df["Display Row"] and df["Display Column"]. Because we want row 1 to be at the top with row 2 beneath it, we take the $y$-coordinate to be the negative of the display row number. To loop through the elements, we use the method df.iterrows(). (If you want to understand in more detail how this works, you should search for information about Python iterators and iterables.) In the body of the loop, i will be a row number (which we do not use) and x will be an object of class pandas.Series containing information about a single elements. The code x[["Display Column","Display Row","Symbol","Colour"]] also gives an object of class pandas.Series but the code c, r, s, h = x[["Display Column","Display Row","Symbol","Colour"]] unpacks it, setting c to be the display column number, r to be the display row number, s to be the chemical symbol of the relevant element, and h to be the colour associated with the element type.
fig, ax = plt.subplots(figsize=(12, 5))
ax.axis('off')
ax.set_aspect('equal')
ax.plot([0, 19, 19, 0, 0], [0, 0, -10, -10, 0], 'k-')
for i, x in df.iterrows():
c, r, s, h = x[["Display Column","Display Row","Symbol","Colour"]]
ax.text(c, -r, s,ha='center', va='center', fontsize=10, color=h)