Periodic table of elements¶

In this notebook we load a CSV file containing information about chemical elements, and use pandas to analyse the resulting dataset.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

The following line sets df to be an object of class pandas.DataFrame containing data about elements.

In [2]:
data_dir = '../data/chemistry/'
df = pd.read_csv(data_dir + 'elements.csv')
print(type(df))
display(df)
<class 'pandas.core.frame.DataFrame'>
Atomic Number Element Symbol Atomic Weight Period Group Phase Most Stable Crystal Type Ionic Radius ... Density Melting Point (K) Boiling Point (K) Isotopes Discoverer Year of Discovery Specific Heat Capacity Electron Configuration Display Row Display Column
0 1 Hydrogen H 1.007940 1 1 gas NaN Nonmetal 0.012 ... 0.000090 14.175 20.28 3.0 Cavendish 1766.0 14.304 1s1 1 1
1 2 Helium He 4.002602 1 18 gas NaN Noble Gas NaN ... 0.000179 NaN 4.22 5.0 Janssen 1868.0 5.193 1s2 1 18
2 3 Lithium Li 6.941000 2 1 solid bcc Alkali Metal 0.760 ... 0.534000 453.850 1615.00 5.0 Arfvedson 1817.0 3.582 [He] 2s1 2 1
3 4 Beryllium Be 9.012182 2 2 solid hex Alkaline Earth Metal 0.350 ... 1.850000 1560.150 2742.00 6.0 Vaulquelin 1798.0 1.825 [He] 2s2 2 2
4 5 Boron B 10.811000 2 13 solid rho Metalloid 0.230 ... 2.340000 2573.150 4200.00 6.0 Gay-Lussac 1808.0 1.026 [He] 2s2 2p1 2 13
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
113 114 Flerovium Fl 289.000000 7 14 artificial NaN Transactinide NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 7 14
114 115 Moscovium Mc 288.000000 7 15 artificial NaN Transactinide NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 7 15
115 116 Livermorium Lv 292.000000 7 16 artificial NaN Transactinide NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 7 16
116 117 Tennessine Ts 295.000000 7 17 artificial NaN Transactinide NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 7 17
117 118 Oganesson Og 294.000000 7 18 artificial NaN Noble Gas NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 7 18

118 rows × 23 columns

We now make a list of all the possible element types. The code df["Type"] gives a list of types, represented as an object of type pandas.Series. This contains many repetitions. The unique() method returns a numpy array of strings in which each type appears only once, and the sorted() function sorts the types into alphabetical order.

In [3]:
sorted(df['Type'].unique())
Out[3]:
['Actinide',
 'Alkali Metal',
 'Alkaline Earth Metal',
 'Halogen',
 'Lanthanide',
 'Metal',
 'Metalloid',
 'Noble Gas',
 'Nonmetal',
 'Transactinide',
 'Transition Metal']

We can count the number of elements of each type as follows:

In [4]:
def count_elements_by_type(df):
    return df.groupby('Type').size().sort_values(ascending=False)

x = count_elements_by_type(df)
x
Out[4]:
Type
Transition Metal        29
Lanthanide              15
Actinide                15
Transactinide           14
Metalloid                7
Nonmetal                 7
Metal                    7
Noble Gas                7
Alkali Metal             6
Alkaline Earth Metal     6
Halogen                  5
dtype: int64

The cell above defines an object x. Here are several ways to explore what kind of thing x is.

In [5]:
print(str(type(x)))
print(type(x).__module__)
print(type(x).__qualname__)
print(isinstance(x,pd.core.series.Series))
print(type(x.index))
print(list(x))
<class 'pandas.core.series.Series'>
pandas.core.series
Series
True
<class 'pandas.core.indexes.base.Index'>
[29, 15, 15, 14, 7, 7, 7, 7, 6, 6, 5]

We next extract a subset of information about the elements which are gasses

In [6]:
def gasses_only(df):
    return df[df['Phase'] == 'gas'][['Element','Symbol']].reset_index(drop=True)

gasses_only(df)
Out[6]:
Element Symbol
0 Hydrogen H
1 Helium He
2 Nitrogen N
3 Oxygen O
4 Fluorine F
5 Neon Ne
6 Chlorine Cl
7 Argon Ar
8 Krypton Kr
9 Xenon Xe
10 Radon Rn

Here is an example of the highly unstable artificial element Darmstadtium for which many properties have not been measured experimentally. Missing data in pandas is usually represented by the Python object nan (short for "not a number"). We can use the dropna() method to remove entries for which the information is missing.

In [5]:
df[df['Element'] == 'Darmstadtium'][['Atomic Number','Element','Symbol']]
Out[5]:
Atomic Number Element Symbol
109 110 Darmstadtium Ds
In [6]:
print("Full row:")
display(df.loc[109])
print("")
print("With missing information suppressed:")
display(df.iloc[109].dropna())
Full row:
Atomic Number                           110
Element                        Darmstadtium
Symbol                                   Ds
Atomic Weight                         271.0
Period                                    7
Group                                    10
Phase                            artificial
Most Stable Crystal                     NaN
Type                          Transactinide
Ionic Radius                            NaN
Atomic Radius                           NaN
Electronegativity                       NaN
First Ionization Potential              NaN
Density                                 NaN
Melting Point (K)                       NaN
Boiling Point (K)                       NaN
Isotopes                                NaN
Discoverer                              NaN
Year of Discovery                       NaN
Specific Heat Capacity                  NaN
Electron Configuration                  NaN
Display Row                               7
Display Column                           10
Name: 109, dtype: object
With missing information suppressed:
Atomic Number               110
Element            Darmstadtium
Symbol                       Ds
Atomic Weight             271.0
Period                        7
Group                        10
Phase                artificial
Type              Transactinide
Display Row                   7
Display Column               10
Name: 109, dtype: object
In [9]:
def Ds_stripped(df):
    return df[df['Symbol'] == 'Ds'].iloc[0].dropna()

Ds_stripped(df)
Out[9]:
Atomic Number               110
Element            Darmstadtium
Symbol                       Ds
Atomic Weight             271.0
Period                        7
Group                        10
Phase                artificial
Type              Transactinide
Display Row                   7
Display Column               10
Name: 109, dtype: object

We now want to extract information about the element with the highest boiling point. If we just wanted to find the boiling point, we could do df["Boiling Point (K)"].max(), but that would not tell us which element has that boiling point. If we instead do df["Boiling Point (K)"].argmax(), we get the number of the row with the highest boiling point, which is 74. To get all the data about the corresponding element, we can enter df.iloc[74] or df[df["Boiling Point (K)"].argmax()].

In [10]:
print("The number of the row with the highest boiling point is " + str(df["Boiling Point (K)"].argmax()))
df.iloc[df["Boiling Point (K)"].argmax()]
The number of the row with the highest boiling point is 74
Out[10]:
Atomic Number                                       75
Element                                        Rhenium
Symbol                                              Re
Atomic Weight                                  186.207
Period                                               6
Group                                                7
Phase                                            solid
Most Stable Crystal                                hex
Type                                  Transition Metal
Ionic Radius                                      0.56
Atomic Radius                                      2.0
Electronegativity                                  1.9
First Ionization Potential                      7.8335
Density                                          21.02
Melting Point (K)                              3453.15
Boiling Point (K)                               5869.0
Isotopes                                          21.0
Discoverer                    Noddack, Berg, and Tacke
Year of Discovery                               1925.0
Specific Heat Capacity                           0.137
Electron Configuration               [Xe] 4f14 5d5 6s2
Display Row                                          6
Display Column                                       7
Name: 74, dtype: object
In [11]:
def highest_bp_element(df):
    return df.iloc[df["Boiling Point (K)"].argmax()]["Element"]

highest_bp_element(df)
Out[11]:
'Rhenium'

Later we will make a picture of the periodic table, with different types of elements displayed in different colours. Here we make a small dataframe which records the colours to be used for the different element types.

In [7]:
type_colours = pd.DataFrame([
 ['Alkali Metal','purple'],
 ['Alkaline Earth Metal','blue'],
 ['Halogen','cyan'],
 ['Transition Metal','green'],
 ['Metal','yellow'],
 ['Metalloid','orange'],
 ['Nonmetal','red'],
 ['Noble Gas','grey'],
 ['Actinide','brown'],
 ['Transactinide','purple'],
 ['Lanthanide','pink']
],columns=['Type','Colour'])

display(type_colours)
Type Colour
0 Alkali Metal purple
1 Alkaline Earth Metal blue
2 Halogen cyan
3 Transition Metal green
4 Metal yellow
5 Metalloid orange
6 Nonmetal red
7 Noble Gas grey
8 Actinide brown
9 Transactinide purple
10 Lanthanide pink

We now use the method df.merge() to add colour information to the original dataframe. The argument on='Type' indicates that the Type column in the original dataframe should be matched with the Type column in the type_colours dataframe. The argument how='left' indicates that we should keep all the rows in the original dataframe, even if their types do not appear in the type_colours dataframe (in which case the colour will be set to NaN).

In [8]:
df = df.merge(type_colours, how='left', on='Type')

We now want to extract the names of all elements that end with "ium".

  • The code df["Element"] gives the list of all element names, represented as an object of class pandas.Series.
  • The code lambda x: x[-3:] == "ium" defines a function which accepts a string x and returns True if the string ends with "ium" and False otherwise.
  • The code df["Elements"].apply(lambda x: x[-3:] == "ium") gives a list of True or False values, represented as an object of class pandas.Series. The $k$ th entry is True if and only if the name of the $k$ th element ends with "ium".
  • The code df[df["Element"].apply(lambda x: x[-3:] == "ium")] is an example of boolean indexing. It returns a new dataframe consisting of those rows from the original dataframe where the element name ends with "ium".
  • Finally, the code df[df["Element"].apply(lambda x: x[-3:] == "ium")]["Element"] just gives the column of element names in this reduced dataframe.
In [9]:
df[df["Element"].apply(lambda x: x[-3:] == "ium")]["Element"]
Out[9]:
1           Helium
2          Lithium
3        Beryllium
10          Sodium
11       Magnesium
          ...     
111    Copernicium
112       Nihonium
113      Flerovium
114      Moscovium
115    Livermorium
Name: Element, Length: 78, dtype: object

We now make a picture of the periodic table. The coordinates where each symbol should be displayed are stored in the columns df["Display Row"] and df["Display Column"]. Because we want row 1 to be at the top with row 2 beneath it, we take the $y$-coordinate to be the negative of the display row number. To loop through the elements, we use the method df.iterrows(). (If you want to understand in more detail how this works, you should search for information about Python iterators and iterables.) In the body of the loop, i will be a row number (which we do not use) and x will be an object of class pandas.Series containing information about a single elements. The code x[["Display Column","Display Row","Symbol","Colour"]] also gives an object of class pandas.Series but the code c, r, s, h = x[["Display Column","Display Row","Symbol","Colour"]] unpacks it, setting c to be the display column number, r to be the display row number, s to be the chemical symbol of the relevant element, and h to be the colour associated with the element type.

In [23]:
fig, ax = plt.subplots(figsize=(12, 5))
ax.axis('off')
ax.set_aspect('equal')
ax.plot([0, 19, 19, 0, 0], [0, 0, -10, -10, 0], 'k-')
for i, x in df.iterrows():
    c, r, s, h = x[["Display Column","Display Row","Symbol","Colour"]]
    ax.text(c, -r, s,ha='center', va='center', fontsize=10, color=h)
No description has been provided for this image
In [ ]: