MAS2008 Scientific Computing: Lab 6
The Python ecosystem

Because there is an assignment due this week, there is only one question (corresponding to Task 1) on the online test.

Instructions

The following notebooks and videos are relevant for this lab:
       
Map of China View Download
List of links View Download
List notebooks View Download
Read a Microsoft Word file View Download

For the map of China notebook, you will also need the file china_provinces.json. For the Microsoft Word notebook, you will need a sample Word file. Here are two small files that you can use for testing: mouse.docx and abc.docx. In both cases, you will need to adjust the notebook slightly to reflect where you have saved the files.

Task 1: Spreadsheets

Write a function create_spreadsheet(filename, strings). As an example, calling

    create_spreadsheet(
     'cheese.xlsx', 
     ['Cheddar', 'Wensleydale', 'Stilton']
    )
   
should create and save a spreadsheet file called cheese.xlsx looking like this:
AB
1StringLength
2Cheddar7
3Wensleydale11
4Stilton7
5Total25

Task 2: Image manipulation

Download and save the image python_facing_left.jpg. Create a mirror image and save it as python_facing_right.jpg. Display both images in your Jupyter notebook.

You will need to find a suitable Python library to perform this task. Options include Pillow and skimage and possibly others. You could read the documentation or try asking Google Gemini for instructions. I found that I had to rephrase the request a few times before I got a useful answer.

Task 3: Butterflies

At https://huggingface.co/datasets/ceyda/smithsonian_butterflies you will find information about a dataset of about 9000 images of butterflies. You will see a box with vertical and horizontal scroll bars; you will need to scroll all the way to the right to see the actual images. (Huggingface also has a very large collection of other datasets of all kinds.)

Now visit the following URL:

https://datasets-server.huggingface.co/rows?dataset=ceyda/smithsonian_butterflies&config=default&split=train&offset=0&length=5

This will show you information about the first 5 images in the dataset, in JSON format, which is a standard and convenient format for data that is intended to be parsed by software rather than read by humans. Note that this data contains URLs for the individual images, but does not contain the images themselves.

We can import the first 100 rows into Python and analyse them as follows.