Week 6 (3/7-3/13)


Weekly digest



1. Sample DataFrame merging data

import pandas as pd
import numpy as np

rng = np.random.default_rng(10)

names = ["Ava", "Benjamin", "Charlotte", "Daniel", "Emma", "Fredric", "Gianna"]
courses = ["MTH 141", "MTH 142", "MTH 241", "MTH 306", "MTH 309", "MTH 311"]
rooms = ["NSC 216", "Capen 110", "Park 440"]

# concat rows data
scores1 = rng.integers(0, 100, 12).reshape(4, 3)
scores2 = rng.integers(0, 100, 9).reshape(3, 3)
columns = ["problem_1", "problem_2", "problem_3"]
sec1 = pd.DataFrame(scores1, index=names[:4], columns=columns)
sec2 = pd.DataFrame(scores2, index=names[4:7], columns=columns)

# concat columns data
scores1 = rng.integers(0, 100, 8).reshape(4, 2)
scores2 = rng.integers(0, 100, 9).reshape(3, 3)
part1 = pd.DataFrame(scores1,
                     columns=["problem_1", "problem_2"])
part2 = pd.DataFrame(scores2,
                     columns=["problem_3", "problem_4", "problem_5"])

# merging data
office_nums = rng.integers(100, 150, len(names[:-1]))
courses = pd.DataFrame({"course": courses,
                        "instructor": rng.choice(names[1:], len(courses))})
instructors = pd.DataFrame({"name": names[:-1], "office": office_nums}, dtype="object")

2. Plotly installation

[ ]:
%pip install plotly

3. Data for choropleth maps

url = "https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv"


Note. All exercises, except for the first one, use data contained in the names.zip file.

Exercise 1

Construct a DataFrame with baby names data coming from the namesbystate.zip file. The DataFrame columns should have names “state”, “sex”, “year”, “name”, and “count”. You don’t need to save this data to a single csv file; if you do, the file size will be about 130 MB.

Check: The DataFrame should have 6,215,834 rows.

Exercise 2

Compute a DataFrame that lists the total number of babies recorded each year.

Check: There were 201,484 babies recorded in 1880 and 3,305,259 in 2020.

Exercise 3

Compute a DataFrame that lists the number of male babies named “John” for each year.

Check: There were 9,655 such babies recorded in 1880 and 8,180 in 2020.

Exercise 4

Compute a DataFrame that lists how many different names were used each year for males and how many for females.

Check: In 1880 there were 942 different names used for females and 1,058 for males. In 2020 these numbers were 17,360 for females and 13,911 for males.

Exercise 5

Compute a DataFrame that for each name shows in which year the name appeared in the records for the first time.

Check: Here are the first recorded years for a few names: Aaban 2007, Aabha 2011, Aabid 2003, Aabidah 2018.

Exercise 6

Compute a DataFrame that shows what was the most popular name for males and the most popular name for females each year.

Check: The most popular names in 1880 were John and Mary, and in 2020 Liam and Olivia.