Week 6 (3/7-3/13)¶
Notebook¶
Download the notebook file: week_6_class.ipynb.
Weekly digest¶
Project¶
Resources¶
1. Sample DataFrame merging data¶
[32]:
import pandas as pd
import numpy as np
rng = np.random.default_rng(10)
names = ["Ava", "Benjamin", "Charlotte", "Daniel", "Emma", "Fredric", "Gianna"]
courses = ["MTH 141", "MTH 142", "MTH 241", "MTH 306", "MTH 309", "MTH 311"]
rooms = ["NSC 216", "Capen 110", "Park 440"]
# concat rows data
scores1 = rng.integers(0, 100, 12).reshape(4, 3)
scores2 = rng.integers(0, 100, 9).reshape(3, 3)
columns = ["problem_1", "problem_2", "problem_3"]
sec1 = pd.DataFrame(scores1, index=names[:4], columns=columns)
sec2 = pd.DataFrame(scores2, index=names[4:7], columns=columns)
# concat columns data
scores1 = rng.integers(0, 100, 8).reshape(4, 2)
scores2 = rng.integers(0, 100, 9).reshape(3, 3)
part1 = pd.DataFrame(scores1,
index=names[:4],
columns=["problem_1", "problem_2"])
part2 = pd.DataFrame(scores2,
index=names[:3],
columns=["problem_3", "problem_4", "problem_5"])
# merging data
office_nums = rng.integers(100, 150, len(names[:-1]))
courses = pd.DataFrame({"course": courses,
"instructor": rng.choice(names[1:], len(courses))})
instructors = pd.DataFrame({"name": names[:-1], "office": office_nums}, dtype="object")
2. Plotly installation¶
[ ]:
%pip install plotly
3. Data for choropleth maps¶
[2]:
url = "https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv"
Exercises¶
Note. All exercises, except for the first one, use data contained in the names.zip
file.
Exercise 1¶
Construct a DataFrame with baby names data coming from the namesbystate.zip
file. The DataFrame columns should have names “state”, “sex”, “year”, “name”, and “count”. You don’t need to save this data to a single csv file; if you do, the file size will be about 130 MB.
Check: The DataFrame should have 6,215,834 rows.
Exercise 2¶
Compute a DataFrame that lists the total number of babies recorded each year.
Check: There were 201,484 babies recorded in 1880 and 3,305,259 in 2020.
Exercise 3¶
Compute a DataFrame that lists the number of male babies named “John” for each year.
Check: There were 9,655 such babies recorded in 1880 and 8,180 in 2020.
Exercise 4¶
Compute a DataFrame that lists how many different names were used each year for males and how many for females.
Check: In 1880 there were 942 different names used for females and 1,058 for males. In 2020 these numbers were 17,360 for females and 13,911 for males.
Exercise 5¶
Compute a DataFrame that for each name shows in which year the name appeared in the records for the first time.
Check: Here are the first recorded years for a few names: Aaban 2007, Aabha 2011, Aabid 2003, Aabidah 2018.
Exercise 6¶
Compute a DataFrame that shows what was the most popular name for males and the most popular name for females each year.
Check: The most popular names in 1880 were John and Mary, and in 2020 Liam and Olivia.