# Week 6 (3/7-3/13)¶

## Notebook¶

Download the notebook file: week_6_class.ipynb.

## Weekly digest¶

## Project¶

## Resources¶

### 1. Sample DataFrame merging data¶

```
[32]:
```

```
import pandas as pd
import numpy as np
rng = np.random.default_rng(10)
names = ["Ava", "Benjamin", "Charlotte", "Daniel", "Emma", "Fredric", "Gianna"]
courses = ["MTH 141", "MTH 142", "MTH 241", "MTH 306", "MTH 309", "MTH 311"]
rooms = ["NSC 216", "Capen 110", "Park 440"]
# concat rows data
scores1 = rng.integers(0, 100, 12).reshape(4, 3)
scores2 = rng.integers(0, 100, 9).reshape(3, 3)
columns = ["problem_1", "problem_2", "problem_3"]
sec1 = pd.DataFrame(scores1, index=names[:4], columns=columns)
sec2 = pd.DataFrame(scores2, index=names[4:7], columns=columns)
# concat columns data
scores1 = rng.integers(0, 100, 8).reshape(4, 2)
scores2 = rng.integers(0, 100, 9).reshape(3, 3)
part1 = pd.DataFrame(scores1,
index=names[:4],
columns=["problem_1", "problem_2"])
part2 = pd.DataFrame(scores2,
index=names[:3],
columns=["problem_3", "problem_4", "problem_5"])
# merging data
office_nums = rng.integers(100, 150, len(names[:-1]))
courses = pd.DataFrame({"course": courses,
"instructor": rng.choice(names[1:], len(courses))})
instructors = pd.DataFrame({"name": names[:-1], "office": office_nums}, dtype="object")
```

### 2. Plotly installation¶

```
[ ]:
```

```
%pip install plotly
```

### 3. Data for choropleth maps¶

```
[2]:
```

```
url = "https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv"
```

## Exercises¶

**Note.** All exercises, except for the first one, use data contained in the `names.zip`

file.

### Exercise 1¶

Construct a DataFrame with baby names data coming from the `namesbystate.zip`

file. The DataFrame columns should have names “state”, “sex”, “year”, “name”, and “count”. You don’t need to save this data to a single csv file; if you do, the file size will be about 130 MB.

**Check:** The DataFrame should have 6,215,834 rows.

### Exercise 2¶

Compute a DataFrame that lists the total number of babies recorded each year.

**Check:** There were 201,484 babies recorded in 1880 and 3,305,259 in 2020.

### Exercise 3¶

Compute a DataFrame that lists the number of male babies named “John” for each year.

**Check:** There were 9,655 such babies recorded in 1880 and 8,180 in 2020.

### Exercise 4¶

Compute a DataFrame that lists how many different names were used each year for males and how many for females.

**Check:** In 1880 there were 942 different names used for females and 1,058 for males. In 2020 these numbers were 17,360 for females and 13,911 for males.

### Exercise 5¶

Compute a DataFrame that for each name shows in which year the name appeared in the records for the first time.

**Check:** Here are the first recorded years for a few names: Aaban 2007, Aabha 2011, Aabid 2003, Aabidah 2018.

### Exercise 6¶

Compute a DataFrame that shows what was the most popular name for males and the most popular name for females each year.

**Check:** The most popular names in 1880 were John and Mary, and in 2020 Liam and Olivia.