Week 13 (5/2-5/8)


Weekly digest



%pip install ipython-sql

Exercise 1

The file newsgroups.zip is a zipped text file containing several thousand posts sent to various online newsgroups. Each post starts with three lines that describe the name of the newsgroup to which the post was sent, the author of the post, and the subject. These lines are followed by the text of the post. For example:

  Newsgroup: sci.med
  From: kcarver@dante.nmsu.edu (Kenneth Carver)
  Subject: Isolation amplifiers for EEG/ECG *cheap*

  I have several isolation amplifier boards that are the ideal interface
  for EEG and ECG.  Isolation is essential for safety when connecting
  line-powered equipment to electrodes on the body.  These boards
  incorporate the Burr-Brown 3656 isolation module that currently sells
  for $133, plus other op amps to produce an overall voltage gain of
  350-400.  They are like new and guaranteed good.  $20 postpaid,
  schematic included.  Please email me for more data.

  --Ken Carver

Create a dataframe in which every row corresponds to one post. The columns should list the name of the newsgroup, the post author, the post subject, and the body of the post. Here is a sample:

newsgroup from subject body
0 rec.autos gwm@spl1.spl.loral.com (Gary W. Mahan) Re: Are BMW's worth the price? >sure sounds like they got a ringer. the 325i...
1 sci.med davec@ecst.csuchico.edu (Dave Childs) Dental Fillings question I have been hearing bad thing about amalgam de...
2 alt.atheism "Robert Knowles" <p00261@psilink.com> Re: Islamic marriage? >DATE: Tue, 6 Apr 1993 00:11:49 GMT\n>FROM: ...
3 rec.sport.baseball sepinwal@mail.sas.upenn.edu (Alan Sepinwall) Re: WFAN In article <1993Apr16.174843.28111@cabell.vcu....
4 talk.religion.misc rwd4f@poe.acc.Virginia.EDU (Rob Dobson) Re: A Message for you Mr. President: How do yo... In article <visser.735284180@convex.convex.com...