Archive

Using Python to tackle the CPS (Part 3)

In part 2 of this series, we set the stage to parse the data files themselves.

As a reminder, we have a dictionary that looks like

         id  length  start  end
0    HRHHID      15      1   15
1   HRMONTH       2     16   17
2   HRYEAR4       4     18   21
3  HURESPLI       2     22   23 …

Tidy Data in Action

Hadley Whickham wrote a famous paper (for a certain definition of famous) about the importance of tidy data when doing data analysis. I want to talk a bit about that, using an example from a StackOverflow post, with a solution using pandas. The principles of tidy data aren't language specific …


Organizing Papers

As a graduate student, you read a lot of journal articles... a lot. With the material in the articles being as difficult as it is, I didn't want to worry about organizing everything as well. That's why I wrote this script to help (I may have also been procrastinating from …


Using Python to tackle the CPS (Part 2)

Last time, we used Python to fetch some data from the Current Population Survey. Today, we'll work on parsing the files we just downloaded.


We downloaded two types of files last time:

  • CPS monthly tables: a fixed-width format text file with the actual data
  • Data Dictionaries: a text file describing …

Using Python to tackle the CPS

The Current Population Survey is an important source of data for economists. It's modern form took shape in the 70's and unfortunately the data format and distribution shows its age. Some centers like IPUMS have attempted to put a nicer face on accessing the data, but they haven't done everything …