Modern Panadas (Part 3): Indexes

This is part 3 in my series on writing modern idiomatic pandas.

Indexes can be a difficult concept to grasp at first. I suspect this is partly becuase they're somewhat peculiar to pandas. These aren't like the indexes …

Modern Pandas (Part 4): Performance

This is part 4 in my series on writing modern idiomatic pandas.

Wes McKinney, the creator of pandas, is kind of obsessed with performance. From micro-optimizations for element access, to embedding a fast hash table inside pandas, we …

Modern Pandas (Part 2): Method Chaining

This is part 2 in my series on writing modern idiomatic pandas.

Method Chaining

Method chaining, where you call methods on an object one after another, is in vogue at the moment. It's always been a style of …

Modern Pandas (Part 1)

This is part 1 in my series on writing modern idiomatic pandas.

Effective Pandas


This series is about how to make effective use of pandas, a data analysis library for the Python programming language. It's targeted at …

dplyr and pandas

This notebook compares pandas and dplyr. The comparison is just on syntax (verbage), not performance. Whether you're an R user looking to switch to pandas (or the other way around), I hope this guide will help ease the transition.

We'll work through the introductory dplyr vignette to analyze some flight …

Practical Pandas Part 3 - Exploratory Data Analysis

Welcome back. As a reminder:

  • In part 1 we got dataset with my cycling data from last year merged and stored in an HDF5 store
  • In part 2 we did some cleaning and augmented the cycling data with data from

You can find the full source code …

Practical Pandas Part 2 - More Tidying, More Data, and Merging

This is Part 2 in the Practical Pandas Series, where I work through a data analysis problem from start to finish.

It's a misconception that we can cleanly separate the data analysis pipeline into a linear sequence of steps from

  1. data acqusition
  2. data tidying
  3. exploratory analysis
  4. model building
  5. production

As …

Practical Pandas Part 1 - Reading the Data

This is the first post in a series where I'll show how I use pandas on real-world datasets.

For this post, we'll look at data I collected with Cyclemeter on my daily bike ride to and from school last year. I had to manually start and stop the tracking at …

Using Python to tackle the CPS (Part 4)

Last time, we got to where we'd like to have started: One file per month, with each month laid out the same.

As a reminder, the CPS interviews households 8 times over the course of 16 months. They're interviewed for 4 months, take 8 months off, and are interviewed four …

Using Python to tackle the CPS (Part 3)

In part 2 of this series, we set the stage to parse the data files themselves.

As a reminder, we have a dictionary that looks like

         id  length  start  end
0    HRHHID      15      1   15
1   HRMONTH       2     16   17
2   HRYEAR4       4     18   21
3  HURESPLI       2     22   23 …