Modern Pandas (Part 6): Visualization

This is part 6 in my series on writing modern idiomatic pandas.

Visualization and Exploratory Analysis

A few weeks ago, the R community went through some hand-wringing about plotting packages. For outsiders (like me) the details aren't …

Modern Pandas (Part 5): Tidy Data

This is part 5 in my series on writing modern idiomatic pandas.

Reshaping & Tidy Data

Structuring datasets to facilitate analysis (Wickham 2014)

So, you've sat down to analyze a new dataset. What do you do first?

In …

Modern Panadas (Part 3): Indexes

This is part 3 in my series on writing modern idiomatic pandas.

Indexes can be a difficult concept to grasp at first. I suspect this is partly becuase they're somewhat peculiar to pandas. These aren't like the …

Modern Pandas (Part 4): Performance

This is part 4 in my series on writing modern idiomatic pandas.

Wes McKinney, the creator of pandas, is kind of obsessed with performance. From micro-optimizations for element access, to embedding a fast hash table inside pandas …

Modern Pandas (Part 2): Method Chaining

This is part 2 in my series on writing modern idiomatic pandas.

Method Chaining

Method chaining, where you call methods on an object one after another, is in vogue at the moment. It's always been a style …

Modern Pandas (Part 1)

This is part 1 in my series on writing modern idiomatic pandas.

Effective Pandas


This series is about how to make effective use of pandas, a data analysis library for the Python programming language. It's targeted …

dplyr and pandas

This notebook compares pandas and dplyr. The comparison is just on syntax (verbage), not performance. Whether you're an R user looking to switch to pandas (or the other way around), I hope this guide will help ease the transition.

We'll work through the introductory dplyr vignette to analyze some flight …

Practical Pandas Part 3 - Exploratory Data Analysis

Welcome back. As a reminder:

  • In part 1 we got dataset with my cycling data from last year merged and stored in an HDF5 store
  • In part 2 we did some cleaning and augmented the cycling data with data from

You can find the full source code …

Practical Pandas Part 2 - More Tidying, More Data, and Merging

This is Part 2 in the Practical Pandas Series, where I work through a data analysis problem from start to finish.

It's a misconception that we can cleanly separate the data analysis pipeline into a linear sequence of steps from

  1. data acqusition
  2. data tidying
  3. exploratory analysis
  4. model building
  5. production

As …

Practical Pandas Part 1 - Reading the Data

This is the first post in a series where I'll show how I use pandas on real-world datasets.

For this post, we'll look at data I collected with Cyclemeter on my daily bike ride to and from school last year. I had to manually start and stop the tracking at …