Maintaing Performance

As pandas' documentation claims: pandas provides high-performance data structures. But how do we verify that the claim is correct? And how do we ensure that it stays correct over many releases. This post describes

  1. pandas' current setup for monitoring performance
  2. My personal debugging strategy for understanding and fixing performance regressions …

pandas + binder

This post describes the start of a journey to get pandas' documentation running on Binder. The end result is this nice button:


For a while now I've been jealous of Dask's examples repository. That's a repository containing a collection of Jupyter notebooks demonstrating Dask in action. It stitches together some …

Moral Philosophy for pandas or: What is .values?

The other day, I put up a Twitter poll asking a simple question: What's the type of series.values?

I was a bit limited for space, so I'll expand on …

Modern Pandas (Part 8): Scaling

This is part 1 in my series on writing modern idiomatic pandas.

As I sit down to write this, the third-most popular pandas question on StackOverflow covers how to use pandas for large datasets. This is in …

Extension Arrays for Pandas

This is a status update on some enhancements for pandas. The goal of the work is to store things that are sufficiently array-like in a pandas DataFrame, even if they aren't a regular NumPy array. Pandas already does this in a few places for some blessed types (like Categorical); we'd …

Modern Pandas (Part 7): Timeseries

This is part 7 in my series on writing modern idiomatic pandas.


Pandas started out in the financial world, so naturally it has strong timeseries support.

The first half of this post will look at pandas' …

Modern Pandas (Part 6): Visualization

This is part 6 in my series on writing modern idiomatic pandas.

Visualization and Exploratory Analysis

A few weeks ago, the R community went through some hand-wringing about plotting packages. For outsiders (like me) the details aren't …

Modern Pandas (Part 5): Tidy Data

This is part 5 in my series on writing modern idiomatic pandas.

Reshaping & Tidy Data

Structuring datasets to facilitate analysis (Wickham 2014)

So, you've sat down to analyze a new dataset. What do you do first?

In …

Modern Panadas (Part 3): Indexes

This is part 3 in my series on writing modern idiomatic pandas.

Indexes can be a difficult concept to grasp at first. I suspect this is partly becuase they're somewhat peculiar to pandas. These aren't like the …

Modern Pandas (Part 4): Performance

This is part 4 in my series on writing modern idiomatic pandas.

Wes McKinney, the creator of pandas, is kind of obsessed with performance. From micro-optimizations for element access, to embedding a fast hash table inside pandas …