Archive

Introducing Stitch

Today I released stitch into the wild. If you haven't yet, check out the examples page to see an example of what stitch does, and the Github repo for how to install. I'm using this post to explain why I wrote stitch, and some issues it tries to solve.

Why …


Modern Pandas (Part 7): Timeseries


This is part 7 in my series on writing modern idiomatic pandas.


Timeseries

Pandas started out in the financial world, so naturally it has strong timeseries support.

The first half of this post will look at pandas' …


Modern Pandas (Part 6): Visualization


This is part 6 in my series on writing modern idiomatic pandas.


Visualization and Exploratory Analysis

A few weeks ago, the R community went through some hand-wringing about plotting packages. For outsiders (like me) the details aren't …


Modern Pandas (Part 5): Tidy Data


This is part 5 in my series on writing modern idiomatic pandas.


Reshaping & Tidy Data

Structuring datasets to facilitate analysis (Wickham 2014)

So, you've sat down to analyze a new dataset. What do you do first?

In …


Modern Panadas (Part 3): Indexes


This is part 3 in my series on writing modern idiomatic pandas.


Indexes can be a difficult concept to grasp at first. I suspect this is partly becuase they're somewhat peculiar to pandas. These aren't like the …


Modern Pandas (Part 4): Performance


This is part 4 in my series on writing modern idiomatic pandas.


Wes McKinney, the creator of pandas, is kind of obsessed with performance. From micro-optimizations for element access, to embedding a fast hash table inside pandas …


Modern Pandas (Part 2): Method Chaining


This is part 2 in my series on writing modern idiomatic pandas.


Method Chaining

Method chaining, where you call methods on an object one after another, is in vogue at the moment. It's always been a style …


Modern Pandas (Part 1)


This is part 1 in my series on writing modern idiomatic pandas.


Effective Pandas

Introduction

This series is about how to make effective use of pandas, a data analysis library for the Python programming language. It's targeted …


dplyr and pandas

This notebook compares pandas and dplyr. The comparison is just on syntax (verbage), not performance. Whether you're an R user looking to switch to pandas (or the other way around), I hope this guide will help ease the transition.

We'll work through the introductory dplyr vignette to analyze some flight …


Practical Pandas Part 3 - Exploratory Data Analysis

Welcome back. As a reminder:

  • In part 1 we got dataset with my cycling data from last year merged and stored in an HDF5 store
  • In part 2 we did some cleaning and augmented the cycling data with data from http://forecast.io.

You can find the full source code …