Archive

dask-ml

Today we released the first version of dask-ml, a library for parallel and distributed machine learning. Read the documentation or install it with

pip install dask-ml

Packages are currently building for conda-forge, and will be up later today.

conda install -c conda-forge dask-ml

The Goals

dask is, to quote the …


Scalable Machine Learning (Part 2): Partial Fit

This work is supported by Anaconda, Inc. and the Data Driven Discovery Initiative from the Moore Foundation.

This is part two of my series on scalable machine learning.

You can download a notebook of this post here.


Scikit-learn supports out-of-core learning (fitting a …


Scalable Machine Learning (Part 1)

This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation.

Anaconda is interested in scaling the scientific python ecosystem. My current focus is on out-of-core, parallel, and distributed machine learning. This series of posts will introduce those concepts, explore what we have available …


Introducing Stitch

Today I released stitch into the wild. If you haven't yet, check out the examples page to see an example of what stitch does, and the Github repo for how to install. I'm using this post to explain why I wrote stitch, and some issues it tries to solve.

Why …


Modern Pandas (Part 7): Timeseries


This is part 7 in my series on writing modern idiomatic pandas.


Timeseries

Pandas started out in the financial world, so naturally it has strong timeseries support.

The first half of this post will look at pandas' capabilities …


Modern Pandas (Part 6): Visualization


This is part 6 in my series on writing modern idiomatic pandas.


Visualization and Exploratory Analysis

A few weeks ago, the R community went through some hand-wringing about plotting packages. For outsiders (like me) the details aren't that …


Modern Pandas (Part 5): Tidy Data


This is part 5 in my series on writing modern idiomatic pandas.


Reshaping & Tidy Data

Structuring datasets to facilitate analysis (Wickham 2014)

So, you've sat down to analyze a new dataset. What do you do first?

In episode …


Modern Panadas (Part 3): Indexes


This is part 3 in my series on writing modern idiomatic pandas.


Indexes can be a difficult concept to grasp at first. I suspect this is partly becuase they're somewhat peculiar to pandas. These aren't like the indexes …


Modern Pandas (Part 4): Performance


This is part 4 in my series on writing modern idiomatic pandas.


Wes McKinney, the creator of pandas, is kind of obsessed with performance. From micro-optimizations for element access, to embedding a fast hash table inside pandas, we …


Modern Pandas (Part 2): Method Chaining


This is part 2 in my series on writing modern idiomatic pandas.


Method Chaining

Method chaining, where you call methods on an object one after another, is in vogue at the moment. It's always been a style of …