Archive

dask-ml 0.4.1 Released

This work is supported by Anaconda Inc and the Data Driven Discovery Initiative from the Moore Foundation.

dask-ml 0.4.1 was released today with a few enhancements. See the changelog for all the changes from 0.4.0.

Conda packages are available on conda-forge

$ conda install -c conda-forge dask-ml …

Extension Arrays for Pandas

This is a status update on some enhancements for pandas. The goal of the work is to store things that are sufficiently array-like in a pandas DataFrame, even if they aren't a regular NumPy array. Pandas already does this in a few places for some blessed types (like Categorical); we'd …


Easy distributed training with Joblib and dask

This work is supported by Anaconda Inc and the Data Driven Discovery Initiative from the Moore Foundation.

This past week, I had a chance to visit some of the scikit-learn developers at Inria in Paris. It was a fun and productive week, and I'm thankful to them for hosting me …


dask-ml

Today we released the first version of dask-ml, a library for parallel and distributed machine learning. Read the documentation or install it with

pip install dask-ml

Packages are currently building for conda-forge, and will be up later today.

conda install -c conda-forge dask-ml

The Goals

dask is, to quote the …


Scalable Machine Learning (Part 2): Partial Fit

This work is supported by Anaconda, Inc. and the Data Driven Discovery Initiative from the Moore Foundation.

This is part two of my series on scalable machine learning.

You can download a notebook of this post here.


Scikit-learn supports out-of-core learning (fitting a …


Scalable Machine Learning (Part 1)

This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation.

Anaconda is interested in scaling the scientific python ecosystem. My current focus is on out-of-core, parallel, and distributed machine learning. This series of posts will introduce those concepts, explore what we have available …


Introducing Stitch

Today I released stitch into the wild. If you haven't yet, check out the examples page to see an example of what stitch does, and the Github repo for how to install. I'm using this post to explain why I wrote stitch, and some issues it tries to solve.

Why …


Modern Pandas (Part 7): Timeseries


This is part 7 in my series on writing modern idiomatic pandas.


Timeseries

Pandas started out in the financial world, so naturally it has strong timeseries support.

The first half of this post will look at pandas' capabilities …


Modern Pandas (Part 6): Visualization


This is part 6 in my series on writing modern idiomatic pandas.


Visualization and Exploratory Analysis

A few weeks ago, the R community went through some hand-wringing about plotting packages. For outsiders (like me) the details aren't that …


Modern Pandas (Part 5): Tidy Data


This is part 5 in my series on writing modern idiomatic pandas.


Reshaping & Tidy Data

Structuring datasets to facilitate analysis (Wickham 2014)

So, you've sat down to analyze a new dataset. What do you do first?

In episode …