This is a status update on some enhancements for pandas. The goal of the work
is to store things that are sufficiently array-like in a pandas
even if they aren't a regular NumPy array. Pandas already does this in a few
places for some blessed types (like
Categorical); we'd …
Today we released the first version of
dask-ml, a library for parallel and
distributed machine learning. Read the documentation or install it with
pip install dask-ml
Packages are currently building for conda-forge, and will be up later today.
conda install -c conda-forge dask-ml
dask is, to quote the …
This is part two of my series on scalable machine learning.
You can download a notebook of this post here.
Scikit-learn supports out-of-core learning (fitting a …
Anaconda is interested in scaling the scientific python ecosystem. My current focus is on out-of-core, parallel, and distributed machine learning. This series of posts will introduce those concepts, explore what we have available …
This is part 7 in my series on writing modern idiomatic pandas.
Pandas started out in the financial world, so naturally it has strong timeseries support.
The first half of this post will look at pandas' capabilities …
This is part 6 in my series on writing modern idiomatic pandas.
Visualization and Exploratory Analysis
A few weeks ago, the R community went through some hand-wringing about plotting packages. For outsiders (like me) the details aren't that …
This is part 5 in my series on writing modern idiomatic pandas.
Reshaping & Tidy Data
Structuring datasets to facilitate analysis (Wickham 2014)
So, you've sat down to analyze a new dataset. What do you do first?
In episode …