This is part 1 in my series on writing modern idiomatic pandas.
As I sit down to write this, the third-most popular pandas question on StackOverflow covers how to use pandas for large datasets. This is in …
This is a status update on some enhancements for pandas. The goal of the work
is to store things that are sufficiently array-like in a pandas
even if they aren't a regular NumPy array. Pandas already does this in a few
places for some blessed types (like
Categorical); we'd …
Today we released the first version of
dask-ml, a library for parallel and
distributed machine learning. Read the documentation or install it with
pip install dask-ml
Packages are currently building for conda-forge, and will be up later today.
conda install -c conda-forge dask-ml
dask is, to quote the …
This is part two of my series on scalable machine learning.
You can download a notebook of this post here.
Scikit-learn supports out-of-core learning (fitting a …
Anaconda is interested in scaling the scientific python ecosystem. My current focus is on out-of-core, parallel, and distributed machine learning. This series of posts will introduce those concepts, explore what we have available …
This is part 7 in my series on writing modern idiomatic pandas.
Pandas started out in the financial world, so naturally it has strong timeseries support.
The first half of this post will look at pandas' …