Apache Arrow: connecting and accelerating dataframe libraries across the PyData ecosystem
Joris Van den Bossche

Connecting and accelerating dataframe libraries across the PyData ecosystem with Apache Arrow. Learn about the recent developments in Arrow and its adoption, and how it can improve your day-to-day data analytics workflows.

Geospatial Data Processing with Python: A Comprehensive Tutorial
Martin Christen

Learn how to use Python to process geospatial data in this comprehensive tutorial! You'll gain hands-on experience with many Geo modules, learning how to read and write spatial data, perform coordinate system transformations, create interactive maps, and more.

Let's contribute to pandas (3 hours) #1
Noa Tamir, Patrick Hoefler

Join our beginner friendly, mentored contributing to @pandas_dev workshop at PyData Berlin! 🥳 #opensource #pandas

Let's contribute to pandas (3 hours) #2
Noa Tamir, Patrick Hoefler

Join our beginner friendly, mentored contributing to @pandas_dev workshop at PyData Berlin! 🥳 #opensource #pandas

Observability for Distributed Computing with Dask
Hendrik Makait

Debugging is hard. Distributed debugging is hell. Let’s dive into distributed logging, automated metrics, event-based monitoring, and root-causing problems with diagnostic tooling to understand how Dask helps you remain sane while identifying and solving your problems.

Pandas 2.0 and beyond
Joris Van den Bossche, Patrick Hoefler

Pandas has reached a 2.0 milestone in 2023. But what does that mean? And what is coming after 2.0? This talk will give an overview of what happened in the latest releases of pandas and highlight some topics and major new features the pandas project is working on.

Shrinking gigabyte sized scikit-learn models for deployment
Pavel Zwerschke, Yasin Tatar

Shrinking gigabyte sized scikit-learn models for deployment: this talk shows how to deploy machine learning models with up to 6x disk space improvement

The Beauty of Zarr
Sanket Verma

Hi all, I’ll be talking about Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays, along with a hands-on session. If you work with huge datasets in local/cloud storage and looking for an efficient format, please attend my talk. Thanks!

Unlocking Information - Creating Synthetic Data for Open Access.
Antonia Scherz

A lot of data is private but this talk is not - learn how to synthesize anonymized, reliable data from sensitive, private data.

You've got trust issues, we've got solutions: Differential Privacy
Vikram Waradpande, Sarthika Dhawan

What if I tell you I could answer everything about you without knowing you using Differential Privacy

Filter