Most of you don't need Spark. Large-scale data management on a budget with Python

Guillem Borrell

Wednesday 11:40 in A03-A04

Type/Track Tutorial pydata-data-handling

The Python data ecosystem has matured during the last decade and there are less and less reasons to rely only large batch process executed in a Spark cluster, but with every large ecosystem, putting together the key pieces of technology takes some effort. There are now better storage technologies, streaming execution engines, query planners, and low level compute libraries. And modern hardware is way more powerful than what you'd probably expect. In this workshop we will explore some global-warming-reducing techniques to build more efficient data transformation pipelines in Python, and a little bit of Rust.

Level Domain Expertise Intermediate Python Skill Level Intermediate

Guillem Borrell

Affiliation: BCG X

PhD, MS, Aerospace Engineering. Previously researching on Turbulence Theory and Simulation. Now at BCG X helping clients take the most out of data and AI

visit the speaker at: Github • Homepage