Most of you don't need Spark. Large-scale data management on a budget with Python
The Python data ecosystem has matured during the last decade and there are less and less reasons to rely only large batch process executed in a Spark cluster, but with every large ecosystem, putting together the key pieces of technology takes some effort. There are now better storage technologies, streaming execution engines, query planners, and low level compute libraries. And modern hardware is way more powerful than what you'd probably expect. In this workshop we will explore some global-warming-reducing techniques to build more efficient data transformation pipelines in Python, and a little bit of Rust.
Affiliation: BCG X
PhD, MS, Aerospace Engineering. Previously researching on Turbulence Theory and Simulation. Now at BCG X helping clients take the most out of data and AI