Have your cake and eat it too: Rapid model development and stable, high-performance deployments
Christian Bourjau, Jakub Bachurski
At the boundary of model development and MLOps lies the balance between the speed of deploying new models and ensuring operational constraints. These include factors like low latency prediction, the absence of vulnerabilities in dependencies and the need for the model behavior to stay reproducible for years. The longer the list of constraints, the longer it usually takes to take a model from its development environment into production. In this talk, we present how we seemingly managed to square the circle and have both a rapid, highly dynamic model development and yet also a stable and high-performance deployment. We ship sklearn-based models in a real-time service that guarantees 24/7 uptime with low latency (ms) responses. Simultaneously, we adhere to strict regulatory and security policies, where every model must remain available for 3-5 years, while its dependencies are kept up-to-date. As the basis, we are using ONNX as a technology to transform our dynamic Python pipelines into static, low-overhead model definitions. To ensure the cost of the model transformation does not slow down our Data Scientists, we have developed an open-source library named Spox, to streamline these operations as much as possible. Combined with an apt model serving infrastructure, we can satisfy the needs of our data scientists (fast development and deployment) and those of corporate IT (vulnerability-free, year-long stability) without compromising efficiency.
With a PhD in experimental particle physics, Christian has a passion for the intersection of cutting-edge data science and modern software engineering. His work at QuantCo is centered around creating efficient tools for data scientists with a clean and maintainable path toward production. That pursuit has led him deep into ONNX and its related ecosystem over the last year.
Currently studying Computer Science at the University of Cambridge, Jakub’s interests are primarily in algorithm design and programming languages. He put those interests to use designing the Spox framework for ONNX as a Software Engineer at QuantCo.