PyData Session List
Accelerating Public Consultations with Large Language Models: A Case Study from the UK Planning Inspectorate
Michele Dallachiesa, Andreas Leed
New study shows Large Language Models can accelerate public consultations by streamlining the analysis process of representations for Local Plans. Results show the potential for 30% faster analysis time and up to 90% classification accuracy #AI #NLP #DataScience #pyconde @PINSgov
Actionable Machine Learning in the Browser with PyScript
Valerio Maggio
Interactive ML apps in the browser with zero installation and no server needed? Come to my talk to know how..
Advanced Visual Search Engine with Self-Supervised Learning (SSL) Representations and Milvus
Antoine Toubhans, Noé Achache
Building a Visual Search Engine with Milvus and comparing supervised and self-supervised approaches for images representations
Apache Arrow: connecting and accelerating dataframe libraries across the PyData ecosystem
Joris Van den Bossche
Connecting and accelerating dataframe libraries across the PyData ecosystem with Apache Arrow. Learn about the recent developments in Arrow and its adoption, and how it can improve your day-to-day data analytics workflows.
Apache StreamPipes for Pythonistas: IIoT data handling made easy!
Tim Bossenmaier, Sven Oehler
Data enthusiasts love to play with IIoT data. However, the technical challenges remain high (e.g., connect to devices). @StreamPipes makes this easy by providing a self-service toolbox. In this talk, we introduce a new python module to work with IIoT data in a pythonic way.
Ask-A-Question: an FAQ-answering service for when there's little to no data
Suzin You
Doing data science in international development often means dealing with more resource-constraints. This talk will walk you through Ask-A-Question, a simple FAQ-answering service for when there's little to no data that we built for WhatsApp helplines for public health.
AutoGluon: AutoML for Tabular, Multimodal and Time Series Data
Caner Turkmen, Oleksandr Shchur
Learn about #AutoML and @AutoGluon, which can handle a range of tasks from regression to image classification and time series forecasting with state-of-the-art performance. #AutoML #datascience
Bayesian Marketing Science: Solving Marketing's 3 Biggest Problems
Dr. Thomas Wiecki
A Bayesian modeling toolkit to solve today's biggest marketing challenges.
BHAD: Explainable unsupervised anomaly detection using Bayesian histograms
Alexander Vosseler
We present a Bayesian histogram anomaly detector (BHAD). BHAD scales linearly with the size of the data and allows a direct explanation of individual anomaly scores due to its simple linear form
Building a Personal Assistant With GPT and Haystack: How to Feed Facts to Large Language Models and Reduce Hallucination.
Mathis Lucka
Building a Personal Assistant With GPT and Haystack: How to Feed Facts to Large Language Models and Reduce Hallucination.
Common issues with Time Series data and how to solve them
Vadim Nelidov
Handling time series data is an important yet not an easy task. After this talk you will learn to identify, understand, and resolve time series issues such as divergence, delayed data, time series imputation and impact of outliers.
Contributing to an open-source content library for NLP
Leonard Püttmann
Learn to build amazing open-source enrichments for natural language processing!
Create interactive Jupyter websites with JupyterLite
Jeremy Tuloup
Do you want to create your own interactive Jupyter website with JupyterLite? Check out this step-by-step tutorial and learn how to configure and customize your website 💡
Driving down the Memray lane - Profiling your data science work
Cheuk Ting Ho
You should profile your data science work. In this talk, we will introduce Mamray its new Jupyter plugin.
evosax: JAX-Based Evolution Strategies
Robert Lange
Tired of having to handle asynchronous processes for neuroevolution? Do you want to leverage high-throughput accelerators for evolution strategies (ES)? evosax allows you to leverage JAX, XLA compilation & auto-vectorization/parallelization to scale ES to accelerators.
Geospatial Data Processing with Python: A Comprehensive Tutorial
Martin Christen
Learn how to use Python to process geospatial data in this comprehensive tutorial! You'll gain hands-on experience with many Geo modules, learning how to read and write spatial data, perform coordinate system transformations, create interactive maps, and more.
Getting started with JAX
Simon Pressler
Getting Started with JAX! Hands-on tips to overcome your first hurdles.
Grokking Anchors: Uncovering What a Machine-Learning Model Relies On
KIlian Kluge
What makes or breaks a machine-learning model's decision? Let's use anchor explanations to find out!
Haystack for climate Q/A
Vibha Vikram Rao
Haystack for climate Q/A - How to build POCs quickly and take it to production
Honey, I broke the PyTorch model >.< - Debugging custom PyTorch models in a structured manner
Clara Hoffmann
Honey, I broke the Pytorch model >.< No problem! In this talk, we'll build a toolbox to debug our models and prevent this from happening again -all by leveraging DL logic, synthetic data and pytest. Let's make our models unbreakable <3
How Chatbots work – We need to talk!
Yuqiong Weng, Katrin Reininger
We need to talk - All about concepts, techniques as well as practical experience with the Rasa framework for building a chatbot
How to baseline in NLP and where to go from there
Tobias Sterbak
Join us for a talk on baselines in NLP! We'll cover common tasks like classification, clustering, search, and NER, and discuss how to establish and improve baselines using weak learning. Don't miss out on this opportunity to gain a deeper understanding of NLP baselines!
How to teach NLP to a newbie & get them started on their first project
Lisa Andreevna Chalaguine
Learn how to teach people to analyse textual data with the help of Python
Hyperparameter optimization for the impatient
Martin Wistuba
HPO does not need to be expensive, see how to speed it up with a couple of simple algorithms
Improving Machine Learning from Human Feedback
Erin Mikail Staples, Nikolai
While powerful, models built off large datasets like GPT-3 often bring their biases along with them. However, is this the best future for machine learning? Join us to explore Reinforcement Learning from Human Feedback (RLHF) techniques and why they matter more now than ever.
Incorporating GPT-3 into practical NLP workflows
Ines Montani
Large language models like @OpenAI GPT-3 can complement existing machine learning workflows really well. You can get initial annotations from GPT-3, quickly fix them with an annotation tool like https://prodi.gy , and train a cheaper and better model.
Let's contribute to pandas (3 hours) #1
Noa Tamir, Patrick Hoefler
Join our beginner friendly, mentored contributing to @pandas_dev workshop at PyData Berlin! 🥳 #opensource #pandas
Let's contribute to pandas (3 hours) #2
Noa Tamir, Patrick Hoefler
Join our beginner friendly, mentored contributing to @pandas_dev workshop at PyData Berlin! 🥳 #opensource #pandas
Methods for Text Style Transfer: Text Detoxification Case
Daryna Dementieva
How to detoxify texts? How to collect parallel corpus for text style transfer task? How to transfer the knowledge of a style between languages? We answer these questions in this talk.
Most of you don't need Spark. Large-scale data management on a budget with Python
Guillem Borrell
Most of you don't need Spark. Large-scale data management on a budget with Python
Neo4j graph databases for climate policy
Marcus Tedesco
Can Neo4j graph databases and Python help us understand climate policy? Find out!
Observability for Distributed Computing with Dask
Hendrik Makait
Debugging is hard. Distributed debugging is hell. Let’s dive into distributed logging, automated metrics, event-based monitoring, and root-causing problems with diagnostic tooling to understand how Dask helps you remain sane while identifying and solving your problems.
Pandas 2.0 and beyond
Joris Van den Bossche, Patrick Hoefler
Pandas has reached a 2.0 milestone in 2023. But what does that mean? And what is coming after 2.0? This talk will give an overview of what happened in the latest releases of pandas and highlight some topics and major new features the pandas project is working on.
Performing Root Cause Analysis with DoWhy, a Causal Machine-Learning Library
Patrick Blöbaum
Learn how to use the Python DoWhy library to perform root cause analysis using methods of causal machine-learning.
Polars - make the switch to lightning-fast dataframes
Thomas Bierhance
Want to learn about a new Python library that can speed up your datascience and analytics work? Join us at the conference to hear about polars, a lightning-fast dataframe library based on Apache Arrow and written in Rust!
Postmodern Architecture: The Python Powered Modern Data Stack
John Sandall
Learn how to upgrade your pandas pipelines powering DAG workflows to a Python Powered Modern Data Stack, demystify the jargon from ETL to ELT, and see how tools like dbt can integrate with Python to change how data pipelines are built and maintained.
Pragmatic ways of using Rust in your data project
Christopher Prohm
Pragmatic ways of using Rust in your data project - strategies to speed up your data pipelines without rewriting the whole program.
Prompt Engineering 101: Beginner intro to LangChain, the shovel of our ChatGPT gold rush."
Lev Konstantinovskiy
A modern AI start-up is a front-end developer plus a prompt engineer" is a popular joke on Twitter. This talk is about LangChain, a Python open-source tool for prompt engineering.
Raised by Pandas, striving for more: An opinionated introduction to Polars
Nico Kreiling
Have you also been raised with #pandas for all kinds of data transformations and wonder, if there is more? I did, I searched for performance and more concise syntax, and I would like to introduce you to #polars
Shrinking gigabyte sized scikit-learn models for deployment
Pavel Zwerschke, Yasin Tatar
Shrinking gigabyte sized scikit-learn models for deployment: this talk shows how to deploy machine learning models with up to 6x disk space improvement
Teaching Neural Networks a Sense of Geometry
Jens Agerberg
By taking neural networks back to the school bench and teaching them some elements of geometry and topology we can build algorithms that can reason about the shape of data. This is the promise of the emerging field of Topological Data Analysis (TDA) which we will introduce!
The Battle of Giants: Causality vs NLP => From Theory to Practice
Aleksander Molak
Join us for a workshop on the latest advances in Causal NLP to see the Causal Transformer in action! All in Python! ❤️
The Beauty of Zarr
Sanket Verma
Hi all, I’ll be talking about Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays, along with a hands-on session. If you work with huge datasets in local/cloud storage and looking for an efficient format, please attend my talk. Thanks!
The bumps in the road: A retrospective on my data visualisation mistakes
Artem Kislovskiy
Join us for a talk: The bumps in the road: A retrospective on my data visualisation mistakes, on data visualisation and how it's essential for conveying insights from data. We'll discuss best practices with Matplotlib, the limitations of static visualisations, and how CI can stre
The future of the Jupyter Notebook interface
Jeremy Tuloup
Jupyter Notebook 7 is the new version of the popular document-oriented notebook interface. It comes packed with a lot of new features, and its future looks bright!
Unlocking Information - Creating Synthetic Data for Open Access.
Antonia Scherz
A lot of data is private but this talk is not - learn how to synthesize anonymized, reliable data from sensitive, private data.
Using transformers – a drama in 512 tokens
Marianne Stecklina
Nearly all pretrained transformers have an annoying limitation: they can only process short input sequences. Watch me rant about it ;-)
Visualizing your computer vision data is not a luxury, it's a necessity: without it, your models are blind and so do you.
Chazareix Arnault
Visualizing your #ComputerVision data is not a luxury, it's a necessity: without it, your models are blind and so do you! Learn how to elevate your projects and #datasets with #DatasetVisualization.
WALD: A Modern & Sustainable Analytics Stack
Florian Wilhelm
WALD: A modern & sustainable analytics stack consisting of a warehouse like Snowflake or BigQuery, Airbyte, Lightdash and dbt.
When A/B testing isn’t an option: an introduction to quasi-experimental methods
Inga Janczuk
Have you ever wanted to know the causal effect of an action but A/B testing wasn’t an option? Here’s a brief helicopter tour over quasi-experimental methods that can be used instead!
Why GPU Clusters Don't Need to Go Brrr? Leverage Compound Sparsity to Achieve the Fastest Inference Performance on CPUs
Damian Bogunowicz
Fun fact: you can remove 90% of a neural network's weights without losing much accuracy! With model sparsity, you can even run these networks on your CPU with GPU-level performance. Learn about compound sparsity (pruning, quantization, knowledge distillation) for faster inference
You are what you read: Building a personal internet front-page with spaCy and Prodigy
Victoria Slocum
The internet can be overwhelming, so I made a tool to create a personalized summary of it! Through building this internet front-page project, I've learned how the design concepts of tools like spaCy and Prodigy can facilitate the development of both complex and simple software.
You've got trust issues, we've got solutions: Differential Privacy
Vikram Waradpande, Sarthika Dhawan
What if I tell you I could answer everything about you without knowing you using Differential Privacy
“Who is an NLP expert?” - Lessons Learned from building an in-house QA-system
Nico Kreiling, Alina Bickel
Imagine to have somethingn like ChatGPT for your worklife! Or at least a bot you could ask about all your internal documents? We tried to build something like that @scieneers and will tell you about our journey #haystack #weaviate
Filter