Accelerating Public Consultations with Large Language Models: A Case Study from the UK Planning Inspectorate
Michele Dallachiesa, Andreas Leed

New study shows Large Language Models can accelerate public consultations by streamlining the analysis process of representations for Local Plans. Results show the potential for 30% faster analysis time and up to 90% classification accuracy #AI #NLP #DataScience #pyconde @PINSgov

Actionable Machine Learning in the Browser with PyScript
Valerio Maggio

Interactive ML apps in the browser with zero installation and no server needed? Come to my talk to know how..

Advanced Visual Search Engine with Self-Supervised Learning (SSL) Representations and Milvus
Antoine Toubhans, Noé Achache

Building a Visual Search Engine with Milvus and comparing supervised and self-supervised approaches for images representations

Apache Arrow: connecting and accelerating dataframe libraries across the PyData ecosystem
Joris Van den Bossche

Connecting and accelerating dataframe libraries across the PyData ecosystem with Apache Arrow. Learn about the recent developments in Arrow and its adoption, and how it can improve your day-to-day data analytics workflows.

Apache StreamPipes for Pythonistas: IIoT data handling made easy!
Tim Bossenmaier, Sven Oehler

Data enthusiasts love to play with IIoT data. However, the technical challenges remain high (e.g., connect to devices). @StreamPipes makes this easy by providing a self-service toolbox. In this talk, we introduce a new python module to work with IIoT data in a pythonic way.

Ask-A-Question: an FAQ-answering service for when there's little to no data
Suzin You

Doing data science in international development often means dealing with more resource-constraints. This talk will walk you through Ask-A-Question, a simple FAQ-answering service for when there's little to no data that we built for WhatsApp helplines for public health.

AutoGluon: AutoML for Tabular, Multimodal and Time Series Data
Caner Turkmen, Oleksandr Shchur

Learn about #AutoML and @AutoGluon, which can handle a range of tasks from regression to image classification and time series forecasting with state-of-the-art performance. #AutoML #datascience

Bayesian Marketing Science: Solving Marketing's 3 Biggest Problems
Dr. Thomas Wiecki

A Bayesian modeling toolkit to solve today's biggest marketing challenges.

BHAD: Explainable unsupervised anomaly detection using Bayesian histograms
Alexander Vosseler

We present a Bayesian histogram anomaly detector (BHAD). BHAD scales linearly with the size of the data and allows a direct explanation of individual anomaly scores due to its simple linear form

Building a Personal Assistant With GPT and Haystack: How to Feed Facts to Large Language Models and Reduce Hallucination.
Mathis Lucka

Building a Personal Assistant With GPT and Haystack: How to Feed Facts to Large Language Models and Reduce Hallucination.

Common issues with Time Series data and how to solve them
Vadim Nelidov

Handling time series data is an important yet not an easy task. After this talk you will learn to identify, understand, and resolve time series issues such as divergence, delayed data, time series imputation and impact of outliers.

Contributing to an open-source content library for NLP
Leonard Püttmann

Learn to build amazing open-source enrichments for natural language processing!

Create interactive Jupyter websites with JupyterLite
Jeremy Tuloup

Do you want to create your own interactive Jupyter website with JupyterLite? Check out this step-by-step tutorial and learn how to configure and customize your website 💡

Driving down the Memray lane - Profiling your data science work
Cheuk Ting Ho

You should profile your data science work. In this talk, we will introduce Mamray its new Jupyter plugin.

evosax: JAX-Based Evolution Strategies
Robert Lange

Tired of having to handle asynchronous processes for neuroevolution? Do you want to leverage high-throughput accelerators for evolution strategies (ES)? evosax allows you to leverage JAX, XLA compilation & auto-vectorization/parallelization to scale ES to accelerators.

Geospatial Data Processing with Python: A Comprehensive Tutorial
Martin Christen

Learn how to use Python to process geospatial data in this comprehensive tutorial! You'll gain hands-on experience with many Geo modules, learning how to read and write spatial data, perform coordinate system transformations, create interactive maps, and more.

Getting started with JAX
Simon Pressler

Getting Started with JAX! Hands-on tips to overcome your first hurdles.

Grokking Anchors: Uncovering What a Machine-Learning Model Relies On
KIlian Kluge

What makes or breaks a machine-learning model's decision? Let's use anchor explanations to find out!

Haystack for climate Q/A
Vibha Vikram Rao

Haystack for climate Q/A - How to build POCs quickly and take it to production

Honey, I broke the PyTorch model >.< - Debugging custom PyTorch models in a structured manner
Clara Hoffmann

Honey, I broke the Pytorch model >.< No problem! In this talk, we'll build a toolbox to debug our models and prevent this from happening again -all by leveraging DL logic, synthetic data and pytest. Let's make our models unbreakable <3

How Chatbots work – We need to talk!
Yuqiong Weng, Katrin Reininger

We need to talk - All about concepts, techniques as well as practical experience with the Rasa framework for building a chatbot

How to baseline in NLP and where to go from there
Tobias Sterbak

Join us for a talk on baselines in NLP! We'll cover common tasks like classification, clustering, search, and NER, and discuss how to establish and improve baselines using weak learning. Don't miss out on this opportunity to gain a deeper understanding of NLP baselines!

How to teach NLP to a newbie & get them started on their first project
Lisa Andreevna Chalaguine

Learn how to teach people to analyse textual data with the help of Python

Hyperparameter optimization for the impatient
Martin Wistuba

HPO does not need to be expensive, see how to speed it up with a couple of simple algorithms

Improving Machine Learning from Human Feedback
Erin Mikail Staples, Nikolai

While powerful, models built off large datasets like GPT-3 often bring their biases along with them. However, is this the best future for machine learning? Join us to explore Reinforcement Learning from Human Feedback (RLHF) techniques and why they matter more now than ever.

Incorporating GPT-3 into practical NLP workflows
Ines Montani

Large language models like @OpenAI GPT-3 can complement existing machine learning workflows really well. You can get initial annotations from GPT-3, quickly fix them with an annotation tool like , and train a cheaper and better model.

Let's contribute to pandas (3 hours) #1
Noa Tamir, Patrick Hoefler

Join our beginner friendly, mentored contributing to @pandas_dev workshop at PyData Berlin! 🥳 #opensource #pandas

Let's contribute to pandas (3 hours) #2
Noa Tamir, Patrick Hoefler

Join our beginner friendly, mentored contributing to @pandas_dev workshop at PyData Berlin! 🥳 #opensource #pandas

Methods for Text Style Transfer: Text Detoxification Case
Daryna Dementieva

How to detoxify texts? How to collect parallel corpus for text style transfer task? How to transfer the knowledge of a style between languages? We answer these questions in this talk.

Most of you don't need Spark. Large-scale data management on a budget with Python
Guillem Borrell

Most of you don't need Spark. Large-scale data management on a budget with Python

Neo4j graph databases for climate policy
Marcus Tedesco

Can Neo4j graph databases and Python help us understand climate policy? Find out!

Observability for Distributed Computing with Dask
Hendrik Makait

Debugging is hard. Distributed debugging is hell. Let’s dive into distributed logging, automated metrics, event-based monitoring, and root-causing problems with diagnostic tooling to understand how Dask helps you remain sane while identifying and solving your problems.

Pandas 2.0 and beyond
Joris Van den Bossche, Patrick Hoefler

Pandas has reached a 2.0 milestone in 2023. But what does that mean? And what is coming after 2.0? This talk will give an overview of what happened in the latest releases of pandas and highlight some topics and major new features the pandas project is working on.

Performing Root Cause Analysis with DoWhy, a Causal Machine-Learning Library
Patrick Blöbaum

Learn how to use the Python DoWhy library to perform root cause analysis using methods of causal machine-learning.

Polars - make the switch to lightning-fast dataframes
Thomas Bierhance

Want to learn about a new Python library that can speed up your datascience and analytics work? Join us at the conference to hear about polars, a lightning-fast dataframe library based on Apache Arrow and written in Rust!

Postmodern Architecture: The Python Powered Modern Data Stack
John Sandall

Learn how to upgrade your pandas pipelines powering DAG workflows to a Python Powered Modern Data Stack, demystify the jargon from ETL to ELT, and see how tools like dbt can integrate with Python to change how data pipelines are built and maintained.

Pragmatic ways of using Rust in your data project
Christopher Prohm

Pragmatic ways of using Rust in your data project - strategies to speed up your data pipelines without rewriting the whole program.

Prompt Engineering 101: Beginner intro to LangChain, the shovel of our ChatGPT gold rush."
Lev Konstantinovskiy

A modern AI start-up is a front-end developer plus a prompt engineer" is a popular joke on Twitter. This talk is about LangChain, a Python open-source tool for prompt engineering.

Raised by Pandas, striving for more: An opinionated introduction to Polars
Nico Kreiling

Have you also been raised with #pandas for all kinds of data transformations and wonder, if there is more? I did, I searched for performance and more concise syntax, and I would like to introduce you to #polars

Shrinking gigabyte sized scikit-learn models for deployment
Pavel Zwerschke, Yasin Tatar

Shrinking gigabyte sized scikit-learn models for deployment: this talk shows how to deploy machine learning models with up to 6x disk space improvement

Teaching Neural Networks a Sense of Geometry
Jens Agerberg

By taking neural networks back to the school bench and teaching them some elements of geometry and topology we can build algorithms that can reason about the shape of data. This is the promise of the emerging field of Topological Data Analysis (TDA) which we will introduce!

The Battle of Giants: Causality vs NLP => From Theory to Practice
Aleksander Molak

Join us for a workshop on the latest advances in Causal NLP to see the Causal Transformer in action! All in Python! ❤️

The Beauty of Zarr
Sanket Verma

Hi all, I’ll be talking about Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays, along with a hands-on session. If you work with huge datasets in local/cloud storage and looking for an efficient format, please attend my talk. Thanks!

The bumps in the road: A retrospective on my data visualisation mistakes
Artem Kislovskiy

Join us for a talk: The bumps in the road: A retrospective on my data visualisation mistakes, on data visualisation and how it's essential for conveying insights from data. We'll discuss best practices with Matplotlib, the limitations of static visualisations, and how CI can stre

The future of the Jupyter Notebook interface
Jeremy Tuloup

Jupyter Notebook 7 is the new version of the popular document-oriented notebook interface. It comes packed with a lot of new features, and its future looks bright!

Unlocking Information - Creating Synthetic Data for Open Access.
Antonia Scherz

A lot of data is private but this talk is not - learn how to synthesize anonymized, reliable data from sensitive, private data.

Using transformers – a drama in 512 tokens
Marianne Stecklina

Nearly all pretrained transformers have an annoying limitation: they can only process short input sequences. Watch me rant about it ;-)

Visualizing your computer vision data is not a luxury, it's a necessity: without it, your models are blind and so do you.
Chazareix Arnault

Visualizing your #ComputerVision data is not a luxury, it's a necessity: without it, your models are blind and so do you! Learn how to elevate your projects and #datasets with #DatasetVisualization.

WALD: A Modern & Sustainable Analytics Stack
Florian Wilhelm

WALD: A modern & sustainable analytics stack consisting of a warehouse like Snowflake or BigQuery, Airbyte, Lightdash and dbt.

When A/B testing isn’t an option: an introduction to quasi-experimental methods
Inga Janczuk

Have you ever wanted to know the causal effect of an action but A/B testing wasn’t an option? Here’s a brief helicopter tour over quasi-experimental methods that can be used instead!

Why GPU Clusters Don't Need to Go Brrr? Leverage Compound Sparsity to Achieve the Fastest Inference Performance on CPUs
Damian Bogunowicz

Fun fact: you can remove 90% of a neural network's weights without losing much accuracy! With model sparsity, you can even run these networks on your CPU with GPU-level performance. Learn about compound sparsity (pruning, quantization, knowledge distillation) for faster inference

You are what you read: Building a personal internet front-page with spaCy and Prodigy
Victoria Slocum

The internet can be overwhelming, so I made a tool to create a personalized summary of it! Through building this internet front-page project, I've learned how the design concepts of tools like spaCy and Prodigy can facilitate the development of both complex and simple software.

You've got trust issues, we've got solutions: Differential Privacy
Vikram Waradpande, Sarthika Dhawan

What if I tell you I could answer everything about you without knowing you using Differential Privacy

“Who is an NLP expert?” - Lessons Learned from building an in-house QA-system
Nico Kreiling, Alina Bickel

Imagine to have somethingn like ChatGPT for your worklife! Or at least a bot you could ask about all your internal documents? We tried to build something like that @scieneers and will tell you about our journey #haystack #weaviate