PyData | PyConDE & PyData Berlin 2023

Talk pydata-natural-language-processing

Accelerating Public Consultations with Large Language Models: A Case Study from the UK Planning Inspectorate

Michele Dallachiesa, Andreas Leed

New study shows Large Language Models can accelerate public consultations by streamlining the analysis process of representations for Local Plans. Results show the potential for 30% faster analysis time and up to 90% classification accuracy #AI #NLP #DataScience #pyconde @PINSgov

Talk pydata-machine-learning-stats

Actionable Machine Learning in the Browser with PyScript

Valerio Maggio

Interactive ML apps in the browser with zero installation and no server needed? Come to my talk to know how..

Talk pydata-computer-vision

Advanced Visual Search Engine with Self-Supervised Learning (SSL) Representations and Milvus

Antoine Toubhans, Noé Achache

Building a Visual Search Engine with Milvus and comparing supervised and self-supervised approaches for images representations

Talk pydata-pydata-scientific-libraries-stack

Apache Arrow: connecting and accelerating dataframe libraries across the PyData ecosystem

Joris Van den Bossche

Connecting and accelerating dataframe libraries across the PyData ecosystem with Apache Arrow. Learn about the recent developments in Arrow and its adoption, and how it can improve your day-to-day data analytics workflows.

Talk pydata-data-handling

Apache StreamPipes for Pythonistas: IIoT data handling made easy!

Tim Bossenmaier, Sven Oehler

Data enthusiasts love to play with IIoT data. However, the technical challenges remain high (e.g., connect to devices). @StreamPipes makes this easy by providing a self-service toolbox. In this talk, we introduce a new python module to work with IIoT data in a pythonic way.

Talk pydata-natural-language-processing

Ask-A-Question: an FAQ-answering service for when there's little to no data

Suzin You

Doing data science in international development often means dealing with more resource-constraints. This talk will walk you through Ask-A-Question, a simple FAQ-answering service for when there's little to no data that we built for WhatsApp helplines for public health.

Talk pydata-machine-learning-stats

AutoGluon: AutoML for Tabular, Multimodal and Time Series Data

Caner Turkmen, Oleksandr Shchur

Learn about #AutoML and @AutoGluon, which can handle a range of tasks from regression to image classification and time series forecasting with state-of-the-art performance. #AutoML #datascience

Talk pydata-machine-learning-stats

Bayesian Marketing Science: Solving Marketing's 3 Biggest Problems

Dr. Thomas Wiecki

A Bayesian modeling toolkit to solve today's biggest marketing challenges.

Talk pydata-machine-learning-stats

BHAD: Explainable unsupervised anomaly detection using Bayesian histograms

Alexander Vosseler

We present a Bayesian histogram anomaly detector (BHAD). BHAD scales linearly with the size of the data and allows a direct explanation of individual anomaly scores due to its simple linear form

Talk pydata-natural-language-processing

Building a Personal Assistant With GPT and Haystack: How to Feed Facts to Large Language Models and Reduce Hallucination.

Mathis Lucka

Building a Personal Assistant With GPT and Haystack: How to Feed Facts to Large Language Models and Reduce Hallucination.

Talk pydata-data-handling

Common issues with Time Series data and how to solve them

Vadim Nelidov

Handling time series data is an important yet not an easy task. After this talk you will learn to identify, understand, and resolve time series issues such as divergence, delayed data, time series imputation and impact of outliers.

Tutorial pydata-natural-language-processing

Contributing to an open-source content library for NLP

Leonard Püttmann

Learn to build amazing open-source enrichments for natural language processing!

Tutorial pydata-jupyter

Create interactive Jupyter websites with JupyterLite

Jeremy Tuloup

Do you want to create your own interactive Jupyter website with JupyterLite? Check out this step-by-step tutorial and learn how to configure and customize your website 💡

Talk pydata-jupyter

Driving down the Memray lane - Profiling your data science work

Cheuk Ting Ho

You should profile your data science work. In this talk, we will introduce Mamray its new Jupyter plugin.

Talk pydata-machine-learning-stats

evosax: JAX-Based Evolution Strategies

Robert Lange

Tired of having to handle asynchronous processes for neuroevolution? Do you want to leverage high-throughput accelerators for evolution strategies (ES)? evosax allows you to leverage JAX, XLA compilation & auto-vectorization/parallelization to scale ES to accelerators.

Tutorial pydata-pydata-scientific-libraries-stack

Geospatial Data Processing with Python: A Comprehensive Tutorial

Martin Christen

Learn how to use Python to process geospatial data in this comprehensive tutorial! You'll gain hands-on experience with many Geo modules, learning how to read and write spatial data, perform coordinate system transformations, create interactive maps, and more.

Talk pydata-deep-learning

Getting started with JAX

Simon Pressler

Getting Started with JAX! Hands-on tips to overcome your first hurdles.

Talk pydata-machine-learning-stats

Grokking Anchors: Uncovering What a Machine-Learning Model Relies On

KIlian Kluge

What makes or breaks a machine-learning model's decision? Let's use anchor explanations to find out!

Talk pydata-natural-language-processing

Haystack for climate Q/A

Vibha Vikram Rao

Haystack for climate Q/A - How to build POCs quickly and take it to production

Talk pydata-deep-learning

Honey, I broke the PyTorch model >.< - Debugging custom PyTorch models in a structured manner

Clara Hoffmann

Honey, I broke the Pytorch model >.< No problem! In this talk, we'll build a toolbox to debug our models and prevent this from happening again -all by leveraging DL logic, synthetic data and pytest. Let's make our models unbreakable <3

Talk pydata-natural-language-processing

How Chatbots work – We need to talk!

Yuqiong Weng, Katrin Reininger

We need to talk - All about concepts, techniques as well as practical experience with the Rasa framework for building a chatbot

Talk pydata-natural-language-processing

How to baseline in NLP and where to go from there

Tobias Sterbak

Join us for a talk on baselines in NLP! We'll cover common tasks like classification, clustering, search, and NER, and discuss how to establish and improve baselines using weak learning. Don't miss out on this opportunity to gain a deeper understanding of NLP baselines!

Tutorial pydata-natural-language-processing

How to teach NLP to a newbie & get them started on their first project

Lisa Andreevna Chalaguine

Learn how to teach people to analyse textual data with the help of Python

Talk pydata-machine-learning-stats

Hyperparameter optimization for the impatient

Martin Wistuba

HPO does not need to be expensive, see how to speed it up with a couple of simple algorithms

Talk pydata-machine-learning-stats

Improving Machine Learning from Human Feedback

Erin Mikail Staples, Nikolai

While powerful, models built off large datasets like GPT-3 often bring their biases along with them. However, is this the best future for machine learning? Join us to explore Reinforcement Learning from Human Feedback (RLHF) techniques and why they matter more now than ever.

Talk pydata-natural-language-processing

Incorporating GPT-3 into practical NLP workflows

Ines Montani

Large language models like @OpenAI GPT-3 can complement existing machine learning workflows really well. You can get initial annotations from GPT-3, quickly fix them with an annotation tool like https://prodi.gy , and train a cheaper and better model.

Tutorial pydata-pydata-scientific-libraries-stack

Let's contribute to pandas (3 hours) #1

Noa Tamir, Patrick Hoefler

Join our beginner friendly, mentored contributing to @pandas_dev workshop at PyData Berlin! 🥳 #opensource #pandas

Tutorial pydata-pydata-scientific-libraries-stack

Let's contribute to pandas (3 hours) #2

Noa Tamir, Patrick Hoefler

Join our beginner friendly, mentored contributing to @pandas_dev workshop at PyData Berlin! 🥳 #opensource #pandas

Talk pydata-natural-language-processing

Methods for Text Style Transfer: Text Detoxification Case

Daryna Dementieva

How to detoxify texts? How to collect parallel corpus for text style transfer task? How to transfer the knowledge of a style between languages? We answer these questions in this talk.

Tutorial pydata-data-handling

Most of you don't need Spark. Large-scale data management on a budget with Python

Guillem Borrell

Most of you don't need Spark. Large-scale data management on a budget with Python

Talk pydata-data-handling

Neo4j graph databases for climate policy

Marcus Tedesco

Can Neo4j graph databases and Python help us understand climate policy? Find out!

Talk pydata-pydata-scientific-libraries-stack

Observability for Distributed Computing with Dask

Hendrik Makait

Debugging is hard. Distributed debugging is hell. Let’s dive into distributed logging, automated metrics, event-based monitoring, and root-causing problems with diagnostic tooling to understand how Dask helps you remain sane while identifying and solving your problems.

Talk pydata-pydata-scientific-libraries-stack

Pandas 2.0 and beyond

Joris Van den Bossche, Patrick Hoefler

Pandas has reached a 2.0 milestone in 2023. But what does that mean? And what is coming after 2.0? This talk will give an overview of what happened in the latest releases of pandas and highlight some topics and major new features the pandas project is working on.

Talk pydata-machine-learning-stats

Performing Root Cause Analysis with DoWhy, a Causal Machine-Learning Library

Patrick Blöbaum

Learn how to use the Python DoWhy library to perform root cause analysis using methods of causal machine-learning.

Talk pydata-data-handling

Polars - make the switch to lightning-fast dataframes

Thomas Bierhance

Want to learn about a new Python library that can speed up your datascience and analytics work? Join us at the conference to hear about polars, a lightning-fast dataframe library based on Apache Arrow and written in Rust!

Talk pydata-data-handling

Postmodern Architecture: The Python Powered Modern Data Stack

John Sandall

Learn how to upgrade your pandas pipelines powering DAG workflows to a Python Powered Modern Data Stack, demystify the jargon from ETL to ELT, and see how tools like dbt can integrate with Python to change how data pipelines are built and maintained.

Talk pydata-data-handling

Pragmatic ways of using Rust in your data project

Christopher Prohm

Pragmatic ways of using Rust in your data project - strategies to speed up your data pipelines without rewriting the whole program.

Talk pydata-natural-language-processing

Prompt Engineering 101: Beginner intro to LangChain, the shovel of our ChatGPT gold rush."

Lev Konstantinovskiy

A modern AI start-up is a front-end developer plus a prompt engineer" is a popular joke on Twitter. This talk is about LangChain, a Python open-source tool for prompt engineering.

Talk pydata-data-handling

Raised by Pandas, striving for more: An opinionated introduction to Polars

Nico Kreiling

Have you also been raised with #pandas for all kinds of data transformations and wonder, if there is more? I did, I searched for performance and more concise syntax, and I would like to introduce you to #polars

Talk pydata-pydata-scientific-libraries-stack

Shrinking gigabyte sized scikit-learn models for deployment

Pavel Zwerschke, Yasin Tatar

Shrinking gigabyte sized scikit-learn models for deployment: this talk shows how to deploy machine learning models with up to 6x disk space improvement

Talk pydata-deep-learning

Teaching Neural Networks a Sense of Geometry

Jens Agerberg

By taking neural networks back to the school bench and teaching them some elements of geometry and topology we can build algorithms that can reason about the shape of data. This is the promise of the emerging field of Topological Data Analysis (TDA) which we will introduce!

Tutorial pydata-natural-language-processing

The Battle of Giants: Causality vs NLP => From Theory to Practice

Aleksander Molak

Join us for a workshop on the latest advances in Causal NLP to see the Causal Transformer in action! All in Python! ❤️

Talk pydata-pydata-scientific-libraries-stack

The Beauty of Zarr

Sanket Verma

Hi all, I’ll be talking about Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays, along with a hands-on session. If you work with huge datasets in local/cloud storage and looking for an efficient format, please attend my talk. Thanks!

Talk pydata-visualisation

The bumps in the road: A retrospective on my data visualisation mistakes

Artem Kislovskiy

Join us for a talk: The bumps in the road: A retrospective on my data visualisation mistakes, on data visualisation and how it's essential for conveying insights from data. We'll discuss best practices with Matplotlib, the limitations of static visualisations, and how CI can stre

Talk pydata-jupyter

The future of the Jupyter Notebook interface

Jeremy Tuloup

Jupyter Notebook 7 is the new version of the popular document-oriented notebook interface. It comes packed with a lot of new features, and its future looks bright!

Talk pydata-pydata-scientific-libraries-stack

Unlocking Information - Creating Synthetic Data for Open Access.

Antonia Scherz

A lot of data is private but this talk is not - learn how to synthesize anonymized, reliable data from sensitive, private data.

Talk pydata-natural-language-processing

Using transformers – a drama in 512 tokens

Marianne Stecklina

Nearly all pretrained transformers have an annoying limitation: they can only process short input sequences. Watch me rant about it ;-)

Talk pydata-computer-vision

Visualizing your computer vision data is not a luxury, it's a necessity: without it, your models are blind and so do you.

Chazareix Arnault

Visualizing your #ComputerVision data is not a luxury, it's a necessity: without it, your models are blind and so do you! Learn how to elevate your projects and #datasets with #DatasetVisualization.

Talk pydata-data-handling

WALD: A Modern & Sustainable Analytics Stack

Florian Wilhelm

WALD: A modern & sustainable analytics stack consisting of a warehouse like Snowflake or BigQuery, Airbyte, Lightdash and dbt.

Talk pydata-machine-learning-stats

When A/B testing isn’t an option: an introduction to quasi-experimental methods

Inga Janczuk

Have you ever wanted to know the causal effect of an action but A/B testing wasn’t an option? Here’s a brief helicopter tour over quasi-experimental methods that can be used instead!

Talk pydata-deep-learning

Why GPU Clusters Don't Need to Go Brrr? Leverage Compound Sparsity to Achieve the Fastest Inference Performance on CPUs

Damian Bogunowicz

Fun fact: you can remove 90% of a neural network's weights without losing much accuracy! With model sparsity, you can even run these networks on your CPU with GPU-level performance. Learn about compound sparsity (pruning, quantization, knowledge distillation) for faster inference

Talk pydata-natural-language-processing

You are what you read: Building a personal internet front-page with spaCy and Prodigy

Victoria Slocum

The internet can be overwhelming, so I made a tool to create a personalized summary of it! Through building this internet front-page project, I've learned how the design concepts of tools like spaCy and Prodigy can facilitate the development of both complex and simple software.

Talk pydata-pydata-scientific-libraries-stack

You've got trust issues, we've got solutions: Differential Privacy

Vikram Waradpande, Sarthika Dhawan

What if I tell you I could answer everything about you without knowing you using Differential Privacy

Talk pydata-natural-language-processing

“Who is an NLP expert?” - Lessons Learned from building an in-house QA-system

Nico Kreiling, Alina Bickel

Imagine to have somethingn like ChatGPT for your worklife! Or at least a bot you could ask about all your internal documents? We tried to build something like that @scieneers and will tell you about our journey #haystack #weaviate

Filter