Improving Machine Learning from Human Feedback Erin Mikail Staples Nikolai PyConDE & PyDataBerlin 2023 conference

Tuesday 10:30 in B07-B08

Type/Track Talk pydata-machine-learning-stats

Large generative models rely upon massive data sets that are collected automatically. For example, GPT-3 was trained with data from “Common Crawl” and “Web Text”, among other sources. As the saying goes — bigger isn’t always better. While powerful, these data sets (and the models that they create) often come at a cost, bringing their “internet-scale biases” along with their “internet-trained models.” While powerful, these models beg the question — is unsupervised learning the best future for machine learning?

ML researchers have developed new model-tuning techniques to address the known biases within existing models and improve their performance (as measured by response preference, truthfulness, toxicity, and result generalization). All of this at a fraction of the initial training cost. In this talk, we will explore these techniques, known as Reinforcement Learning from Human Feedback (RLHF), and how open-source machine learning tools like PyTorch and Label Studio can be used to tune off-the-shelf models using direct human feedback.

Level Domain Expertise Intermediate Python Skill Level Novice

Erin Mikail Staples

Affiliation: Heartex

Erin Mikail Staples is a very online individual passionate about facilitating better connections online and off. She’s forever thinking about how we can communicate, educate and elevate others through collaborative experiences.

Currently, Erin is a Senior Developer Community Advocate at Label Studio. At Label Studio — she empowers the open-source community through education and advocacy efforts. Outside of her day job, Erin is a comedian, graduate technical advisor, content creator, triathlete, avid reader, and dog parent.

Most importantly, she believes in the power of being unabashedly "into things" and works to help friends, strangers, colleagues, community builders, students, and whoever else might cross her path find their thing.

visit the speaker at: Github • Homepage

Nikolai

Affiliation: Heartex

As CTO of Heartex / Label Studio, I specialize in machine learning, data-centric AI, and innovative data labeling techniques. My expertise spans weak supervision, zero-shot and few-shot learning, and reinforcement learning to drive cutting-edge AI solutions.

visit the speaker at: Github