Unlocking Information - Creating Synthetic Data for Open Access.

Wednesday 10:50 in Kuppelsaal

Type/Track Talk pydata-pydata-scientific-libraries-stack

Many good project ideas fail before they even start due to the sensitive personal data required. The good news: a synthetic version of this data does not need protection. Synthetic data copies the actual data's structure and statistical properties without recreating personally identifiable information. The bad news: It is difficult to create synthetic data for open-access use, without recreating the exact copy of actual data. This talk will give hands-on insights into synthetic data creation and challenges along its lifecycle. We will learn how to create and evaluate synthetic data for any use case using the open-source package Synthetic Data Vault. We will find answers to why it takes so long to synthesize the huge amount of data dormant in public administration. The talk addresses owners who want to create access to their private data as well as analysts looking to use synthetic data. After this session, listeners will know which steps to take to generate synthetic data for multi-purpose use and its limitations for real-world analyses.

Level Domain Expertise Intermediate Python Skill Level Intermediate

Antonia Scherz

Affiliation: PD-Berater der öffentlichen Hand

Antonia Scherz is machine learning engineer and consultant at PD - Berater der öffentlichen Hand in Berlin. At PD she builds proof of concept tools and assists in software development for machine learning applications in public administration. She is passionate about making machine leanring and open software tools widely used by public administration and fascinated by how new tools can be integrated in old structures for the public good.

visit the speaker at: Github