Wednesday 

Room 3 

10:20 - 11:20 

(UTC+02

Talk (60 min)

Structuring the Unstructured: Advanced Document Parsing for AI Workflows

Modern organizations generate vast amounts of data stored in diverse and often unstructured formats, such as PDFs, scanned documents, and proprietary file types.

AI
Big Data
Tools
GenAI

For engineers working with AI, the challenge isn’t simply extracting text—it’s preserving the structure, context, and relationships within the data. Whether fine-tuning models or building retrieval-augmented generation (RAG) pipelines, effective document processing is essential to powering actionable insights.

This session dives into the techniques and open source tools needed to transform unstructured documents into structured formats like JSON or Markdown, ready for AI workflows. You’ll learn how to handle challenges like multi-page tables, image-heavy layouts, and scanned documents using context-aware methods. Join this session as we explore how to efficiently bridge the gap between unstructured data and AI-powered applications, and help you achieve better results in your AI projects.

Legare Kerrison

Legare Kerrison is an Open Source Engineer and Developer Advocate on Red Hat's AI team. She focuses on open source tools for building and deploying AI. She aims to make technical complexity digestible. She’s a tree hugger and avid matcha fan. Based in Boston.

Cedric Clyburn

Cedric Clyburn (@cedricclyburn), Senior Developer Advocate at Red Hat, is an enthusiastic software technologist with a background in Kubernetes, DevOps, and container tools. He has experience speaking and organizing conferences including Devoxx, WeAreDevelopers, The Linux Foundation, KCD NYC, and more. Cedric loves all things open-source, and works to make developer's lives easier! Based out of New York.