Structured Data Extraction with Local Vision AI
Event Details
Presented by: Andrej Baranovskij
This session focuses on data extraction solutions using local AI models. Various local open-source AI models (Mistral, Qwen, dots-ocr), suitable for structured data extraction tasks, will be discussed and compared. The main focus of this session will be the technical architecture implemented with Python, designed to work efficiently with AI models without cloud API dependency. This will be a practical session based on a solution available on GitHub - https://github.com/katanaml/sparrow/, with 5K+ stars. Attendees will gain valuable knowledge based on the speaker's practical experience.
Why is this session relevant for ODTUG attendees? Structured data extraction from various documents is a common use case in enterprise software. For data privacy reasons, enterprises often prefer local processing, making the described solution particularly useful. Attendees will also learn about a modern, open-source AI stack and Python integration with Oracle DB.
