Home/Directory/Document AI & OCR

Document AI & OCR

Document parsing, layout understanding, and structured extraction (Reducto, LlamaParse, Unstructured).

All tools6 tools

Hyper Extract
yifanfeng97
Transform unstructured text into structured knowledge with LLMs. Graphs, hypergraphs, and spatio-temporal extractions — with one command.
PaddleOCR
PaddlePaddle
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
cad2data Revit IFC DWG DGN
datadrivenconstruction
Workflow for AI Agents enables automated conversion of CAD files (such as `.rvt`, `.ifc`, `.dwg`) using command-line converters on a local Windows machine
langextract
Google DeepMind
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
opendataloader-pdf
opendataloader-project
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
webclaw
0xMassi
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.