MegaRAG: Build Multimodal RAG That Understands Charts & Slides Like a Human

3 hours ago 高效码农

MegaRAG: Teaching RAG to Read Diagrams, Charts, and Slide Layouts Like a Human “ What makes MegaRAG different? It treats every page as a mini-multimodal graph—text, figures, tables, and even the page screenshot itself become nodes. A two-pass large-language-model pipeline first extracts entities in parallel, then refines cross-modal edges using a global subgraph. The final answer is produced in two stages to prevent modality bias. On four public benchmarks the system outperforms GraphRAG and LightRAG by up to 45 percentage points while running on a single RTX-3090. § The Core Question This Article Answers “How can I build a retrieval-augmented-generation …