MatTools: The Definitive Benchmark for Evaluating LLMs in Materials Science Tools

13 hours ago 高效码农

MatTools: A Comprehensive Benchmark for Evaluating LLMs in Materials Science Tool Usage Figure 1: Computational tools in materials science (Image source: Unsplash) 1. Core Architecture and Design Principles 1.1 System Overview MatTools (Materials Tools Benchmark) is a cutting-edge framework designed to evaluate the capabilities of Large Language Models (LLMs) in handling materials science computational tools. The system introduces a dual-aspect evaluation paradigm: QA Benchmark: 69,225 question-answer pairs (34,621 code-related + 34,604 documentation-related) Real-World Tool Usage Benchmark: 49 practical materials science problems (138 verification tasks) Key technical innovations include: Version-locked dependencies (pymatgen 2024.8.9 + pymatgen-analysis-defects 2024.7.19) Containerized validation environment (Docker image: …