MiniCPM: A Breakthrough in Real-time Multimodal Interaction on End-side Devices Introduction In the rapidly evolving field of artificial intelligence, multimodal large models (MLLM) have become a key focus. These models can process various types of data, such as text, images, and audio, providing a more natural and enriched human-computer interaction experience. However, due to computational resource and performance limitations, most high-performance multimodal models have traditionally been confined to cloud-based operation, making it difficult for general users to utilize them directly on local devices like smartphones or tablets. The MiniCPM series of models, developed jointly by the Tsinghua University Natural Language …