Deep Learning Modelsarchive

DeepSeek MODEL1 Breakdown: How Infinite Memory AI Will Revolutionize Long-Context Processing

2 months ago 高效码农

DeepSeek MODEL1 Revealed: FlashMLA Code Updates Hint at Next-Gen AI Model—How Will “Infinite Memory” Transform the Way We Use AI? Summary DeepSeek updated 114 files in its FlashMLA GitHub repository, with 28 references to a new MODEL1 model developed in parallel with the existing V3.2 series. MODEL1 introduces optimizations in KV cache layout, sparse attention mechanisms, and FP8 decoding, potentially incorporating Engram conditional memory technology for breakthrough long-context processing capabilities, expected to debut in the V4 flagship model launching mid-February. What Exactly Did DeepSeek Update on GitHub? In January 2025, coinciding with the one-year anniversary of DeepSeek-R1’s release, the DeepSeek …

Crisp Text-to-Image Generation: How Ovis-Image 7B Delivers 20B-Level Performance on One GPU

3 months ago 高效码农

Ovis-Image: A 7-Billion-Parameter Text-to-Image Model That Punches at 20-Billion Scale—While Running on One GPU “ What makes a compact 7 B model able to render crisp, bilingual, layout-heavy text previously dominated by 20 B+ giants, and how can you deploy it today? TL;DR (the 30-second take) Architecture: 2 B multimodal Ovis 2.5 encoder frozen for alignment, 7 B MMDiT diffusion decoder trained from scratch, FLUX.1-schnell VAE stays frozen—10 B total, <24 GB VRAM. Training: four-stage pipeline (pre-train → instruction fine-tune → DPO preference → GRPO text-specialist) steadily improves word accuracy from 87 % → 92 %. Benchmarks: leads CVTG-2K English …

VoxCPM: Revolutionizing Text-to-Speech with Tokenizer-Free AI Technology

6 months ago 高效码农

Author / Team / Institution Authors: Yixuan Zhou, Guoyang Zeng, Xin Liu, Xiang Li, Renjie Yu, Ziyang Wang, Runchuan Ye, Weiyue Sun, Jiancheng Gui, Kehan Li, Zhiyong Wu, Zhiyong Liu. Team/Institution: Developed by ModelBest and THUHCSI, under the OpenBMB project. Role: Researchers and developers in text-to-speech systems. Authority Backing: The model is open-sourced under Apache-2.0 license, with acknowledgments to foundational works like DiTAR, MiniCPM-4, CosyVoice, and DAC. No external peer reviews or third-party reports are provided in the input files. Abstract VoxCPM represents a shift in text-to-speech (TTS) technology by eliminating discrete tokenization and operating directly in continuous speech space. …