distributed systemsarchive - Efficient Coder

Designing a 100M User Short Video System: TikTok-Scale Architecture Secrets

1 months ago 高效码农

How to Design a Short Video Streaming System for 100 Million Users? Decoding High-Concurrency Architecture Through TikTok-Style Feeds Video Streaming Architecture Diagram I. Why Rethink Video Streaming Architecture? With modern users spending over 2 hours daily on short videos, a system serving 100 million users must handle: 100,000+ video requests per second Tens of thousands of interactions (likes/comments/shares) per second Petabyte-scale video data transmission simultaneously Traditional content delivery systems face three core challenges: Instant Response: Generate personalized recommendations within 500ms Seamless Experience: Zero latency during swipe transitions Dynamic Adaptation: Balance cold starts for new users with high-frequency access for active …

Mastering PyTorch Distributed Training: The Ultimate TorchTitan Guide for LLMs

1 months ago 高效码农

TorchTitan: A Comprehensive Guide to PyTorch-Native Distributed Training for Generative AI Figure 1: Distributed Training Visualization (Image source: Unsplash) Introduction to TorchTitan: Revolutionizing LLM Pretraining TorchTitan is PyTorch’s official framework for large-scale generative AI model training, designed to simplify distributed training workflows while maximizing hardware utilization. As the demand for training billion-parameter models like Llama 3.1 and FLUX diffusion models grows, TorchTitan provides a native solution that integrates cutting-edge parallelism strategies and optimization techniques. Key Features at a Glance: Multi-dimensional parallelism (FSDP2, Tensor Parallel, Pipeline Parallel) Support for million-token context lengths via Context Parallel Float8 precision training with dynamic scaling …

NSQite Message Queue: Simplifying Event-Driven Architecture in Go with SQLite Backend

2 months ago 高效码农

What is NSQite: A Lightweight Message Queue Solution in Go In today’s world of software development, message queues play a vital role in building robust and scalable applications. They help decouple services, improve system resilience, and enable asynchronous communication between components. While large-scale distributed message queue systems like NSQ, NATs, and Pulsar are popular, they might be overkill for early-stage projects. This is where NSQite comes into play. As a lightweight message queue implemented in Go, NSQite supports SQLite, PostgreSQL, and ORM for persistent storage, offering a simple yet reliable solution for basic message queue needs. Advantages of NSQite Simplicity …