Large Language Model Plagiarism Detection: A Deep Dive into MDIR Technology

Introduction

The rapid advancement of Large Language Models (LLMs) has brought intellectual property (IP) concerns to the forefront. Developers may copy model weights without authorization, disguising originality through fine-tuning or continued pretraining. Such practices not only violate IP rights but also risk legal repercussions.

This article explores Matrix-Driven Instant Review (MDIR), a novel technique for detecting LLM plagiarism through mathematical weight analysis. All content derives from the research paper “Matrix-Driven Instant Review: Confident Detection and Reconstruction of LLM Plagiarism on PC”.


Why Do We Need New Detection Methods?

Limitations of Existing Approaches

Traditional detection methods fall into two categories but suffer critical shortcomings:

Method Type Key Issues
Retrieval-based Requires vendor-specific keys/prompts; impractical without training data access.
Representation-based Only identifies similarity; lacks statistical significance metrics (e.g., p-values).

MDIR’s Innovations

MDIR leverages matrix analysis and probability theory to:


  • Directly compute weight similarity without vendor data

  • Provide rigorous statistical validation

Core Principles of MDIR

1. Matrix Decomposition Techniques

Singular Value Decomposition (SVD)

Decomposes weight matrices into three components:

A = U * S * V^T

  • U, V: Orthogonal matrices (rotation/reflection)

  • S: Diagonal matrix (contains singular values)

Polar Decomposition

Expresses matrices as the product of symmetric positive-definite and orthogonal matrices:

A = P * W  or  A = W * Q

  • P, Q: Symmetric positive-definite matrices (scaling)

  • W: Orthogonal matrix (rotation/reflection)

2. Key Mathematical Tools

Tool Purpose
Large Deviation Theory Analyzes extreme event probabilities in random matrices; estimates p-values.
Random Matrix Theory Studies statistical distribution patterns in matrix elements.

MDIR Workflow Explained

Step 1: Embedding Layer Analysis

Objective: Initial similarity assessment through vocabulary embeddings.

Process:

  1. Extract Embedding Matrices


    • Model A: E ∈ ℝ^(Vocabulary Size×Embedding Dimension)

    • Model B: E’ ∈ ℝ^(Vocabulary Size×Embedding Dimension)
  2. Identify Shared Vocabulary


    • Collect overlapping tokens (e.g., ASCII characters, common English words).
  3. Compute Orthogonal Transformation Matrix


    • Apply polar decomposition: U = Ortho(E[Shared Tokens]^T * E'[Shared Tokens]).
  4. Validate Permutation Matrix


    • Find permutation matrix P that maximizes Tr(PU^T)*, revealing vocabulary mapping.

Example:

High similarity between embedding layers may produce patterns like this heatmap (figure).


Step 2: Attention Module Analysis

Objective: Verify if attention mechanism parameters originate from the same architecture.

Key Formula:

Q' ≈ U * Q * W_Q  
K' ≈ U * K * W_K  
V' ≈ U * V * W_V  
O' ≈ W_O^{-1} * O * U^{-1}

  • Q, K, V, O: Query/Key/Value/Output matrices of Model A

  • Q’, K’, V’, O’: Corresponding matrices of Model B

  • W_Q, W_K, W_V, W_O: Inner transformation matrices

Detection Method:

  1. Layer-wise Transformation Calculation


    • Compute orthogonal matrices W_Q, W_K, W_V for each layer’s attention parameters.
  2. Statistical Significance Check


    • Use Large Deviation Theory to estimate p-values. A p < 2×10^-23 (10σ standard) indicates plagiarism.

Step 3: MLP Module Analysis

Objective: Examine Multi-Layer Perceptron (MLP) parameter similarity.

Key Formula:

U_X = Ortho(X^T * U^T * X')  
P = argmax_{P∈Permutation Group} Tr(P * U_Up^T)

  • X ∈ {Gate, Up, Down}: MLP gate/upper projection/lower projection matrices

  • U_Up: Orthogonal component of upper projection matrix

Case Studies & Experimental Results

Case 1: Official Fine-tuned Models

Model Pairs:


  • Qwen2.5-0.5B vs Qwen2.5-0.5B-Instruct

  • Meta-Llama-3.1-8B vs Meta-Llama-3.1-8B-Instruct

Results:


  • Embedding similarity p-values extremely low (10^-171,931), confirming shared origin.

Case 2: Continued Pretraining Models

Model Pairs:


  • Qwen2-7B vs Qwen2.5-7B

  • Llama-3-8B vs Llama-3.1-8B-Instruct

Results:


  • Attention modules show significant similarity with p-values up to 10^-1,384,545.

Case 3: Architectural Divergence Verification

Model Pairs:


  • Meta-Llama-3.1-8B vs Qwen3-8B-Base

  • DeepSeek-V3-Base vs Kimi-K2-Instruct

Results:


  • No statistically significant p-values, correctly identifying unrelated models.

Frequently Asked Questions (FAQ)

Q1: What types of plagiarism can MDIR detect?

A: Detects fine-tuning, continued pretraining, pruning, architectural transformations, and obfuscation.

Q2: What computational resources are needed?

A: Runs on a standard PC without GPU, enabling rapid verification.

Q3: Does it support models with different tokenizers?

A: Yes! Similarity is calculated using overlapping token subsets.

Q4: How to interpret p-value significance?

A: Use 10σ standard (p < 2×10^-23) for minimal false positives.


Technical Limitations

  1. Numerical Precision Issues


    • Matrix decomposition errors may occur, especially with low-precision formats (fp16/bf16).
  2. Extreme p-value Interpretation


    • Billions of parameters lead to extremely small p-values. Actual significance may be lower due to computational precision limits.

Future Research Directions

Direction Description
Evasion Techniques Explore methods like high learning rates to bypass detection.
Semi-Orthogonal p-values Improve statistical inference for non-square matrices.

Conclusion

MDIR provides a mathematically rigorous framework for efficient LLM plagiarism detection. As models grow larger, such technologies become crucial for maintaining AI ecosystem integrity.