CATransformers: A Framework for Carbon-Aware AI Through Model-Hardware Co-Optimization

Introduction: Addressing AI’s Carbon Footprint Challenge

The rapid advancement of artificial intelligence has come with significant computational costs. Studies reveal that training a large language model can generate carbon emissions equivalent to five cars’ lifetime emissions. In this context, balancing model performance with sustainability goals has become a critical challenge for both academia and industry.

Developed by Meta’s research team, CATransformers emerges as a groundbreaking solution—a carbon-aware neural network and hardware co-optimization framework. By simultaneously optimizing model architectures and hardware configurations, it significantly reduces AI systems’ environmental impact while maintaining accuracy. This article provides a comprehensive guide to its core functionalities and practical implementation.


Core Capabilities of CATransformers

1. Multi-Objective Optimization Modes

The framework supports five optimization strategies:

  • Carbon-First: Maximizes accuracy while minimizing total carbon emissions (with latency constraints)
  • Latency-First: Directly optimizes inference speed and model accuracy
  • Energy-First: Focuses on reducing operational carbon emissions
  • Full-Spectrum Optimization: Balances accuracy, carbon footprint, and latency
  • Hardware-Specific Tuning: Optimizes hardware configurations for fixed model architectures

2. Supported Model Architectures

Currently compatible with cutting-edge AI models:

  • NLP Models: BERT, Llama 2/3
  • Multimodal Models: CLIP
  • Vision Models: ViT

Extension Requirements for New Models:

  • Availability in HuggingFace Transformers
  • Compatibility with Phaze hardware evaluation framework
  • Adherence to OpenCLIP standards (for CLIP variants)

Installation & Environment Setup Guide

1. Base Environment Configuration

git clone --recurse-submodules https://github.com/facebookresearch/CATransformers.git
conda env create -f env.yaml
conda activate env
./setup.sh

2. Critical Path Configuration

Add to ~/.bashrc:

export THIRD_PARTY_PATH=$(pwd)/phaze/third_party_for_phaze
export WHAM_PATH=$THIRD_PARTY_PATH/wham/
export SUNSTONE_PATH=$THIRD_PARTY_PATH/sunstone/
export ACT_PATH=$THIRD_PARTY_PATH/ACT/
export PYTHONPATH=$THIRD_PARTY_PATH:$WHAM_PATH:$SUNSTONE_PATH:$ACT_PATH:$PYTHONPATH

3. HuggingFace Authentication

huggingface-cli login

Quick Start Tutorial

Basic Optimization Command

python main.py --metric=<optimization_mode> --name=<experiment_name> <--hf>

Key Parameters:

  • --metric: Choose from carbon/latency/energy/all/all-hw
  • --hf: Mandatory for HuggingFace models (except CLIP)

Dataset Preparation Essentials

  • CLIP Models: Requires MSCOCO dataset formatted per OpenCLIP specifications in /dataset
  • Other Models: Automatic data preprocessing

Advanced Configuration Strategies

1. Customizing Search Spaces

Modify configuration files:

  • configurations.py: CLIP model parameters
  • configurations_hf.py: Parameters for other models

Sample Configuration:

MODEL_ARCH = "vit_base_patch16_224"  # Specify model architecture
TRIALS = 50                         # Optimization iterations
MAX_LATENCY = 100ms                 # Latency constraint threshold
CARBON_REGION = "europe-west4"      # Carbon intensity calculation region

2. Hardware Constraints Adjustment

  • Compute Limits: TOPS (Tera Operations Per Second)
  • Physical Size: Chip area constraints
  • Energy Efficiency: Performance-per-watt metrics

Post-Optimization Workflow

CLIP Model Special Handling

1. Post-Pruning Training

Use customized OpenCLIP for fine-tuning:

# Submit SLURM job
sbatch final_model_training/train_slurm.sh

2. Benchmark Testing

python final_model_training/benchmark_cli.py eval \
  --model ViT-B-32 \
  --pretrained datacomp_xl_s13b_b90k \
  --load-checkpoint pruned_model.pt \
  --vision-layers 10 \
  --vision-embed-dim 768 \
  --text-layers 6

Results Compilation:

clip_benchmark build benchmark_*.json --output summary.csv

General Model Processing

Reference eval/model_eval_hf.py for:

  • Automated accuracy validation
  • Latency measurement modules
  • Carbon emission estimators

Technical Architecture Overview

/ 
├── phaze/            # Hardware evaluation engine
├── optimization/     # Multi-objective optimization algorithms
├── open_clip_custom/ # Customized CLIP training
├── eval/             # Model evaluation modules
└── configurations*   # Optimization parameters

Academic Citation & Licensing

Research Paper Reference

@article{wang2025carbon,
  title={Carbon Aware Transformers Through Joint Model-Hardware Optimization},
  author={Wang, Irene and Ardalani, Newsha and Elhoushi, Mostafa and Jiang, Daniel and Hsia, Samuel and Sumbul, Ekin and Mahajan, Divya and Wu, Carole-Jean and Acun, Bilge},
  journal={arXiv preprint arXiv:2505.01386},
  year={2025}
}

Licensing Information

  • Core Framework: CC-BY-NC
  • Phaze Component: MIT License
  • OpenCLIP Adapter: MIT License

Practical Applications & Future Outlook

  1. Green Cloud Computing: Optimizing energy efficiency in AI data centers
  2. Edge Device Deployment: Balancing performance and power consumption
  3. Sustainable AI Research: Quantifying environmental costs of models
  4. Hardware Design Guidance: Informing custom accelerator development

CATransformers demonstrates practical results—achieving 30% model compression with 40% carbon reduction—providing actionable solutions for eco-friendly AI ecosystems. Its out-of-the-box functionality enables researchers to perform cross-layer optimizations without deep hardware expertise.

Implementation Tip: Start with --metric=latency to observe latency-accuracy Pareto frontiers. Advanced users can modify hardware search spaces to explore manufacturing process impacts on carbon footprints.