Baby Head Image Segmentation: Building a High-Precision Medical Imaging Tool from Scratch

Where medical imaging technology meets artificial intelligence to revolutionize neonatal health monitoring

In neonatal care and pediatric medicine, accurately measuring head development indicators is critical. Traditional manual measurement methods are not only time-consuming but also prone to subjective errors. This article details how to build a high-precision baby head image segmentation system using deep learning technology, enabling medical professionals to automatically obtain precise head contour data.

Why Baby Head Image Segmentation Matters

Head circumference is a crucial indicator for assessing infant growth and development. Conventional measurement requires nurses to use measuring tapes manually, which becomes challenging when babies are restless. With image segmentation technology, a single baby photo enables the system to:

  1. Automatically identify the baby’s head region
  2. Precisely outline the head contour
  3. Calculate key parameters like head circumference and anterior-posterior diameter
  4. Track development trends by comparing changes over time

Medical research shows that abnormal head circumference may indicate serious conditions like hydrocephalus or microcephaly. Automated measurement tools help doctors detect issues earlier, enabling timely intervention.


Core Project Capabilities Overview

Feature Category Specific Capabilities Practical Value
Model Accuracy Supports UNet, UNet++, DeepLabV3+ architectures >95% segmentation accuracy
Ease of Use One-command training, simple configuration No deep learning expertise required
Performance Evaluation Dice coefficient, IoU metrics Comprehensive model assessment
Deployment Support ONNX export, model quantization Hardware platform flexibility
Interactive Demo Web-based real-time testing Quick clinical validation

10-Minute Quick Start Guide

Step 1: Environment Setup

# Clone project repository
git clone https://github.com/your-username/baby-head-seg.git
cd baby-head-seg

# Install dependencies (recommended using make)
make setup

Step 2: Prepare Training Data

Create this directory structure:

data/
├── source/          # Original images
│   ├── baby1.jpg    # Baby head photo
│   └── baby1.json   # Corresponding annotation file
└── masks/           # Auto-generated masks directory

Annotation Requirements:

  • Use free tool https://github.com/wkentaro/labelme
  • Label category must be named “head”
  • Each image requires a corresponding JSON annotation file

Step 3: Start Model Training

# Run demo mode (recommended for first-time users)
make demo

# Full model training
make train

# Custom parameter training
python src/train.py --config config/train_config.yaml

Step 4: Use Trained Models

# Single image prediction
python src/inference.py --model outputs/model_best.pth --image test.jpg

# Launch web demo (access at http://localhost:8000)
cd web && python -m http.server 8000

Project Architecture Deep Dive

baby-head-seg/
├── src/                    # Core source code
│   ├── dataset.py          # Data loader
│   ├── model.py            # Model definition
│   ├── train.py            # Training workflow
│   └── inference.py        # Prediction module
├── config/                 # Configuration files
│   ├── train_config.yaml   # Main training config
│   └── lightweight.yaml    # Lightweight config
├── scripts/                # Data processing scripts
│   ├── generate_masks.py   # Mask generation
│   └── preprocess.py       # Data preprocessing
├── web/                    # Web demo
│   ├── index.html          # Frontend interface
│   └── app.js              # Interactive logic
└── Makefile                # Automation commands

Key design principles:

  • Modular architecture: Independent components for easy maintenance
  • Configuration-driven: All parameters managed via YAML files
  • Automated workflows: Makefile encapsulates common operations
  • End-to-end design: Complete coverage from data to deployment

Model Configuration Explained

config/train_config.yaml is the project’s core configuration file:

# Model architecture configuration
model:
  architecture: "UNet"          # Options: UNet/UNet++/DeepLabV3+/FPN/PSPNet
  encoder_name: "mobilenet_v2"  # Encoder: resnet34/efficientnet-b0 etc.
  image_size: [512, 512]        # Input size

# Training parameters
training:
  epochs: 100                   # Training iterations
  batch_size: 8                 # Batch size
  learning_rate: 0.0001         # Learning rate
  optimizer: "AdamW"            # Optimizer
  loss_function: "bce_dice"     # Loss function (BCE + Dice)

Configuration strategies:

  • Lightweight deployment: Choose MobileNetV2 + UNet combination
  • Accuracy focus: Use ResNet50 + UNet++ combination
  • Balanced approach: EfficientNet-B3 + FPN mid-range configuration

Performance Comparison of Popular Models

Model Architecture Encoder Dice Score IoU Model Size Inference Speed
UNet MobileNetV2 0.95+ 0.90+ 9MB 50+ FPS
UNet ResNet34 0.96+ 0.92+ 25MB 30+ FPS
UNet++ ResNet34 0.97+ 0.93+ 35MB 25+ FPS
DeepLabV3+ ResNet50 0.96+ 0.92+ 45MB 20+ FPS

Performance optimization tips:

  • Limited hardware: Choose MobileNet series encoders
  • Accuracy priority: Use ResNet50/101 backbone
  • Speed sensitivity: Enable model quantization

Practical Deployment Strategies

ONNX Format Export

python convert_to_onnx.py --model outputs/model_best.pth

Model Quantization & Compression

# 8-bit integer quantization (4x size reduction)
python optimize_onnx_model.py model.onnx --quantize uint8

# ORT format conversion (improved inference speed)
python optimize_onnx_model.py model.onnx --ort --benchmark

Deployment scenario adaptations:

  • Mobile devices: Use uint8 quantized version
  • Hospital servers: FP16 precision optimal
  • Cloud APIs: Original ONNX model

Essential Command Reference

# Data preparation
make masks      # Generate mask files
make preprocess # Data preprocessing
make split      # Dataset splitting

# Model training
make train      # Complete training workflow
make demo       # Demonstration mode

# Model testing
make inference  # Batch prediction
make benchmark  # Performance testing

# Project management
make clean      # Clean output files
make status     # Show project status

Interactive Web Demo

Built-in real-time demonstration system:

  1. Navigate to web directory: cd web
  2. Start server: python -m http.server 8000
  3. Access via browser: http://localhost:8000
  4. Upload baby photo to instantly view segmentation results

Demo highlights:

  • Real-time segmentation rendering
  • Overlay contour display
  • Automatic key measurement calculation
  • Result export functionality

Technical Requirements

  • Python 3.8+
  • PyTorch 1.9+
  • OpenCV for image processing
  • NumPy for scientific computing
  • Segmentation Models library

Complete dependencies in requirements.txt


Contribution Guidelines

We welcome code contributions:

  1. Fork the project repository
  2. Create feature branch: git checkout -b feature/new-feature
  3. Commit code changes
  4. Open a Pull Request

Suggested contribution areas:

  • New model architectures
  • Data augmentation modules
  • Performance optimization solutions
  • Documentation improvements

License & Citation

License: GNU GPL v3.0 (LICENSE)

Academic citation format:

@misc{baby-head-seg,
  title={Baby Head Image Segmentation System},
  author={voyax},
  year={2025},
  url={https://github.com/voyax/baby-head-seg}
}

Acknowledgements

Special thanks to:

  • segmentation_models.pytorch team for base models
  • Labelme developers for annotation tools
  • PyTorch framework supporters
  • Open source community contributors

Frequently Asked Questions (FAQ)

Q1: How much data is needed to train a usable model?

A: Initial validation requires 50+ annotated samples. Production environments recommend 300+ diverse samples covering various lighting, angles, and hair styles.

Q2: Can this run on regular computers?

A: CPU inference is supported, but NVIDIA GPU acceleration is recommended. Quantized models can run on embedded devices like Raspberry Pi.

Q3: Is this suitable for clinical diagnosis?

A: This tool provides auxiliary measurement functionality. Diagnostic decisions should be made by medical professionals considering complete clinical data.

Q4: How does it handle occlusions?

A: The model handles minor occlusions (like monitoring patches) through data augmentation. Severely occluded images require re-capturing.

Q5: Does it support video stream processing?

A: The current version processes single frames. Batch processing scripts can analyze videos, but real-time video support requires additional development.

Q6: How to improve boundary precision?

A: Three optimization approaches: 1) Add boundary-focused samples 2) Use CRF post-processing 3) Implement boundary-sensitive loss functions