SmolML: Machine Learning from Scratch, Made Clear!
Introduction
SmolML is a pure Python machine learning library built entirely from the ground up for educational purposes. It aims to provide a transparent, understandable, and educational implementation of core machine learning concepts. Unlike powerful libraries like Scikit-learn, PyTorch, or TensorFlow, SmolML is built using only pure Python and its basic collections
, random
, and math
modules. No NumPy, no SciPy, no C++ extensions – just Python, all the way down. The goal isn’t to compete with production-grade libraries on speed or features, but to help users understand how ML really works.
Core Components
Automatic Differentiation & N-Dimensional Arrays
The foundation of SmolML includes custom arrays and an autograd engine. The automatic differentiation (Value
) is a simple autograd engine that tracks operations and computes gradients automatically, which is the heart of training neural networks. The N-dimensional arrays (MLArray
) are inspired by NumPy, supporting common mathematical operations needed for ML. Although it’s extremely inefficient due to being written in Python, it’s ideal for understanding N-Dimensional Arrays.
Preprocessing Tools
SmolML provides essential preprocessing tools, including scalers like StandardScaler
and MinMaxScaler
, which are fundamental for preparing data. Algorithms tend to perform better when features are on a similar scale, and these tools can help achieve that.
Build Your Own Neural Networks
SmolML allows you to build your own neural networks with various components:
Activation Functions
It offers non-linearities like relu
, sigmoid
, softmax
, and tanh
that allow networks to learn complex patterns.
Weight Initializers
Smart strategies like Xavier
and He
are provided to set initial network weights for stable training.
Loss Functions
You can use loss functions such as mse_loss
, binary_cross_entropy
, and categorical_cross_entropy
to measure model error.
Optimizers
Algorithms like SGD
, Adam
, and AdaGrad
are available to update model weights based on gradients to minimize loss.
Classic ML Models
SmolML also includes implementations of classic ML models:
Regression
It provides implementations of Linear
and Polynomial
regression for predicting continuous values.
Tree-Based Models
You can use Decision Tree
and Random Forest
implementations for classification and regression tasks.
K-Means Clustering
The KMeans
clustering algorithm is available for grouping similar data points together.
Who is SmolML For?
SmolML is suitable for students learning ML concepts for the first time, developers curious about the internals of ML libraries they use daily, and educators looking for a simple, transparent codebase to demonstrate ML principles. It’s also for anyone who enjoys learning by building!
Limitations
It’s important to note that SmolML is built for learning, not for breaking speed records or handling massive datasets. Being pure Python, it’s much slower than libraries using optimized C/C++/Fortran backends. It’s best suited for small datasets and toy problems where understanding the mechanics is more important than computation time. SmolML is not recommended for production applications; instead, use battle-tested libraries like Scikit-learn, PyTorch, TensorFlow, JAX, etc., for real-world tasks.
Getting Started with SmolML
You can start using SmolML by cloning the repository and exploring the code and examples:
git clone https://github.com/rodmarkun/SmolML
cd SmolML
You can also run the tests in the tests/
folder. Just install the requirements.txt
to compare SmolML against other standard libraries like TensorFlow, sklearn, etc., and generate plots with matplotlib:
cd tests
pip install -r requirements
Contributing to SmolML
Contributions to SmolML are always welcome. If you’re interested in contributing, you can fork the repository and create a new branch for your changes. Once you’re done, submit a pull request to merge your changes into the main branch.
Supporting SmolML
If you find SmolML useful and want to support it, you can star the project on GitHub, donate to the Ko-fi page, or share the project with your friends.
By learning and using SmolML, you can gain a deeper understanding of machine learning core principles and build a solid foundation for further development in the field of machine learning.