Step-by-Step Guide to Fine-Tuning Your Own LLM on Windows 10 Using CPU Only with LLaMA-Factory

SINAPSA Infocomplex


Introduction

Large Language Models (LLMs) have revolutionized AI applications, but accessing GPU resources for fine-tuning remains a barrier for many developers. This guide provides a detailed walkthrough for fine-tuning LLMs using only a CPU on Windows 10 with LLaMA-Factory 0.9.2. Whether you’re customizing models for niche tasks or experimenting with lightweight AI solutions, this tutorial ensures accessibility without compromising technical rigor.


Prerequisites and Setup

1. Install Python 3.12.9

Download the latest Python 3.12.9 installer from the official website. After installation, clear Python’s cache (optional):

pip cache purge

2. Create a Project Directory

Organize your workspace on a drive (e.g., Drive D:):

D:
mkdir lafa

3. Clone and Install LLaMA-Factory

Use Git to clone the repository and install dependencies:

cd D:\lafa
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

Dataset Preparation and Configuration

4. Set Up Custom Datasets

Create a datasets_mymodels folder in D:\lafa and include these mandatory files (available in LLaMA-Factory’s data folder):

  • identity.json
  • dataset_info.json
  • c4_demo.jsonl

5. Configure Default Paths

Edit LLaMA-Factory\src\llamafactory\webui\common.py to define your directories:

DEFAULT_DATA_DIR = "D:/lafa/datasets_mymodels"  # Dataset storage
DEFAULT_SAVE_DIR = "D:/lafa/lafa_llms_created"  # Model output

6. Install Dependencies

Navigate to the LLaMA-Factory root folder and run:

pip install -r requirements.txt

CPU-Specific Adjustments

7. Install CPU-Compatible PyTorch

Replace GPU-dependent PyTorch with the CPU version:

pip uninstall -y torch torchvision torchaudio
pip install torch==2.2.2+cpu torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu

8. Modify Training Scripts

Edit LLaMA-Factory\src\train.py to enforce CPU usage by adding this line before def main()::

device = torch.device('cpu')

9. Adjust WebUI Launch Parameters

In LLaMA-Factory\src\llamafactory\webui\runner.py, update the Popen command to include shell=True:

self.trainer = Popen(["llamafactory-cli", "train", save_cmd(args)], env=env, shell=True)

Launching Training and Monitoring

10. Start the WebUI Interface

From LLaMA-Factory\src, run:

webui --device cpu

Ignore the “CUDA environment not detected” warning. Monitor real-time progress in the command prompt.

11. Troubleshooting Common Issues

  • Training Fails to Start: Verify llamafactory-cli installation. Re-clone the repository if needed.
  • Path Errors: Ensure slashes in common.py use / format (e.g., D:/lafa).

Model Export and Format Conversion

12. Export as Safetensors

Trained models are saved to LLaMA-Factory\src\saves\Custom\lora\ by default. To customize the output path, update DEFAULT_SAVE_DIR in common.py.

13. Merge Adapter with Base Model

Copy a configuration file (e.g., qwen2vl_lora.sft.yaml from LLaMA-Factory\examples\train_lora) to your model folder. Modify these entries:

model_name_or_path: "Path_to_HuggingFace_Base_Model"  
adapter_name_or_path: "Path_to_Fine-Tuned_Adapter"  
export_dir: "Output_Path_for_Merged_Model"  

Run the export command:

llamafactory-cli export [your_config_file.yaml]

14. Convert to GGUF Format

Install llama.cpp and execute:

python llama.cpp/convert_hf_to_gguf.py [input_model_path] --outfile [output.gguf] --outtype q8_0

Note: If conversion fails, validate the config.json file for architecture compatibility.


Model Testing and Deployment

15. Load GGUF into LM Studio

Copy the GGUF file to LM Studio’s model directory (e.g., D:\llm_for_lmstudio\lmstudio_models). The model will appear under “My Models” upon relaunching the software.

16. Validate Model Performance

Test domain-specific knowledge by querying the model. For example, if fine-tuned for medical QA, compare responses before and after training.


Conclusion

This guide demonstrates that CPU-based LLM fine-tuning is not only feasible but also practical for resource-constrained environments. Key takeaways:

  1. Precision in Configuration: Path formatting and dependency versions are critical.
  2. Iterative Validation: Test workflows with small datasets before scaling.
  3. Future-Proofing: Monitor updates to LLaMA-Factory and llama.cpp for efficiency improvements.