A framework for fine-tuning local LLMs to generate, debug, and optimize code solutions through automated iterative improvement.
CodeEvolveLLM is a framework that uses a local LLM—specifically the Qwen2.5-coder 7B model—to generate, debug, and improve code through a reinforcement learning (RL) loop. The goal is to train the LLM not only to produce code but also to refine its own outputs by learning from its mistakes, especially when solving text-based coding problems (e.g., data structures and algorithms challenges).
Many current LLMs can write code, but they often need help to fix errors or optimize their output. With CodeEvolveLLM, we aim to build a self-improving system that:
Generates code with a chain of thought.
Tests the code in real-time.
Iteratively fixes any bugs.
Learns from correct solutions using RL.
This project makes it easier to create an open source and robust, local coding assistant that runs on your own hardware—ideal for learning, prototyping, and building new tools.
Local LLM Deployment: Run the Qwen2.5-coder 7B model locally after fine-tuning.
Iterative Code Generation: Generate a chain of code, test it, and allow for up to a few reattempts if errors occur.
Reinforcement Learning Loop: Reward correct outputs (and later compare time and space costs if needed) so that the model learns to generate better code over time.
Dataset Generation: Use the Gemini Flash API to automatically build a dataset of 1,000+ examples of correct code chains for DSA problems.
CLI Tool: A command-line interface to interact with the model, test code, and run training or inference.
Future-Proofing: Start with Python’s built-in code execution with a plan to add Docker-based isolation for better security in future updates.
Base Model: Qwen2.5-coder 7B, chosen for its speed and performance.
Training Process: We are Fine-tuning the LLM on a cloud platform (e.g., Google Colab) using the generated dataset of correct code chains.
Local Inference: The final LLM model is designed to run on consumer-grade hardware.
Code Interpreter: Initially, we use Python's built-in execution (like exec() or similar) to run generated code and verify outputs. This can later be upgraded to a Docker-based system for better safety.
Iterative Debugging: If the code does not produce the correct output, the model is prompted to fix its errors and try again. Only code that eventually passes the tests is used for fine-tuning.
The current reward formula is:
Score = C × (w1 * [1 / (1 + T)] + w2 * [1 / (1 + S)] + w3 * R)
C (Correctness): 1 if the code works, 0 if not.
T (Time Cost): Fewer steps lead to a higher reward.
S (Space Cost): Less memory used leads to a higher reward.
R (Readability): A score (0 to 1) for code clarity (to be used later if needed).
Weights (w1, w2, w3): They sum to 1. For now, we primarily focus on correctness (C) and may use T and S to break ties when multiple correct solutions exist.
For the initial version, we will mainly reward correct outputs and will add more nuance later as the project evolves.
Source: Use Gemini Flash via its free API to generate chains of code that solve DSA problems.
Scale: Aim to create a dataset of at least 1,000 examples covering a range of difficulties—from easy to hard.
Format: Store the generated examples in JSON, making it easy to extract the code (using markers like <code> tags) for testing and training.
Parallel Generation: Future updates might include parallel dataset generation to speed up the process.
Prototype Development:
Build the basic CLI tool.
Implement the built-in Python code executor.
Create a simple RL loop that rewards correct code.
Dataset Creation:
Generate and store 1,000+ DSA problem examples using the Gemini Flash API.
Ensure each example includes a full chain-of-thought that leads to a correct solution.
Model Fine-Tuning:
Fine-tune the Qwen2.5-coder 7B model on the generated dataset.
Test the model’s ability to generate and self-correct code.
Enhancements:
Add safety features and possibly Docker-based code execution.
Incorporate additional reward metrics (like time and space complexity) for tie-breaking.
Expand dataset generation with parallel processing if needed.
Future Features:
Develop a demo web interface.
Compare performance with other code-generating LLMs.
CodeEvolveLLM is an ambitious project that aims to push the boundaries of local LLM performance in code generation. By combining iterative code improvement, reinforcement learning, and a robust dataset of real-world coding challenges, this project aspires to create a tool that not only writes code but also learns to fix and optimize its own output. Whether you are a student, a developer, or a researcher, CodeEvolveLLM provides a new way to harness the power of LLMs for coding tasks—locally and securely.