verigen is a DSPy-native framework for verifiable code generation through evolutionary optimization. Define what you need — it evolves the implementation.
Define your task with a seed program, an evaluator, and a description. verigen handles the rest.
Seed code with # EVOLVE-BLOCK markers. The LLM generates the full program; markers guide where to mutate.
Exports evaluate(code) → dict with score, passed, feedback. Hard constraints stop bad candidates, continuous metrics drive improvement.
Task description + rich context. The first heading is the short description; full content guides the LLM toward the right implementation.
passed=True and score improves
·
✗ Reject If hard constraints fail
·
⏹ Stop Early at score threshold or plateau
Everything you need to evolve correct, performant Python code — without leaving the terminal.
LLM-guided code improvement with feedback from each evaluation. Greedy hill-climbing that keeps only better candidates.
Candidates that fail tests are automatically rejected. If the seed generation can't pass, the loop stops immediately — no wasted iterations.
Optimize latency, accuracy, throughput — whatever your evaluate() returns. Higher score wins, guided by sigmoid normalization.
Leverages DSPy modules, signatures, and assertions. Works with OpenAI, Anthropic, Google, Ollama, llama.cpp, vLLM — any DSPy-compatible provider.
See per-iteration scores, timing, and best-so-far marks in real time. Statuses: completed, threshold_reached, plateau, initial_failed.
Ships as a pi agent Skill. Auto-discovered at the project root — your agent can scaffold tasks, run evolution, and return optimized code.
A simple task to double an integer — seed code, evaluator, and the result after evolution.
def solve(n: int) -> int: """Return n * 2.""" # EVOLVE-BLOCK-START raise NotImplementedError("Replace this!") # EVOLVE-BLOCK-END
import time def evaluate(code: str) -> dict: ns = {} exec(code, ns) fn = ns["solve"] # Correctness assert fn(5) == 10 assert fn(0) == 0 # Performance t0 = time.perf_counter() for _ in range(1000): fn(100) avg_us = (time.perf_counter() - t0) / 1000 * 1e6 return { "score": avg_us / (avg_us + 1.0), "passed": True, "feedback": f"Avg {avg_us:.1f}µs/call", "metrics": {"avg_us": avg_us}, }
| Task | Pattern | Difficulty | Notes |
|---|---|---|---|
| examples/palindrome/ | String processing | Easy | Simple correctness + speed |
| tasks/game_of_life/ | Matrix computation | Medium | Padding vs vectorized |
| tasks/levenshtein/ | DP algorithm | Medium | 2D → 1D row optimization |
| tasks/lru_cache/ | Data structure | Medium | OrderedDict → linked list |
| tasks/topological_sort/ | Graph algorithm | Medium | Kahn's vs DFS optimization |
| tasks/prefilter/ | AST walking | Medium | Dogfooding: ast.walk vs Python recursion |
All options for the CLI interface.
openai/gpt-4o, ollama_chat/qwen3.6, etc.
auto
http://localhost:8080/v1)
—
Import and integrate directly into your Python workflows.
from verigen import VerifiableCodeGen, load_task # Load a task task = load_task("tasks/palindrome/") # Configure evolution gen = VerifiableCodeGen( max_iterations=50, score_threshold=0.95, ) # Run result = gen(task) # Results print(f"Score: {result.best_score:.4f}") print(result.best_code) # Full iteration history for entry in result.trace.entries: badge = "✓" if entry.passed else "✗" print(f"[{entry.iteration:3d}] {entry.phase:7s} {badge} score={entry.score:.4f}")
Install verigen and run your first evolution in minutes.