The Self-Extending AI Agent Part 1: Build an Agent That Writes Its Own Tools at Runtime

The Self-Extending AI Agent

Every AI agent framework — LangChain, CrewAI, LlamaIndex, AutoGen — shares the same fundamental constraint: an agent can only use the tools you give it before it starts. You define search_web, run_sql, send_email, and the agent picks from that list. Miss a tool? The agent fails, hallucinates, or gives up.

What if the agent could recognize the gap itself — and fill it?

This article builds a Self-Extending Agent: an agentic system that, when faced with a task it has no tool for, writes the tool in Python, validates it in a sandbox, registers it into its own tool registry, and uses it — all at runtime, without any human intervention.

Warning: This is production-grade, advanced material. By the end, you will have a working system and a deep understanding of the risks, guardrails, and architecture required to deploy it safely.

The Core Problem With Static Tool Registries

Consider this task given to a standard LangChain agent:

agent.run("Fetch the latest exchange rate for USD/ILS, convert 5000 USD, and save the result to a CSV file.")

Your agent has: search_web, run_python, read_file.

It does not have: fetch_exchange_rate, convert_currency, write_csv.

The agent will either:

Try to hack together a solution using wrong tools
Hallucinate a tool call that doesn't exist
Return "I cannot complete this task"

All three outcomes are failures. The fix isn't to pre-load 200 tools. The fix is to let the agent generate what it needs.

Architecture Overview

The Self-Extending Agent is composed of five components that work in a loop:

┌─────────────────────────────────────────────┐
│              ORCHESTRATOR AGENT             │
│  Receives task → checks tool registry       │
│  → identifies missing capability            │
└──────────────────┬──────────────────────────┘
                   │ "I need: fetch_exchange_rate"
                   ▼
┌─────────────────────────────────────────────┐
│            TOOL GENERATOR AGENT             │
│  LLM writes a Python function from spec     │
│  → returns code as string                   │
└──────────────────┬──────────────────────────┘
                   │ raw Python code
                   ▼
┌─────────────────────────────────────────────┐
│           SANDBOX VALIDATOR                 │
│  Runs code in restricted subprocess         │
│  → checks syntax, safety, output contract   │
└──────────────────┬──────────────────────────┘
                   │ validated ✅ or rejected ❌
                   ▼
┌─────────────────────────────────────────────┐
│            TOOL REGISTRY                    │
│  Stores tool as callable + persists to disk │
│  → available for all future agent runs      │
└──────────────────┬──────────────────────────┘
                   │ tool now available
                   ▼
┌─────────────────────────────────────────────┐
│         ORCHESTRATOR (resumed)              │
│  Uses new tool → completes original task ✅ │
└─────────────────────────────────────────────┘

Project Setup

pip install langchain langchain-openai openai pydantic

Create the project structure:

self_extending_agent/
├── main.py
├── registry.py
├── generator.py
├── validator.py
├── orchestrator.py
└── tools/
    └── manifest.json

Step 1: The Tool Registry

The registry is the agent's memory for tools. It stores callables in-process and persists generated tool source to disk so they survive restarts.

registry.py

import json
import importlib.util
from pathlib import Path
from typing import Callable, Optional
from langchain.tools import StructuredTool
 
TOOLS_DIR = Path(__file__).parent / "tools"
TOOLS_DIR.mkdir(exist_ok=True)
MANIFEST_PATH = TOOLS_DIR / "manifest.json"
 
 
class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, StructuredTool] = {}
        self._load_persisted_tools()
 
    def register(self, name: str, func: Callable, description: str) -> StructuredTool:
        tool = StructuredTool.from_function(
            func=func,
            name=name,
            description=description,
        )
        self._tools[name] = tool
        return tool
 
    def get(self, name: str) -> Optional[StructuredTool]:
        return self._tools.get(name)
 
    def has(self, name: str) -> bool:
        return name in self._tools
 
    def all_tools(self) -> list[StructuredTool]:
        return list(self._tools.values())
 
    def tool_names(self) -> list[str]:
        return list(self._tools.keys())
 
    def persist_tool(self, name: str, source_code: str, description: str):
        tool_file = TOOLS_DIR / f"{name}.py"
        tool_file.write_text(source_code)
        manifest = self._load_manifest()
        manifest[name] = {"description": description, "file": f"{name}.py"}
        MANIFEST_PATH.write_text(json.dumps(manifest, indent=2))
 
    def _load_manifest(self) -> dict:
        if MANIFEST_PATH.exists():
            return json.loads(MANIFEST_PATH.read_text())
        return {}
 
    def _load_persisted_tools(self):
        manifest = self._load_manifest()
        for name, meta in manifest.items():
            tool_file = TOOLS_DIR / meta["file"]
            if not tool_file.exists():
                continue
            try:
                spec = importlib.util.spec_from_file_location(name, tool_file)
                module = importlib.util.module_from_spec(spec)
                spec.loader.exec_module(module)
                func = getattr(module, name, None)
                if callable(func):
                    self.register(name, func, meta["description"])
                    print(f"[Registry] Loaded persisted tool: {name}")
            except Exception as e:
                print(f"[Registry] Failed to load {name}: {e}")
 
 
registry = ToolRegistry()

Step 2: The Tool Generator Agent

This is the heart of the system. It uses an LLM to write a Python function based on a plain-English description of what the tool needs to do.

generator.py

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
 
GENERATION_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """You are an expert Python engineer. Your job is to write a single, self-contained Python function.
 
Rules:
- The function name must exactly match the `tool_name` provided.
- The function must have a clear docstring.
- Use only standard library modules OR: requests, httpx, pandas, pydantic.
- Do NOT import from langchain, openai, or any LLM library.
- Do NOT use file system access outside of /tmp.
- Return ONLY the raw Python code. No markdown, no explanation, no ```python blocks.
- The function must handle errors gracefully with try/except.
- Always include type hints.
"""),
    ("human", """Write a Python function with this specification:
 
Tool name: {tool_name}
Description: {description}
Expected input parameters: {input_params}
Expected return type: {return_type}
Example usage: {example}
""")
])
 
 
class ToolGeneratorAgent:
    def __init__(self, model: str = "gpt-4o"):
        self.llm = ChatOpenAI(model=model, temperature=0.1)
        self.chain = GENERATION_PROMPT | self.llm
 
    def generate(
        self,
        tool_name: str,
        description: str,
        input_params: str,
        return_type: str,
        example: str,
    ) -> str:
        print(f"[Generator] Writing tool: {tool_name}...")
        response = self.chain.invoke({
            "tool_name": tool_name,
            "description": description,
            "input_params": input_params,
            "return_type": return_type,
            "example": example,
        })
        return response.content.strip()

Step 3: The Sandbox Validator

Generated code must never run unchecked. The validator checks for dangerous patterns via AST analysis and runs the code in an isolated subprocess with a timeout.

validator.py

import ast
import subprocess
import sys
import tempfile
import textwrap
from pathlib import Path
 
BANNED_PATTERNS = [
    "os.system",
    "subprocess.call",
    "subprocess.Popen",
    "__import__",
    "eval(",
    "exec(",
    "shutil.rmtree",
    "sys.exit",
]
 
 
class ValidationError(Exception):
    pass
 
 
class ToolValidator:
 
    def validate(self, tool_name: str, source_code: str) -> bool:
        self._check_syntax(source_code)
        self._check_banned_patterns(source_code)
        self._check_function_exists(tool_name, source_code)
        self._run_import_test(tool_name, source_code)
        return True
 
    def _check_syntax(self, source_code: str):
        try:
            ast.parse(source_code)
        except SyntaxError as e:
            raise ValidationError(f"Syntax error: {e}")
 
    def _check_banned_patterns(self, source_code: str):
        for pattern in BANNED_PATTERNS:
            if pattern in source_code:
                raise ValidationError(f"Banned pattern detected: '{pattern}'")
 
    def _check_function_exists(self, tool_name: str, source_code: str):
        tree = ast.parse(source_code)
        functions = [
            node.name for node in ast.walk(tree)
            if isinstance(node, ast.FunctionDef)
        ]
        if tool_name not in functions:
            raise ValidationError(
                f"Function '{tool_name}' not found. Found: {functions}"
            )
 
    def _run_import_test(self, tool_name: str, source_code: str):
        test_script = textwrap.dedent(f"""
import sys
try:
{textwrap.indent(source_code, "    ")}
    assert callable({tool_name}), "Not callable"
    print("OK")
except Exception as e:
    print(f"FAIL: {{e}}", file=sys.stderr)
    sys.exit(1)
""")
        with tempfile.NamedTemporaryFile(
            mode="w", suffix=".py", delete=False
        ) as f:
            f.write(test_script)
            tmp_path = f.name
 
        try:
            result = subprocess.run(
                [sys.executable, tmp_path],
                capture_output=True,
                text=True,
                timeout=10,
            )
            if result.returncode != 0:
                raise ValidationError(
                    f"Runtime validation failed: {result.stderr.strip()}"
                )
        except subprocess.TimeoutExpired:
            raise ValidationError("Validation timed out (10s limit)")
        finally:
            Path(tmp_path).unlink(missing_ok=True)

Step 4: The Orchestrator Agent

The orchestrator is the main agent loop. It receives a task, tries to execute it, and when it encounters a missing capability it triggers the full generation pipeline — then rebuilds itself with the new tool available.

orchestrator.py

from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.tools import StructuredTool
from pydantic import BaseModel, Field
 
from registry import registry
from generator import ToolGeneratorAgent
from validator import ToolValidator, ValidationError
 
ORCHESTRATOR_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """You are an autonomous AI agent with the ability to extend your own capabilities.
 
When you need to perform an action for which you have no tool:
1. Use the `request_new_tool` tool to describe exactly what you need.
2. Wait — the system will generate and register the tool automatically.
3. The new tool will then be available. Use it to complete the task.
 
Always complete the user's task fully. Never say you cannot do something — if you lack a tool, request it.
"""),
    MessagesPlaceholder("chat_history", optional=True),
    ("human", "{input}"),
    MessagesPlaceholder("agent_scratchpad"),
])
 
 
class NewToolRequest(BaseModel):
    tool_name: str = Field(
        description="snake_case name for the tool, e.g. fetch_exchange_rate"
    )
    description: str = Field(
        description="What the tool does in one clear sentence"
    )
    input_params: str = Field(
        description="Parameters with types, e.g. 'amount: float, from_currency: str'"
    )
    return_type: str = Field(
        description="Return type, e.g. 'float' or 'dict'"
    )
    example: str = Field(
        description="Example call, e.g. fetch_exchange_rate(100.0, 'USD', 'ILS')"
    )
 
 
class SelfExtendingOrchestrator:
    def __init__(self):
        self.llm = ChatOpenAI(model="gpt-4o", temperature=0)
        self.generator = ToolGeneratorAgent()
        self.validator = ToolValidator()
        self._build_executor()
 
    def _build_executor(self):
        request_tool = StructuredTool.from_function(
            func=self._handle_tool_request,
            name="request_new_tool",
            description=(
                "Call this when you need a capability you don't have. "
                "The system will generate and register the tool for you."
            ),
            args_schema=NewToolRequest,
        )
        tools = [request_tool] + registry.all_tools()
        agent = create_openai_tools_agent(self.llm, tools, ORCHESTRATOR_PROMPT)
        self.executor = AgentExecutor(
            agent=agent,
            tools=tools,
            verbose=True,
            max_iterations=15,
            handle_parsing_errors=True,
        )
 
    def _handle_tool_request(
        self,
        tool_name: str,
        description: str,
        input_params: str,
        return_type: str,
        example: str,
    ) -> str:
        if registry.has(tool_name):
            return f"Tool '{tool_name}' already exists in the registry. Use it directly."
 
        for attempt in range(1, 4):
            print(f"\n[Orchestrator] Generating '{tool_name}' (attempt {attempt}/3)")
            try:
                source_code = self.generator.generate(
                    tool_name=tool_name,
                    description=description,
                    input_params=input_params,
                    return_type=return_type,
                    example=example,
                )
                self.validator.validate(tool_name, source_code)
 
                exec_globals: dict = {}
                exec(source_code, exec_globals)  # noqa: S102
                func = exec_globals[tool_name]
 
                registry.register(tool_name, func, description)
                registry.persist_tool(tool_name, source_code, description)
 
                # Rebuild executor so new tool is immediately available
                self._build_executor()
 
                print(f"[Orchestrator] ✅ '{tool_name}' registered.")
                return (
                    f"Tool '{tool_name}' created and registered successfully. "
                    f"You can now use it to: {description}"
                )
 
            except (ValidationError, Exception) as e:
                print(f"[Orchestrator] ❌ Attempt {attempt} failed: {e}")
                if attempt == 3:
                    return f"Failed to generate '{tool_name}' after 3 attempts: {e}"
 
        return f"Tool generation failed for '{tool_name}'."
 
    def run(self, task: str) -> str:
        print(f"\n{'='*60}")
        print(f"Task: {task}")
        print(f"Available tools: {registry.tool_names()}")
        print(f"{'='*60}\n")
        result = self.executor.invoke({"input": task})
        return result["output"]

Step 5: Main Entry Point

main.py

import os
from orchestrator import SelfExtendingOrchestrator
 
os.environ["OPENAI_API_KEY"] = "your-key-here"
 
def main():
    agent = SelfExtendingOrchestrator()
 
    tasks = [
        "What is the current USD to ILS exchange rate? Convert 5000 USD.",
        "Get the top 5 trending GitHub repositories today and list their names and stars.",
        "Generate a 16-character secure password and calculate its entropy in bits.",
    ]
 
    for task in tasks:
        result = agent.run(task)
        print(f"\n✅ Result: {result}\n")
        print("-" * 60)
 
if __name__ == "__main__":
    main()

Live Demo: What Actually Happens

Running main.py with an empty tool registry for the first time:

============================================================
Task: What is the current USD to ILS exchange rate? Convert 5000 USD.
Available tools: []
============================================================
 
> Entering new AgentExecutor chain...
> Invoking: `request_new_tool` with {
    "tool_name": "fetch_exchange_rate",
    "description": "Fetch live exchange rate between two currencies",
    "input_params": "from_currency: str, to_currency: str",
    "return_type": "float",
    "example": "fetch_exchange_rate('USD', 'ILS')"
  }
 
[Orchestrator] Generating 'fetch_exchange_rate' (attempt 1/3)
[Generator] Writing tool: fetch_exchange_rate...
[Orchestrator] ✅ 'fetch_exchange_rate' registered.
 
> Invoking: `fetch_exchange_rate` with {"from_currency": "USD", "to_currency": "ILS"}
→ 3.71
 
> Invoking: `request_new_tool` with {
    "tool_name": "convert_currency",
    "description": "Multiply an amount by an exchange rate",
    "input_params": "amount: float, rate: float",
    "return_type": "float",
    "example": "convert_currency(5000, 3.71)"
  }
 
[Orchestrator] ✅ 'convert_currency' registered.
 
> Invoking: `convert_currency` with {"amount": 5000, "rate": 3.71}
→ 18550.0
 
✅ Result: The current USD to ILS rate is 3.71.
   5,000 USD = 18,550 ILS.

Second run — tools loaded from tools/manifest.json:

[Registry] Loaded persisted tool: fetch_exchange_rate
[Registry] Loaded persisted tool: convert_currency
 
Available tools: ['fetch_exchange_rate', 'convert_currency']

The agent never generates them again. The registry compounds over time. ✅

Risks and Guardrails You Must Implement

This architecture is powerful — and dangerous without proper controls.

1. Code Injection

The biggest risk. Mitigations already implemented:

AST static analysis — banned patterns checked before any execution
Subprocess isolation — validation runs in a separate process
No eval/exec in generated code — enforced by the validator

For production, replace exec() in the orchestrator with Docker container execution:

result = subprocess.run(
    ["docker", "run", "--rm", "--network=none",
     "--memory=64m", "--cpus=0.5",
     "-v", f"{tool_file}:/tool.py:ro",
     "python:3.12-slim", "python", "/tool.py"],
    capture_output=True,
    timeout=15,
)

2. Runaway Generation

MAX_TOOLS_PER_SESSION = 5
MAX_TOOLS_TOTAL = 50
 
if len(registry.tool_names()) >= MAX_TOOLS_TOTAL:
    return "Registry full. Review and prune unused tools first."

3. Cost Control

Each tool generation costs 1–3 LLM calls. Track spend with a callback:

from langchain_community.callbacks import get_openai_callback
 
with get_openai_callback() as cb:
    source_code = self.generator.generate(...)
    print(f"[Cost] Tool generation: ${cb.total_cost:.4f}")

4. Human-in-the-Loop for Sensitive Tools

For tools that write to databases, send emails, or make financial transactions:

SENSITIVE_KEYWORDS = ["send", "write", "delete", "pay", "transfer"]
 
if any(kw in tool_name for kw in SENSITIVE_KEYWORDS):
    approval = input(f"\n⚠️  Approve creation of '{tool_name}'? [y/N] ")
    if approval.lower() != "y":
        return "Tool creation rejected by operator."

Production Architecture

For a real deployment, the full stack looks like this:

                    ┌──────────────────────┐
                    │   Orchestrator Agent  │
                    └──────────┬───────────┘
                               │ generate_tool(spec)
                    ┌──────────▼───────────┐
                    │   Tool Generator LLM  │
                    └──────────┬───────────┘
                               │ source_code string
                    ┌──────────▼───────────┐
                    │  Docker Sandbox       │ ← no network, read-only FS
                    │  AST + runtime check  │   5s timeout, 64MB RAM
                    └──────────┬───────────┘
                               │ ✅ validated
                    ┌──────────▼───────────┐
                    │  Tool Registry        │
                    │  (S3 + DynamoDB)      │ ← shared across instances
                    └──────────────────────┘

Key Takeaways

Static tool registries are a ceiling. The Self-Extending Agent removes that ceiling entirely.
The generate → validate → register loop is the core primitive. Keep it tight and safe.
Persistence makes it compound. Every generated tool is an investment — it is never re-generated.
Safety is non-negotiable. Always validate. Never skip the sandbox. Add human approval for sensitive actions.
This is where agentic AI is heading. Systems that expand their own capabilities at runtime are the next frontier — and you now know how to build one.

What's Next

The natural evolution of this system is a Self-Improving Agent — one that not only creates new tools but rewrites existing ones when it detects they are underperforming. Combine this with an evaluation loop, a performance memory, and a regression test suite, and you have a system that genuinely learns from its own operational history.

That is the subject of Part 2: The Self-Improving Agent.