Batch 2 of 3 · Generative AI Engineering Curriculum

From Chatbot to Action Machine

Phases 4–6: Where LLMs stop just talking and start actually doing things — tool use, MCP, and production SDKs.

Phase 04

Tool Use

Tool CallingAPIsDatabase Queries File AccessWeb SearchWorkflow Actions
🔧

Tool Calling — From Talk to Action

How LLMs decide when and how to invoke external functions

OpenAI introduced function calling in June 2023 as an extension to the Chat Completions API. The breakthrough insight: instead of trying to make the model execute code (which it can't), let it output structured JSON describing which function to call and with what arguments. The orchestration layer then executes the function and feeds the result back. Anthropic shipped tool use for Claude shortly after. By 2025, every major LLM provider supported it, and the concept had evolved into the foundation of all agentic AI systems.

Critical misconception: LLMs do NOT execute functions. They are next-token predictors. When "calling a tool," the model outputs tokens in a structured format (JSON) that the surrounding code interprets as a function invocation. The code executes the function and passes the result back as new context.

The 5-step loop: (1) User asks a question → (2) LLM receives query + tool definitions (JSON schemas) → (3) LLM outputs a structured tool call → (4) Your code executes the function → (5) Result is fed back to the LLM, which generates a final response.

Why it matters for GenAI: Tool calling is what transforms an LLM from a text generator into an agent. It can search the web, query databases, read files, send emails, update CRMs, create tickets — anything you can wrap in a function.

Real-world uses: Customer support bots that look up orders, coding assistants that run tests, research agents that search the web and synthesize findings, data analysts that query databases with natural language.

Without tool calling: LLMs would be trapped inside their training data. They couldn't check the weather, look up a stock price, or tell you the status of your order. They'd be brilliant librarians locked in a room with no books.

Before tool calling: Developers used regex parsing of free-text responses (fragile), fine-tuned models for specific tasks (expensive), or multi-model pipelines with classifiers (complex). Tool calling unified all of this into one clean pattern.

Layman's Terms

Imagine you hire a brilliant consultant who knows everything about strategy but can't use a phone. Tool calling is like giving them a phone and a contact list. They can now say "Call the sales department and ask for Q3 numbers" — they don't make the call themselves, but they know exactly who to call and what to ask.

Explain Like I'm 10

Picture a super-smart kid in a game show. They know the answer to everything, but sometimes the answer isn't in their head — it's in a locked box. Tool calling is giving them keys to different boxes: one for math, one for weather, one for finding things online. They pick the right key, open the box, read what's inside, and give you the answer!

tool_call_simulator.py

Select a user query to watch the complete tool-calling loop animate step by step.

👤 User Query
input
🧠 LLM Decides
tool selection
⚡ Execute Tool
your code runs
📋 Return Result
feed to LLM
💬 Final Answer
response
Select a query and click "Run" to visualize the tool call loop…
🗂

Tool Types — APIs, Databases, Files & Web

The different categories of tools an LLM can leverage

API Tools: The most common type. Wrap any REST or GraphQL API as a tool — weather services, CRM lookups, payment processing, shipping tracking. The LLM generates the right parameters and your code makes the HTTP call.

Database Tools: Let the LLM generate SQL queries against your databases. A user asks "What were our top 5 products last month?" and the LLM writes SELECT product, SUM(revenue) FROM sales WHERE date >= '2026-03-01' GROUP BY product ORDER BY 2 DESC LIMIT 5.

File Access Tools: Read documents, parse PDFs, analyze CSVs, write reports. Critical for document-heavy workflows like legal review, financial analysis, or research synthesis.

Web Search Tools: Give the LLM real-time access to the internet. It can search for current events, verify facts, find pricing, and research competitors. Both OpenAI and Anthropic offer built-in web search tools.

Workflow/Action Tools: The most powerful category — tools that actually DO things. Send emails, create Jira tickets, update Salesforce records, deploy code, schedule meetings. These turn chatbots into digital coworkers.

Layman's Terms

Think of tools as the AI's Swiss Army knife. An API tool is like a phone (it can call services). A database tool is like a filing cabinet key. A web tool is like a browser. An action tool is like a remote control — it can actually press buttons in your software systems.

Explain Like I'm 10

Imagine your robot has different attachments: a magnifying glass (search), a calculator (math), a walkie-talkie (send messages), and a drill (build things). Depending on what you ask, it snaps on the right tool and gets to work!

Phase 05

Model Context Protocol

MCP ClientsMCP ServersTool Exposure Context SharingPermissions
🔌

MCP — The USB-C of AI

The open protocol that standardizes how AI connects to everything

Anthropic introduced MCP in November 2024 as an open-source protocol, initially targeting Claude Desktop. The genius was solving the N×M integration problem: if you have 3 AI models and 10 tools, you'd traditionally need 30 custom integrations. MCP reduces this to 3 clients + 10 servers = 13 total components, each built once and reusable everywhere.

Adoption was explosive. By March 2025, OpenAI officially adopted MCP. Google DeepMind and Microsoft followed. By December 2025, Anthropic donated MCP to the Agentic AI Foundation (AAIF) under the Linux Foundation, co-founded by Anthropic, Block, and OpenAI. As of early 2026, MCP has reached 97 million monthly SDK downloads with over 5,800 community-built servers.

The Problem ("Context Starvation"): LLMs are incredibly capable but isolated from real-world data and systems. A model that can't read your codebase, check your calendar, or query your database is limited to operating on whatever text you paste into a prompt.

How MCP Works: It's a client-server protocol built on JSON-RPC 2.0. The architecture has three layers:

1. Host: The AI application (Claude Desktop, Cursor, VS Code) that contains the MCP client.
2. Client: Maintains a 1:1 connection to an MCP server, handling protocol negotiation and message routing.
3. Server: A lightweight program that exposes capabilities via three primitives: Tools (functions the model can invoke), Resources (data the model can read), and Prompts (pre-crafted instruction templates).

Real-world impact: Claude Code can generate an entire web app from a Figma design (via MCP). Enterprise chatbots connect to multiple databases across an organization. AI coding assistants access GitHub, Jira, and CI/CD pipelines through a single standard protocol.

Without MCP: Every AI app would need custom integrations for every tool. Switching from OpenAI to Claude would mean rebuilding all your connectors. The ecosystem would be fragmented, slow, and expensive — exactly where we were before November 2024.

Before MCP: OpenAI's ChatGPT plugin system (2023, now deprecated), custom REST wrappers per tool, LangChain tool abstractions (framework-specific, not universal). Current competitors: Google's A2A (Agent-to-Agent) protocol focuses on agent-to-agent communication rather than tool connectivity, making it complementary rather than competitive.

Layman's Terms

MCP is USB-C for AI. Before USB-C, every phone maker had a different charger — MagSafe, Micro-USB, Lightning. Now one cable charges everything. MCP does the same for AI: one protocol connects any AI model to any tool — your calendar, your database, your CRM, your code editor. Build the connector once, use it with every AI system.

Explain Like I'm 10

Imagine every game console used a different TV cable — one for PlayStation, one for Xbox, one for Nintendo. That would be so annoying! MCP is like making one magic cable that works with ALL consoles and ALL TVs. Now your AI "console" can plug into any "TV" (tool) without needing a special cable each time.

mcp_architecture.py

Click any MCP server (right column) to see how the request flows from Host → Client → Server and back.

HOSTS
🤖 Claude Desktop
MCP Client built-in
💻 Cursor IDE
MCP Client built-in
🌐 Your App
MCP Client via SDK

JSON-RPC
MCP PROTOCOL
🔧 Tools
model-controlled functions
📄 Resources
app-controlled data
💬 Prompts
user-controlled templates

JSON-RPC
MCP SERVERS
🐙 GitHub
repos, issues, PRs
💬 Slack
messages, channels
🗄 PostgreSQL
queries, schemas
📁 File System
read, write, search
Click an MCP Server to see the request/response flow…
🏗

Building MCP Clients & Servers

Hands-on: Creating your own MCP server with the Python SDK

MCP Servers are lightweight programs that expose tools, resources, and prompts. Using the official Python SDK, you define tools with simple decorators — no manual JSON schema writing required. The SDK handles protocol negotiation, message routing, and transport.

MCP Clients connect to servers and make their capabilities available to the host application. In production, you'll typically use a host that already has a client built in (Claude Desktop, Cursor), but for custom apps, you build the client with the SDK.

Transport: MCP is transport-agnostic. It supports stdio (for local processes), SSE (server-sent events over HTTP), and WebSocket. Most local tools use stdio; remote services use SSE.

Security considerations: Prompt injection through malicious server responses is a real risk. Always validate tool outputs, enforce permission boundaries, and use OAuth 2.1 for remote authentication (shipping in MCP's 2026 roadmap).

Phase 06

Agent SDKs

OpenAI Agents SDKOrchestrationHandoffs TracesTool SchemasLangChain/LangGraph
🚀

OpenAI Agents SDK & Orchestration

The production-ready framework for building multi-agent workflows

The OpenAI Agents SDK evolved from Swarm, an experimental multi-agent framework OpenAI released in October 2024. Swarm proved the pattern — lightweight agents with instructions, tools, and handoffs — but wasn't production-ready. In March 2025, OpenAI launched the Agents SDK alongside the Responses API, offering the same simplicity with production-grade features: built-in tracing, guardrails, sessions, and provider-agnostic design (it works with 100+ LLMs via LiteLLM).

By early 2026, OpenAI added Agent Builder (visual drag-and-drop), ChatKit (frontend toolkit), and Realtime Agents for voice. The legacy Assistants API is sunset in August 2026, making the Agents SDK the standard path forward.

Core Primitives:

Agents: LLMs configured with instructions, tools, and optional guardrails. Think of each agent as a specialist: one for research, one for coding, one for customer support.

Handoffs: Let agents delegate to other agents. A triage agent might hand off to a billing agent or a technical support agent based on the user's issue. Handoffs carry full context.

Guardrails: Input/output validators that run before or after the LLM responds. They catch policy violations, sensitive data leaks, or off-topic responses.

Tracing: Built-in observability that logs every step — agent decisions, tool calls, handoffs, guardrail checks. Traces feed into OpenAI's dashboard for debugging, evaluation, and fine-tuning.

Sessions: Persistent conversation state across turns. No more manually appending chat history — the SDK manages it automatically via SQLite or server-side storage.

Without agent SDKs: You'd be manually managing conversation history, writing custom routing logic, building your own tracing infrastructure, and handling state across turns. A 2023-era "agent" was 500+ lines of glue code. With the Agents SDK, the same thing is ~30 lines.

LangChain/LangGraph: More mature ecosystem, excellent for complex workflows with cycles and conditional logic. Heavier abstraction layer. LlamaIndex: RAG-first, best for retrieval-heavy applications. CrewAI: Role-based multi-agent patterns, great for team simulations. Autogen: Microsoft's multi-agent conversation framework.

Layman's Terms

The Agents SDK is a construction kit for AI workers. Instead of building an entire factory from raw materials, you get pre-made components: workers (agents) that know their job, a dispatch system (handoffs) that routes tasks to the right worker, a security team (guardrails) that checks everything, and cameras (tracing) that record what happens for debugging.

Explain Like I'm 10

Imagine a team of robot helpers. One knows about math, one about cooking, one about cleaning. The Agents SDK is the instruction manual that tells them how to work together. When someone asks about food, the team captain says "that's a job for the cooking robot!" and hands it over. If the cooking robot needs ingredients, it asks the shopping robot. They all work as a team!

agent_workflow_builder.py

Click each step to trace through a multi-agent workflow — from triage to specialized agent to final response.

Select a scenario and click "Run" to trace the agent workflow…
🔗

LangChain, LangGraph & LlamaIndex

The broader orchestration ecosystem — chains, graphs, and retrieval

LangChain was created by Harrison Chase in October 2022, less than a month before ChatGPT launched. It became the first widely-adopted framework for building LLM applications, introducing the concept of "chains" — composable sequences of LLM calls, tool uses, and data transformations. LangGraph (2024) extended this with a graph-based execution model for cyclic, stateful workflows — essential for agents that loop and branch. LlamaIndex by Jerry Liu (originally GPT Index, 2022) focused specifically on connecting LLMs to data — the definitive framework for RAG systems.

LangChain: Best for linear and branching workflows. Its LCEL (LangChain Expression Language) provides clean syntax for composing retrieval, prompting, and output parsing. Strong for RAG pipelines, extraction chains, and structured output workflows.

LangGraph: The agent-building layer. Uses directed graphs with nodes (functions) and edges (routing logic). Supports cycles (agent loops), persistence (state checkpoints), and human-in-the-loop patterns. Ideal for multi-step agents that need to iterate.

LlamaIndex: The data framework. Best-in-class for ingesting documents from 160+ sources, chunking, indexing, and retrieving. If your application is primarily about connecting an LLM to data, LlamaIndex is often the fastest path.

Layman's Terms

LangChain is like a recipe cookbook — it gives you step-by-step instructions for combining AI ingredients. LangGraph is a flow chart builder — when decisions need to loop back or branch, it handles the complexity. LlamaIndex is a librarian — it's the best at organizing and finding information in your documents so the AI can use it.

Explain Like I'm 10

LangChain is like train tracks — it moves your AI from station to station in order. LangGraph is like a choose-your-own-adventure book — the AI can go different directions and even loop back. LlamaIndex is like a super-organized binder with tabs — it helps the AI find the exact page of information it needs!