A complete, interactive curriculum from Python fundamentals to production-grade AI systems. Batch 1 covers Phases 1–3.
Guido van Rossum created Python in 1991 at CWI in the Netherlands. The "aha" moment was designing a language that prioritized readability — code should read like English pseudocode. Named after Monty Python (not the snake), it became the de facto language of data science around 2015 when libraries like NumPy, Pandas, and scikit-learn matured. When deep learning exploded with TensorFlow (2015) and PyTorch (2016), Python's ecosystem became unassailable.
Problem it solved: Before Python dominated AI, researchers juggled C++, MATLAB, and R. Python unified everything under one language with a massive package ecosystem, gentle learning curve, and first-class support from every major ML framework.
How it works for GenAI: Python is the glue layer — you write orchestration logic, API calls, data pipelines, and agent workflows. Libraries like httpx (async HTTP), FastAPI (web services), and pydantic (data validation) are the building blocks of every GenAI application in 2026.
Real-world uses: Every major LLM API (OpenAI, Anthropic, Google) ships Python SDKs first. Agent frameworks (LangChain, LlamaIndex, OpenAI Agents SDK) are Python-native. Data pipelines for RAG systems are written in Python.
What if Python didn't exist? AI development would be fragmented — researchers in R, production in Java, glue code in Bash. The speed of GenAI innovation would be cut in half because prototyping would take weeks, not hours.
JavaScript/TypeScript: Growing in AI (Vercel AI SDK), but ecosystem is thinner. Rust: For performance-critical inference (llama.cpp), but too low-level for app development. Julia: Excellent for numerical computing, but tiny ecosystem.
Python is like English for computers — it's the easiest major programming language to read and write. For AI work, it's the only language where every tool you need already exists, so you spend time building products instead of reinventing wheels.
Imagine you're building with LEGO. Python is like having every LEGO set ever made in one giant box. Other languages are like having just one small set — you can still build stuff, but you'd have to make your own pieces first. Python lets you snap things together fast!
REST APIs emerged from Roy Fielding's 2000 doctoral dissertation. JSON was formalized by Douglas Crockford around 2001 as a lightweight alternative to XML. HTTP dates back to Tim Berners-Lee's work in 1991. Together, these three standards form the backbone of how every LLM service communicates — when you call the OpenAI API or Anthropic API, you're sending JSON over HTTP to a REST endpoint.
Problem solved: Applications need a universal way to send structured data between services. JSON became the standard because it's human-readable and maps perfectly to Python dictionaries. Every LLM response is a JSON object with fields like choices, content, and usage.
For GenAI specifically: Understanding HTTP status codes (200 OK, 429 rate-limited, 500 server error), headers (Authorization, Content-Type), and request/response bodies is non-negotiable. Async HTTP (via httpx or aiohttp) lets you make parallel LLM calls — critical when your agent needs to query multiple tools simultaneously.
Without REST/JSON, every AI provider would use a different protocol. Integrating OpenAI, Anthropic, and Google would require learning three different communication standards instead of one.
An API is like a restaurant menu: you pick what you want (the request), the kitchen makes it (the server), and the waiter brings it back (the response). JSON is the tray the food comes on — it's a structured, organized format both you and the kitchen understand.
Imagine passing notes in class. You write "What's 5+3?" on a piece of paper (that's HTTP). You fold it a special way so your friend knows how to read it (that's JSON). Your friend writes "8" and passes it back. An API is the rules for how to fold and pass notes!
Git was created by Linus Torvalds in 2005 when the Linux kernel's previous version control system (BitKeeper) revoked its free license. Torvalds built Git in about two weeks. Linux itself dates to 1991, also by Torvalds. SQL was developed at IBM by Donald Chamberlin and Raymond Boyce in the 1970s, based on Edgar Codd's relational model.
Git: Every GenAI project needs version control — prompt templates, agent code, configuration files. Git branches let you experiment with different agent architectures without breaking production.
Linux: Almost all AI services run on Linux servers. Docker containers (which wrap your GenAI apps) are Linux under the hood. Knowing bash commands, file permissions, and process management is essential.
SQL: RAG systems often pull data from databases. Tool-calling agents need to write and execute SQL queries. Understanding joins, filters, and aggregations is critical for building knowledge assistants over structured data.
Git is Google Docs "Version History" for code — it tracks every change and lets you undo anything. Linux is the operating system that runs most of the internet's servers. SQL is the language you use to ask questions of databases ("Show me all customers who bought last week").
Git is like a time machine for your homework — you can go back to any version you saved. Linux is like the engine in a car: you don't see it, but everything runs because of it. SQL is like asking the school librarian a very specific question: "Find all books by this author published after 2020."
Tokenization traces back to Byte Pair Encoding (BPE), originally a data compression algorithm from 1994 by Philip Gage. The breakthrough was applying it to NLP — the 2016 paper by Sennrich, Haddow, and Birch introduced BPE for machine translation. OpenAI adopted it for GPT-2 (2019) with their tiktoken tokenizer, and it became the industry standard. Context windows started at 512 tokens (BERT, 2018), grew to 2048 (GPT-2), 4096 (GPT-3), 8K–32K (GPT-4), and have now reached 1 million+ tokens in 2026 models.
Problem solved: Neural networks can't process raw text — they need numbers. Tokenizers break text into subword units (tokens), each mapped to an integer ID. Common words like "the" are single tokens, while rare words like "tokenization" might be split into "token" + "ization".
How it works: A context window is the model's total working memory for one request. Everything must fit inside it: your system prompt, conversation history, documents, tool outputs, AND the model's response. In English, 1 token ≈ 0.75 words. A 200K-token window holds roughly 150,000 words (about 500 pages). Critically, the context window resets with every API call — the model has no persistent memory.
Real-world impact: Context window size determines what's possible. A 4K window can handle a short chat. A 200K window can analyze an entire codebase. A 1M window can process a full legal contract or research paper collection in one pass.
What if tokens didn't exist? Models would have to process individual characters (catastrophically slow) or full words (vocabulary of millions, impossibly expensive). Subword tokenization is the Goldilocks solution.
Tokens are like syllables for AI. The model breaks your message into chunks — "Hello" is 1 token, "artificial" might be 2 tokens ("artific" + "ial"). The context window is how many tokens the AI can see at once — think of it as the size of its desk. Everything (your question + its answer) has to fit on that desk.
Imagine you can only remember 20 words at a time. If someone tells you a story that's 25 words long, you'd forget the first 5 words! That's what a context window is — the AI's short-term memory limit. Tokens are like the individual LEGO bricks that make up each word.
Type text below to see how an LLM tokenizer breaks it into tokens. Each color = one token.
Prompt engineering became a recognized discipline with GPT-3 (2020), when researchers at OpenAI demonstrated that the same model could perform wildly different tasks based solely on how you phrased the input. The landmark paper "Language Models are Few-Shot Learners" (Brown et al., 2020) showed that providing examples in the prompt (few-shot prompting) could rival fine-tuned models. Chain-of-thought prompting was formalized by Wei et al. (2022) at Google, showing that adding "Let's think step by step" dramatically improved reasoning.
System prompts are persistent instructions injected at the start of every conversation. They define the model's persona, constraints, and output format. User prompts are the actual queries. The interaction between them determines output quality.
Key techniques: Zero-shot (just ask), few-shot (provide examples), chain-of-thought (ask to reason step by step), structured output (request JSON or XML), and role-based prompting (assign a persona).
Without prompt engineering, LLMs would be like having a genius employee who never received a job description. The difference between a bad prompt and a good one can be the difference between a useless response and a production-ready output.
A prompt is your instruction to the AI. A system prompt is like a standing memo ("You are a legal assistant who always cites sources") that stays active for the entire conversation. The better your instructions, the better the output. It's the difference between telling someone "write something about dogs" vs. "write a 200-word product description for a premium dog food brand targeting health-conscious pet owners."
Imagine you have a super-smart robot friend. If you say "draw something," it might draw anything random. But if you say "draw a red dragon flying over a castle at sunset, in cartoon style," you'll get exactly what you want. A system prompt is like telling the robot at the start of the day: "Today you're an art teacher who always explains your drawings."
Build a prompt step-by-step. Watch how each technique changes the effective prompt.
OpenAI introduced function calling in June 2023, allowing GPT models to output structured JSON matching predefined function schemas instead of free text. This was the "aha" moment that turned chatbots into software components. Anthropic followed with tool use in Claude, and it's now standard across all major providers. Structured outputs (guaranteed JSON schema conformance) arrived in 2024, using constrained decoding to ensure the model's output is always valid.
Problem solved: Before function calling, extracting structured data from LLMs was unreliable — you'd parse free text with regex and hope it worked. Function calling gives the model a formal schema and it returns structured JSON that maps directly to code functions.
Structured output goes further: the model is constrained at the token level to only produce valid JSON matching your schema. This means zero parsing failures in production.
Function calling is like giving the AI a menu of actions it can take: "search the web," "query the database," "send an email." Instead of just talking, the AI can now DO things by outputting a structured request that your code executes.
Imagine you have a robot butler. Before function calling, you'd say "I'm hungry" and it would just say "You should eat something." Now, with function calling, it says "I'll order a pizza for you" and actually presses the buttons to order it!
The concept of representing words as vectors traces to Word2Vec by Tomas Mikolov et al. at Google (2013). The famous result: "King − Man + Woman = Queen" showed that vector arithmetic could capture semantic relationships. This evolved through GloVe (Stanford, 2014), ELMo (AllenNLP, 2018), and finally modern sentence/document embeddings powered by transformers. Today's embedding models (OpenAI's text-embedding-3, Cohere Embed, BGE) produce 1536+ dimensional vectors that capture the semantic meaning of entire passages.
Problem solved: Computers can't understand text natively. Embeddings convert text into dense numerical vectors where semantically similar texts have similar vectors. "How do I fix a bug?" and "debugging my code" would be close in vector space, even though they share few words.
How it works: You send text to an embedding model, which returns a vector (array of floating-point numbers). To find similar content, you compute the cosine similarity between vectors — a value between -1 and 1 where 1 means identical meaning.
Real-world uses: Semantic search, recommendation systems, clustering documents, anomaly detection, and — most importantly — RAG (the "R" in RAG). Without embeddings, search would be limited to exact keyword matching. You'd miss documents that discuss the same concept using different words.
Embeddings turn text into coordinates on a map. Similar ideas end up near each other. "Happy" and "joyful" would be neighbors; "happy" and "carburetor" would be on opposite sides of the map. This lets computers find related content even when different words are used.
Imagine every word is a kid in a schoolyard. Kids who like the same things stand close together. "Dog," "puppy," and "canine" are all in the pet-lovers corner. "Airplane" is across the yard with "jet" and "flight." Embeddings are the GPS coordinates of where each word stands!
Click any two words to compute their similarity. Nearby words have high similarity scores.
The concept of retrieval-augmented generation was formalized by Patrick Lewis et al. at Meta (Facebook AI Research) in their 2020 paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." The insight was simple but revolutionary: instead of cramming all knowledge into model weights, retrieve relevant documents at inference time and provide them as context. This dramatically reduces hallucination and keeps the model's knowledge current.
Chunking is the first step: splitting documents into smaller pieces (typically 200–1000 tokens) that can be individually embedded and retrieved. Strategies include fixed-size, sentence-based, semantic, and recursive character splitting.
Retrieval uses vector similarity search (semantic) and/or keyword search (BM25) to find the most relevant chunks for a given query. Hybrid search combines both: vector search finds semantically similar content, keyword search catches exact terms (product names, IDs, codes).
Reranking is a second pass using a cross-encoder model (like Cohere Rerank or BGE-reranker) that scores each retrieved chunk's relevance to the query more precisely than vector similarity alone.
Without RAG: LLMs can only use knowledge frozen in their training data. They can't answer questions about your company's internal docs, last week's meeting notes, or your product catalog. RAG is what transforms a general chatbot into a knowledge assistant over your data.
RAG is like giving the AI an open-book exam instead of a closed-book one. When you ask a question, it first searches through your documents (the "retrieval" part), finds the most relevant pages, then reads them and writes an answer (the "generation" part). The result: accurate answers grounded in your actual data, with sources you can verify.
Imagine you're on a quiz show but you're allowed to bring one book. RAG is like having a super-fast librarian: you whisper a question, the librarian zooms through the book, grabs the best 3 pages, and hands them to you. Then you read those pages and answer the question confidently — because you can SEE the answer right there!
Type a query and watch the RAG pipeline in action — from chunking to retrieval to generation.