← All writing
AI

How to Build an AI Agent with LangChain and OpenAI

A complete, production-minded walkthrough of building an AI agent with LangChain and OpenAI in TypeScript — tools, memory, the agent loop, error handling, deployment, and the things every tutorial leaves out.

"Agent" is the most over-promised word in AI right now. Strip away the marketing and the idea is straightforward: an LLM in a loop, with tools, working toward a goal. The mechanism is easy — what's hard is making one that doesn't burn money, hallucinate tool calls, or get stuck talking to itself at 3am. This guide is the build I wish I'd had when I shipped my first one: every step, in TypeScript, with the parts most tutorials skip.

An agent observes its state, reasons about the next step, acts via a tool, reflects on the result, and repeats — until the goal is met or a hard limit stops it.

What is an AI agent (practical definition)

A chatbot answers in one turn. An agent takes multiple turns, calling external tools (search, a database, an API, a calculator) to do something rather than just describe it. In code, an agent is exactly three things: an LLM, a set of tools with schemas, and a loop with stop conditions. LangChain didn't invent the pattern — it gives you a clean way to wire it without writing the loop from scratch.

If your "agent" only calls the LLM once, it's a chatbot. The loop is what makes it an agent.

Setting up your project — Node.js + TypeScript

Use Node 20+ and TypeScript so your tool schemas, model outputs, and runtime errors stay typed end to end. A minimal scaffold:

mkdir flight-agent && cd flight-agent
pnpm init
pnpm add langchain @langchain/openai @langchain/core zod dotenv
pnpm add -D typescript tsx @types/node
npx tsc --init --target es2022 --module nodenext --strict

Add an .env with your key, load it with dotenv, and never commit it. If you're new to typed config in this stack, my TypeScript patterns guide covers the discriminated-union style I use for agent state.

Connecting to OpenAI API

LangChain's ChatOpenAI wrapper handles auth, streaming, and tool-calling conventions. Pin the model explicitly — never use "latest" in production — and pick by the trade-off you care about:

  • gpt-4o — best reasoning + tool-calling reliability. The default for agents.
  • gpt-4o-mini — ~15× cheaper, fast, good enough for routing and simple tools.
  • gpt-4-turbo — older but stable; still common in regulated stacks.
// src/llm.ts
import "dotenv/config";
import { ChatOpenAI } from "@langchain/openai";

export const llm = new ChatOpenAI({
  model: "gpt-4o",        // pin the version, not "latest"
  temperature: 0,         // deterministic-ish for tool routing
  maxTokens: 1024,
  apiKey: process.env.OPENAI_API_KEY,
});

Defining tools your agent can call

A tool is a function with three things the model can read: a name, a description, and a parameter schema. The model picks tools entirely from those — vague descriptions are the single most common reason agents flail. Use zod so the same schema validates the model's arguments at runtime and types your tool body at compile time.

// src/tools.ts
import { tool } from "@langchain/core/tools";
import { z } from "zod";

export const searchFlights = tool(
  async ({ from, to, date }) => {
    const res = await fetch(`https://api.example.com/flights?from=${from}&to=${to}&date=${date}`);
    if (!res.ok) throw new Error(`flights api: ${res.status}`);
    const json = await res.json();
    return JSON.stringify(json.results.slice(0, 5));   // keep tool output small
  },
  {
    name: "search_flights",
    description: "Find available flights between two airports on a given date. Returns top 5 results as JSON.",
    schema: z.object({
      from: z.string().length(3).describe("IATA airport code, e.g. LHR"),
      to:   z.string().length(3).describe("IATA airport code, e.g. JFK"),
      date: z.string().describe("Departure date in ISO format YYYY-MM-DD"),
    }),
  }
);

export const tools = [searchFlights /*, bookFlight, sendEmail, ... */];
Treat tool descriptions like a public API. The model reads them more carefully than your teammates ever will.

Building the agent loop with LangChain

LangChain's AgentExecutor wraps the loop for you: it prompts the model with the tool schemas, parses tool calls, runs them, feeds results back, and stops when the model returns a final answer or hits a limit. Here's the whole agent in 20 lines:

User
Agent
OpenAI
Tool
1 · question 2 · prompt + tool specs 3 · "call search_flights(…)" 4 · run tool 5 · tool result 6 · final answer
One full agent turn — the model decides which tool to call, the executor runs it, the result goes back to the model, and the loop continues until a final answer is returned.
// src/agent.ts
import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";
import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts";
import { llm } from "./llm";
import { tools } from "./tools";

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "You are a flight-booking assistant. Use tools when you need real data. " +
             "If a tool fails twice, tell the user honestly instead of guessing."],
  new MessagesPlaceholder("chat_history"),
  ["human", "{input}"],
  new MessagesPlaceholder("agent_scratchpad"),
]);

const agent = await createOpenAIToolsAgent({ llm, tools, prompt });

export const executor = new AgentExecutor({
  agent,
  tools,
  maxIterations: 6,          // hard ceiling — never let it loop forever
  returnIntermediateSteps: true,
  verbose: process.env.NODE_ENV !== "production",
});

// usage
const result = await executor.invoke({
  input: "Find me morning flights from LHR to JFK on 2026-07-15",
  chat_history: [],
});
console.log(result.output);

Memory management strategies

Memory is just "what we put back into the prompt next turn." The art is fitting useful context inside the model's window without paying to resend everything. Three patterns cover almost every case:

Buffer
m₁m₂m₃m₄m₅m₆m₇m₈
Summary
∑ older context m₆m₇m₈
Entity
user.name = "Maha" trip.from = "LHR" trip.date = "2026-07-15"
Three memory strategies — a sliding buffer of recent turns, a rolling summary plus recent turns, or a structured record of entities the agent has learned.
  • Buffer window — keep the last N messages verbatim. Cheapest, leakiest. Good for short tasks.
  • Summary buffer — summarise everything older than the window into one paragraph. The default for long-running agents.
  • Entity / vector memory — extract facts (name, dates, preferences) and recall them on demand. Best when conversations resume across sessions.
import { BufferWindowMemory } from "langchain/memory";

const memory = new BufferWindowMemory({
  k: 10,                       // keep last 10 turns
  returnMessages: true,
  memoryKey: "chat_history",
  inputKey: "input",
});

// pass memory.loadMemoryVariables() into executor.invoke({...})
// then memory.saveContext(input, output) after each turn

Handling tool errors gracefully

Tools fail — APIs rate-limit, payloads change, networks flake. The right pattern is to let the agent see the error and recover, not crash the whole run. Throw structured errors from inside the tool; the executor will catch them and feed the message back to the model so it can try a different tool or ask the user.

// in your tool body
if (res.status === 429) {
  throw new Error("rate_limit: try again in a few seconds");
}
if (!res.ok) {
  throw new Error(`flights_api_error: status=${res.status}`);
}

// in the executor — bound the recovery
new AgentExecutor({
  agent, tools,
  maxIterations: 6,
  handleParsingErrors: "Reformat your tool call as valid JSON.",
});

Pair this with a hard maxIterations ceiling. The defining failure mode of agents is the runaway loop, and there is no prompt prompt-engineering trick that beats a hard stop.

Testing your agent end to end

Agents are non-deterministic, but they're not untestable. Build a small eval set of representative inputs and assert on properties of the output, not exact strings — "did it call search_flights with a valid IATA code?", "did it finish in under N steps?", "did the final answer cite a real result?".

// test/agent.spec.ts
import { describe, it, expect } from "vitest";
import { executor } from "../src/agent";

describe("flight agent", () => {
  it("calls search_flights for a valid request", async () => {
    const r = await executor.invoke({
      input: "Flights from LHR to JFK on 2026-07-15",
      chat_history: [],
    });
    const calls = r.intermediateSteps.map((s: any) => s.action.tool);
    expect(calls).toContain("search_flights");
    expect(calls.length).toBeLessThanOrEqual(3);
    expect(r.output.toLowerCase()).toMatch(/lhr|london/);
  });
});

Pin the model version in tests too. A silent model upgrade is the most common reason a green CI suddenly turns red.

Deploying on Vercel / Railway

For request-response agents (chat-style), a Next.js route handler on Vercel works well — but watch out for the function timeout. Long agent runs need streaming or a queue.

// app/api/agent/route.ts (Next.js App Router)
import { executor } from "@/src/agent";

export const runtime = "nodejs";
export const maxDuration = 60;   // Vercel: extend the timeout

export async function POST(req: Request) {
  const { input, history = [] } = await req.json();
  const result = await executor.invoke({ input, chat_history: history });
  return Response.json({ output: result.output, steps: result.intermediateSteps });
}

For long-running or background work (multi-step research, scheduled agents), deploy a Node worker on Railway or Fly and put work behind a queue (BullMQ, QStash). Vercel's serverless model isn't built for 5-minute reasoning loops.

Real production considerations

The gap between a working demo and a production agent is everything below. None of it is glamorous; all of it is necessary.

  • Cost ceilings. Track token spend per user and per session. Cap it. A bug that loops a $0.01 tool call can drain $400 overnight.
  • Observability. Log every prompt, tool call, and result as a trace. LangSmith, Helicone, or your own trace store — pick one before you need it.
  • Prompt injection. Anything the agent reads from the web or a user can hijack its instructions. Treat tool outputs as untrusted data, never as instructions.
  • Human-in-the-loop on the edges. Cheap, reversible actions: let the agent run. Irreversible or costly (sending email, charging cards, deleting data): require approval.
  • Streaming responses. Users tolerate slow agents far better when they can see thinking happen. Stream tokens and tool calls to the UI.
  • Eval gate in CI. Run your eval set on every PR. Don't let prompt changes ship without a green check.
The agents that work in production are not the most autonomous — they're the most contained.

Frequently asked questions

What's the difference between an AI agent and a chatbot?

A chatbot generates a single response per turn — text in, text out. An AI agent runs a multi-step loop: it can call tools (search, APIs, databases), observe results, and keep going until a goal is met. If your build only invokes the LLM once and returns its reply, you have a chatbot. If it can decide to take an action, run it, and incorporate the result, you have an agent.

Can I use GPT-4o vs GPT-4 Turbo for agents?

Yes — and you almost always want GPT-4o for the agent's reasoning, because its tool-calling reliability is materially better than GPT-4 Turbo's. Use gpt-4o-mini for cheap sub-tasks (classification, routing, simple summarisation). GPT-4 Turbo is still fine if you've certified it; just pin the version and don't mix tiers without measuring.

How do I add memory to an AI agent?

Start with BufferWindowMemory — it keeps the last N turns verbatim, which covers most short tasks. Move to ConversationSummaryBufferMemory when sessions get long enough that you're paying to resend context every turn; it summarises older turns into one rolling paragraph. For agents that resume across sessions, add a vector store or an entity memory so facts (names, IDs, preferences) survive process restarts.

Is LangChain production-ready in 2026?

Yes, with caveats. The core @langchain/core and @langchain/openai packages are stable, well-typed, and used in production at thousands of companies. The wider LangChain ecosystem moves fast — pin versions, lock your lockfile, and don't pull in integrations you don't need. For complex multi-agent orchestration, look at LangGraph; for simple single-agent loops, plain AgentExecutor is still the right tool.

Where to go next

An agent that can call tools is the foundation. The two upgrades that will matter most for what you build next:

Build the loop, describe the tools precisely, bound the iterations, trace everything, and put a human on the sharp edges — and "an LLM in a loop" stops being a meme and starts being software you can ship.

Related articles