Building LangChain Agents in Python: A Deep Dive for Senior Engineers

Building LangChain Agents in Python: A Deep Dive for Senior Engineers

I have everything I need. Let me write the full deep-dive post now! --- ``````

Building LangChain Agents in Python: A Deep Dive for Senior Engineers

If you've spent years building deterministic systems, adding an LLM to your stack feels like hiring a brilliant but unpredictable contractor. They can do things you never could before, but you can't just call a function and trust the output. Agents are the engineering discipline that tames that unpredictability, giving LLMs a structured environment in which to reason, act, and iterate toward a goal.

LangChain has become the de facto framework for building those environments in Python. But "LangChain agents" is not a single thing. It's a spectrum of three distinct paradigms, each with different tradeoffs, different failure modes, and different ceilings. This post walks through all three: ReAct agents, Tool-Calling agents, and LangGraph agents. By the end, you'll have the architectural intuition to choose the right one for your use case and the code to back it up.

All examples use the latest LangChain/LangGraph packages (as of 2025) with OpenAI as the LLM backend.


Prerequisites and Setup

Before diving in, make sure your environment is ready:

pip install langchain langchain-openai langgraph langchain-community

Set your OpenAI API key:

import os
os.environ["OPENAI_API_KEY"] = "your-api-key-here"

Throughout this post we'll use gpt-4o as our model. It has strong reasoning capabilities and native tool-calling support, which matters a lot for the later sections.


A Quick Grounding: What Is a LangChain Agent?

At its core, an agent is a loop. The LLM receives a prompt, decides whether to take an action (call a tool) or produce a final answer, and if it takes an action, the result is fed back into the next iteration. This loop continues until the agent decides it has enough information to respond.

The key primitives you need to understand before writing any agent code are:

  • Tools: Python functions exposed to the LLM. The LLM can choose to invoke them and receives their output as context.
  • The Agent Executor: The runtime that manages the loop, routes tool calls, and handles errors.
  • Memory/State: How the agent retains information across turns or steps.
  • Runnables: LangChain's LCEL (LangChain Expression Language) abstraction. Almost everything in modern LangChain is a Runnable, meaning it has a consistent .invoke(), .stream(), and .batch() interface.

With that foundation in place, let's look at each agent paradigm.


1. ReAct Agents: Reasoning and Acting

The Mental Model

ReAct (Reasoning + Acting) is the oldest and most transparent of the three paradigms. It was introduced in a 2022 paper by Yao et al. and the core idea is elegant: ask the LLM to explicitly alternate between a Thought (reasoning step) and an Action (tool call), then observe the result before continuing.

A typical ReAct trace looks like this:

Thought: I need to find the current price of AAPL stock.
Action: search_tool
Action Input: "AAPL stock price today"
Observation: AAPL is currently trading at $213.45.
Thought: I now have the price. I can answer the question.
Final Answer: AAPL is currently trading at $213.45.

The reasoning is baked into the prompt itself. The LLM is instructed to produce this structured output, and the agent executor parses it to decide what to do next. This is entirely prompt-driven, which is both its strength and its weakness.

How It Works Internally

ReAct agents use a prompt template that instructs the model to follow the Thought/Action/Observation pattern. The agent executor runs a loop: it calls the LLM, parses the output to extract the action and action input, invokes the tool, appends the observation to the conversation, and calls the LLM again. This continues until the model outputs a "Final Answer".

Because the reasoning is explicit text in the prompt, you can read exactly what the model was thinking at each step. This makes ReAct agents highly debuggable, which is a significant advantage during development.

Code Example

from langchain import hub
from langchain.agents import AgentExecutor, create_react_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

# Define tools
@tool
def get_word_length(word: str) -> int:
    """Returns the number of characters in a word."""
    return len(word)

@tool
def multiply(a: int, b: int) -> int:
    """Multiplies two integers together."""
    return a * b

tools = [get_word_length, multiply]

# Pull the standard ReAct prompt from LangChain Hub
prompt = hub.pull("hwchase17/react")

# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Create the agent and executor
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,       # prints the Thought/Action/Observation trace
    handle_parsing_errors=True
)

# Run the agent
result = agent_executor.invoke({
    "input": "How many characters are in the word 'engineer', and what is that number multiplied by 7?"
})

print(result["output"])

Setting verbose=True is invaluable during development. You'll see the full reasoning chain printed to stdout, which makes it immediately obvious when the model is going off the rails.

Tradeoffs and When to Use ReAct

Strengths:

  • Highly transparent. The reasoning chain is human-readable.
  • Works with any LLM, including models that don't support native function calling.
  • Easy to debug and iterate on, since you can inspect every step.

Weaknesses:

  • Fragile. The agent depends on the LLM correctly formatting its output as Thought/Action/Observation. Weaker models frequently break the format.
  • Slower. Every reasoning step is a separate LLM call, and the prompt grows with each observation.
  • Limited parallelism. Actions are strictly sequential.

Use ReAct when: you're prototyping, you need maximum transparency, or you're working with a model that doesn't support function calling.


2. Tool-Calling Agents: Leveraging Model-Native Function Calling

The Mental Model

Tool-calling agents replace the prompt-engineering trick of ReAct with a first-class API feature. Models like GPT-4o have native support for structured tool/function calling: you pass a list of tool schemas (JSON Schema format) alongside your prompt, and the model returns a structured object specifying which tool to call and with what arguments. No parsing of free-form text required.

This is a fundamentally more reliable architecture. The model isn't trying to format its output as a string you'll parse. It's returning a structured object from a dedicated output channel. The failure modes are different and generally more predictable.

How It Works Internally

When you call llm.bind_tools(tools) in LangChain, it serializes your tool definitions into the OpenAI function-calling schema and attaches them to every request. When the model decides to use a tool, it returns an AIMessage with a tool_calls attribute instead of text content. The agent executor detects this, invokes the appropriate tool, wraps the result in a ToolMessage, and appends it to the conversation history before the next LLM call.

The reasoning is implicit rather than explicit. The model doesn't write out its thoughts; it just decides which tool to call. This is faster and more reliable, but you lose the step-by-step reasoning trace.

Code Example

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Define tools (same as before)
@tool
def get_word_length(word: str) -> int:
    """Returns the number of characters in a word."""
    return len(word)

@tool
def multiply(a: int, b: int) -> int:
    """Multiplies two integers together."""
    return a * b

tools = [get_word_length, multiply]

# Tool-calling agents need a prompt with a MessagesPlaceholder for agent_scratchpad
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

llm = ChatOpenAI(model="gpt-4o", temperature=0)

# create_tool_calling_agent uses the model's native function calling
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True
)

result = agent_executor.invoke({
    "input": "How many characters are in the word 'engineer', and what is that number multiplied by 7?"
})

print(result["output"])

Notice the key structural difference: instead of pulling a ReAct prompt from the hub, we define a ChatPromptTemplate with a MessagesPlaceholder for the agent scratchpad. This is where the tool call messages and tool results are injected into the conversation history on each loop iteration.

Parallel Tool Calls

One significant advantage of tool-calling agents is that GPT-4o can decide to call multiple tools in a single step when the calls are independent. For example, if you ask "What is the length of 'hello' and the length of 'world'?", the model may return two tool calls in one response rather than two sequential calls. LangChain handles this automatically.

Tradeoffs and When to Use Tool-Calling Agents

Strengths:

  • More reliable than ReAct. No free-form output parsing.
  • Supports parallel tool calls out of the box.
  • Cleaner conversation history. Tool calls and results are structured messages.
  • Generally faster than ReAct for the same task.

Weaknesses:

  • Requires a model with native function calling support (GPT-4o, Claude 3, Gemini, etc.).
  • Less transparent than ReAct. You don't see the model's reasoning, only its decisions.
  • Still uses AgentExecutor, which has limited support for complex control flow.

Use tool-calling agents when: you're building production features with a capable model, you need reliability over transparency, and your workflow is relatively linear (one agent, one goal, a set of tools).


3. LangGraph Agents: Stateful Graph-Based Orchestration

The Mental Model

LangGraph is a separate library built on top of LangChain that treats your agent as a directed graph. Nodes are functions (LLM calls, tool calls, custom logic). Edges define the flow between nodes, and edges can be conditional, meaning the next node to execute depends on the current state. The entire execution is driven by a shared state object that every node can read from and write to.

This is a fundamentally different abstraction from AgentExecutor. Instead of a generic loop that runs until the model says "Final Answer", you define the exact control flow of your agent. You decide what happens after a tool call. You decide when to loop back. You decide when to stop. The LLM is a participant in the graph, not the controller of it.

This is why LangGraph is the right choice for complex, production-grade agents. It gives you the control flow guarantees that senior engineers expect from any other system they build.

Core Concepts

  • State: A TypedDict (or Pydantic model) that represents the full state of the agent at any point in time. Every node receives the current state and returns a partial update.
  • Nodes: Python functions or runnables. Each node takes the state as input and returns a dict of state updates.
  • Edges: Connections between nodes. Can be unconditional (add_edge) or conditional (add_conditional_edges).
  • StateGraph: The graph object you build by adding nodes and edges, then compile into a runnable.
  • Checkpointing: LangGraph supports built-in persistence, so you can pause, resume, and inspect agent runs at any node boundary.

Code Example: A ReAct-Style Agent Built with LangGraph

LangGraph ships with a create_react_agent helper that builds a production-ready graph under the hood. Let's look at both the high-level API and then the lower-level graph construction so you understand what's happening.

High-level (recommended for most use cases):

from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def get_word_length(word: str) -> int:
    """Returns the number of characters in a word."""
    return len(word)

@tool
def multiply(a: int, b: int) -> int:
    """Multiplies two integers together."""
    return a * b

tools = [get_word_length, multiply]
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# This returns a compiled StateGraph, not an AgentExecutor
graph = create_react_agent(llm, tools)

result = graph.invoke({
    "messages": [("human", "How many characters are in 'engineer', multiplied by 7?")]
})

# The result is the full state, including all messages
for message in result["messages"]:
    print(f"{message.type}: {message.content}")

Low-level (for custom control flow):

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

# 1. Define the state schema
class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

# 2. Define tools
@tool
def get_word_length(word: str) -> int:
    """Returns the number of characters in a word."""
    return len(word)

@tool
def multiply(a: int, b: int) -> int:
    """Multiplies two integers together."""
    return a * b

tools = [get_word_length, multiply]
tool_node = ToolNode(tools)

# 3. Bind tools to the LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)
llm_with_tools = llm.bind_tools(tools)

# 4. Define nodes
def call_model(state: AgentState):
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

# 5. Define the routing logic
def should_continue(state: AgentState) -> str:
    last_message = state["messages"][-1]
    if last_message.tool_calls:
        return "tools"
    return END

# 6. Build the graph
graph_builder = StateGraph(AgentState)
graph_builder.add_node("agent", call_model)
graph_builder.add_node("tools", tool_node)
graph_builder.set_entry_point("agent")
graph_builder.add_conditional_edges("agent", should_continue)
graph_builder.add_edge("tools", "agent")  # always loop back after tool use

graph = graph_builder.compile()

# 7. Run
result = graph.invoke({
    "messages": [("human", "How many characters are in 'engineer', multiplied by 7?")]
})

for message in result["messages"]:
    print(f"{message.type}: {message.content}")

This low-level example makes the control flow explicit. After the model runs, we check if it wants to call tools. If yes, we route to the tools node. After the tools run, we always route back to the agent node. This loop continues until the model produces a message with no tool calls, at which point we route to END.

Adding Persistence with Checkpointing

One of LangGraph's most powerful production features is built-in checkpointing. You can persist the full agent state between invocations, enabling multi-turn conversations and the ability to resume interrupted runs:

from langgraph.checkpoint.memory import MemorySaver

checkpointer = MemorySaver()
graph = graph_builder.compile(checkpointer=checkpointer)

# Each conversation thread gets a unique config
config = {"configurable": {"thread_id": "user-123-session-1"}}

# First turn
graph.invoke(
    {"messages": [("human", "My name is Alex.")]},
    config=config
)

# Second turn: the agent remembers the previous messages
result = graph.invoke(
    {"messages": [("human", "What is my name?")]},
    config=config
)
print(result["messages"][-1].content)
# Output: "Your name is Alex."

For production, swap MemorySaver for a persistent backend like PostgresSaver or RedisSaver from the langgraph-checkpoint-postgres or langgraph-checkpoint-redis packages.

Tradeoffs and When to Use LangGraph

Strengths:

  • Full control over agent control flow. No magic loops.
  • Built-in state management and persistence.
  • Supports human-in-the-loop: you can interrupt the graph at any node and wait for user input before continuing.
  • Supports multi-agent architectures natively (subgraphs, supervisor patterns).
  • Streaming at the node level: you can stream partial outputs from any node.
  • First-class support in LangSmith for tracing and observability.

Weaknesses:

  • More boilerplate than AgentExecutor for simple use cases.
  • Steeper learning curve. You need to think in graphs.
  • Overkill for simple, single-step agents.

Use LangGraph when: you're building for production, you need multi-turn memory, your agent has complex branching logic, you're building multi-agent systems, or you need human-in-the-loop capabilities.


Choosing the Right Paradigm: A Decision Framework

Here's a practical framework for choosing between the three approaches:

Criteria ReAct Tool-Calling LangGraph
Stage Prototyping Early production Production
Control flow complexity Low Low to medium Any
Transparency needed High Medium High (via graph structure)
Multi-turn memory Manual Manual Built-in
Model requirement Any LLM Function-calling models Function-calling models
Parallel tool calls No Yes Yes
Human-in-the-loop No No Yes
Multi-agent support No No Yes

The migration path is typically: ReAct for exploration, Tool-Calling for a first production pass, LangGraph for anything that needs to scale or handle complexity.


Production Considerations

Regardless of which paradigm you choose, there are several engineering concerns that apply to all agents in production.

Observability

Agents are non-deterministic. You cannot reason about their behavior from code alone. You need traces. LangSmith (LangChain's observability platform) integrates with all three paradigms with minimal setup:

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"

That's it. Every agent run will be traced and available in the LangSmith UI, including every LLM call, tool invocation, input, output, latency, and token count.

Error Handling and Retries

Tools fail. APIs return errors. LLMs hallucinate tool arguments. Build your tools defensively and handle errors explicitly. In LangGraph, you can add dedicated error-handling nodes. In AgentExecutor, use handle_parsing_errors=True and consider wrapping tools with retry logic using tenacity.

Latency and Cost

Each agent step is an LLM call. A five-step agent run with GPT-4o can cost 10x more than a single call and take 10x longer. Design your tools to be coarse-grained (one tool that does more is better than five tools that each do a little). Use streaming (graph.stream() or agent_executor.stream()) to improve perceived latency for end users.

Limiting Runaway Loops

Always set a maximum iteration limit to prevent infinite loops:

# AgentExecutor
AgentExecutor(agent=agent, tools=tools, max_iterations=10)

# LangGraph: set recursion limit on the config
graph.invoke(input, config={"recursion_limit": 25})

Structured Outputs

For production agents where you need a guaranteed output schema, use llm.with_structured_output(YourPydanticModel) on the final response node. This eliminates free-form text parsing from your application code entirely.


Conclusion

LangChain's three agent paradigms represent a maturity curve, not competing alternatives. ReAct gives you transparency and a low barrier to entry, making it ideal for exploration and debugging. Tool-calling agents give you reliability and parallel execution, making them a solid choice for straightforward production features. LangGraph gives you the full power of explicit state management, persistence, and complex control flow, making it the right foundation for anything you're serious about shipping and maintaining.

The most important shift for engineers coming from traditional software is this: you are no longer writing the logic. You are writing the environment in which the logic runs. Your job is to define the state, the tools, the control flow, and the guardrails. The LLM fills in the reasoning. Getting that boundary right is the core skill of agent engineering.

If you're starting today, build your first agent with the tool-calling approach to understand the fundamentals, then migrate to LangGraph as soon as your requirements grow beyond a simple loop. You'll thank yourself later.

Have questions, or want to share what you're building? Drop a comment below or reach out on GitHub. The agent ecosystem is moving fast, and the best insights come from engineers doing it in production.

--- That's the full draft! Here's a summary of what's covered: - **ReAct agents** with a full code example and honest tradeoff analysis - **Tool-calling agents** with parallel tool call explanation - **LangGraph** with both the high-level and low-level API, plus a persistence/checkpointing example - A **comparison table** for quick decision-making - A **production section** covering observability, error handling, latency, cost, and structured outputs Would you like me to publish this, or would you like any changes first?