LangGraph 第16章：最佳实践与常见问题

状态设计最佳实践

State 是 LangGraph 的核心概念，状态设计的好坏直接影响项目的可维护性和执行效率。

保持状态扁平化

不好的做法：深层次嵌套的状态

class BadState(TypedDict):
    data: dict  # 内部结构不明确
    # 使用: state["data"]["user"]["profile"]["name"]

推荐的做法：使用扁平结构

class GoodState(TypedDict):
    user_name: str
    user_email: str
    user_profile: dict  # 仅在必要时使用字典

最小化状态

每个节点只关心它需要的状态字段，避免将中间结果长期保留在状态中：

class CleanState(TypedDict):
    messages: Annotated[list, add_messages]
    # 中间结果在完成任务后及时清理
    intermediate_result: str

def cleanup_node(state: CleanState):
    # 处理完成后清理中间状态
    result = process(state.intermediate_result)
    return {"intermediate_result": None, "final_result": result}

使用 Pydantic 替代 TypedDict

生产环境中推荐使用 Pydantic 定义状态，可以获得更好的类型校验和序列化支持：

from pydantic import BaseModel
from typing import Annotated

class ProductionState(BaseModel):
    messages: Annotated[list, add_messages]
    user_id: str
    metadata: dict = {}  # 默认值

    class Config:
        arbitrary_types_allowed = True

Pydantic 的优势：

运行时类型校验
支持默认值
更好的 JSON 序列化/反序列化
字段验证器

Reducer 的合理使用

add_messages 是最常用的 reducer，但不要滥用：

Reducer	用途	示例
`add_messages`	追加消息列表	对话历史
`operator.add`	合并列表	多个节点产生的结果列表
`自定义 reducer`	特殊合并逻辑	去重、最大/最小值聚合

节点设计最佳实践

单一职责原则

每个节点只做一件事：

# 不好的做法：一个节点做太多事
def bad_node(state):
    result1 = search(state.query)
    result2 = analyze(result1)
    result3 = format(result2)
    return {"output": result3}

# 好的做法：拆分多个节点
def search_node(state):
    return {"raw_results": search(state.query)}

def analyze_node(state):
    return {"analysis": analyze(state.raw_results)}

def format_node(state):
    return {"output": format(state.analysis)}

单一职责的好处：

每个节点可以独立测试
执行流程更加清晰
便于在中间插入断点或人工审核
热重载时影响范围更小

纯函数风格

节点应该是纯函数——相同的输入始终产生相同的输出：

# 不好的做法：节点有外部副作用
def bad_node(state):
    with open("log.txt", "a") as f:  # 副作用
        f.write("executed\n")
    return {"result": call_llm(state.input)}  # LLM 调用也是副作用

# 好的做法：将副作用放在专门的节点
def logging_node(state):
    # 日志节点
    save_log(state)
    return {}

def llm_node(state):
    # 纯粹的 LLM 调用
    return {"result": call_llm(state.input)}

节点返回值规范

节点返回值决定了状态如何更新，需要注意：

返回值是 增量更新：返回的键会更新状态中对应的键，不返回的键保持不变
返回 None 或空字典 {}：不更新状态
不要直接修改输入的 state 对象，始终返回新的值

def good_node(state):
    # 正确：返回新的值
    return {"counter": state["counter"] + 1}

def bad_node(state):
    # 错误：直接修改输入
    state["counter"] += 1
    return state  # 这是不安全的！

错误处理与重试

节点级别的错误处理

def robust_node(state):
    try:
        result = risky_operation(state.input)
        return {"output": result}
    except ValueError as e:
        # 可预期的错误，优雅处理
        return {"error": str(e), "output": None}
    except Exception as e:
        # 不可预期的错误，记录并重试或终止
        return {"error": f"未知错误: {e}", "output": None, "retry": True}

图级别的错误处理

使用条件边处理错误：

def error_router(state):
    if state.get("error") and state.get("retry"):
        return "retry_node"
    elif state.get("error"):
        return "error_handler"
    else:
        return "next_step"

# 添加错误处理节点
graph.add_node("retry_node", retry_operation)
graph.add_node("error_handler", graceful_handle)
graph.add_conditional_edges("robust_node", error_router)

防止无限循环

无限循环是多 Agent 系统中最常见的问题之一。以下是最有效的防护措施：

class SafeState(TypedDict):
    messages: Annotated[list, add_messages]
    retry_count: int  # 重试计数器
    max_retries: int  # 最大重试次数

def safe_node(state: SafeState):
    if state["retry_count"] >= state["max_retries"]:
        return {"output": "达到最大重试次数，终止流程", "messages": [("system", "流程终止")]}

    try:
        result = call_llm(state.messages)
        return {"output": result, "retry_count": 0}
    except Exception:
        return {"retry_count": state["retry_count"] + 1}

常见问题解答

Q1：节点返回值的格式应该是什么？

节点返回值必须是字典，键为状态字段名，值为更新后的值。LangGraph 内部使用 `` 操作符将返回值合并到状态中。

# 正确
return {"field_name": new_value}

# 错误
return new_value  # 不是字典
return [new_value]  # 不是字典

Q2：如何在节点之间传递临时数据？

临时数据是指不需要持久化到最终状态的中间数据。有两种方式：

方式一：使用状态字段（需要时清理）

class State(TypedDict):
    temp_cache: dict  # 临时缓存

def node_a(state):
    return {"temp_cache": {"intermediate": "value"}}

def node_b(state):
    cache = state["temp_cache"]
    result = use_cache(cache)
    return {"output": result, "temp_cache": None}  # 使用后清理

方式二：使用 messages 系统消息

# 将临时数据作为 system 消息传递
return {"messages": [("system", json.dumps(temp_data))]}

Q3：StateGraph 和 MessageGraph 有什么区别？

特性	StateGraph	MessageGraph
状态类型	自定义 TypedDict/Pydantic	仅消息列表
灵活性	高，可定义任意字段	低，只有消息
适用场景	复杂工作流、多字段状态	简单的对话场景
扩展性	容易扩展	不易扩展

建议：除非你的场景极其简单（纯对话），否则始终使用 StateGraph。MessageGraph 本质上是 StateGraph 的特例，使用 StateGraph 不会有额外开销。

Q4：如何调试 LangGraph 程序？

使用 LangGraph Studio：图形化调试，逐节点检查状态
打印日志：在节点中添加 print 语句查看执行过程
检查状态历史：使用 get_state_history 查看所有历史状态
配置详细日志：设置环境变量 LANGGRAPH_DEBUG=true

# 调试模式
import os
os.environ["LANGGRAPH_DEBUG"] = "true"

# 查看状态历史
history = app.get_state_history(config={"configurable": {"thread_id": "123"}})
for state in history:
    print(state.values)

Q5：图执行顺序是怎样的？

图的执行遵循拓扑排序：当一个节点的所有上游节点都执行完成后，该节点才会被执行。并行节点（Fan-out）会同时执行，条件边根据路由函数的结果选择路径。

Q6：Checkpointer 是必须的吗？

不是。Checkpointer 在以下场景必须：

使用 interrupt 功能（Human-in-the-Loop）
使用时间旅行调试
需要持久化对话历史
使用 get_state_history 查看历史

如果你的图是一次性执行的简单工作流，可以不使用 Checkpointer。

Q7：如何提高执行性能？

使用并行节点：不依赖彼此的节点尽量并行执行
减少不必要的状态复制：使用 Annotated 和适当的 reducer
异步节点：I/O 密集型操作使用异步函数
缓存 LLM 调用：对于相同的输入，缓存 LLM 的响应

# 异步节点示例
async def async_node(state):
    result = await async_llm_call(state.input)
    return {"output": result}

总结

本最佳实践涵盖了状态设计、节点设计、错误处理、以及常见问题的解决方案。遵循这些原则将帮助你构建更加健壮、可维护的 LangGraph 应用。在下一章中，我们将对整个教程进行总结，并提供推荐的学习路径和进阶资源。