OpenAI, a leading artificial intelligence company, recently released a valuable document titled "A practical guide to building agents." This 34-page guide provides product and engineering teams with the necessary knowledge and best practices for building their first agent system. Its content reflects OpenAI's deep insights gained from numerous real-world customer deployments.

By reading this guide, developers will understand core agent concepts and learn when and how to design, build, and safely deploy agents.

QQ_1744946969203.png

What is an Agent?

The guide clearly defines agents, highlighting their fundamental difference from traditional software in how they automate workflows. Traditional software simplifies and automates user-initiated workflows, while agents can autonomously complete entire workflows on behalf of the user. A workflow is defined as a sequence of steps needed to achieve a user goal, such as resolving a customer service issue, booking a restaurant, submitting a code change, or generating a report. However, simply integrating a Large Language Model (LLM) application, such as a simple chatbot or sentiment classifier, without leveraging the LLM to control workflow execution, cannot be considered an agent. True agents possess core characteristics that enable them to reliably and consistently act on behalf of the user. This includes using the LLM to manage workflow execution and decision-making, proactively correcting their behavior when necessary, and even aborting execution and returning control to the user in case of failure. Furthermore, agents can interact with various tools and external systems, dynamically selecting appropriate tools, and operating within clearly defined safety guardrails.

When Should You Build an Agent?

The guide points out that building agents requires rethinking how systems make decisions and handle complexity. Unlike traditional deterministic and rule-based approaches, agents are particularly well-suited for workflows where traditional methods struggle. The guide uses the example of payment fraud analysis for a vivid comparison: a traditional rule engine is like a checklist, flagging transactions based on preset criteria; an LLM agent is more like an experienced investigator, able to assess context, consider subtle patterns, and identify suspicious activity even without explicit rules. Therefore, when evaluating the value of an agent, prioritize workflows that have been difficult to automate in the past, especially in these three scenarios:

  • Complex Decisions: Workflows involving nuanced judgment, exceptions, or context-dependent decisions, such as refund approvals in customer service.
  • Difficult-to-Maintain Rules: Systems that have become difficult to maintain due to large and complex rule sets, making updates costly or error-prone, such as conducting vendor security reviews.
  • Heavy Reliance on Unstructured Data: Scenarios involving interpreting natural language, extracting meaning from documents, or interacting with users conversationally, such as processing home insurance claims.

The guide emphasizes that before deciding to build an agent, it's crucial to verify that the use case clearly meets these criteria; otherwise, a deterministic solution might suffice.

Agent Design Fundamentals

The guide details the three core components of building an agent:

  • Model (LLM): Drives the agent's reasoning and decision-making. The guide recommends using the most powerful model to establish a baseline during prototyping, then experimenting with smaller models to optimize cost and latency.
  • Tools: External functions or APIs that the agent can use to perform actions. Tools extend the agent's capabilities through the APIs of underlying applications or systems. For legacy systems without APIs, agents can rely on computer vision models to interact directly with web and application UIs. The guide broadly categorizes tools into three types: data retrieval (e.g., querying databases, reading PDF files, or searching the web), action execution (e.g., sending emails, updating CRM records), and orchestration (the agent itself can serve as a tool for other agents).
  • Instructions: Clear guidelines and safety guardrails that define the agent's behavior. High-quality instructions are crucial for agents, reducing ambiguity and improving decision quality. The guide provides best practices such as leveraging existing documentation, breaking down tasks into smaller steps, defining clear actions, and capturing edge cases.

The guide also briefly introduces the concept of orchestration, which involves combining fundamental components to efficiently execute workflows. Orchestration patterns are mainly divided into single-agent systems (a single agent equipped with tools and instructions executes the workflow in a loop) and multi-agent systems (workflow execution is distributed among multiple coordinated agents). Multi-agent systems can be further categorized into managerial modes (a central "manager" agent coordinates multiple specialized agents through tool calls) and decentralized modes (multiple agents operate as peers, handing off tasks to each other based on their areas of expertise).

Safety Guardrails

The guide particularly emphasizes the criticality of safety guardrails in managing data privacy and reputational risks. Developers should set guardrails for identified risks and add additional guardrails as new vulnerabilities are discovered. Safety guardrails should be combined with strong authentication and authorization protocols, strict access control, and standard software security measures to form a multi-layered defense mechanism. The guide lists several types of safety guardrails, including relevance classifiers (ensuring responses are within expectations), safety classifiers (detecting unsafe inputs), PII filters (preventing exposure of personally identifiable information), auditing (recording agent behavior), tool safety measures (assessing and controlling tool risks), rule-based protections (e.g., blacklists, input length limits), and output validation (ensuring responses align with brand values). The guide also describes how to set safety guardrails in the Agents SDK and highlights the importance of human intervention as a key safeguard, especially in early deployment phases, to identify failures and edge cases.

Summary and Resource Links

The guide concludes that agents mark a new era in workflow automation, capable of reasoning about ambiguity, executing actions across tools, and handling multi-step tasks with a high degree of autonomy. The key to building reliable agents lies in a strong foundation (model, tools, and instructions), appropriate orchestration patterns, and crucial safety guardrails. The guide encourages users to start small and gradually expand the agent's capabilities through validation with real users. Finally, the guide provides links to more resources, including the OpenAI API platform, OpenAI for Business, and developer documentation.

OpenAI's "A practical guide to building agents" provides comprehensive guidance and practical advice for teams looking to explore and build agent systems, signaling an accelerated move towards a more intelligent and automated future across various industries.

Document Resource Link: https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf