GPT-4.1 提示指南:如何充分利用新模型的能力?


Highlights

GPT-4.1 不只是性能提升,它在遵循指令的字面性上变化很大。这意味着,以往的 Prompt 经验可能不再适用,需要新的方法才能充分发挥其能力。这篇文章就是为了帮你理解这种转变,避免“老司机翻车”。

这篇指南直接给出了 OpenAI 内部测试验证过的具体技巧。比如,为 AI Agent 的系统 Prompt 加上三个关键提醒,就能把模型在 SWE-bench 编码任务上的解决率提升近 20%。工具调用直接用 API 的 tools 字段,也比手动注入描述的通过率高 2%

读完后,你会掌握一套针对 GPT-4.1 的实用 Prompt 方法:如何构建高效的 AI Agent,怎么利用好 1M token 长上下文(比如指令放头尾效果更佳),以及一个官方推荐的文件 diff 格式。这些都是能直接提升开发效率的经验。

Apr 14, 2025 2025 年 4 月 14 日

Open in Github 在 Github 中打开

The GPT-4.1 family of models represents a significant step forward from GPT-4o in capabilities across coding, instruction following, and long context. In this prompting guide, we collate a series of important prompting tips derived from extensive internal testing to help developers fully leverage the improved abilities of this new model family.

GPT-4.1 系列模型代表了在编码、指令遵循和长上下文方面,相比 GPT-4o 的能力有了显著提升。在本提示指南中,我们整理了一系列重要的提示技巧,这些技巧来源于大量的内部测试,旨在帮助开发者充分利用这个新模型系列所改进的能力。

Many typical best practices still apply to GPT-4.1, such as providing context examples, making instructions as specific and clear as possible, and inducing planning via prompting to maximize model intelligence. However, we expect that getting the most out of this model will require some prompt migration. GPT-4.1 is trained to follow instructions more closely and more literally than its predecessors, which tended to more liberally infer intent from user and system prompts. This also means, however, that GPT-4.1 is highly steerable and responsive to well-specified prompts – if model behavior is different from what you expect, a single sentence firmly and unequivocally clarifying your desired behavior is almost always sufficient to steer the model on course.

许多典型的最佳实践仍然适用于 GPT-4.1,例如提供上下文示例,使指令尽可能具体和清晰,以及通过提示诱导规划以最大限度地提高模型智能。但是,我们预计充分利用此模型需要进行一些提示迁移。GPT-4.1 经过训练,能够比其前代产品更密切、更字面地遵循指令,而其前代产品往往更自由地从用户和系统提示中推断意图。然而,这也意味着 GPT-4.1 具有高度的可操纵性,并且对明确指定的提示具有响应性——如果模型行为与您的预期不同,只需用一句话坚定而明确地澄清您期望的行为,几乎总是足以引导模型回到正轨。

Please read on for prompt examples you can use as a reference, and remember that while this guidance is widely applicable, no advice is one-size-fits-all. AI engineering is inherently an empirical discipline, and large language models inherently nondeterministic; in addition to following this guide, we advise building informative evals and iterating often to ensure your prompt engineering changes are yielding benefits for your use case.

请继续阅读,了解您可以作为参考的提示示例,并记住,虽然此指南具有广泛的适用性,但没有一种建议是万能的。人工智能工程本质上是一门经验学科,而大型语言模型本质上是不确定的;除了遵循本指南外,我们建议构建信息丰富的评估并经常迭代,以确保您的提示工程变更为您的用例带来好处。

1. Agentic Workflows 1. 主动工作流

GPT-4.1 is a great place to build agentic workflows. In model training we emphasized providing a diverse range of agentic problem-solving trajectories, and our agentic harness for the model achieves state-of-the-art performance for non-reasoning models on SWE-bench Verified, solving 55% of problems.

GPT-4.1 是构建主动工作流的绝佳平台。在模型训练中,我们强调提供各种主动的问题解决轨迹,并且我们的模型主动工具在 SWE-bench Verified 上实现了非推理模型的最佳性能,解决了 55% 的问题。

System Prompt Reminders 系统提示语提醒

In order to fully utilize the agentic capabilities of GPT-4.1, we recommend including three key types of reminders in all agent prompts. The following prompts are optimized specifically for the agentic coding workflow, but can be easily modified for general agentic use cases.

为了充分利用 GPT-4.1 的 Agent 能力,我们建议在所有 Agent 提示语中包含三种关键类型的提醒。以下提示语专门针对 Agent 编码工作流程进行了优化,但可以轻松修改以用于通用 Agent 用例。

  1. Persistence: this ensures the model understands it is entering a multi-message turn, and prevents it from prematurely yielding control back to the user. Our example is the following:

    持久性:这确保模型理解它正在进入多消息轮次,并防止它过早地将控制权交还给用户。我们的例子如下:

    You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved.
  2. Tool-calling: this encourages the model to make full use of its tools, and reduces its likelihood of hallucinating or guessing an answer. Our example is the following:

    工具调用:这鼓励模型充分利用其工具,并降低其产生幻觉或猜测答案的可能性。我们的例子如下:

    If you are not sure about file content or codebase structure pertaining to the user’s request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.
  3. Planning [optional]: if desired, this ensures the model explicitly plans and reflects upon each tool call in text, instead of completing the task by chaining together a series of only tool calls. Our example is the following:

    计划 [可选]:如果需要,这可确保模型明确地计划和反思文本中的每个工具调用,而不是仅通过链接一系列工具调用来完成任务。我们的例子如下:

    You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.

GPT-4.1 is trained to respond very closely to both user instructions and system prompts in the agentic setting. The model adhered closely to these three simple instructions and increased our internal SWE-bench Verified score by close to 20% – so we highly encourage starting any agent prompt with clear reminders covering the three categories listed above. As a whole, we find that these three instructions transform the model from a chatbot-like state into a much more “eager” agent, driving the interaction forward autonomously and independently.

GPT-4.1 经过训练,能够在代理设置中非常密切地响应用户指令和系统提示。该模型严格遵守这三个简单的指令,并使我们的内部 SWE-bench Verified 分数提高了近 20%——因此我们强烈建议以涵盖上述三个类别的明确提醒来开始任何代理提示。总的来说,我们发现这三个指令将模型从类似聊天机器人的状态转变为更加“渴望”的代理,从而自主和独立地推动交互。

Tool Calls 工具调用

Compared to previous models, GPT-4.1 has undergone more training on effectively utilizing tools passed as arguments in an OpenAI API request. We encourage developers to exclusively use the tools field to pass tools, rather than manually injecting tool descriptions into your prompt and writing a separate parser for tool calls, as some have reported doing in the past. This is the best way to minimize errors and ensure the model remains in distribution during tool-calling trajectories – in our own experiments, we observed a 2% increase in SWE-bench Verified pass rate when using API-parsed tool descriptions versus manually injecting the schemas into the system prompt.

与之前的模型相比,GPT-4.1 在有效利用作为 OpenAI API 请求中的参数传递的工具方面接受了更多的训练。我们鼓励开发者专门使用 tools 字段来传递工具,而不是像过去某些人报告的那样,手动将工具描述注入到提示中并编写单独的工具调用解析器。这是最大限度地减少错误并确保模型在工具调用轨迹期间保持分布的最佳方式——在我们自己的实验中,我们观察到,使用 API 解析的工具描述与手动将模式注入到系统提示中相比,SWE-bench Verified 通过率提高了 2%。

Developers should name tools clearly to indicate their purpose and add a clear, detailed description in the “description” field of the tool. Similarly, for each tool param, lean on good naming and descriptions to ensure appropriate usage. If your tool is particularly complicated and you’d like to provide examples of tool usage, we recommend that you create an # Examples section in your system prompt and place the examples there, rather than adding them into the “description’ field, which should remain thorough but relatively concise. Providing examples can be helpful to indicate when to use tools, whether to include user text alongside tool calls, and what parameters are appropriate for different inputs. Remember that you can use “Generate Anything” in the Prompt Playground to get a good starting point for your new tool definitions.

开发者应该清晰地命名工具以表明其用途,并在工具的“description”字段中添加清晰、详细的描述。同样,对于每个工具参数,依靠良好的命名和描述来确保适当的使用。如果您的工具特别复杂,并且您想提供工具使用示例,我们建议您在系统提示中创建一个 # Examples 部分,并将示例放在那里,而不是将它们添加到“description”字段中,该字段应保持彻底但相对简洁。提供示例有助于指示何时使用工具,是否在工具调用中包含用户文本,以及哪些参数适合不同的输入。请记住,您可以使用 Prompt Playground 中的“Generate Anything”来为您的新工具定义获得一个良好的起点。

Prompting-Induced Planning & Chain-of-Thought提示诱导的规划和思维链

As mentioned already, developers can optionally prompt agents built with GPT-4.1 to plan and reflect between tool calls, instead of silently calling tools in an unbroken sequence. GPT-4.1 is not a reasoning model – meaning that it does not produce an internal chain of thought before answering – but in the prompt, a developer can induce the model to produce an explicit, step-by-step plan by using any variant of the Planning prompt component shown above. This can be thought of as the model “thinking out loud.” In our experimentation with the SWE-bench Verified agentic task, inducing explicit planning increased the pass rate by 4%.

正如已经提到的,开发者可以选择提示使用 GPT-4.1 构建的代理在工具调用之间进行规划和反思,而不是默默地以不间断的顺序调用工具。GPT-4.1 不是一个推理模型——意味着它在回答之前不会产生内部的思维链——但在提示中,开发者可以通过使用上面显示的任何规划提示组件的变体来诱导模型产生一个明确的、逐步的计划。这可以被认为是模型“大声思考”。在我们对 SWE-bench Verified 自主任务的实验中,诱导明确的规划将通过率提高了 4%。

Sample Prompt: SWE-bench Verified示例 Prompt:SWE-bench Verified

Below, we share the agentic prompt that we used to achieve our highest score on SWE-bench Verified, which features detailed instructions about workflow and problem-solving strategy. This general pattern can be used for any agentic task.

在下面,我们分享了用于在 SWE-bench Verified 上获得最高分数的自主提示,其中包含关于工作流程和问题解决策略的详细说明。这种通用模式可用于任何自主任务。

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get(
        "OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"
    )
)

SYS_PROMPT_SWEBENCH = """
You will be tasked to fix an issue from an open-source repository.
Your thinking should be thorough and so it's fine if it's very long. You can think step by step before and after each action you decide to take.
You MUST iterate and keep going until the problem is solved.
You already have everything you need to solve this problem in the /testbed folder, even without internet connection. I want you to fully solve this autonomously before coming back to me.
Only terminate your turn when you are sure that the problem is solved. Go through the problem step by step, and make sure to verify that your changes are correct. NEVER end your turn without having solved the problem, and when you say you are going to make a tool call, make sure you ACTUALLY make the tool call, instead of ending your turn.
THE PROBLEM CAN DEFINITELY BE SOLVED WITHOUT THE INTERNET.
Take your time and think through every step - remember to check your solution rigorously and watch out for boundary cases, especially with the changes you made. Your solution must be perfect. If not, continue working on it. At the end, you must test your code rigorously using the tools provided, and do it many times, to catch all edge cases. If it is not robust, iterate more and make it perfect. Failing to test your code sufficiently rigorously is the NUMBER ONE failure mode on these types of tasks; make sure you handle all edge cases, and run existing tests if they are provided.
You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.

# Workflow
## High-Level Problem Solving Strategy
1. Understand the problem deeply. Carefully read the issue and think critically about what is required.
2. Investigate the codebase. Explore relevant files, search for key functions, and gather context.
3. Develop a clear, step-by-step plan. Break down the fix into manageable, incremental steps.
4. Implement the fix incrementally. Make small, testable code changes.
5. Debug as needed. Use debugging techniques to isolate and resolve issues.
6. Test frequently. Run tests after each change to verify correctness.
7. Iterate until the root cause is fixed and all tests pass.
8. Reflect and validate comprehensively. After tests pass, think about the original intent, write additional tests to ensure correctness, and remember there are hidden tests that must also pass before the solution is truly complete.
Refer to the detailed sections below for more information on each step.

## 1. Deeply Understand the Problem
Carefully read the issue and think hard about a plan to solve it before coding.

## 2. Codebase Investigation
- Explore relevant files and directories.
- Search for key functions, classes, or variables related to the issue.
- Read and understand relevant code snippets.
- Identify the root cause of the problem.
- Validate and update your understanding continuously as you gather more context.

## 3. Develop a Detailed Plan
- Outline a specific, simple, and verifiable sequence of steps to fix the problem.
- Break down the fix into small, incremental changes.

## 4. Making Code Changes
- Before editing, always read the relevant file contents or section to ensure complete context.
- If a patch is not applied correctly, attempt to reapply it.
- Make small, testable, incremental changes that logically follow from your investigation and plan.

## 5. Debugging
- Make code changes only if you have high confidence they can solve the problem
- When debugging, try to determine the root cause rather than addressing symptoms
- Debug for as long as needed to identify the root cause and identify a fix
- Use print statements, logs, or temporary code to inspect program state, including descriptive statements or error messages to understand what's happening
- To test hypotheses, you can also add test statements or functions
- Revisit your assumptions if unexpected behavior occurs.

## 6. Testing
- Run tests frequently using \`!python3 run_tests.py\` (or equivalent).
- After each change, verify correctness by running relevant tests.
- If tests fail, analyze failures and revise your patch.
- Write additional tests if needed to capture important behaviors or edge cases.
- Ensure all tests pass before finalizing.

## 7. Final Verification
- Confirm the root cause is fixed.
- Review your solution for logic correctness and robustness.
- Iterate until you are extremely confident the fix is complete and all tests pass.

## 8. Final Reflection and Additional Testing
- Reflect carefully on the original intent of the user and the problem statement.
- Think about potential edge cases or scenarios that may not be covered by existing tests.
- Write additional tests that would need to pass to fully validate the correctness of your solution.
- Run these new tests and ensure they all pass.
- Be aware that there are additional hidden tests that must also pass for the solution to be successful.
- Do not assume the task is complete just because the visible tests pass; continue refining until you are confident the fix is robust and comprehensive.
"""

PYTHON_TOOL_DESCRIPTION = """This function is used to execute Python code or terminal commands in a stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 60.0 seconds. Internet access for this session is disabled. Do not make external web requests or API calls as they will fail. Just as in a Jupyter notebook, you may also execute terminal commands by calling this function with a terminal command, prefaced with an exclamation mark.

In addition, for the purposes of this task, you can call this function with an \`apply_patch\` command as input.  \`apply_patch\` effectively allows you to execute a diff/patch against a file, but the format of the diff specification is unique to this task, so pay careful attention to these instructions. To use the \`apply_patch\` command, you should pass a message of the following structure as "input":

%%bash
apply_patch <<"EOF"
*** Begin Patch
[YOUR_PATCH]
*** End Patch
EOF

Where [YOUR_PATCH] is the actual content of your patch, specified in the following V4A diff format.
*** [ACTION] File: [path/to/file] -> ACTION can be one of Add, Update, or Delete.
For each snippet of code that needs to be changed, repeat the following:
[context_before] -> See below for further instructions on context.
- [old_code] -> Precede the old code with a minus sign.
+ [new_code] -> Precede the new, replacement code with a plus sign.
[context_after] -> See below for further instructions on context.

For instructions on [context_before] and [context_after]:
- By default, show 3 lines of code immediately above and 3 lines immediately below each change. If a change is within 3 lines of a previous change, do NOT duplicate the first change's [context_after] lines in the second change's [context_before] lines.
- If 3 lines of context is insufficient to uniquely identify the snippet of code within the file, use the @@ operator to indicate the class or function to which the snippet belongs. For instance, we might have:
@@ class BaseClass
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]
- If a code block is repeated so many times in a class or function such that even a single @@ statement and 3 lines of context cannot uniquely identify the snippet of code, you can use multiple \`@@\` statements to jump to the right context. For instance:
@@ class BaseClass
@@     def method():
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]

Note, then, that we do not use line numbers in this diff format, as the context is enough to uniquely identify code. An example of a message that you might pass as "input" to this function, in order to apply a patch, is shown below.

%%bash
apply_patch <<"EOF"
*** Begin Patch
*** Update File: pygorithm/searching/binary_search.py
@@ class BaseClass
@@     def search():
-        pass
+        raise NotImplementedError()
@@ class Subclass
@@     def search():
-        pass
+        raise NotImplementedError()
*** End Patch
EOF

File references can only be relative, NEVER ABSOLUTE. After the apply_patch command is run, python will always say "Done!", regardless of whether the patch was successfully applied or not. However, you can determine if there are issue and errors by looking at any warnings or logging lines printed BEFORE the "Done!" is output.
"""

python_bash_patch_tool = {
  "type": "function",
  "name": "python",
  "description": PYTHON_TOOL_DESCRIPTION,
  "parameters": {
      "type": "object",
      "strict": True,
      "properties": {
          "input": {
              "type": "string",
              "description": " The Python code, terminal command (prefaced by exclamation mark), or apply_patch command that you wish to execute.",
          }
      },
      "required": ["input"],
  },
}

# Additional harness setup:
# - Add your repo to /testbed
# - Add your issue to the first user message
# - Note: Even though we used a single tool for python, bash, and apply_patch, we generally recommend defining more granular tools that are focused on a single function

response = client.responses.create(
    instructions=SYS_PROMPT_SWEBENCH,
    model="gpt-4.1-2025-04-14",
    tools=[python_bash_patch_tool],
    input=f"Please answer the following question:\nBug: Typerror..."
)
response.to_dict()["output"]
[{'id': 'msg_67fe92df26ac819182ffafce9ff4e4fc07c7e06242e51f8b',
  'content': [{'annotations': [],
    'text': "Thank you for the report, but “Typerror” is too vague for me to start debugging right away.\n\n**To make progress, I need to:**\n1. Find the exact error message text (e.g. \`'TypeError: ...'\`).\n2. Find which file and which line/function/class the error occurred in.\n3. Figure out what triggered the error (test file, usage, reproduction steps).\n4. Find the root cause and details.\n\n**Next steps:**\n- Investigate error/log/test output files for a Python \`TypeError\` message.\n- Examine the relevant code sections for problematic type usage.\n- If possible, reproduce the bug locally.\n\n**Plan:**\n- First, I will search for test files and log output in the \`/testbed\` directory that may contain the full error message and stack trace.\n\nLet’s start by listing the contents of the \`/testbed\` directory to look for clues.",
    'type': 'output_text'}],
  'role': 'assistant',
  'status': 'completed',
  'type': 'message'},
 {'arguments': '{"input":"!ls -l /testbed"}',
  'call_id': 'call_frnxyJgKi5TsBem0nR9Zuzdw',
  'name': 'python',
  'type': 'function_call',
  'id': 'fc_67fe92e3da7081918fc18d5c96dddc1c07c7e06242e51f8b',
  'status': 'completed'}]

2. Long context 2. 长上下文

GPT-4.1 has a performant 1M token input context window, and is useful for a variety of long context tasks, including structured document parsing, re-ranking, selecting relevant information while ignoring irrelevant context, and performing multi-hop reasoning using context.

GPT-4.1 具有高性能的 100 万 token 输入上下文窗口,适用于各种长上下文任务,包括结构化文档解析、重新排序、选择相关信息同时忽略不相关上下文,以及使用上下文执行多跳推理。

Optimal Context Size 最优上下文大小

We observe very good performance on needle-in-a-haystack evaluations up to our full 1M token context, and we’ve observed very strong performance at complex tasks with a mix of both relevant and irrelevant code and other documents. However, long context performance can degrade as more items are required to be retrieved, or perform complex reasoning that requires knowledge of the state of the entire context (like performing a graph search, for example).

我们观察到在“大海捞针”评估中,一直到我们完整的 1M token 上下文,都表现出非常好的性能,并且我们观察到在复杂的任务中,混合了相关和不相关的代码以及其他文档的情况下,表现出非常强大的性能。但是,随着需要检索的项目越来越多,或者执行需要了解整个上下文状态的复杂推理(例如执行图搜索)时,长上下文性能可能会下降。

Tuning Context Reliance 调整上下文依赖性

Consider the mix of external vs. internal world knowledge that might be required to answer your question. Sometimes it’s important for the model to use some of its own knowledge to connect concepts or make logical jumps, while in others it’s desirable to only use provided context

考虑回答您的问题可能需要的外部知识与内部知识的结合。有时,模型需要利用自身的一些知识来连接概念或进行逻辑跳转,而在其他情况下,最好只使用提供的上下文。

# Instructions
// for internal knowledge
- Only use the documents in the provided External Context to answer the User Query. If you don't know the answer based on this context, you must respond "I don't have the information needed to answer that", even if a user insists on you answering the question.

// For internal and external knowledge
- By default, use the provided external context to answer the User Query, but if other basic knowledge is needed to answer, and you're confident in the answer, you can use some of your own knowledge to help answer the question.

Prompt Organization 提示组织

Especially in long context usage, placement of instructions and context can impact performance. If you have long context in your prompt, ideally place your instructions at both the beginning and end of the provided context, as we found this to perform better than only above or below. If you’d prefer to only have your instructions once, then above the provided context works better than below.

尤其是在长上下文使用中,指令和上下文的放置会影响性能。如果你的提示中有长上下文,理想情况下,将你的指令放在所提供上下文的开头和结尾,因为我们发现这样做比仅放在上方或下方效果更好。如果你只想让你的指令出现一次,那么放在所提供上下文的上方比下方效果更好。

3. Chain of Thought3. 思维链

As mentioned above, GPT-4.1 is not a reasoning model, but prompting the model to think step by step (called “chain of thought”) can be an effective way for a model to break down problems into more manageable pieces, solve them, and improve overall output quality, with the tradeoff of higher cost and latency associated with using more output tokens. The model has been trained to perform well at agentic reasoning about and real-world problem solving, so it shouldn’t require much prompting to perform well.

如上所述,GPT-4.1 不是一个推理模型,但提示模型逐步思考(称为“思维链”)可能是模型将问题分解成更易于管理的部分、解决它们并提高整体输出质量的有效方法,但权衡是使用更多输出 token 会带来更高的成本和延迟。该模型已经过训练,可以在关于和解决实际问题的能动推理方面表现良好,因此它不需要太多的提示就可以表现良好。

We recommend starting with this basic chain-of-thought instruction at the end of your prompt:

我们建议从提示末尾的这个基本思维链指令开始:

...

First, think carefully step by step about what documents are needed to answer the query. Then, print out the TITLE and ID of each document. Then, format the IDs into a list.

From there, you should improve your chain-of-thought (CoT) prompt by auditing failures in your particular examples and evals, and addressing systematic planning and reasoning errors with more explicit instructions. In the unconstrained CoT prompt, there may be variance in the strategies it tries, and if you observe an approach that works well, you can codify that strategy in your prompt. Generally speaking, errors tend to occur from misunderstanding user intent, insufficient context gathering or analysis, or insufficient or incorrect step by step thinking, so watch out for these and try to address them with more opinionated instructions.

从那里开始,你应该通过审计你的特定例子和评估中的失败之处,并通过更明确的指示解决系统性的规划和推理错误,来改进你的思维链(CoT)提示。在无约束的 CoT 提示中,它尝试的策略可能存在差异,如果你观察到一种有效的方法,你可以将该策略编入你的提示中。一般来说,错误往往发生在误解用户意图、上下文收集或分析不足,或逐步思考不足或不正确的情况下,所以要注意这些,并尝试用更明确的指示来解决它们。

Here is an example prompt instructing the model to focus more methodically on analyzing user intent and considering relevant context before proceeding to answer.

以下是一个示例提示,指示模型在继续回答之前,更系统地关注分析用户意图并考虑相关上下文。

# Reasoning Strategy
1. Query Analysis: Break down and analyze the query until you're confident about what it might be asking. Consider the provided context to help clarify any ambiguous or confusing information.
2. Context Analysis: Carefully select and analyze a large set of potentially relevant documents. Optimize for recall - it's okay if some are irrelevant, but the correct documents must be in this list, otherwise your final answer will be wrong. Analysis steps for each:
    a. Analysis: An analysis of how it may or may not be relevant to answering the query.
    b. Relevance rating: [high, medium, low, none]
3. Synthesis: summarize which documents are most relevant and why, including all documents with a relevance rating of medium or higher.

# User Question
{user_question}

# External Context
{external_context}

First, think carefully step by step about what documents are needed to answer the query, closely adhering to the provided Reasoning Strategy. Then, print out the TITLE and ID of each document. Then, format the IDs into a list.

4. Instruction Following4. 指令遵循

GPT-4.1 exhibits outstanding instruction-following performance, which developers can leverage to precisely shape and control the outputs for their particular use cases. Developers often extensively prompt for agentic reasoning steps, response tone and voice, tool calling information, output formatting, topics to avoid, and more. However, since the model follows instructions more literally, developers may need to include explicit specification around what to do or not to do. Furthermore, existing prompts optimized for other models may not immediately work with this model, because existing instructions are followed more closely and implicit rules are no longer being as strongly inferred.

GPT-4.1 表现出出色的指令遵循性能,开发人员可以利用它来精确地塑造和控制其特定用例的输出。开发人员通常会广泛提示代理推理步骤、响应语气和声音、工具调用信息、输出格式、要避免的主题等等。但是,由于该模型更严格地遵循指令,因此开发人员可能需要明确说明该做什么或不该做什么。此外,针对其他模型优化的现有提示可能无法立即与此模型一起使用,因为现有指令会被更紧密地遵循,并且不再像以前那样强烈地推断隐式规则。

Here is our recommended workflow for developing and debugging instructions in prompts:

以下是我们在提示中开发和调试指令的推荐工作流程:

  1. Start with an overall “Response Rules” or “Instructions” section with high-level guidance and bullet points.

    首先,从包含高级指导和要点的总体“响应规则”或“说明”部分开始。

  2. If you’d like to change a more specific behavior, add a section to specify more details for that category, like # Sample Phrases.

    如果您想更改更具体的行为,请添加一个部分来指定该类别的更多详细信息,例如 # Sample Phrases

  3. If there are specific steps you’d like the model to follow in its workflow, add an ordered list and instruct the model to follow these steps.

    如果您希望模型在其工作流程中遵循特定的步骤,请添加一个有序列表,并指示模型遵循这些步骤。

  4. If behavior still isn’t working as expected:

    如果行为仍然没有按预期工作:

    1. Check for conflicting, underspecified, or wrong instructions and examples. If there are conflicting instructions, GPT-4.1 tends to follow the one closer to the end of the prompt.

      检查是否存在冲突、欠规范或错误的指令和示例。如果存在冲突的指令,GPT-4.1 往往会遵循更靠近提示末尾的指令。

    2. Add examples that demonstrate desired behavior; ensure that any important behavior demonstrated in your examples are also cited in your rules.

      添加演示所需行为的示例;确保示例中演示的任何重要行为也在您的规则中被引用。

    3. It’s generally not necessary to use all-caps or other incentives like bribes or tips. We recommend starting without these, and only reaching for these if necessary for your particular prompt. Note that if your existing prompts include these techniques, it could cause GPT-4.1 to pay attention to it too strictly.

      通常情况下,没有必要使用全部大写字母或其他激励措施,如贿赂或小费。我们建议从不使用这些开始,只有在您的特定提示需要时才考虑使用。请注意,如果您的现有提示包含这些技术,可能会导致 GPT-4.1 过分关注它。

Note that using your preferred AI-powered IDE can be very helpful for iterating on prompts, including checking for consistency or conflicts, adding examples, or making cohesive updates like adding an instruction and updating instructions to demonstrate that instruction.

请注意,使用您首选的 AI 驱动的 IDE 对于迭代提示非常有帮助,包括检查一致性或冲突、添加示例或进行有凝聚力的更新,例如添加指令并更新指令以演示该指令。

Common Failure Modes 常见失败模式

These failure modes are not unique to GPT-4.1, but we share them here for general awareness and ease of debugging.

这些失败模式并非 GPT-4.1 独有,但我们在此分享它们是为了提高普遍认识并方便调试。

  • Instructing a model to always follow a specific behavior can occasionally induce adverse effects. For instance, if told “you must call a tool before responding to the user,” models may hallucinate tool inputs or call the tool with null values if they do not have enough information. Adding “if you don’t have enough information to call the tool, ask the user for the information you need” should mitigate this.

    指示模型始终遵循特定行为有时会产生不利影响。例如,如果被告知“在回复用户之前,您必须调用工具”,则模型可能会产生幻觉工具输入,或者在没有足够信息的情况下使用空值调用该工具。添加“如果您没有足够的信息来调用该工具,请向用户询问您需要的信息”应该可以缓解这种情况。

  • When provided sample phrases, models can use those quotes verbatim and start to sound repetitive to users. Ensure you instruct the model to vary them as necessary.

    当提供示例短语时,模型可以逐字使用这些引用,并开始让用户觉得重复。确保您指示模型根据需要改变它们。

  • Without specific instructions, some models can be eager to provide additional prose to explain their decisions, or output more formatting in responses than may be desired. Provide instructions and potentially examples to help mitigate.

    如果没有明确的指示,一些模型可能会急于提供额外的文字来解释它们的决策,或者在响应中输出比预期更多的格式。提供指导,并可能提供示例来帮助缓解这种情况。

Example Prompt: Customer Service示例提示:客户服务

This demonstrates best practices for a fictional customer service agent. Observe the diversity of rules, the specificity, the use of additional sections for greater detail, and an example to demonstrate precise behavior that incorporates all prior rules.

这演示了一个虚构的客户服务代理的最佳实践。请注意规则的多样性、特殊性、使用额外的部分来获得更大的细节,以及一个演示包含所有先前规则的精确行为的示例。

Try running the following notebook cell – you should see both a user message and tool call, and the user message should start with a greeting, then echo back their answer, then mention they’re about to call a tool. Try changing the instructions to shape the model behavior, or trying other user messages, to test instruction following performance.

尝试运行以下 notebook 单元格 – 你应该看到用户消息和工具调用,并且用户消息应该以问候语开头,然后回应他们的答案,然后提到他们即将调用工具。 尝试更改指令以塑造模型行为,或尝试其他用户消息,以测试指令遵循性能。

SYS_PROMPT_CUSTOMER_SERVICE = """You are a helpful customer service agent working for NewTelco, helping a user efficiently fulfill their request while adhering closely to provided guidelines.

# Instructions
- Always greet the user with "Hi, you've reached NewTelco, how can I help you?"
- Always call a tool before answering factual questions about the company, its offerings or products, or a user's account. Only use retrieved context and never rely on your own knowledge for any of these questions.
    - However, if you don't have enough information to properly call the tool, ask the user for the information you need.
- Escalate to a human if the user requests.
- Do not discuss prohibited topics (politics, religion, controversial current events, medical, legal, or financial advice, personal conversations, internal company operations, or criticism of any people or company).
- Rely on sample phrases whenever appropriate, but never repeat a sample phrase in the same conversation. Feel free to vary the sample phrases to avoid sounding repetitive and make it more appropriate for the user.
- Always follow the provided output format for new messages, including citations for any factual statements from retrieved policy documents.
- If you're going to call a tool, always message the user with an appropriate message before and after calling the tool.
- Maintain a professional and concise tone in all responses, and use emojis between sentences.
- If you've resolved the user's request, ask if there's anything else you can help with

# Precise Response Steps (for each response)
1. If necessary, call tools to fulfill the user's desired action. Always message the user before and after calling a tool to keep them in the loop.
2. In your response to the user
    a. Use active listening and echo back what you heard the user ask for.
    b. Respond appropriately given the above guidelines.

# Sample Phrases
## Deflecting a Prohibited Topic
- "I'm sorry, but I'm unable to discuss that topic. Is there something else I can help you with?"
- "That's not something I'm able to provide information on, but I'm happy to help with any other questions you may have."

## Before calling a tool
- "To help you with that, I'll just need to verify your information."
- "Let me check that for you—one moment, please."
- "I'll retrieve the latest details for you now."

## After calling a tool
- "Okay, here's what I found: [response]"
- "So here's what I found: [response]"

# Output Format
- Always include your final response to the user.
- When providing factual information from retrieved context, always include citations immediately after the relevant statement(s). Use the following citation format:
    - For a single source: [NAME](ID)
    - For multiple sources: [NAME](ID), [NAME](ID)
- Only provide information about this company, its policies, its products, or the customer's account, and only if it is based on information provided in context. Do not answer questions outside this scope.

# Example
## User
Can you tell me about your family plan options?

## Assistant Response 1
### Message
"Hi, you've reached NewTelco, how can I help you? 😊🎉\n\nYou'd like to know about our family plan options. 🤝 Let me check that for you—one moment, please. 🚀"

### Tool Calls
lookup_policy_document(topic="family plan options")

// After tool call, the assistant would follow up with:

## Assistant Response 2 (after tool call)
### Message
"Okay, here's what I found: 🎉 Our family plan allows up to 5 lines with shared data and a 10% discount for each additional line [Family Plan Policy](ID-010). 📱 Is there anything else I can help you with today? 😊"
"""

get_policy_doc = {
    "type": "function",
    "name": "lookup_policy_document",
    "description": "Tool to look up internal documents and policies by topic or keyword.",
    "parameters": {
        "strict": True,
        "type": "object",
        "properties": {
            "topic": {
                "type": "string",
                "description": "The topic or keyword to search for in company policies or documents.",
            },
        },
        "required": ["topic"],
        "additionalProperties": False,
    },
}

get_user_acct = {
    "type": "function",
    "name": "get_user_account_info",
    "description": "Tool to get user account information",
    "parameters": {
        "strict": True,
        "type": "object",
        "properties": {
            "phone_number": {
                "type": "string",
                "description": "Formatted as '(xxx) xxx-xxxx'",
            },
        },
        "required": ["phone_number"],
        "additionalProperties": False,
    },
}

response = client.responses.create(
    instructions=SYS_PROMPT_CUSTOMER_SERVICE,
    model="gpt-4.1-2025-04-14",
    tools=[get_policy_doc, get_user_acct],
    input="How much will it cost for international service? I'm traveling to France.",
    # input="Why was my last bill so high?"
)
response.to_dict()["output"]
[{'id': 'msg_67fe92d431548191b7ca6cd604b4784b06efc5beb16b3c5e',
  'content': [{'annotations': [],
    'text': "Hi, you've reached NewTelco, how can I help you? 🌍✈️\n\nYou'd like to know the cost of international service while traveling to France. 🇫🇷 Let me check the latest details for you—one moment, please. 🕑",
    'type': 'output_text'}],
  'role': 'assistant',
  'status': 'completed',
  'type': 'message'},
 {'arguments': '{"topic":"international service cost France"}',
  'call_id': 'call_cF63DLeyhNhwfdyME3ZHd0yo',
  'name': 'lookup_policy_document',
  'type': 'function_call',
  'id': 'fc_67fe92d5d6888191b6cd7cf57f707e4606efc5beb16b3c5e',
  'status': 'completed'}]

5. General Advice 5. 一般建议

Prompt Structure 提示结构

For reference, here is a good starting point for structuring your prompts.

作为参考,以下是构建提示的一个良好起点。

# Role and Objective
# Instructions
## Sub-categories for more detailed instructions
# Reasoning Steps
# Output Format
# Examples
## Example 1
# Context
# Final instructions and prompt to think step by step

Add or remove sections to suit your needs, and experiment to determine what’s optimal for your usage.

添加或删除部分以满足您的需求,并通过实验确定最适合您使用的方法。

Delimiters 分隔符

Here are some general guidelines for selecting the best delimiters for your prompt. Please refer to the Long Context section for special considerations for that context type.

以下是为您的提示选择最佳分隔符的一些通用指南。 有关该上下文类型的特殊注意事项,请参阅“长上下文”部分。

  1. Markdown: We recommend starting here, and using markdown titles for major sections and subsections (including deeper hierarchy, to H4+). Use inline backticks or backtick blocks to precisely wrap code, and standard numbered or bulleted lists as needed.

    Markdown:我们建议从这里开始,并使用 Markdown 标题作为主要章节和小节的标题(包括更深的层级结构,直到 H4+)。使用行内反引号或反引号块来精确地包裹代码,并根据需要使用标准编号或项目符号列表。

  2. XML: These also perform well, and we have improved adherence to information in XML with this model. XML is convenient to precisely wrap a section including start and end, add metadata to the tags for additional context, and enable nesting. Here is an example of using XML tags to nest examples in an example section, with inputs and outputs for each:

    XML:这些也表现良好,并且我们已经通过此模型改进了 XML 中信息的遵从性。XML 可以方便地精确包裹包含开始和结束的部分,将元数据添加到标签以获得额外的上下文,并启用嵌套。这是一个使用 XML 标签在示例部分中嵌套示例的示例,每个示例都有输入和输出:

    <examples>
    <example1 type="Abbreviate">
    <input>San Francisco</input>
    <output>- SF</output>
    </example1>
    </examples>
  3. JSON is highly structured and well understood by the model particularly in coding contexts. However it can be more verbose, and require character escaping that can add overhead.

    JSON 具有高度结构化,并且模型对其非常了解,尤其是在编码上下文中。但是,它可能更冗长,并且需要字符转义,这会增加开销。

Guidance specifically for adding a large number of documents or files to input context:

专门针对向输入上下文添加大量文档或文件的指南:

  • XML performed well in our long context testing.

    XML 在我们的长上下文测试中表现良好。

    • Example: <doc id=1 title=”The Fox”>The quick brown fox jumps over the lazy dog</doc>

      示例: <doc id=1 title=”The Fox”>The quick brown fox jumps over the lazy dog</doc>

  • This format, proposed by Lee et al. (ref), also performed well in our long context testing.

    Lee et al. ( ref) 提出的这种格式在我们的长上下文测试中也表现良好。

    • Example: ID: 1 | TITLE: The Fox | CONTENT: The quick brown fox jumps over the lazy dog

      示例: ID: 1 | TITLE: The Fox | CONTENT: The quick brown fox jumps over the lazy dog

  • JSON performed particularly poorly.

    JSON 的表现尤其糟糕。

    • Example: [{“id”: 1, “title”: “The Fox”, “content”: “The quick brown fox jumped over the lazy dog”}]

      示例: [{“id”: 1, “title”: “The Fox”, “content”: “The quick brown fox jumped over the lazy dog”}]

The model is trained to robustly understand structure in a variety of formats. Generally, use your judgement and think about what will provide clear information and “stand out” to the model. For example, if you’re retrieving documents that contain lots of XML, an XML-based delimiter will likely be less effective.

该模型经过训练,可以稳健地理解各种格式的结构。通常,使用你的判断力,思考什么能够提供清晰的信息并使模型“脱颖而出”。例如,如果你正在检索包含大量 XML 的文档,那么基于 XML 的分隔符可能效果较差。

Caveats 注意事项

  • In some isolated cases we have observed the model being resistant to producing very long, repetitive outputs, for example, analyzing hundreds of items one by one. If this is necessary for your use case, instruct the model strongly to output this information in full, and consider breaking down the problem or using a more concise approach.

    在某些孤立的案例中,我们观察到模型会抵制产生非常长、重复的输出,例如,逐一分析数百个项目。如果您的用例需要这样做,请强烈指示模型完整输出此信息,并考虑分解问题或使用更简洁的方法。

  • We have seen some rare instances of parallel tool calls being incorrect. We advise testing this, and considering setting the parallel_tool_calls param to false if you’re seeing issues.

    我们已经看到一些罕见的并行工具调用不正确的例子。我们建议您对此进行测试,如果您遇到问题,请考虑将 parallel_tool_calls 参数设置为 false。

Developers have provided us feedback that accurate and well-formed diff generation is a critical capability to power coding-related tasks. To this end, the GPT-4.1 family features substantially improved diff capabilities relative to previous GPT models. Moreover, while GPT-4.1 has strong performance generating diffs of any format given clear instructions and examples, we open-source here one recommended diff format, on which the model has been extensively trained. We hope that in particular for developers just starting out, that this will take much of the guesswork out of creating diffs yourself.

开发者向我们反馈,准确且格式良好的 diff 生成是支持编码相关任务的关键能力。为此,GPT-4.1 系列在 diff 能力方面相对于之前的 GPT 模型有了显著的改进。此外,虽然 GPT-4.1 在生成任何格式的 diff 时都表现出色(前提是提供清晰的指令和示例),但我们在此开源一种推荐的 diff 格式,该模型已经过广泛的训练。我们希望,特别是对于刚开始的开发者来说,这将大大减少您自己创建 diff 的猜测工作。

Apply Patch 应用补丁

See the example below for a prompt that applies our recommended tool call correctly.

请参见以下示例,了解正确应用我们推荐的工具调用的提示。

APPLY_PATCH_TOOL_DESC = """This is a custom utility that makes it more convenient to add, remove, move, or edit code files. \`apply_patch\` effectively allows you to execute a diff/patch against a file, but the format of the diff specification is unique to this task, so pay careful attention to these instructions. To use the \`apply_patch\` command, you should pass a message of the following structure as "input":

%%bash
apply_patch <<"EOF"
*** Begin Patch
[YOUR_PATCH]
*** End Patch
EOF

Where [YOUR_PATCH] is the actual content of your patch, specified in the following V4A diff format.
*** [ACTION] File: [path/to/file] -> ACTION can be one of Add, Update, or Delete.
For each snippet of code that needs to be changed, repeat the following:
[context_before] -> See below for further instructions on context.
- [old_code] -> Precede the old code with a minus sign.
+ [new_code] -> Precede the new, replacement code with a plus sign.
[context_after] -> See below for further instructions on context.

For instructions on [context_before] and [context_after]:
- By default, show 3 lines of code immediately above and 3 lines immediately below each change. If a change is within 3 lines of a previous change, do NOT duplicate the first change’s [context_after] lines in the second change’s [context_before] lines.
- If 3 lines of context is insufficient to uniquely identify the snippet of code within the file, use the @@ operator to indicate the class or function to which the snippet belongs. For instance, we might have:
@@ class BaseClass
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]
- If a code block is repeated so many times in a class or function such that even a single @@ statement and 3 lines of context cannot uniquely identify the snippet of code, you can use multiple \`@@\` statements to jump to the right context. For instance:
@@ class BaseClass
@@     def method():
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]

Note, then, that we do not use line numbers in this diff format, as the context is enough to uniquely identify code. An example of a message that you might pass as "input" to this function, in order to apply a patch, is shown below.

%%bash
apply_patch <<"EOF"
*** Begin Patch
*** Update File: pygorithm/searching/binary_search.py
@@ class BaseClass
@@     def search():
-          pass
+          raise NotImplementedError()
@@ class Subclass
@@     def search():
-          pass
+          raise NotImplementedError()
*** End Patch
EOF
"""

APPLY_PATCH_TOOL = {
    "name": "apply_patch",
    "description": APPLY_PATCH_TOOL_DESC,
    "parameters": {
        "type": "object",
        "properties": {
            "input": {
                "type": "string",
                "description": " The apply_patch command that you wish to execute.",
            }
        },
        "required": ["input"],
    },
}

Reference Implementation: apply_patch.py参考实现: apply_patch.py

Here’s a reference implementation of the apply_patch tool that we used as part of model training. You’ll need to make this an executable and available as apply_patch from the shell where the model will execute commands:

以下是我们用作模型训练一部分的 apply_patch 工具的参考实现。你需要将其制作成可执行文件,并在模型将执行命令的 shell 中将其作为 apply_patch 提供:

#!/usr/bin/env python3
"""
A self-contained **pure-Python 3.9+** utility for applying human-readable
“pseudo-diff” patch files to a collection of text files.
"""
from __future__ import annotations
import pathlib
from dataclasses import dataclass, field
from enum import Enum
from typing import (
    Callable,
    Dict,
    List,
    Optional,
    Tuple,
    Union,
)

# --------------------------------------------------------------------------- #
#  Domain objects
# --------------------------------------------------------------------------- #
class ActionType(str, Enum):
    ADD = "add"
    DELETE = "delete"
    UPDATE = "update"

@dataclass
class FileChange:
    type: ActionType
    old_content: Optional[str] = None
    new_content: Optional[str] = None
    move_path: Optional[str] = None

@dataclass
class Commit:
    changes: Dict[str, FileChange] = field(default_factory=dict)

# --------------------------------------------------------------------------- #
#  Exceptions
# --------------------------------------------------------------------------- #
class DiffError(ValueError):
    """Any problem detected while parsing or applying a patch."""

# --------------------------------------------------------------------------- #
#  Helper dataclasses used while parsing patches
# --------------------------------------------------------------------------- #
@dataclass
class Chunk:
    orig_index: int = -1
    del_lines: List[str] = field(default_factory=list)
    ins_lines: List[str] = field(default_factory=list)

@dataclass
class PatchAction:
    type: ActionType
    new_file: Optional[str] = None
    chunks: List[Chunk] = field(default_factory=list)
    move_path: Optional[str] = None

@dataclass
class Patch:
    actions: Dict[str, PatchAction] = field(default_factory=dict)

# --------------------------------------------------------------------------- #
#  Patch text parser
# --------------------------------------------------------------------------- #
@dataclass
class Parser:
    current_files: Dict[str, str]
    lines: List[str]
    index: int = 0
    patch: Patch = field(default_factory=Patch)
    fuzz: int = 0

    # ------------- low-level helpers -------------------------------------- #
    def _cur_line(self) -> str:
        if self.index >= len(self.lines):
            raise DiffError("Unexpected end of input while parsing patch")
        return self.lines[self.index]

    @staticmethod
    def _norm(line: str) -> str:
        """Strip CR so comparisons work for both LF and CRLF input."""
        return line.rstrip("\r")

    # ------------- scanning convenience ----------------------------------- #
    def is_done(self, prefixes: Optional[Tuple[str, ...]] = None) -> bool:
        if self.index >= len(self.lines):
            return True
        if (
            prefixes
            and len(prefixes) > 0
            and self._norm(self._cur_line()).startswith(prefixes)
        ):
            return True
        return False

    def startswith(self, prefix: Union[str, Tuple[str, ...]]) -> bool:
        return self._norm(self._cur_line()).startswith(prefix)

    def read_str(self, prefix: str) -> str:
        """
        Consume the current line if it starts with *prefix* and return the text
        **after** the prefix.  Raises if prefix is empty.
        """
        if prefix == "":
            raise ValueError("read_str() requires a non-empty prefix")
        if self._norm(self._cur_line()).startswith(prefix):
            text = self._cur_line()[len(prefix) :]
            self.index += 1
            return text
        return ""

    def read_line(self) -> str:
        """Return the current raw line and advance."""
        line = self._cur_line()
        self.index += 1
        return line

    # ------------- public entry point -------------------------------------- #
    def parse(self) -> None:
        while not self.is_done(("*** End Patch",)):
            # ---------- UPDATE ---------- #
            path = self.read_str("*** Update File: ")
            if path:
                if path in self.patch.actions:
                    raise DiffError(f"Duplicate update for file: {path}")
                move_to = self.read_str("*** Move to: ")
                if path not in self.current_files:
                    raise DiffError(f"Update File Error - missing file: {path}")
                text = self.current_files[path]
                action = self._parse_update_file(text)
                action.move_path = move_to or None
                self.patch.actions[path] = action
                continue
            # ---------- DELETE ---------- #
            path = self.read_str("*** Delete File: ")
            if path:
                if path in self.patch.actions:
                    raise DiffError(f"Duplicate delete for file: {path}")
                if path not in self.current_files:
                    raise DiffError(f"Delete File Error - missing file: {path}")
                self.patch.actions[path] = PatchAction(type=ActionType.DELETE)
                continue
            # ---------- ADD ---------- #
            path = self.read_str("*** Add File: ")
            if path:
                if path in self.patch.actions:
                    raise DiffError(f"Duplicate add for file: {path}")
                if path in self.current_files:
                    raise DiffError(f"Add File Error - file already exists: {path}")
                self.patch.actions[path] = self._parse_add_file()
                continue
            raise DiffError(f"Unknown line while parsing: {self._cur_line()}")
        if not self.startswith("*** End Patch"):
            raise DiffError("Missing *** End Patch sentinel")
        self.index += 1  # consume sentinel

    # ------------- section parsers ---------------------------------------- #
    def _parse_update_file(self, text: str) -> PatchAction:
        action = PatchAction(type=ActionType.UPDATE)
        lines = text.split("\n")
        index = 0
        while not self.is_done(
            (
                "*** End Patch",
                "*** Update File:",
                "*** Delete File:",
                "*** Add File:",
                "*** End of File",
            )
        ):
            def_str = self.read_str("@@ ")
            section_str = ""
            if not def_str and self._norm(self._cur_line()) == "@@":
                section_str = self.read_line()
            if not (def_str or section_str or index == 0):
                raise DiffError(f"Invalid line in update section:\n{self._cur_line()}")
            if def_str.strip():
                found = False
                if def_str not in lines[:index]:
                    for i, s in enumerate(lines[index:], index):
                        if s == def_str:
                            index = i + 1
                            found = True
                            break
                if not found and def_str.strip() not in [
                    s.strip() for s in lines[:index]
                ]:
                    for i, s in enumerate(lines[index:], index):
                        if s.strip() == def_str.strip():
                            index = i + 1
                            self.fuzz += 1
                            found = True
                            break
            next_ctx, chunks, end_idx, eof = peek_next_section(self.lines, self.index)
            new_index, fuzz = find_context(lines, next_ctx, index, eof)
            if new_index == -1:
                ctx_txt = "\n".join(next_ctx)
                raise DiffError(
                    f"Invalid {'EOF ' if eof else ''}context at {index}:\n{ctx_txt}"
                )
            self.fuzz += fuzz
            for ch in chunks:
                ch.orig_index += new_index
                action.chunks.append(ch)
            index = new_index + len(next_ctx)
            self.index = end_idx
        return action

    def _parse_add_file(self) -> PatchAction:
        lines: List[str] = []
        while not self.is_done(
            ("*** End Patch", "*** Update File:", "*** Delete File:", "*** Add File:")
        ):
            s = self.read_line()
            if not s.startswith("+"):
                raise DiffError(f"Invalid Add File line (missing '+'): {s}")
            lines.append(s[1:])  # strip leading '+'
        return PatchAction(type=ActionType.ADD, new_file="\n".join(lines))

# --------------------------------------------------------------------------- #
#  Helper functions
# --------------------------------------------------------------------------- #
def find_context_core(
    lines: List[str], context: List[str], start: int
) -> Tuple[int, int]:
    if not context:
        return start, 0
    for i in range(start, len(lines)):
        if lines[i : i + len(context)] == context:
            return i, 0
    for i in range(start, len(lines)):
        if [s.rstrip() for s in lines[i : i + len(context)]] == [
            s.rstrip() for s in context
        ]:
            return i, 1
    for i in range(start, len(lines)):
        if [s.strip() for s in lines[i : i + len(context)]] == [
            s.strip() for s in context
        ]:
            return i, 100
    return -1, 0

def find_context(
    lines: List[str], context: List[str], start: int, eof: bool
) -> Tuple[int, int]:
    if eof:
        new_index, fuzz = find_context_core(lines, context, len(lines) - len(context))
        if new_index != -1:
            return new_index, fuzz
        new_index, fuzz = find_context_core(lines, context, start)
        return new_index, fuzz + 10_000
    return find_context_core(lines, context, start)

def peek_next_section(
    lines: List[str], index: int
) -> Tuple[List[str], List[Chunk], int, bool]:
    old: List[str] = []
    del_lines: List[str] = []
    ins_lines: List[str] = []
    chunks: List[Chunk] = []
    mode = "keep"
    orig_index = index
    while index < len(lines):
        s = lines[index]
        if s.startswith(
            (
                "@@",
                "*** End Patch",
                "*** Update File:",
                "*** Delete File:",
                "*** Add File:",
                "*** End of File",
            )
        ):
            break
        if s == "***":
            break
        if s.startswith("***"):
            raise DiffError(f"Invalid Line: {s}")
        index += 1
        last_mode = mode
        if s == "":
            s = " "
        if s[0] == "+":
            mode = "add"
        elif s[0] == "-":
            mode = "delete"
        elif s[0] == " ":
            mode = "keep"
        else:
            raise DiffError(f"Invalid Line: {s}")
        s = s[1:]
        if mode == "keep" and last_mode != mode:
            if ins_lines or del_lines:
                chunks.append(
                    Chunk(
                        orig_index=len(old) - len(del_lines),
                        del_lines=del_lines,
                        ins_lines=ins_lines,
                    )
                )
            del_lines, ins_lines = [], []
        if mode == "delete":
            del_lines.append(s)
            old.append(s)
        elif mode == "add":
            ins_lines.append(s)
        elif mode == "keep":
            old.append(s)
    if ins_lines or del_lines:
        chunks.append(
            Chunk(
                orig_index=len(old) - len(del_lines),
                del_lines=del_lines,
                ins_lines=ins_lines,
            )
        )
    if index < len(lines) and lines[index] == "*** End of File":
        index += 1
        return old, chunks, index, True
    if index == orig_index:
        raise DiffError("Nothing in this section")
    return old, chunks, index, False

# --------------------------------------------------------------------------- #
#  Patch → Commit and Commit application
# --------------------------------------------------------------------------- #
def _get_updated_file(text: str, action: PatchAction, path: str) -> str:
    if action.type is not ActionType.UPDATE:
        raise DiffError("_get_updated_file called with non-update action")
    orig_lines = text.split("\n")
    dest_lines: List[str] = []
    orig_index = 0
    for chunk in action.chunks:
        if chunk.orig_index > len(orig_lines):
            raise DiffError(
                f"{path}: chunk.orig_index {chunk.orig_index} exceeds file length"
            )
        if orig_index > chunk.orig_index:
            raise DiffError(
                f"{path}: overlapping chunks at {orig_index} > {chunk.orig_index}"
            )
        dest_lines.extend(orig_lines[orig_index : chunk.orig_index])
        orig_index = chunk.orig_index
        dest_lines.extend(chunk.ins_lines)
        orig_index += len(chunk.del_lines)
    dest_lines.extend(orig_lines[orig_index:])
    return "\n".join(dest_lines)

def patch_to_commit(patch: Patch, orig: Dict[str, str]) -> Commit:
    commit = Commit()
    for path, action in patch.actions.items():
        if action.type is ActionType.DELETE:
            commit.changes[path] = FileChange(
                type=ActionType.DELETE, old_content=orig[path]
            )
        elif action.type is ActionType.ADD:
            if action.new_file is None:
                raise DiffError("ADD action without file content")
            commit.changes[path] = FileChange(
                type=ActionType.ADD, new_content=action.new_file
            )
        elif action.type is ActionType.UPDATE:
            new_content = _get_updated_file(orig[path], action, path)
            commit.changes[path] = FileChange(
                type=ActionType.UPDATE,
                old_content=orig[path],
                new_content=new_content,
                move_path=action.move_path,
            )
    return commit

# --------------------------------------------------------------------------- #
#  User-facing helpers
# --------------------------------------------------------------------------- #
def text_to_patch(text: str, orig: Dict[str, str]) -> Tuple[Patch, int]:
    lines = text.splitlines()  # preserves blank lines, no strip()
    if (
        len(lines) < 2
        or not Parser._norm(lines[0]).startswith("*** Begin Patch")
        or Parser._norm(lines[-1]) != "*** End Patch"
    ):
        raise DiffError("Invalid patch text - missing sentinels")
    parser = Parser(current_files=orig, lines=lines, index=1)
    parser.parse()
    return parser.patch, parser.fuzz

def identify_files_needed(text: str) -> List[str]:
    lines = text.splitlines()
    return [
        line[len("*** Update File: ") :]
        for line in lines
        if line.startswith("*** Update File: ")
    ] + [
        line[len("*** Delete File: ") :]
        for line in lines
        if line.startswith("*** Delete File: ")
    ]

def identify_files_added(text: str) -> List[str]:
    lines = text.splitlines()
    return [
        line[len("*** Add File: ") :]
        for line in lines
        if line.startswith("*** Add File: ")
    ]

# --------------------------------------------------------------------------- #
#  File-system helpers
# --------------------------------------------------------------------------- #
def load_files(paths: List[str], open_fn: Callable[[str], str]) -> Dict[str, str]:
    return {path: open_fn(path) for path in paths}

def apply_commit(
    commit: Commit,
    write_fn: Callable[[str, str], None],
    remove_fn: Callable[[str], None],
) -> None:
    for path, change in commit.changes.items():
        if change.type is ActionType.DELETE:
            remove_fn(path)
        elif change.type is ActionType.ADD:
            if change.new_content is None:
                raise DiffError(f"ADD change for {path} has no content")
            write_fn(path, change.new_content)
        elif change.type is ActionType.UPDATE:
            if change.new_content is None:
                raise DiffError(f"UPDATE change for {path} has no new content")
            target = change.move_path or path
            write_fn(target, change.new_content)
            if change.move_path:
                remove_fn(path)

def process_patch(
    text: str,
    open_fn: Callable[[str], str],
    write_fn: Callable[[str, str], None],
    remove_fn: Callable[[str], None],
) -> str:
    if not text.startswith("*** Begin Patch"):
        raise DiffError("Patch text must start with *** Begin Patch")
    paths = identify_files_needed(text)
    orig = load_files(paths, open_fn)
    patch, _fuzz = text_to_patch(text, orig)
    commit = patch_to_commit(patch, orig)
    apply_commit(commit, write_fn, remove_fn)
    return "Done!"

# --------------------------------------------------------------------------- #
#  Default FS helpers
# --------------------------------------------------------------------------- #
def open_file(path: str) -> str:
    with open(path, "rt", encoding="utf-8") as fh:
        return fh.read()

def write_file(path: str, content: str) -> None:
    target = pathlib.Path(path)
    target.parent.mkdir(parents=True, exist_ok=True)
    with target.open("wt", encoding="utf-8") as fh:
        fh.write(content)

def remove_file(path: str) -> None:
    pathlib.Path(path).unlink(missing_ok=True)

# --------------------------------------------------------------------------- #
#  CLI entry-point
# --------------------------------------------------------------------------- #
def main() -> None:
    import sys
    patch_text = sys.stdin.read()
    if not patch_text:
        print("Please pass patch text through stdin", file=sys.stderr)
        return
    try:
        result = process_patch(patch_text, open_file, write_file, remove_file)
    except DiffError as exc:
        print(exc, file=sys.stderr)
        return
    print(result)

if __name__ == "__main__":
    main()

Other Effective Diff Formats其他有效的 Diff 格式

If you want to try using a different diff format, we found in testing that the SEARCH/REPLACE diff format used in Aider’s polyglot benchmark, as well as a pseudo-XML format with no internal escaping, both had high success rates.

如果您想尝试使用不同的 diff 格式,我们在测试中发现,Aider 的 polyglot 基准测试中使用的 SEARCH/REPLACE diff 格式,以及一种没有内部转义的伪 XML 格式,都具有很高的成功率。

These diff formats share two key aspects: (1) they do not use line numbers, and (2) they provide both the exact code to be replaced, and the exact code with which to replace it, with clear delimiters between the two.

这些 diff 格式有两个关键方面:(1)它们不使用行号;(2)它们提供要替换的确切代码,以及用于替换它的确切代码,并在两者之间使用清晰的分隔符。

SEARCH_REPLACE_DIFF_EXAMPLE = """
path/to/file.py
\`\`\`
>>>>>>> SEARCH
def search():
    pass
=======
def search():
   raise NotImplementedError()
<<<<<<< REPLACE
"""

PSEUDO_XML_DIFF_EXAMPLE = """
<edit>
<file>
path/to/file.py
</file>
<old_code>
def search():
    pass
</old_code>
<new_code>
def search():
   raise NotImplementedError()
</new_code>
</edit>
"""

阅读原文

掌握系统化思维,构建你的AI智能体

工具思维跨越到智能体思维,掌握系统化的方法论与AI智能体构建能力,构建可迁移的AI自动化技能,随时适应新需求!

7 阶实战体系,一步步带你从零基础到高阶自动化

◆ 立即节省2小时以上的重复劳动,效率提升10倍!

◆ 现在订阅,限时享受独家优惠,解锁你的AI竞争优势!



立即开启AI智能体进化之旅 →

Axton二维码

扫码关注获取更多资源

发表评论

您的邮箱地址不会被公开。 必填项已用 * 标注

滚动至顶部