日韩中文字幕在线一区二区三区,亚洲热视频在线观看,久久精品午夜一区二区福利,精品一区二区三区在线观看l,麻花传媒剧电影,亚洲香蕉伊综合在人在线,免费av一区二区三区在线,亚洲成在线人视频观看
          首頁 500強 活動 榜單 商業 科技 商潮 專題 品牌中心
          雜志訂閱

          谷歌破解AI智能體高效運行的關鍵

          Jeremy Kahn
          2025-12-18

          多智能體協作是否一定比單智能體更具優勢?谷歌最新研究表明,這取決于具體任務需求。

          文本設置
          小號
          默認
          大號
          Plus(0條)

          圖片來源:VCG via Getty Images

          多AI智能體協作一定優于單智能體嗎?谷歌最新研究表明,答案完全取決于你的具體任務目標。

          2025年本應是AI智能體之年。然而臨近年末,顯然科技廠商們的這些預言過于樂觀了。的確,部分公司已開始使用AI智能體,但大多數企業尚未如此,尤其是尚未在全公司范圍內部署。

          麥肯錫(McKinsey)上月的《人工智能現狀》調查發現,大多數企業尚未開始使用AI智能體,40%表示正在試驗。不到四分之一的企業表示已在至少一個用例中大規模部署AI智能體;而當這家咨詢公司問及是否在營銷銷售或人力資源等具體職能中使用AI時,結果更不樂觀。不超過10%的受訪者表示在任一這些領域實現了AI智能體的“全面規模化”或“正在規模化”。規模化智能體使用率最高的職能是IT(智能體常被用于自動解決服務工單或為員工安裝軟件),但即使在這里,也僅有2%的企業報告實現了“全面規模化”,另有8%表示“正在規模化”。

          問題的一大關鍵在于,設計出能讓AI智能體產出可靠結果的工作流程非常困難。即便是當今最強大的AI模型也處于一個奇怪的邊界——能夠像人類一樣完成工作流中的某些任務,但對其他任務則無能為力。涉及從多來源收集數據、多步驟使用軟件工具的復雜任務尤其具有挑戰性。工作流程越長,早期步驟出錯的風險就越大,錯誤會不斷累積,導致最終失敗。此外,最強大的AI模型大規模使用成本高昂,如果工作流程涉及智能體需要進行大量規劃和推理,則成本更高。

          許多公司試圖通過設計“多智能體工作流”來解決這些問題,即啟動不同的智能體,每個僅負責工作流中的一個離散步驟,有時甚至用一個智能體檢查另一個的工作。這可以提高性能,但也可能最終變得昂貴——有時昂貴到讓工作流程自動化變得不值。

          兩個AI智能體總是優于一個嗎?

          如今,谷歌的一個團隊進行了一項研究,旨在為企業提供一個良好的評估標準,用以決定何時使用單個智能體更優(而非構建多智能體工作流),以及何種類型的多智能體工作流可能最適合特定任務。

          研究人員使用來自谷歌、OpenAI和Anthropic的AI模型進行了180項對照實驗。他們用四項不同的AI智能體基準測試來評估這些模型,這些測試涵蓋了一系列不同目標:從多個網站檢索信息;在《我的世界》游戲環境中進行規劃;通過規劃和工具使用來完成常見業務任務(如回復電子郵件、安排會議和使用項目管理軟件);以及一項金融智能體基準測試。該金融測試要求智能體從美國證券交易委員會(SEC)文件中檢索信息并進行基本分析,例如將實際結果與上一季度管理層的預測進行比較、弄清楚特定產品細分市場的收入隨時間的變化情況,或計算公司可能有多少現金可用于并購活動。

          過去一年,傳統觀點認為多智能體工作流能產生更可靠的結果。(我此前在《人工智能觀察》中曾寫過這個觀點,并得到一些公司如Prosus的經驗支持。)但谷歌研究人員發現,傳統觀點是否成立,很大程度上取決于具體的任務是什么。

          單智能體在順序步驟上表現更好,在并行步驟上表現較差

          如果任務是順序性的(許多《我的世界》基準測試任務即是如此),那么結果發現,只要單個AI智能體能夠以至少45%的準確率執行該任務(在我看來這是一個相當低的標準),那么部署單個智能體是更好的選擇。使用多個智能體,無論何種配置,都會使整體性能大幅降低,降幅在39%到70%之間。根據研究人員的說法,原因是如果公司完成整個任務的令牌預算有限,那么多個智能體試圖弄清楚如何使用不同工具的需求會迅速耗盡預算。

          但如果任務涉及可以并行執行的步驟(許多金融分析任務正是如此),那么多智能體系統則能帶來巨大優勢。此外,研究人員發現,智能體之間如何配置協作方式也至關重要。對于金融分析任務,集中式多智能體系統——即一個協調者智能體指揮和監督多個子智能體的活動,所有通信都通過協調者進行——產生了最佳結果。該系統的性能比單個智能體高出80%。與此同時,獨立式多智能體系統(沒有協調者,每個智能體僅被分配一個狹窄的角色并并行完成)僅比單個智能體好57%。

          此類研究應能幫助企業找出配置AI智能體的最佳方式,并使這項技術最終開始兌現去年的承諾。對于那些銷售AI智能體技術的公司來說,遲到總比不到好。對于在使用AI智能體的企業工作的人們,我們將不得不觀察這些智能體對勞動力市場的影響。這是我們邁入2026年時將密切關注的故事。(財富中文網)

          譯者:中慧言-王芳

          多AI智能體協作一定優于單智能體嗎?谷歌最新研究表明,答案完全取決于你的具體任務目標。

          2025年本應是AI智能體之年。然而臨近年末,顯然科技廠商們的這些預言過于樂觀了。的確,部分公司已開始使用AI智能體,但大多數企業尚未如此,尤其是尚未在全公司范圍內部署。

          麥肯錫(McKinsey)上月的《人工智能現狀》調查發現,大多數企業尚未開始使用AI智能體,40%表示正在試驗。不到四分之一的企業表示已在至少一個用例中大規模部署AI智能體;而當這家咨詢公司問及是否在營銷銷售或人力資源等具體職能中使用AI時,結果更不樂觀。不超過10%的受訪者表示在任一這些領域實現了AI智能體的“全面規模化”或“正在規模化”。規模化智能體使用率最高的職能是IT(智能體常被用于自動解決服務工單或為員工安裝軟件),但即使在這里,也僅有2%的企業報告實現了“全面規模化”,另有8%表示“正在規模化”。

          問題的一大關鍵在于,設計出能讓AI智能體產出可靠結果的工作流程非常困難。即便是當今最強大的AI模型也處于一個奇怪的邊界——能夠像人類一樣完成工作流中的某些任務,但對其他任務則無能為力。涉及從多來源收集數據、多步驟使用軟件工具的復雜任務尤其具有挑戰性。工作流程越長,早期步驟出錯的風險就越大,錯誤會不斷累積,導致最終失敗。此外,最強大的AI模型大規模使用成本高昂,如果工作流程涉及智能體需要進行大量規劃和推理,則成本更高。

          許多公司試圖通過設計“多智能體工作流”來解決這些問題,即啟動不同的智能體,每個僅負責工作流中的一個離散步驟,有時甚至用一個智能體檢查另一個的工作。這可以提高性能,但也可能最終變得昂貴——有時昂貴到讓工作流程自動化變得不值。

          兩個AI智能體總是優于一個嗎?

          如今,谷歌的一個團隊進行了一項研究,旨在為企業提供一個良好的評估標準,用以決定何時使用單個智能體更優(而非構建多智能體工作流),以及何種類型的多智能體工作流可能最適合特定任務。

          研究人員使用來自谷歌、OpenAI和Anthropic的AI模型進行了180項對照實驗。他們用四項不同的AI智能體基準測試來評估這些模型,這些測試涵蓋了一系列不同目標:從多個網站檢索信息;在《我的世界》游戲環境中進行規劃;通過規劃和工具使用來完成常見業務任務(如回復電子郵件、安排會議和使用項目管理軟件);以及一項金融智能體基準測試。該金融測試要求智能體從美國證券交易委員會(SEC)文件中檢索信息并進行基本分析,例如將實際結果與上一季度管理層的預測進行比較、弄清楚特定產品細分市場的收入隨時間的變化情況,或計算公司可能有多少現金可用于并購活動。

          過去一年,傳統觀點認為多智能體工作流能產生更可靠的結果。(我此前在《人工智能觀察》中曾寫過這個觀點,并得到一些公司如Prosus的經驗支持。)但谷歌研究人員發現,傳統觀點是否成立,很大程度上取決于具體的任務是什么。

          單智能體在順序步驟上表現更好,在并行步驟上表現較差

          如果任務是順序性的(許多《我的世界》基準測試任務即是如此),那么結果發現,只要單個AI智能體能夠以至少45%的準確率執行該任務(在我看來這是一個相當低的標準),那么部署單個智能體是更好的選擇。使用多個智能體,無論何種配置,都會使整體性能大幅降低,降幅在39%到70%之間。根據研究人員的說法,原因是如果公司完成整個任務的令牌預算有限,那么多個智能體試圖弄清楚如何使用不同工具的需求會迅速耗盡預算。

          但如果任務涉及可以并行執行的步驟(許多金融分析任務正是如此),那么多智能體系統則能帶來巨大優勢。此外,研究人員發現,智能體之間如何配置協作方式也至關重要。對于金融分析任務,集中式多智能體系統——即一個協調者智能體指揮和監督多個子智能體的活動,所有通信都通過協調者進行——產生了最佳結果。該系統的性能比單個智能體高出80%。與此同時,獨立式多智能體系統(沒有協調者,每個智能體僅被分配一個狹窄的角色并并行完成)僅比單個智能體好57%。

          此類研究應能幫助企業找出配置AI智能體的最佳方式,并使這項技術最終開始兌現去年的承諾。對于那些銷售AI智能體技術的公司來說,遲到總比不到好。對于在使用AI智能體的企業工作的人們,我們將不得不觀察這些智能體對勞動力市場的影響。這是我們邁入2026年時將密切關注的故事。(財富中文網)

          譯者:中慧言-王芳

          Is a team of AI agents always better than using just one agent? New Google research suggests the answer depends on exactly what you want the agents to do.

          Hello. 2025 was supposed to be the year of AI agents. But as the year draws to a close, it is clear such prognostications from tech vendors were overly optimistic. Yes, some companies have started to use AI agents. But most are not yet doing so, especially not in company-wide deployments.

          A McKinsey “State of AI” survey from last month found that a majority of businesses had yet to begin using AI agents, while 40% said they were experimenting. Less than a quarter said they had deployed AI agents at scale in at least one use case; and when the consulting firm asked people about whether they were using AI in specific functions, such as marketing and sales or human resources, the results were even worse. No more than 10% of survey respondents said they had AI agents “fully scaled” or were “in the process of scaling” in any of these areas. The one function with the most usage of scaled agents was IT (where agents are often used to automatically resolve service tickets or install software for employees), and even here only 2% reported having agents “fully scaled,” with an additional 8% saying they were “scaling.”

          A big part of the problem is that designing workflows for AI agents that will enable them to produce reliable results turns out to be difficult. Even the most capable of today’s AI models sit on a strange boundary—capable of doing certain tasks in a workflow as well as humans, but unable to do others. Complex tasks that involve gathering data from multiple sources and using software tools over many steps represent a particular challenge. The longer the workflow, the more risk that an error in one of the early steps in a process will compound, resulting in a failed outcome. Plus, the most capable AI models can be expensive to use at scale, especially if the workflow involves the agent having to do a lot of planning and reasoning.

          Many firms have sought to solve these problems by designing “multi-agent workflows,” where different agents are spun up, with each assigned just one discrete step in the workflow, including sometimes using one agent to check the work of another agent. This can improve performance, but it too can wind up being expensive—sometimes too expensive to make the workflow worth automating.

          Are two AI agents always better than one?

          Now a team at Google has conducted research that aims to give businesses a good rubric for deciding when it is better to use a single agent, as opposed to building a multi-agent workflow, and what type of multi-agent workflows might be best for a particular task.

          The researchers conducted 180 controlled experiments using AI models from Google, OpenAI, and Anthropic. They tried them against four different agentic AI benchmarks that covered a diverse set of goals: retrieving information from multiple websites; planning in a Minecraft game environment; planning and tool use to accomplish common business tasks such as answering emails, scheduling meetings, and using project management software; and a finance agent benchmark. That finance test requires agents to retrieve information from SEC filings and perform basic analytics, such as comparing actual results to management’s forecasts from the prior quarter, figuring out how revenue derived from a specific product segment has changed over time, or figuring out how much cash a company might have free for M&A activity.

          In the past year, the conventional wisdom has been that multi-agent workflows produce more reliable results. (I’ve previously written about this view, which has been backed up by the experience of some companies, such as Prosus, here in Eye on AI.) But the Google researchers found instead that whether the conventional wisdom held was highly contingent on exactly what the task was.

          Single agents do better at sequential steps, worse at parallel ones

          If the task was sequential, which was the case for many of the Minecraft benchmark tasks, then it turned out that so long as a single AI agent could perform the task accurately at least 45% of the time (which is a pretty low bar, in my opinion), then it was better to deploy just one agent. Using multiple agents, in any configuration, reduced overall performance by huge amounts, ranging between 39% and 70%. The reason, according to the researchers, is that if a company had a limited token budget for completing the entire task, then the demands of multiple agents trying to figure out how to use different tools would quickly overwhelm the budget.

          But if a task involved steps that could be performed in parallel, as was true for many of the financial analysis tasks, then multi-agent systems conveyed big advantages. What’s more, the researchers found that exactly how the agents are configured to work with one another makes a big difference, too. For the financial-analysis tasks, a centralized multi-agent system—where a single coordinator agent directs and oversees the activity of multiple sub-agents and all communication flows to and from the coordinator—produced the best result. This system performed 80% better than a single agent. Meanwhile, an independent multi-agent system, in which there is no coordinator and each agent is simply assigned a narrow role that they complete in parallel, was only 57% better than a single agent.

          Research like this should help companies figure out the best ways to configure AI agents and enable the technology to finally begin to deliver on last year’s promises. For those selling AI agent technology, late is better than never. For the people working in the businesses using AI agents, we’ll have to see what impact these agents have on the labor market. That’s a story we’ll be watching closely as we head into 2026.

          財富中文網所刊載內容之知識產權為財富媒體知識產權有限公司及/或相關權利人專屬所有或持有。未經許可,禁止進行轉載、摘編、復制及建立鏡像等任何使用。
          0條Plus
          精彩評論
          評論

          撰寫或查看更多評論

          請打開財富Plus APP

          前往打開