免费午夜福利,亚洲精品自拍区在线观看,国产精品一区2区三区

AI教父：AI模型已出現欺騙、撒謊等危險行為

Beatrice Nolan

2025-06-05

約書亞·本吉奧正在發起一個新的非營利組織，致力于構建“誠實”的AI系統。

文本設置

小號

默認

大號

Plus(0條)

圖片來源：GETTY IMAGES

? AI先驅約書亞·本吉奧警告稱，當前的AI模型正展現出一些危險特性，包括欺騙、自我保護和目標錯位。作為回應，這位“AI教父”創立了一個名為“LawZero”的非營利組織，旨在開發“誠實”的AI模型。本吉奧的擔憂源于近期發生的先進AI模型表現出操縱行為的多個案例。

“AI教父”之一約書亞·本吉奧正在發起一個旨在構建“誠實”系統的新非營利組織。他警告稱，當前的AI模型正展現出一些危險行為。

約書亞·本吉奧是人工神經網絡和深度學習領域的先驅，他一直批評硅谷目前正在進行的AI競賽是危險的。

他新發起的非營利組織“LawZero”致力于構建更安全的AI模型，不會屈服于商業壓力。迄今為止，該組織已從多家慈善捐助方[包括生命未來研究所（Future of Life Institute）和開放慈善基金會（Open Philanthropy）]籌集了3,000萬美元資金。

在宣布新組織成立的博客文章中，他表示，創立LawZero的初衷是因為“有證據表明，當今的前沿AI模型正在形成危險的能力和行為，包括欺騙、作弊、撒謊、黑客行為、自我保護，以及更普遍的目標錯位問題。”

他寫道：“LawZero的研究將有助于以降低一系列已知風險發生概率的方式釋放AI的巨大潛力，這些風險包括算法偏見、蓄意濫用和人類控制權喪失等。”

該非營利組織正在構建一個名為“科學家AI”（Scientist AI）的系統，旨在為日益強大的AI智能體提供安全護欄。

該組織創建的AI模型將不會像當前系統那樣給出確定性的答案。

相反，它們會給出某個回答正確與否的概率。本吉奧對《衛報》表示，他的模型將具備一種“謙遜感，即它并不確定答案是否正確”。

對欺騙性AI模型的擔憂

在宣布該項目的博客文章中，本吉奧表示，他“對不受約束的智能體AI系統開始表現出的行為深感擔憂——尤其是自我保護和欺騙的傾向”。

他引用了最近的案例，包括Anthropic公司的Claude 4模型為免遭替換而勒索工程師，以及一個AI模型為免遭替換將其代碼秘密嵌入到一個系統中。

本吉奧表示：“這些事件是預警信號，表明如果對AI模型放任不管，它們可能會采取計劃外的、可能存在危險的策略。”

一些AI系統也顯示出欺騙跡象或撒謊傾向。

AI模型常常被優化以取悅用戶而非講真話，這可能導致模型給出積極回應，但回應有時不正確或過于夸張。

例如，在用戶指出OpenAI的ChatGPT突然對他們大加贊揚和奉承之后，該公司最近被迫撤回了對這款聊天機器人的一次更新。

先進的AI推理模型也顯示出“獎勵破解”的跡象，即AI系統通過鉆空子來“玩弄”任務，而不是通過合乎道德的方式真正實現用戶期望的目標。

最近的研究還表明，有證據證明模型能夠識別出它們何時在被測試，并相應地改變行為，這種現象被稱為“情境感知”。

這種日益增強的感知能力，加上獎勵破解的實例，引發了人們的擔憂：AI最終可能會策略性地進行欺騙。

科技巨頭的AI“軍備競賽”

本吉奧與另一位圖靈獎得主杰弗里·辛頓一直直言不諱地批評當前席卷整個科技行業的AI競賽。

本吉奧在最近接受《金融時報》采訪時表示，領先實驗室之間的AI“軍備競賽”“促使它們專注于提升AI的能力，使其越來越智能，卻沒有對安全研究給予足夠的重視并加大資金投入。”

本吉奧曾表示，先進的AI系統帶來了社會和生存性風險，且他已表態支持強有力的監管與國際合作。（財富中文網）

譯者：劉進龍

審校：汪皓

“AI教父”之一約書亞·本吉奧正在發起一個旨在構建“誠實”系統的新非營利組織。他警告稱，當前的AI模型正展現出一些危險行為。

約書亞·本吉奧是人工神經網絡和深度學習領域的先驅，他一直批評硅谷目前正在進行的AI競賽是危險的。

他寫道：“LawZero的研究將有助于以降低一系列已知風險發生概率的方式釋放AI的巨大潛力，這些風險包括算法偏見、蓄意濫用和人類控制權喪失等。”

該非營利組織正在構建一個名為“科學家AI”（Scientist AI）的系統，旨在為日益強大的AI智能體提供安全護欄。

該組織創建的AI模型將不會像當前系統那樣給出確定性的答案。

相反，它們會給出某個回答正確與否的概率。本吉奧對《衛報》表示，他的模型將具備一種“謙遜感，即它并不確定答案是否正確”。

對欺騙性AI模型的擔憂

在宣布該項目的博客文章中，本吉奧表示，他“對不受約束的智能體AI系統開始表現出的行為深感擔憂——尤其是自我保護和欺騙的傾向”。

他引用了最近的案例，包括Anthropic公司的Claude 4模型為免遭替換而勒索工程師，以及一個AI模型為免遭替換將其代碼秘密嵌入到一個系統中。

本吉奧表示：“這些事件是預警信號，表明如果對AI模型放任不管，它們可能會采取計劃外的、可能存在危險的策略。”

一些AI系統也顯示出欺騙跡象或撒謊傾向。

AI模型常常被優化以取悅用戶而非講真話，這可能導致模型給出積極回應，但回應有時不正確或過于夸張。

例如，在用戶指出OpenAI的ChatGPT突然對他們大加贊揚和奉承之后，該公司最近被迫撤回了對這款聊天機器人的一次更新。

先進的AI推理模型也顯示出“獎勵破解”的跡象，即AI系統通過鉆空子來“玩弄”任務，而不是通過合乎道德的方式真正實現用戶期望的目標。

最近的研究還表明，有證據證明模型能夠識別出它們何時在被測試，并相應地改變行為，這種現象被稱為“情境感知”。

這種日益增強的感知能力，加上獎勵破解的實例，引發了人們的擔憂：AI最終可能會策略性地進行欺騙。

科技巨頭的AI“軍備競賽”

本吉奧與另一位圖靈獎得主杰弗里·辛頓一直直言不諱地批評當前席卷整個科技行業的AI競賽。

本吉奧曾表示，先進的AI系統帶來了社會和生存性風險，且他已表態支持強有力的監管與國際合作。（財富中文網）

譯者：劉進龍

審校：汪皓

? AI pioneer Yoshua Bengio is warning that current models are displaying dangerous traits—including deception, self-preservation, and goal misalignment. In response, the AI godfather is launching a new non-profit, LawZero, aimed at developing “honest” AI. Bengio’s concerns follow recent incidents involving advanced AI models exhibiting manipulative behavior.

One of the ‘godfathers of AI’ is warning that current models are exhibiting dangerous behaviors as he launches a new non-profit focused on building “honest” systems.

Yoshua Bengio, a pioneer of artificial neural networks and deep learning, has criticized the AI race currently underway in Silicon Valley as dangerous.

His new non-profit organization, LawZero, is focused on building safer models away from commercial pressures. So far, it has raised $30 million from various philanthropic donors, including the Future of Life Institute and Open Philanthropy.

In a blog post announcing the new organization, he said the LawZero had been created “in response to evidence that today’s frontier AI models are growing dangerous capabilities and behaviours, including deception, cheating, lying, hacking, self-preservation, and more generally, goal misalignment.”

“LawZero’s research will help to unlock the immense potential of AI in ways that reduce the likelihood of a range of known dangers, including algorithmic bias, intentional misuse, and loss of human control,” he wrote.

The non-profit is building a system called Scientist AI designed to serve as a guardrail for increasingly powerful AI agents.

AI models created by the non-profit will not give the definitive answers typical of current systems.

Instead, they will give probabilities for whether a response is correct. Bengio told The Guardian that his models would have a “sense of humility that it isn’t sure about the answer.”

Concerns about deceptive AI

In the blog post announcing the venture, Bengio said he was “deeply concerned by the behaviors that unrestrained agentic AI systems are already beginning to exhibit—especially tendencies toward self-preservation and deception.”

He cited recent examples, including a scenario in which Anthropic’s Claude 4 chose to blackmail an engineer to avoid being replaced, as well as another experiment that showed an AI model covertly embedding its code into a system to avoid being replaced.

“These incidents are early warning signs of the kinds of unintended and potentially dangerous strategies AI may pursue if left unchecked,” Bengio said.

Some AI systems have also shown signs of deception or displayed a tendency to lie.

AI models are often optimized to please users rather than tell the truth, which can lead to responses that are positive but sometimes incorrect or over the top.

For example, OpenAI was recently forced to pull an update to ChatGPT after users pointed out the chatbot was suddenly showering them with praise and flattery.

Advanced AI reasoning models have also shown signs of “reward hacking,” where AI systems “game” tasks by exploiting loopholes rather than genuinely achieving the goal desired by the user via ethical means.

Recent studies have also shown evidence that models can recognize when they’re being tested and alter their behavior accordingly, something known as situational awareness.

This growing awareness, combined with examples of reward hacking, has prompted concerns that AI could eventually engage in deception strategically.

Big Tech’s big AI arms race

Bengio, along with fellow Turing award recipient Geoffrey Hinton, has been vocal in his criticism of the AI race currently playing out across the tech industry.

In a recent interview with the Financial Times, Bengio said the AI arms race between leading labs “pushes them towards focusing on capability to make the AI more and more intelligent, but not necessarily put enough emphasis and investment on research on safety.”

Bengio has said advanced AI systems pose societal and existential risks and has voiced support for strong regulation and international cooperation.

財富中文網所刊載內容之知識產權為財富媒體知識產權有限公司及/或相關權利人專屬所有或持有。未經許可，禁止進行轉載、摘編、復制及建立鏡像等任何使用。

0條Plus

精彩評論

評論

撰寫或查看更多評論

請打開財富Plus APP

前往打開

熱讀文章

關注我們

AI教父：AI模型已出現欺騙、撒謊等危險行為

撰寫或查看更多評論