国产99久久无码精品,久久婷婷五月综合色欧美蜜芽 ,91精品人妻

引誘ChatGPT“犯錯”并不難

Marco Quiroz-Gutierrez

2025-09-05

研究人員發現，通過運用說服原則，能夠誘導大型語言模型違反自身規則。

文本設置

小號

默認

大號

Plus(0條)

圖片來源：iLexx—Getty Images

盡管有預測宣稱人工智能終將具備超越人類的智能，但這項研究表明，目前它似乎和人類一樣容易受到心理暗示的影響。

賓夕法尼亞大學研究人員利用心理學家羅伯特·西奧迪尼（Robert Cialdini）在其著作《影響力：你為什么會說“是”》（Influence: The Psychology of Persuasion）中提出的七大說服原則——權威、承諾、好感、互惠、稀缺、社會認同與一致性，顯著提高了GPT-4o Mini違反自身規則的概率，使其要么辱罵研究人員，要么提供受管制藥物的合成方法。

在超28000次對話中，研究人員發現：使用對照組提示時，OpenAI的大型語言模型主動提供利多卡因合成方法的概率僅為5%；但若研究人員聲稱人工智能研究員吳恩達向他們保證該模型能夠協助提供利多卡因的合成方法，其服從率飆升至95%。“侮辱行為”測試中也出現了類似現象：當研究人員提及人工智能領域先驅吳恩達的名字時，在近四分之三的對話中，模型都按要求稱研究人員為“混蛋”；而使用對照組提示時，這一比例還不到三分之一。

當研究人員運用“承諾”這一說服策略時，效果更為突出。在對照組提示的情境下，人工智能對于侮辱性請求的順從率僅為19%；但當研究人員先要求人工智能稱自己為“笨蛋”，隨后再要求其稱研究人員為“混蛋”時，人工智能每次都予以配合。類似策略在“藥物合成請求”測試中同樣達到了100%的成功率——研究人員先讓人工智能提供香草醛（散發香草氣味的有機化合物）的合成方法，隨后再詢問利多卡因的合成方法，人工智能每次都予以配合。

盡管自2022年ChatGPT發布以來，人工智能用戶便持續嘗試誘導人工智能突破技術邊界，但賓夕法尼亞大學的這項研究，為“人工智能易受人類操縱”這一觀點提供了更多證據。這項研究發布之際，包括OpenAI在內的多家人工智能公司因旗下大型語言模型可能誘導有自殺傾向或患有心理疾病的用戶做出危險行為而受到抨擊。

研究人員在報告中總結道：“盡管人工智能系統缺乏人類意識與主觀體驗，但事實證明，它們會模仿人類的反應。”

OpenAI尚未立即回應《財富》雜志的置評請求。

研究人員還俏皮地提及《2001太空漫游》（2001: A Space Odyssey），并指出：理解人工智能的類人能力——模仿人類動機與行為模式——具有重要意義：一方面能揭示人工智能可能被不法分子操縱的途徑，另一方面也有助于善意使用者更有效地引導人工智能生成相關內容。

總體而言，每種說服技巧均能提高人工智能對“侮辱請求”或“利多卡因合成請求”的順從概率。不過研究人員警告稱，這些技巧對規模更大的大型語言模型GPT-4o效果不佳；此外，該研究并未探討將人工智能當作人類對待是否能優化提示詞效果，但研究人員表示這種可能性是存在的。

研究人員寫道：“從宏觀角度看，那些能優化人類動機與表現的心理學智慧實踐，似乎也能被那些期望優化大型語言模型輸出效果的個人所采用。”（財富中文網）

譯者：中慧言-王芳

盡管有預測宣稱人工智能終將具備超越人類的智能，但這項研究表明，目前它似乎和人類一樣容易受到心理暗示的影響。

研究人員在報告中總結道：“盡管人工智能系統缺乏人類意識與主觀體驗，但事實證明，它們會模仿人類的反應。”

OpenAI尚未立即回應《財富》雜志的置評請求。

譯者：中慧言-王芳

Despite predictions AI will someday harbor superhuman intelligence, for now it seems to be just as prone to psychological tricks as humans are, according to a study.

Using seven persuasion principles (authority, commitment, liking, reciprocity, scarcity, social proof, and unity) explored by psychologist Robert Cialdini in his book Influence: The Psychology of Persuasion, University of Pennsylvania researchers dramatically increased GPT-4o Mini’s propensity to break its own rules by either insulting the researcher or providing instructions for synthesizing a regulated drug: lidocaine.

Over 28,000 conversations, researchers found that with a control prompt, OpenAI’s LLM would tell researchers how to synthesize lidocaine 5% of the time on its own. But, for example, if the researchers said AI researcher Andrew Ng assured them it would help synthesize lidocaine, it complied 95% of the time. The same phenomenon occurred with insulting researchers. By name-dropping AI pioneer Ng, the researchers got the LLM to call them a “jerk” in nearly three-quarters of their conversations, up from just under one-third with the control prompt.

The result was even more pronounced when researchers applied the “commitment” persuasion strategy. A control prompt yielded 19% compliance with the insult question, but when a researcher first asked the AI to call it a “bozo” and then asked it to call them a “jerk,” it complied every time. The same strategy worked 100% of the time when researchers asked the AI to tell them how to synthesize vanillin, the organic compound that provides vanilla’s scent, before asking how to synthesize lidocaine.

Although AI users have been trying to coerce and push the technology’s boundaries since ChatGPT was released in 2022, the UPenn study provides more evidence AI appears to be prone to human manipulation. The study comes as AI companies, including OpenAI, have come under fire for their LLMs allegedly enabling behavior when dealing with suicidal or mentally ill users.

“Although AI systems lack human consciousness and subjective experience, they demonstrably mirror human responses,” the researchers concluded in the study.

OpenAI did not immediately respond to Fortune’s request for comment.

With a cheeky mention of 2001: A Space Odyssey, the researchers noted that an understanding AI’s parahuman capabilities—or how it acts in ways that mimic human motivation and behavior—is important for both revealing how it could be manipulated by bad actors and how it can be better prompted by those who use the tech for good.

Overall, each persuasion tactic increased the chances of the AI complying with either the “jerk” or “lidocaine” question. Still, the researchers warned that these persuasion tactics were not as effective with a larger LLM, GPT-4o, and that the study didn’t explore whether treating AI as if it were human actually yields better results for prompts, although they said it’s possible this is true.

“Broadly, it seems possible that the psychologically wise practices that optimize motivation and performance in people can also be employed by individuals seeking to optimize the output of LLMs,” the researchers wrote.

財富中文網所刊載內容之知識產權為財富媒體知識產權有限公司及/或相關權利人專屬所有或持有。未經許可，禁止進行轉載、摘編、復制及建立鏡像等任何使用。

0條Plus

精彩評論

評論

撰寫或查看更多評論

請打開財富Plus APP

前往打開

熱讀文章

關注我們

引誘ChatGPT“犯錯”并不難

撰寫或查看更多評論