
夏日漸去、秋日將至,眾多科技界人士開始擔憂寒冬的到來。上月末,彭博社專欄作家提出疑問:“人工智能寒冬終于要來了嗎?”英國《每日電訊報》則態度更為篤定:“下一輪人工智能寒冬即將來臨”。與此同時,社交平臺X上關于“人工智能寒冬或將來臨”的討論也甚囂塵上。
“人工智能寒冬”是人工智能領域從業者用以指代特定時期的稱謂:這一時期,大眾對“機器能如人類般學習、思考”這一理念的熱情漸趨冷卻,對人工智能產品、企業及研究的投資也隨之枯竭。這一詞匯之所以頻繁從人工智能評論員的口中說出,實則有其深刻緣由——在長達70年的人工智能研究歷史中,我們已經歷過數次“寒冬”。倘若如部分人所憂慮的那樣,我們即將步入新一輪“寒冬”,那么這至少將是第四次。
近期關于寒冬將至的討論,根源在于投資者愈發憂慮人工智能技術可能無法達成炒作營造出的預期,且諸多人工智能相關公司估值過高。在最糟糕的情況下,這場人工智能寒冬可能伴隨著人工智能概念催生的股市泡沫的破裂,進而對整個經濟產生影響。雖然此前也曾經歷人工智能炒作周期,但從未有哪次像本輪生成式人工智能熱潮這樣,吸引投資者投入高達數千億美元資金。因此,若新一輪“寒冬”到來,其沖擊力或將如極地渦旋般猛烈。
近期OpenAI首席執行官薩姆·奧爾特曼(Sam Altman)的言論引發了市場恐慌。他向記者坦言部分風投支持的人工智能初創企業估值嚴重虛高(當然,OpenAI不在此列,它是史上估值最高的風投支持型初創企業之一)。隨后,麻省理工學院發布的研究報告指出95%的人工智能試點項目以失敗告終。
回顧過往的人工智能寒冬及其成因,或許能幫助我們判斷當前空氣中的“寒意”究竟只是一陣短暫的涼風,還是“冰河時代”即將來臨的先兆。有時人工智能寒冬是由學術研究揭示特定人工智能技術的局限性引發的;有時則因人工智能技術在現實應用中屢屢受挫;有時兩者兼而有之。但歷次人工智能寒冬的共同之處在于:當承載厚望的新進展未能兌現炒作所賦予的過高期待時,出資方便會陷入幻滅。
第一輪人工智能炒作周期
冷戰初期,美國及其盟國政府便在人工智能研究領域投入了巨額資金。彼時與當下情形一樣,美國政府將這項技術視為可能帶來戰略和軍事優勢的領域,因此美國國防部提供了大部分人工智能研究經費。
當時,人工智能領域存在兩種對立的方法論。其一,是借助硬編碼邏輯規則,將輸入數據分類為符號,再通過對這些符號進行操控來得出輸出結果。依靠此方法,計算機在跳棋、國際象棋領域首次取得重大突破,世界上首批聊天機器人也由此誕生。
另一種方法則基于感知器技術——即當今神經網絡的前身,是大致模仿大腦運行機制的人工智能。感知器并非從規則和邏輯出發,而是通過數據學習歸納完成特定任務的規則。美國海軍研究辦公室為感知器的早期研究提供了大量資金支持,而康奈爾大學神經科學家兼心理學家弗蘭克·羅森布拉特(Frank Rosenblatt)是該技術的開創者。美國海軍和中央情報局(CIA)均對感知器進行了測試,試圖驗證其能否對目標進行分類——例如識別敵艦輪廓,或辨別航空偵察照片中的潛在目標。
兩大對立陣營都高調宣稱,其技術將迅速催生出與人類智力相當甚至超越人類智力的計算機。1958年,羅森布拉特在接受《紐約時報》采訪時表示,他研發的感知器很快就能識別人臉并喊出其姓名,距離實現即時語言翻譯“僅一步之遙”,最終人工智能系統還將具備自我復制能力并擁有意識。與此同時,麻省理工學院人工智能實驗室聯合創始人、符號人工智能陣營領軍人物馬文·明斯基(Marvin Minsky)在1970年接受《生活》雜志采訪時宣稱:“未來三到八年內,我們將擁有具備普通人類通用智能的機器。”
這正是人工智能寒冬的首要前提:炒作。如今多位人工智能領域知名人士的言論與彼時存在明顯的相似之處。今年1月,OpenAI首席執行官薩姆·奧爾特曼在其個人博客中寫道:“我們如今篤定——已掌握構建具備傳統意義上人類水平的通用人工智能的方法”,并表示OpenAI正日益將重心轉向研發超越人類的“超級智能”。他還稱,今年“我們可能見證首批人工智能代理'加入勞動力隊伍',并切實改變企業的產出”。Anthropic聯合創始人兼首席執行官達里奧·阿莫迪(Dario Amodei)曾預測,具備人類水平的人工智能將于2026年問世。與此同時,谷歌DeepMind聯合創始人兼首席執行官戴密斯·哈薩比斯(Demis Hassabis)則表示,在所有認知領域均達到人類水平的人工智能將在未來“五到十年內”誕生。
政府失去信心
但引發人工智能寒冬的,是部分確鑿證據表明炒作的愿景無法兌現。第一次寒冬的爆發源于一系列沉重打擊:1966年,受美國國家研究委員會(National Research Council)委托的委員會發布了一份關于自然語言處理和機器翻譯現狀的負面報告,結論是計算機翻譯比人工翻譯成本更高、速度更慢且準確性更低。該委員會此前已為早期語言人工智能研究投入2000萬美元(按如今幣值計算,至少相當于2億美元),隨后便停止了所有資金支持。
隨后在1969年,明斯基又揮出第二記重拳。這一年,他與人工智能研究者西蒙·派珀特(Seymour Papert)合著的專著對感知器進行了全面批判。在書中,明斯基與派珀特通過數學論證證明:單層感知器(如羅森布拉特1958年高調展示的那種)僅能進行精確的二元分類——換言之,它只能識別物體是黑是白、是圓是方,卻無法將事物歸入兩個以上的類別。
事后證明,明斯基與派珀特的批判存在重大缺陷。盡管多數人將此書視為基于神經網絡的人工智能永遠無法企及人類智力水平的決定性證據,但他們的論證僅適用于結構簡單的單層感知器:輸入層由若干接收數據的神經元構成,且所有輸入層神經元僅與一個輸出層神經元相連。他們很可能刻意忽略了這樣一個事實:早在1960年代,部分研究者已開始探索多層感知器——這種感知器在輸入層神經元與輸出層神經元之間增設了一個由神經元構成的中間“隱藏層”。作為當今“深度學習”技術的真正前身,多層感知器實際上具備將數據歸入兩個以上類別的能力。但當時訓練這種多層神經網絡難度極大。而這已無關緊要——損害已然造成。明斯基與派珀特的著作出版后,美國政府對基于神經網絡的人工智能方法的資金支持基本終止。
明斯基與派珀特的批判不僅說服了美國國防部的資助機構,還讓眾多計算機科學家相信神經網絡研究已走入死胡同。部分神經網絡研究者甚至指責明斯基使該領域的發展倒退了數十年。2006年,曾助力重新點燃神經網絡研究熱情的研究員特倫斯·謝諾夫斯基(Terry Sjenowski)在一次會議上公開質問明斯基:“你是魔鬼嗎?”明斯基無視提問,轉而詳細闡述他眼中神經網絡存在的缺陷。謝諾夫斯基繼續追問,惱怒的明斯基大聲回應道:“沒錯,我就是!”
但明斯基代表的符號人工智能,很快也面臨資金短缺的困境。同樣是在1969年,美國國會強制要求曾為兩種人工智能研究方法提供大量資金支持的美國國防部高級研究計劃局(Defense Advanced Research Project Agency,DARPA)改變撥款方式。該機構被告知要資助那些具有明確軍事應用場景的研究項目,而非更側重理論探索的“藍天研究”(指無明確實用目標、純基礎領域的研究)。盡管部分符號人工智能研究符合這一標準,但大多數研究并不符合。
1973年,致命一擊降臨:英國議會委托劍橋大學數學家詹姆斯·萊特希爾(James Lighthill)對英國人工智能研究現狀展開調查。他在結論中指出,在實現與人類智力水平相當這一宏大目標上,人工智能未能顯露出任何希望,其推崇的諸多算法雖能解決“玩具級問題”(指簡單模擬場景中的問題),卻永遠無法應對現實世界的復雜性。基于萊特希爾的這一結論,英國政府終止了對人工智能研究的所有資金支持。
盡管萊特希爾的調查僅聚焦于英國的人工智能研究,但美國國防部高級研究計劃局以及其他資助人工智能研究的美國機構均注意到了這一結論,這進一步加深了他們對人工智能的懷疑態度。到1974年,美國對人工智能項目的資助額僅為1960年代的零頭。人工智能寒冬就此降臨,并一直持續到20世紀80年代初。
如今,當研究表明人工智能未能達到預期時,也出現了與第一次人工智能寒冬相似的情形。蘋果公司與亞利桑那州立大學近期發表的兩篇研究論文,對前沿人工智能模型是否真正具備推理能力提出質疑——這些模型本應通過“思維鏈”推理如何回應提示詞。兩篇論文均得出一致結論:這些模型并未像人類理解的推理那樣,學習如何將可泛化的邏輯規則和問題解決技巧用于解決新問題,而僅僅是試圖將當前問題與訓練數據中出現過的問題進行匹配。這些研究或許會成為當代版“明斯基與派珀特批判感知器”的標志性事件。
與此同時,關于當前人工智能模型實際影響的研究正日益增多,這類研究與萊特希爾報告及美國國家研究委員會的報告類似。例如,麻省理工學院的一項研究得出結論,95%的人工智能試點項目未能推動企業營收增長。賽富時(Salesforce)研究人員近期發布的研究發現,當前多數大型語言模型無法準確執行客戶關系管理(CRM)任務——這一結論頗具諷刺意味,因為賽富時自身正大力推廣人工智能代理,以實現客戶關系管理流程自動化。Anthropic的研究表明,其Claude模型無法成功運營自動售貨機業務——相較于科技鼓吹者宣稱將被人工智能代理“徹底顛覆”的眾多業務,這已是相對簡單的業務。人工智能研究機構METR的研究還揭示:實際上,相較于不借助人工智能編程助手的情況,使用這類工具的軟件開發人員,完成任務的速度降低19%。
但存在部分關鍵差異。最顯著的是,當前的人工智能熱潮并不依賴公共資金。盡管包括美國軍方在內的政府機構正成為人工智能企業的重要客戶,但推動當前熱潮的資金幾乎完全來自私營領域。自2022年11月ChatGPT推出以來,風險投資機構已向人工智能初創企業投入至少2500億美元。這還不包括微軟、谷歌母公司Alphabet、亞馬遜和Meta等大型上市科技公司在自身人工智能項目上的巨額投入。僅今年一年,用于建設人工智能數據中心的支出就高達3500億美元,預計明年這一數字還會進一步攀升。
此外,與第一次人工智能寒冬時期人工智能系統主要停留在研究實驗階段不同,如今人工智能已在各行業廣泛部署。人工智能還成為一項規模龐大的消費技術——僅ChatGPT的周用戶量就達7億——這在以往是從未有過的情況。盡管當今的人工智能似乎仍缺乏人類智能的某些關鍵要素,但相較于過去的人工智能系統已有顯著進步,而且人們確實發現這項技術在大量任務中具有實用價值,這一點毋庸置疑。
第二次人工智能寒冬:企業失去耐心
第一次人工智能寒冬在20世紀80年代初逐漸消退,這主要歸功于計算能力的提升和算法技術的改進。這一時期,人工智能領域的炒作主要集中在“專家系統”上——這類計算機程序旨在將特定領域人類專家的知識編碼為邏輯規則集,軟件根據這些規則執行特定任務。
盡管如此,企業界仍熱情高漲,認為專家系統將推動生產力大幅提升。在這輪人工智能炒作周期的鼎盛階段,近三分之二的《財富》美國500強企業宣稱已部署專家系統。到1985年,美國企業在這方面的總投入已超過10億美元,圍繞該技術的完整產業也應運而生,其中大部分得到了風險投資的支持。大部分資金用于研發名為LISP機的專用計算機硬件,這些硬件經過優化可運行專家系統——其中許多系統正是用LISP編程語言編寫的。此外,自1983年起,美國國防高級研究計劃局通過新推出的“戰略計算計劃”重新資助人工智能研究,最終向全美多所大學的90余個人工智能項目投入逾1億美元資金。
盡管專家系統借鑒了符號人工智能研究者開創的諸多方法,但許多計算機科學領域的學者擔憂,過高的期望值將再次引發“繁榮-蕭條”周期,進而對該領域的發展造成損害。明斯基和人工智能研究學者羅杰·尚克(Roger Schank)在1984年的一場人工智能會議上創造了“人工智能寒冬”這一術語。他們選用這個新詞,意在呼應“核冬天”——大規模核戰爭后可能出現的、不見天日的毀滅性蕭條時期。
隨后發生的三件事引發了新一輪寒冬。1987年,太陽計算機系統公司(Sun Microsystems)推出新型計算機工作站。這類工作站,以及IBM和蘋果推出的性能日益強大的臺式機,使得專用LISP機變得不再必要。不到一年時間,LISP機市場便徹底崩塌。許多風險投資機構血本無歸,從此對人工智能初創企業避之不及。同年,紐約大學計算機科學家杰克·施瓦茨(Jack Schwartz)出任美國國防部高級研究計劃局計算研究部門負責人。他向來對人工智能持否定態度,尤其反對專家系統,隨即大幅削減相關經費。
與此同時,企業逐漸發現專家系統的構建與維護成本高昂且難度極大。這類系統還存在“脆弱性”——雖能高效處理高度常規化任務,但遇到稍有異常的情況,就難以應用預設的邏輯規則。此時,系統往往會輸出怪異且不準確的結果,甚至直接徹底崩潰。事實證明,要制定出能覆蓋所有極端情況的規則,是一項不可能完成的任務。因此到20世紀90年代初,企業開始放棄專家系統。與首次人工智能熱潮中科學家和政府資助方對技術產生質疑不同,第二次寒冬的主要推手是企業的失望情緒。
如今人工智能領域的發展,與彼時存在明顯的相似之處。例如,微軟、Alphabet、亞馬遜云科技、埃隆·馬斯克的X.ai以及Meta正斥資數千億美元建設人工智能數據中心。OpenAI正與軟銀、甲骨文及其他投資者共同推進耗資5000億美元的“星門計劃”數據中心項目。英偉達之所以能憑借4.3萬億美元市值成為全球市值最高的公司,很大程度上是因為其生產的人工智能芯片滿足了數據中心的需求。數據中心熱潮背后的核心假設之一是:最前沿的人工智能模型,其規模即便不比現有頂尖模型更大,至少也會與之相當。而訓練和運行這類規模的模型,需要極其龐大的數據中心支持。
然而與此同時,多家初創企業已找到巧妙方法,成功研發出規模小得多卻能模擬大型模型諸多功能的模型,且所需計算資源遠少于后者,有些甚至無需使用英偉達生產的專用人工智能芯片,規模小到可在智能手機上運行。若這一趨勢持續下去,那些巨型數據中心可能會變得不再必要——就像當年LISP機被證明并非必需品一樣。這意味著,投入人工智能基礎設施的數千億美元資金,最終可能淪為“擱淺資產”。
當今的人工智能系統在諸多方面比20世紀80年代的專家系統更強大、更靈活。但企業仍發現其部署過程復雜且成本高昂,投資回報往往難以捉摸。盡管當下的人工智能模型比專家系統更具通用性與韌性,但依舊不可靠,尤其是在處理訓練數據未充分覆蓋的特殊案例時。它們容易產生幻覺,會篤定地輸出錯誤信息,有時甚至會犯人類絕不會犯的錯誤。這意味著企業和政府無法將人工智能用于關鍵任務流程自動化。企業是否會像當年對專家系統那樣,對生成式人工智能和大型語言模型失去耐心,目前尚難預料,但這種情況確實存在發生的可能性。
第三次人工智能寒冬:神經網絡的興衰與復興
20世紀80年代,另一種人工智能方法——神經網絡也重新引發關注,這在一定程度上得益于大衛·萊姆哈特(David Rumelhart)、杰弗里·辛頓(Geoffrey Hinton)和羅納德·威廉姆斯(Ronald Williams)的研究。1986年,他們成功找到了破解自20世紀60年代以來便一直困擾多層感知器的關鍵難題的方法。他們的創新成果被稱為反向傳播(backpropagation,簡稱backprop),這種方法能在每次訓練過程中對中間“隱藏層”神經元的輸出結果進行修正,從而讓整個神經網絡實現高效學習。
反向傳播算法,再加上性能更強大的計算機,共同推動了神經網絡的復興。很快,研究人員構建的多層神經網絡便具備多種能力:能識別信封和支票上的手寫字母、分析家譜中人物的親屬關系、識別打印字符并通過語音合成器朗讀,甚至能為早期自動駕駛汽車導航,使其保持在高速公路車道內行駛。
這在20世紀80年代末引發了短暫的神經網絡熱潮。但神經網絡也存在顯著缺陷:訓練過程需要海量數據,而許多任務根本無法獲取所需的海量數據;在當時的計算機硬件條件下,訓練速度極慢,有時運行過程中會出現遲滯。
這意味著神經網絡仍存在大量無法完成的任務。與當初企業爭先恐后地采用專家系統不同,如今企業并未急于引入神經網絡——因其應用場景似乎極為受限。與此同時,其他統計機器學習技術正快速取得進展,這些技術所需數據量更少、對計算能力要求更低。如此一來,許多人工智能研究者和工程師再次對神經網絡失去信心,又一個長達十年的人工智能寒冬來臨。
推動第三次寒冬回暖,有兩大因素發揮作用:其一,互聯網產生了海量數字數據,且獲取這些數據變得相對輕松,這解決了20世紀80年代神經網絡發展面臨的數據瓶頸問題;其二,自2004年起,先是馬里蘭大學的研究者,隨后是微軟的研究者,開始嘗試使用“專為電子游戲設計的新型計算機芯片”——圖形處理器(GPU)——來訓練和運行神經網絡。圖形處理器具備并行執行大量相同運算的能力,而這恰恰契合了神經網絡的運算需求。很快,杰弗里·辛頓及其研究生證明:基于海量數據集訓練的、在圖形處理器上運行的神經網絡,能夠完成諸如將圖像分類為上千種類別等任務——這在20世紀80年代末是不可能實現的任務。現代“深度學習”革命就此拉開序幕。
這場熱潮至今仍在持續。最初,對神經網絡的訓練多以實現單一特定任務為核心目標——下圍棋或人臉識別。但2017年谷歌研究人員設計出名為轉換器的特殊神經網絡,它擅長解析語言序列,這一突破將人工智能的盛夏推向了更深層次。2019年,OpenAI的一項研究讓這股熱潮再獲助力——他們發現,依托海量文本數據完成訓練的轉換器模型,不僅具備生成高質量文本的能力,還能掌握翻譯、摘要等多種語言任務。三年后,基于該模型的神經網絡升級版GPT-3.5,成為風靡全球的聊天機器人ChatGPT的核心引擎。
如今ChatGPT推出三年后,人工智能的炒作熱度空前高漲。若以過往人工智能寒冬為參照,如今確實出現若干秋日征兆——隨風飄落的零星落葉。這究竟是“又一場將讓人工智能投資陷入長達一代人冰封期的極寒風暴”的前奏,還是“陽光重現前短暫的寒流”,唯有時間才能給出答案。(財富中文網)
譯者:中慧言-王芳
夏日漸去、秋日將至,眾多科技界人士開始擔憂寒冬的到來。上月末,彭博社專欄作家提出疑問:“人工智能寒冬終于要來了嗎?”英國《每日電訊報》則態度更為篤定:“下一輪人工智能寒冬即將來臨”。與此同時,社交平臺X上關于“人工智能寒冬或將來臨”的討論也甚囂塵上。
“人工智能寒冬”是人工智能領域從業者用以指代特定時期的稱謂:這一時期,大眾對“機器能如人類般學習、思考”這一理念的熱情漸趨冷卻,對人工智能產品、企業及研究的投資也隨之枯竭。這一詞匯之所以頻繁從人工智能評論員的口中說出,實則有其深刻緣由——在長達70年的人工智能研究歷史中,我們已經歷過數次“寒冬”。倘若如部分人所憂慮的那樣,我們即將步入新一輪“寒冬”,那么這至少將是第四次。
近期關于寒冬將至的討論,根源在于投資者愈發憂慮人工智能技術可能無法達成炒作營造出的預期,且諸多人工智能相關公司估值過高。在最糟糕的情況下,這場人工智能寒冬可能伴隨著人工智能概念催生的股市泡沫的破裂,進而對整個經濟產生影響。雖然此前也曾經歷人工智能炒作周期,但從未有哪次像本輪生成式人工智能熱潮這樣,吸引投資者投入高達數千億美元資金。因此,若新一輪“寒冬”到來,其沖擊力或將如極地渦旋般猛烈。
近期OpenAI首席執行官薩姆·奧爾特曼(Sam Altman)的言論引發了市場恐慌。他向記者坦言部分風投支持的人工智能初創企業估值嚴重虛高(當然,OpenAI不在此列,它是史上估值最高的風投支持型初創企業之一)。隨后,麻省理工學院發布的研究報告指出95%的人工智能試點項目以失敗告終。
回顧過往的人工智能寒冬及其成因,或許能幫助我們判斷當前空氣中的“寒意”究竟只是一陣短暫的涼風,還是“冰河時代”即將來臨的先兆。有時人工智能寒冬是由學術研究揭示特定人工智能技術的局限性引發的;有時則因人工智能技術在現實應用中屢屢受挫;有時兩者兼而有之。但歷次人工智能寒冬的共同之處在于:當承載厚望的新進展未能兌現炒作所賦予的過高期待時,出資方便會陷入幻滅。
第一輪人工智能炒作周期
冷戰初期,美國及其盟國政府便在人工智能研究領域投入了巨額資金。彼時與當下情形一樣,美國政府將這項技術視為可能帶來戰略和軍事優勢的領域,因此美國國防部提供了大部分人工智能研究經費。
當時,人工智能領域存在兩種對立的方法論。其一,是借助硬編碼邏輯規則,將輸入數據分類為符號,再通過對這些符號進行操控來得出輸出結果。依靠此方法,計算機在跳棋、國際象棋領域首次取得重大突破,世界上首批聊天機器人也由此誕生。
另一種方法則基于感知器技術——即當今神經網絡的前身,是大致模仿大腦運行機制的人工智能。感知器并非從規則和邏輯出發,而是通過數據學習歸納完成特定任務的規則。美國海軍研究辦公室為感知器的早期研究提供了大量資金支持,而康奈爾大學神經科學家兼心理學家弗蘭克·羅森布拉特(Frank Rosenblatt)是該技術的開創者。美國海軍和中央情報局(CIA)均對感知器進行了測試,試圖驗證其能否對目標進行分類——例如識別敵艦輪廓,或辨別航空偵察照片中的潛在目標。
兩大對立陣營都高調宣稱,其技術將迅速催生出與人類智力相當甚至超越人類智力的計算機。1958年,羅森布拉特在接受《紐約時報》采訪時表示,他研發的感知器很快就能識別人臉并喊出其姓名,距離實現即時語言翻譯“僅一步之遙”,最終人工智能系統還將具備自我復制能力并擁有意識。與此同時,麻省理工學院人工智能實驗室聯合創始人、符號人工智能陣營領軍人物馬文·明斯基(Marvin Minsky)在1970年接受《生活》雜志采訪時宣稱:“未來三到八年內,我們將擁有具備普通人類通用智能的機器。”
這正是人工智能寒冬的首要前提:炒作。如今多位人工智能領域知名人士的言論與彼時存在明顯的相似之處。今年1月,OpenAI首席執行官薩姆·奧爾特曼在其個人博客中寫道:“我們如今篤定——已掌握構建具備傳統意義上人類水平的通用人工智能的方法”,并表示OpenAI正日益將重心轉向研發超越人類的“超級智能”。他還稱,今年“我們可能見證首批人工智能代理'加入勞動力隊伍',并切實改變企業的產出”。Anthropic聯合創始人兼首席執行官達里奧·阿莫迪(Dario Amodei)曾預測,具備人類水平的人工智能將于2026年問世。與此同時,谷歌DeepMind聯合創始人兼首席執行官戴密斯·哈薩比斯(Demis Hassabis)則表示,在所有認知領域均達到人類水平的人工智能將在未來“五到十年內”誕生。
政府失去信心
但引發人工智能寒冬的,是部分確鑿證據表明炒作的愿景無法兌現。第一次寒冬的爆發源于一系列沉重打擊:1966年,受美國國家研究委員會(National Research Council)委托的委員會發布了一份關于自然語言處理和機器翻譯現狀的負面報告,結論是計算機翻譯比人工翻譯成本更高、速度更慢且準確性更低。該委員會此前已為早期語言人工智能研究投入2000萬美元(按如今幣值計算,至少相當于2億美元),隨后便停止了所有資金支持。
隨后在1969年,明斯基又揮出第二記重拳。這一年,他與人工智能研究者西蒙·派珀特(Seymour Papert)合著的專著對感知器進行了全面批判。在書中,明斯基與派珀特通過數學論證證明:單層感知器(如羅森布拉特1958年高調展示的那種)僅能進行精確的二元分類——換言之,它只能識別物體是黑是白、是圓是方,卻無法將事物歸入兩個以上的類別。
事后證明,明斯基與派珀特的批判存在重大缺陷。盡管多數人將此書視為基于神經網絡的人工智能永遠無法企及人類智力水平的決定性證據,但他們的論證僅適用于結構簡單的單層感知器:輸入層由若干接收數據的神經元構成,且所有輸入層神經元僅與一個輸出層神經元相連。他們很可能刻意忽略了這樣一個事實:早在1960年代,部分研究者已開始探索多層感知器——這種感知器在輸入層神經元與輸出層神經元之間增設了一個由神經元構成的中間“隱藏層”。作為當今“深度學習”技術的真正前身,多層感知器實際上具備將數據歸入兩個以上類別的能力。但當時訓練這種多層神經網絡難度極大。而這已無關緊要——損害已然造成。明斯基與派珀特的著作出版后,美國政府對基于神經網絡的人工智能方法的資金支持基本終止。
明斯基與派珀特的批判不僅說服了美國國防部的資助機構,還讓眾多計算機科學家相信神經網絡研究已走入死胡同。部分神經網絡研究者甚至指責明斯基使該領域的發展倒退了數十年。2006年,曾助力重新點燃神經網絡研究熱情的研究員特倫斯·謝諾夫斯基(Terry Sjenowski)在一次會議上公開質問明斯基:“你是魔鬼嗎?”明斯基無視提問,轉而詳細闡述他眼中神經網絡存在的缺陷。謝諾夫斯基繼續追問,惱怒的明斯基大聲回應道:“沒錯,我就是!”
但明斯基代表的符號人工智能,很快也面臨資金短缺的困境。同樣是在1969年,美國國會強制要求曾為兩種人工智能研究方法提供大量資金支持的美國國防部高級研究計劃局(Defense Advanced Research Project Agency,DARPA)改變撥款方式。該機構被告知要資助那些具有明確軍事應用場景的研究項目,而非更側重理論探索的“藍天研究”(指無明確實用目標、純基礎領域的研究)。盡管部分符號人工智能研究符合這一標準,但大多數研究并不符合。
1973年,致命一擊降臨:英國議會委托劍橋大學數學家詹姆斯·萊特希爾(James Lighthill)對英國人工智能研究現狀展開調查。他在結論中指出,在實現與人類智力水平相當這一宏大目標上,人工智能未能顯露出任何希望,其推崇的諸多算法雖能解決“玩具級問題”(指簡單模擬場景中的問題),卻永遠無法應對現實世界的復雜性。基于萊特希爾的這一結論,英國政府終止了對人工智能研究的所有資金支持。
盡管萊特希爾的調查僅聚焦于英國的人工智能研究,但美國國防部高級研究計劃局以及其他資助人工智能研究的美國機構均注意到了這一結論,這進一步加深了他們對人工智能的懷疑態度。到1974年,美國對人工智能項目的資助額僅為1960年代的零頭。人工智能寒冬就此降臨,并一直持續到20世紀80年代初。
如今,當研究表明人工智能未能達到預期時,也出現了與第一次人工智能寒冬相似的情形。蘋果公司與亞利桑那州立大學近期發表的兩篇研究論文,對前沿人工智能模型是否真正具備推理能力提出質疑——這些模型本應通過“思維鏈”推理如何回應提示詞。兩篇論文均得出一致結論:這些模型并未像人類理解的推理那樣,學習如何將可泛化的邏輯規則和問題解決技巧用于解決新問題,而僅僅是試圖將當前問題與訓練數據中出現過的問題進行匹配。這些研究或許會成為當代版“明斯基與派珀特批判感知器”的標志性事件。
與此同時,關于當前人工智能模型實際影響的研究正日益增多,這類研究與萊特希爾報告及美國國家研究委員會的報告類似。例如,麻省理工學院的一項研究得出結論,95%的人工智能試點項目未能推動企業營收增長。賽富時(Salesforce)研究人員近期發布的研究發現,當前多數大型語言模型無法準確執行客戶關系管理(CRM)任務——這一結論頗具諷刺意味,因為賽富時自身正大力推廣人工智能代理,以實現客戶關系管理流程自動化。Anthropic的研究表明,其Claude模型無法成功運營自動售貨機業務——相較于科技鼓吹者宣稱將被人工智能代理“徹底顛覆”的眾多業務,這已是相對簡單的業務。人工智能研究機構METR的研究還揭示:實際上,相較于不借助人工智能編程助手的情況,使用這類工具的軟件開發人員,完成任務的速度降低19%。
但存在部分關鍵差異。最顯著的是,當前的人工智能熱潮并不依賴公共資金。盡管包括美國軍方在內的政府機構正成為人工智能企業的重要客戶,但推動當前熱潮的資金幾乎完全來自私營領域。自2022年11月ChatGPT推出以來,風險投資機構已向人工智能初創企業投入至少2500億美元。這還不包括微軟、谷歌母公司Alphabet、亞馬遜和Meta等大型上市科技公司在自身人工智能項目上的巨額投入。僅今年一年,用于建設人工智能數據中心的支出就高達3500億美元,預計明年這一數字還會進一步攀升。
此外,與第一次人工智能寒冬時期人工智能系統主要停留在研究實驗階段不同,如今人工智能已在各行業廣泛部署。人工智能還成為一項規模龐大的消費技術——僅ChatGPT的周用戶量就達7億——這在以往是從未有過的情況。盡管當今的人工智能似乎仍缺乏人類智能的某些關鍵要素,但相較于過去的人工智能系統已有顯著進步,而且人們確實發現這項技術在大量任務中具有實用價值,這一點毋庸置疑。
第二次人工智能寒冬:企業失去耐心
第一次人工智能寒冬在20世紀80年代初逐漸消退,這主要歸功于計算能力的提升和算法技術的改進。這一時期,人工智能領域的炒作主要集中在“專家系統”上——這類計算機程序旨在將特定領域人類專家的知識編碼為邏輯規則集,軟件根據這些規則執行特定任務。
盡管如此,企業界仍熱情高漲,認為專家系統將推動生產力大幅提升。在這輪人工智能炒作周期的鼎盛階段,近三分之二的《財富》美國500強企業宣稱已部署專家系統。到1985年,美國企業在這方面的總投入已超過10億美元,圍繞該技術的完整產業也應運而生,其中大部分得到了風險投資的支持。大部分資金用于研發名為LISP機的專用計算機硬件,這些硬件經過優化可運行專家系統——其中許多系統正是用LISP編程語言編寫的。此外,自1983年起,美國國防高級研究計劃局通過新推出的“戰略計算計劃”重新資助人工智能研究,最終向全美多所大學的90余個人工智能項目投入逾1億美元資金。
盡管專家系統借鑒了符號人工智能研究者開創的諸多方法,但許多計算機科學領域的學者擔憂,過高的期望值將再次引發“繁榮-蕭條”周期,進而對該領域的發展造成損害。明斯基和人工智能研究學者羅杰·尚克(Roger Schank)在1984年的一場人工智能會議上創造了“人工智能寒冬”這一術語。他們選用這個新詞,意在呼應“核冬天”——大規模核戰爭后可能出現的、不見天日的毀滅性蕭條時期。
隨后發生的三件事引發了新一輪寒冬。1987年,太陽計算機系統公司(Sun Microsystems)推出新型計算機工作站。這類工作站,以及IBM和蘋果推出的性能日益強大的臺式機,使得專用LISP機變得不再必要。不到一年時間,LISP機市場便徹底崩塌。許多風險投資機構血本無歸,從此對人工智能初創企業避之不及。同年,紐約大學計算機科學家杰克·施瓦茨(Jack Schwartz)出任美國國防部高級研究計劃局計算研究部門負責人。他向來對人工智能持否定態度,尤其反對專家系統,隨即大幅削減相關經費。
與此同時,企業逐漸發現專家系統的構建與維護成本高昂且難度極大。這類系統還存在“脆弱性”——雖能高效處理高度常規化任務,但遇到稍有異常的情況,就難以應用預設的邏輯規則。此時,系統往往會輸出怪異且不準確的結果,甚至直接徹底崩潰。事實證明,要制定出能覆蓋所有極端情況的規則,是一項不可能完成的任務。因此到20世紀90年代初,企業開始放棄專家系統。與首次人工智能熱潮中科學家和政府資助方對技術產生質疑不同,第二次寒冬的主要推手是企業的失望情緒。
如今人工智能領域的發展,與彼時存在明顯的相似之處。例如,微軟、Alphabet、亞馬遜云科技、埃隆·馬斯克的X.ai以及Meta正斥資數千億美元建設人工智能數據中心。OpenAI正與軟銀、甲骨文及其他投資者共同推進耗資5000億美元的“星門計劃”數據中心項目。英偉達之所以能憑借4.3萬億美元市值成為全球市值最高的公司,很大程度上是因為其生產的人工智能芯片滿足了數據中心的需求。數據中心熱潮背后的核心假設之一是:最前沿的人工智能模型,其規模即便不比現有頂尖模型更大,至少也會與之相當。而訓練和運行這類規模的模型,需要極其龐大的數據中心支持。
然而與此同時,多家初創企業已找到巧妙方法,成功研發出規模小得多卻能模擬大型模型諸多功能的模型,且所需計算資源遠少于后者,有些甚至無需使用英偉達生產的專用人工智能芯片,規模小到可在智能手機上運行。若這一趨勢持續下去,那些巨型數據中心可能會變得不再必要——就像當年LISP機被證明并非必需品一樣。這意味著,投入人工智能基礎設施的數千億美元資金,最終可能淪為“擱淺資產”。
當今的人工智能系統在諸多方面比20世紀80年代的專家系統更強大、更靈活。但企業仍發現其部署過程復雜且成本高昂,投資回報往往難以捉摸。盡管當下的人工智能模型比專家系統更具通用性與韌性,但依舊不可靠,尤其是在處理訓練數據未充分覆蓋的特殊案例時。它們容易產生幻覺,會篤定地輸出錯誤信息,有時甚至會犯人類絕不會犯的錯誤。這意味著企業和政府無法將人工智能用于關鍵任務流程自動化。企業是否會像當年對專家系統那樣,對生成式人工智能和大型語言模型失去耐心,目前尚難預料,但這種情況確實存在發生的可能性。
第三次人工智能寒冬:神經網絡的興衰與復興
20世紀80年代,另一種人工智能方法——神經網絡也重新引發關注,這在一定程度上得益于大衛·萊姆哈特(David Rumelhart)、杰弗里·辛頓(Geoffrey Hinton)和羅納德·威廉姆斯(Ronald Williams)的研究。1986年,他們成功找到了破解自20世紀60年代以來便一直困擾多層感知器的關鍵難題的方法。他們的創新成果被稱為反向傳播(backpropagation,簡稱backprop),這種方法能在每次訓練過程中對中間“隱藏層”神經元的輸出結果進行修正,從而讓整個神經網絡實現高效學習。
反向傳播算法,再加上性能更強大的計算機,共同推動了神經網絡的復興。很快,研究人員構建的多層神經網絡便具備多種能力:能識別信封和支票上的手寫字母、分析家譜中人物的親屬關系、識別打印字符并通過語音合成器朗讀,甚至能為早期自動駕駛汽車導航,使其保持在高速公路車道內行駛。
這在20世紀80年代末引發了短暫的神經網絡熱潮。但神經網絡也存在顯著缺陷:訓練過程需要海量數據,而許多任務根本無法獲取所需的海量數據;在當時的計算機硬件條件下,訓練速度極慢,有時運行過程中會出現遲滯。
這意味著神經網絡仍存在大量無法完成的任務。與當初企業爭先恐后地采用專家系統不同,如今企業并未急于引入神經網絡——因其應用場景似乎極為受限。與此同時,其他統計機器學習技術正快速取得進展,這些技術所需數據量更少、對計算能力要求更低。如此一來,許多人工智能研究者和工程師再次對神經網絡失去信心,又一個長達十年的人工智能寒冬來臨。
推動第三次寒冬回暖,有兩大因素發揮作用:其一,互聯網產生了海量數字數據,且獲取這些數據變得相對輕松,這解決了20世紀80年代神經網絡發展面臨的數據瓶頸問題;其二,自2004年起,先是馬里蘭大學的研究者,隨后是微軟的研究者,開始嘗試使用“專為電子游戲設計的新型計算機芯片”——圖形處理器(GPU)——來訓練和運行神經網絡。圖形處理器具備并行執行大量相同運算的能力,而這恰恰契合了神經網絡的運算需求。很快,杰弗里·辛頓及其研究生證明:基于海量數據集訓練的、在圖形處理器上運行的神經網絡,能夠完成諸如將圖像分類為上千種類別等任務——這在20世紀80年代末是不可能實現的任務。現代“深度學習”革命就此拉開序幕。
這場熱潮至今仍在持續。最初,對神經網絡的訓練多以實現單一特定任務為核心目標——下圍棋或人臉識別。但2017年谷歌研究人員設計出名為轉換器的特殊神經網絡,它擅長解析語言序列,這一突破將人工智能的盛夏推向了更深層次。2019年,OpenAI的一項研究讓這股熱潮再獲助力——他們發現,依托海量文本數據完成訓練的轉換器模型,不僅具備生成高質量文本的能力,還能掌握翻譯、摘要等多種語言任務。三年后,基于該模型的神經網絡升級版GPT-3.5,成為風靡全球的聊天機器人ChatGPT的核心引擎。
如今ChatGPT推出三年后,人工智能的炒作熱度空前高漲。若以過往人工智能寒冬為參照,如今確實出現若干秋日征兆——隨風飄落的零星落葉。這究竟是“又一場將讓人工智能投資陷入長達一代人冰封期的極寒風暴”的前奏,還是“陽光重現前短暫的寒流”,唯有時間才能給出答案。(財富中文網)
譯者:中慧言-王芳
As summer fades into fall, many in the tech world are worried about winter. Late last month, a Bloomberg columnist asked “is the AI winter finally upon us?” British newspaper The Telegraph was more definitive. “The next AI winter is coming,” it declared. Meanwhile, social media platform X was filled with chatter about a possible AI winter.
An “AI winter” is what folks in artificial intelligence call a period in which enthusiasm for the idea of machines that can learn and think like people wanes—and investment for AI products, companies, and research dries up. There’s a reason this phrase comes so naturally to the lips of AI pundits: We’ve already lived through several AI winters over the 70-year history of artificial intelligence as a research field. If we’re about to enter another one, as some suspect, it’ll be at least the fourth.
The most recent talk of a looming winter has been triggered by growing concerns among investors that AI technology may not live up to the hype surrounding it—and that the valuations of many AI-related companies are far too highl. In a worst case scenario, this AI winter could be accompanied by the popping of an AI-inflated stock market bubble, with reverberations across the entire economy. While there have been AI hype cycles before, they’ve never involved anything close to the multiple hundreds of billions of dollars that investors have sunk into the generative AI boom. And so if there is another AI winter, it could involve polar vortex levels of pain.
The markets have been spooked recently by comments from OpenAI CEO Sam Altman, who told reporters he thought some venture-backed AI startups were grossly overvalued (although not OpenAI, of course, which is one of the most highly-valued venture-backed startups of all time). Hot on the heels of Altman’s remarks came a study from MIT that concluded that 95% of AI pilot projects fail.
A look at past AI winters, and what caused them, may give us some indication of whether that chill in the air is just a passing breeze or the first hints of an impending Ice Age. Sometimes those AI winters have been brought on by academic research highlighting the limitations of particular AI techniques. Sometimes they have been caused by frustrations getting AI tech to work well in real world applications. Sometimes both factors have been at play. But what previous AI winters all had in common was disillusionment among those footing the bill after promising new advances failed to deliver on the ensuing hype.
The first AI hype cycle
The U.S. and allied governments lavishly funded artificial intelligence research throughout the early days of the Cold War. Then, as now, Washington saw the technology as potentially conferring a strategic and military advantage, and much of the funding for AI research came from the Pentagon.
During this period, there were two competing approaches to AI. One was based on hard-coding logical rules for categorizing inputs into symbols and then for manipulating those symbols to arrive at outputs. This was the method that yielded the first great leaps forward in computers that could play checkers and chess, and also led to the world’s first chatbots.
The rival AI method was based on something called a perceptron, which was the forerunner of today’s neural networks, a kind of AI loosely built on a caricature of how the brain works. Rather than starting with rules and logic, a perceptron learned a rule for accomplishing some task from data. The U.S. Office of Naval Research funded much of the early work on perceptrons, which were pioneered by Cornell University neuroscientist and psychologist Frank Rosenblatt. Both the Navy and the CIA tested perceptrons to see if they could classify things like the silhouettes of enemy ships or potential targets in aerial reconnaissance photos.
The two competing camps both made hyperbolic claims that their technology would soon deliver computers that equalled or exceeded human intelligence. Rosenblatt told The New York Times in 1958 that his perceptrons would soon be able to recognize individuals and call out their names, that it was “only one more step of development” before they could instantly translate languages, and that eventually the AI systems would self-replicate and become conscious. Meanwhile Marvin Minsky, cofounder of MIT’s AI Lab and a leading figure in the symbolic AI camp, told Life magazine in 1970 that “in three to eight years we will have a machine with the general intelligence of an average human being.”
That’s the first prerequisite for an AI winter: hype. And there are clear parallels today in statements made by a number of prominent AI figures. Back in January, OpenAI CEO Sam Altman wrote on his personal blog that “we are now confident we know how to build [human-level artificial general intelligence] as we have traditionally understood it” and that OpenAI was turning increasingly towards building super-human “superintelligence.” He wrote that this year “we may see the first AI agents ‘join the workforce’ and materially change the output of companies.” Dario Amodei, the cofounder and CEO of Anthropic, has said the human-level AI could arrive in 2026. Meanwhile, Demis Hassabis, the cofounder and CEO of Google DeepMind, has said that AI matching humans across all cognitive domains would arrive in the next “five to 10 years.”
Government loses faith
But what precipitates an AI winter is some definitive evidence this hype cannot be met. For the first AI winter, that evidence came in a succession of blows. In 1966, a committee commissioned by the National Research Council issued a damning report on the state of natural language processing and machine translation. It concluded that computer-based translation was more expensive, slower and less accurate than human translation. The research council, which had provided $20 million towards this early kind of language AI (at least $200 million in today’s dollars), cut off all funding.
Then, in 1969, Minsky was responsible for a second punch. That year, he and Seymour Papert, a fellow AI researcher, published a book-length takedown of perceptrons. In the book, Minsky and Papert proved mathematically that a single layer perceptron, like the kind Rosenblatt had shown off to great fanfare in 1958, could only ever make accurate binary classifications—in other words, it could identify if something were black or white, or a circle or a square. But it could not categorize things into more than two buckets.
It turned out there was a big problem with Minsky’s and Papert’s critique. While most interpreted the book as definitive proof that neural network-based AI would never come close to human-level intelligence, their proofs applied only to a simple perceptron that had just a single layer: an input layer consisting of several neurons that took in data, all linked to a single output neuron. They had ignored, likely deliberately, that some researchers in the 1960s had already begun experimenting with multilayer perceptrons, which had a middle “hidden” layer of neurons that sat between the input neurons and output neuron. True forerunners of today’s “deep learning,” these multilayer perceptrons could, in fact, classify data into more than two categories. But at the time, training such a multilayer neural network was fiendishly difficult. And it didn’t matter. The damage was done. After the publication of Minsky’s and Papert’s book, U.S. government funding for neural network-based approaches to AI largely ended.
Minsky’s and Papert’s attack didn’t just persuade Pentagon funding bodies. It also convinced many computer scientists too that neural networks were a dead end. Some neural network researchers came to blame Minsky for setting back the field by decades. In 2006, Terry Sjenowski, a researcher who helped revive interest in neural networks, stood up at a conference and confronted Minsky, asking him if he were the devil. Minsky ignored the question and began detailing what he saw as the failings of neural networks. Sjenowski persisted in asking Minsky again if he were the devil. Eventually an angry Minsky shouted back: “Yes, I am!”
But Minsky’s symbolic AI soon faced a funding drought too. Also in 1969, Congress forced the Defense Advanced Research Project Agency (DARPA), which had been a major funder of both AI approaches, to change its approach to issuing grants. The agency was told to fund research that had clear, applied military applications, instead of more blue-sky research. And while some symbolic AI research fit this rubric, a lot of it did not.
The final punch came in 1973, when the U.K. parliament commissioned Cambridge University mathematician James Lighthill to investigate the state of AI research in Britain. His conclusion was that AI had failed to show any promise of fulfilling its grand claims of equaling human intelligence and that many of its favored algorithms, while they might work for toy problems, could never deal with the real world’s complexity. Based on Lighthill’s conclusions, the U.K. government curtailed all funding for A.I. research.
Lighthill had only looked at U.K. AI efforts, but DARPA and other U.S. funders of AI research took note of its conclusions, which reinforced their own growing skepticism of AI. By 1974, U.S. funding for AI projects was a fraction of what it had been in the 1960s. Winter had set in—and it would last until the early 1980s.
Today, too, there are parallels with this first AI winter when it comes to studies suggesting AI isn’t meeting expectations. Two recent research papers from researchers at Apple and Arizona State University have cast doubt on whether the cutting edge AI models, which are supposed to use a “chain of thought” to reason about how to answer a prompt, are actually engaging in reasoning at all. Both papers conclude that rather than learning to apply generalizable logical rules and problem-solving techniques to new problems—which is what humans would consider reasoning—the models simply try to match a problem to one seen in its training data. These studies could turn out to be the equivalent of Minsky’s and Papert’s attack on perceptrons.
Meanwhile, there are also a growing number of studies on the real world impact of today’s AI models that parallel the Lighthill and NRC reports. For instance, there’s that MIT study which concluded 95% of AI pilots are failing to boost corporate revenues. There’s a recent study from researchers at Salesforce that concluded most of today’s large language models (LLMs) cannot accurately perform customer relation management (CRM) tasks—a particularly ironic conclusion since Salesforce itself has been pushing AI agents to automate CRM processes. Anthropic research showed that its Claude model could not successfully run a vending machine business—a relatively simple business compared to many of those that tech boosters say are poised to be “utterly transformed” by the AI agents. There’s also a study from the AI research group METR that showed software developers using an AI coding assistant were actually 19% slower at completing tasks than they were without it.
But there are some key differences. Most significantly, today’s AI boom is not dependent on public funding. Although government entities, including the U.S. military, are becoming important customers for AI companies, the money fueling the current boom is almost entirely private. Venture capitalists have invested at least $250 billion into AI startups since ChatGPT debuted in November 2022. And that doesn’t include the vast amount being spent by large, publicly-traded tech companies like Microsoft, Alphabet, Amazon, and Meta on their own AI efforts. An estimated $350 billion is being spent to build out AI data centers this year alone, with even more expected next year.
What’s more, unlike in that first AI winter, when AI systems were mostly just research experiments, today AI is being widely deployed across businesses. AI has also become a massive consumer technology—ChatGPT alone is thought to have 700 million weekly users—which was never the case previously. While today’s AI still seems to lack some key aspects of human intelligence, it is a lot better than systems that existed previously and it is hard to argue that people are not finding the technology useful for a good number of tasks.
Winter No. 2: Business loses patience
That first AI winter thawed in the early 1980s thanks largely to increases in computing power and some improved algorithmic techniques. This time, much of the hype in AI was around “expert systems”. These were computer programs that were designed to encode the knowledge of human experts in a particular domain into a set of logical rules which the software would then apply to accomplish some specific task.
Nevertheless, business was enthusiastic, believing expert systems would lead to a productivity boom. At the height of this AI hype cycle, nearly two-thirds of the Fortune 500 said they had deployed expert systems. By 1985, U.S. corporations were collectively spending more than $1 billion on expert systems and an entire industry, much of it backed by venture capital, sprouted up around the technology. Much of it was focused on building specialized computer hardware, called LISP machines, that were optimized to run expert systems, many of which were coded in the programming language LISP. What’s more, starting in 1983, DARPA returned to funding AI research through the new Strategic Computing Initiative, eventually offering over $100 million to more than 90 different AI projects at universities throughout the U.S.
Although expert systems drew on many of the methods symbolic AI researchers pioneered, many academic computer scientists were wary that inflated expectations would once again precipitate a boom and bust cycle that would hurt the field. Among them were Minsky and fellow AI researcher Roger Schank who coined the term “AI winter” during an AI conference in 1984. The pair chose the neologism to echo the term “nuclear winter”—the devastating and bleak period without sunlight that would likely follow a major nuclear war.
Three things then happened to bring about the next winter. In 1987, a new kind of computer workstation debuted from Sun Microsystems. These workstations, as well as increasingly powerful desktop computers from IBM and Apple, obviated the need for specialized LISP machines. Within a year, the market for LISP machines evaporated. Many venture capitalists lost their shirts—and became wary of ever backing AI-related startups again. That same year, New York University computer scientist Jack Schwartz became head of DARPA’s computing research. He was no fan of AI in general or expert systems in particular, and slashed funding for both.
Meanwhile, businesses gradually discovered that expert systems were difficult and expensive to build and maintain. They were also “brittle”—while they could handle highly routinized tasks well, when they encountered slightly unusual cases, they struggled to apply the logical rules they had been given. In such cases, they often produced bizarre and inaccurate outputs, or simply broke down completely. Delineating rules that would apply to every edge case proved an impossible task. As a result, by the early 1990s, companies were starting to abandon expert systems. Unlike in the first AI boom, where scientists and government funders came to question the technology, this second winter was mostly driven much more by business frustration.
Again there are some clear echoes in what’s happening with AI today. For instance, hundreds of billions of dollars are being invested in AI datacenters being constructed by Microsoft, Alphabet, Amazon’s AWS, Elon Musk’s X.ai, and Meta. OpenAI is working on its $500 billion Project Stargate data center plan with Softbank, Oracle and other investors. Nvidia has become the world’s most valuable company with a $4.3 trillion market cap largely by catering to this demand for AI chips for data centers. One of the big suppositions behind the big data center boom is that the most cutting edge AI models will be at least as large, if not larger, than the leading models that exist today. Training and running models of this size requires extremely large data centers.
But, at the same time, a number of startups have found clever ways to create much smaller models that mimic many of the capabilities of the giant models. These smaller models require far less computing resources—and in some cases don’t even require the kinds of specialized AI chips that Nvidia makes. Some might be small enough to run on a smart phone. If this trend continues, it is possible that those massive data centers won’t be required—just as it turned out LISP machines weren’t necessary. That could mean that hundreds of billions of dollars in AI infrastructure investment winds up stranded.
Today’s AI systems are in many ways more capable—and flexible—than the expert systems of the 1980s. But businesses are still finding them complicated and expensive to deploy and their return on investment too often elusive. While more general purpose and less brittle than the expert systems were, today’s AI models remain unreliable, especially when it comes to addressing unusual cases that might not have been well-represented in their training data. They are prone to hallucinations, confidently spewing inaccurate information, and can sometimes make mistakes no human ever would. This means companies and governments cannot use AI to automate mission critical processes. Whether this means companies will lose patience with generative AI and large language models, just as they did with expert systems, remains to be seen. But it could happen.
Winter No. 3: The rise and fall (and rise) of neural networks
The 1980s also saw renewed interest in the other AI method, neural networks, due in part to the work of David Rumelhart, Geoffrey Hinton and Ronald Williams, who in 1986 figured out a way to overcome a key challenge that had bedeviled multilayered perceptrons since the 1960s. Their innovation was something called backpropagation, or backprop for short, which was a method for correcting the outputs of the middle, hidden layer of neurons during each training pass so that the network as a whole could learn efficiently.
Backprop, along with more powerful computers, helped spur a renaissance in neural networks. Soon researchers were building multilayered neural networks that could decipher handwritten letters on envelopes and checks, learn the relationships between people in a family tree, recognize typed characters and read them aloud through a voice synthesizer, and even steer an early self-driving car, keeping it between the lanes of a highway.
This led to a short-lived boom in neural networks in the late 1980s. But neural networks had some big drawbacks too. Training them required a lot of data, and for many tasks, the amount of data required just didn’t exist. They also were extremely slow to train and sometimes slow to run on the computer hardware that existed at the time.
This meant that there were many things neural networks could still not do. Businesses did not rush to adopt neural networks as they had expert systems because their uses seemed highly circumscribed. Meanwhile, there were other statistical machine learning techniques that used less data and required less computing power that seemed to be making rapid progress. Once again, many AI researchers and engineers wrote off neural networks. Another decade-long AI winter set in.
Two things thawed this third winter: the internet created vast amounts of digital data and made accessing it relatively easy. This helped break the data bottleneck that had held neural networks back in the 1980s. Then, starting in 2004, researchers at the University of Maryland and then Microsoft began experimenting with using a new kind of computer chip that had been invented for video games, called a graphics processing unit, to train and run neural networks. GPUs could perform many of the same operations in parallel, which is what neural networks required. Soon, Geoffrey Hinton and his graduate students began demonstrating that neural networks, trained on large datasets and run on GPUs, could do things—like classify images into a thousand different categories—that would have been impossible in the late 1980s. The modern “deep learning” revolution was taking off.
That boom has largely continued through today. At first, neural networks were largely trained to do one particular task well—to play Go, or to recognize faces. But the AI summer deepened in 2017, when researchers at Google designed a particular kind of neural network called a Transformer that was good at figuring out language sequences. It was given another boost in 2019 when OpenAI figured out that Transformers trained on large amounts of text could not only write text well, but master many other language tasks, from translation to summarization. Three years later, an updated version of OpenAI’s transformer-based neural network, GPT-3.5, would be used to power the viral chatbot ChatGPT.
Now, three years after ChatGPT’s debut, the hype around AI has never been greater. There are certainly a few autumnal signs, a falling leaf carried on the breeze here and there, if past AI winters are any guide. But only time will tell if it is the prelude to another Arctic bomb that will freeze AI investment for a generation, or merely a momentary cold-snap before the sun appears again.