Challenging the Limits of Machine Intelligence
Researchers Unveil 'Humanity's Last Exam' for AI
A global team of 1,000 researchers developed a benchmark of 2,500 expert questions current AI models cannot solve.

A group of researchers examining a digital screen filled with complex data and symbols, representing the Humanity's Last Exam AI benchmark.
Photo: Avantgarde News
A global consortium of nearly 1,000 researchers has released "Humanity's Last Exam" (HLE), a new benchmark designed to test expert-level artificial intelligence [1][2]. The assessment includes 2,500 complex questions spanning specialized fields such as ancient languages and niche scientific subfields [2][3]. Researchers intended for these problems to be unsolvable by current machine learning models [1]. Early testing shows that top models, including GPT-4o and Claude 3.5 Sonnet, perform poorly on the exam [1][2]. These results highlight a significant gap between machine pattern recognition and deep human expertise [1][2]. The project aims to track AI progress as systems approach human-level proficiency in highly technical subjects [3].
Editorial notes
Transparency note
Drafted with LLM; human-edited
- AI assisted
- Yes
- Human review
- Yes
- Last updated
Risk assessment
Reviewed for sourcing quality and editorial consistency.
Sources
Related stories
View allTopics
About the author
Avantgarde News Desk covers challenging the limits of machine intelligence and editorial analysis for Avantgarde News.


