BlueDot Narrated
Audio versions of the core readings, blog posts, and papers from BlueDot courses.
BlueDot Narrated
Measuring AI Ability to Complete Long Tasks
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Audio versions of blogs and papers from BlueDot courses.
By Thomas Kwa et al.
We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
Source:
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
A podcast by BlueDot Impact.