Two red-team critiques of METR's research on long tasks
AI benchmarks are one attempt to track and formalise the progress of AI’s developing capabilities. As explained succinctly in the introduction to “Measuring AI Ability to Complete Long Tasks” (Kwa et al.), published by METR this year, commonly used benchmarks suffer from a variety of issues. One of them is that it’s hard to track AI progress across time because benchmarks are often not mutually comparable. METR proposes a metric which addresses this problem: the X% (task completion) time horizon, defined as the maximum length of task for a human that an AI can complete X% of the time. I really like this metric as an attempt to quantitatively come to grips with how AI is developing over time.
Can superhuman AI help us improve?

As AI continues to improve at writing and math-based tasks, it seems plausible that it either has or soon will surpass domain experts in these tasks. This post aims to explore to what extent superhuman AI in a domain can instruct and empower humans to improve for themselves.
Making Mathematical MONSTERS
Introduction: axiomatizing natural numbers
In math, we often define objects whose existence and properties seem obvious and common place. For example, many introductory university level courses will go out of their way to define the natural numbers. As mundane as this process can seem, I like the mental motion of formalizing such obvious objects. I see at as a sampling of our sub-conscious: a brief taste of the contents of our minds that rarely, if ever, exit the world of the murky and implicit.
Axiomatic jigsaw puzzles: probability
‘Anytime someone finds a problem with your axioms, you just say “oh, but of course that’s not what I meant,” and you change the axioms.’ - cool math prof
Intro
I recently came across the ‘Foundations of the Theory of Probability,’ a 1933 paper by A.N. Kolmogorov which outlines the ‘canonical’ formalization of probability theory which we know and love(?) today. I took a probability course last semester and expected the content of Kolmogorov’s paper to be very similar to what I had learned. It was therefore surprising and delightful to find that the foundational axioms given by Kolmogorov are substantially different from the ones I am used to.