Two red-team critiques of METR's research on long tasks

- 7 minutes read - 1353 words