From VS Code and Claude Code to Cursor 3.0.
METR, which runs the benchmark measuring how well models can complete long-duration tasks, found that Claude Mythos Preview ...