Claude Code Skills 2.0 adds evals plus benchmark test sets; changes target skill reliability as models update over time.
Nested Claude Code runs parallel tasks through Tmux; auto-picks terminal count and routes input, with real-time activity logs ...
ChatGPT and Claude's default models battle it out in challenges that test every day uses such as writing, reasoning and ...