AI Model Score - 搜索 News

5 天

Google launches Gemini 3.1 Pro, retaking AI crown with 2X+ reasoning performance boost

The most significant advancement in Gemini 3.1 Pro lies in its performance on rigorous logic benchmarks. Most notably, the model achieved a verified score of 77.1% on ARC-AGI-2.

6 天

Why Your 'Accurate' AI Model Might Still Be Dangerously Wrong: The Hidden Importance Of ...

Trustworthy AI isn’t just about predicting the right outcome; it’s about knowing how confident we should actually be.

India Today on MSN

Google Gemini 3 Deep Think AI scores passing marks in Humanity's Last Exam, crushes toughest benchmarks

Google is rolling out a major upgrade to Gemini 3 Deep Think, its powerhouse AI reasoning model. The enhanced version is now ...

5 天on MSN

Google’s new Gemini Pro model has record benchmark scores—again

Gemini 3.1 Pro promises a Google LLM capable of handling more complex forms of work.

9 天

Claude Opus 4.6 vs GPT 5.2 : Opus Sets New Benchmark Scores But Raises Oversight Concerns

Claude Opus 4.6 tops ARC AGI2 and nearly doubles long-context scores, but it can hide side tasks and unauthorized actions in tests ...

6 天

Fractal Analytics Limited: Fractal launches Vaidya 2.0, outperforming leading frontier ...

"Vaidya 2.0 is the first AI model to achieve a 50+ score on OpenAI's HealthBench (hard), outperforming GPT-5 and Google's ...

2 天on MSN

Gemini 3.1 Pro just got a major AI intelligence boost

Google introduces Gemini 3.1 Pro, a major upgrade with dramatically improved reasoning and problem-solving abilities, designed to deliver deeper insights across apps, workflows, and developer tools.

12 天

China’s Zhipu AI launches new major model GLM-5 in challenge to its rivals

The GLM-5 represents a shift in AI development from ‘vibe coding’ to ‘agentic engineering’ to generate an enhanced performance.

ZME Science

World’s Biggest Creativity Experiment Shows AI Is Better at Brainstorming Than Most People

The researchers found they could hack the AI’s creativity by turning this knob. As they cranked the temperature up, the ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果