Understanding Visual Language Models

ByteDance Launches New LLM With Better Visual Understanding

ByteDance has released its new generation of large language models, Doubao Seed 2.0, as the Chinese tech giant tries to compete at the highest level with U.S. rivals in all types of AI models, from ...

Nasdaq

Alibaba Cloud Releases Latest AI Models For Enhanced Visual Understanding

(RTTNews) - Chinese tech giant Alibaba Cloud on Wednesday unveiled its latest visual-language model, Qwen2.5-VL, which it claims to be a significant improvement from its predecessor, Qwen2-VL. The ...

VentureBeat

Salesforce releases ‘xGen-MM’ open-source multimodal AI models to advance visual ...

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Salesforce, the enterprise software giant, ...

Phys.org

Language shapes visual processing in both human brains and AI models, study finds

Neuroscientists have been trying to understand how the brain processes visual information for over a century. The development of computational models inspired by the brain's layered organization, also ...

13 天on MSN

Xiaomi announces Xiaomi-Robotics-0, its first-generation robot large-scale model

Xiaomi is best known for smartphones, smart home gear, and the occasional electric vehicle update. Now it wants a place in robotics research too. The company has announced Xiaomi-Robotics-0, an ...

EurekAlert!

Assessing and understanding creativity in large language models

A TTCT-inspired dataset was constructed to evaluate LLMs under varied prompts and role-play settings. GPT-4 served as the evaluator to score model outputs. In recent years, the realm of artificial ...

Ars Technica

Microsoft unveils AI model that understands image content, solves visual puzzles

On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ ...

VentureBeat

Alibaba releases new AI model Qwen2-VL that can analyze videos more than 20 minutes long

Alibaba Cloud, the cloud services and storage division of the Chinese e-commerce giant, has announced the release of Qwen2-VL, its latest advanced vision-language model designed to enhance visual ...

Forbes

The Next Leap In AI: From Large Language Models To Large World Models?

The realm of artificial intelligence (AI) may be on the cusp of a new transformative leap, transitioning from Large Language Models (LLMs) to an innovative and expansive concept, which we may call ...

20 天on MSN

Sarvam AI unveils 'Sarvam Vision', a multilingual document intelligence model

Sarvam AI's new Document intelligence model surpasses the actuary of Gemini 3 Pro, GPT 5.2, and other AI models when it comes to document intelligence.

一些您可能无法访问的结果已被隐去。

显示无法访问的结果