AI News and Product Search Page

Type :

AI News
AI Tools
AI Cases
AI Tutorial

2024-12-18 17:52:23.AIbase

New Breakthrough in Multimodal Models: Fei-Fei Li's Team Unifies Actions and Language, Not Only Understanding Commands but also Reading Implicit Emotions

2024-12-10 08:03:30.AIbase

Zhipu AI Launches Free Multimodal Model GLM-4V-Flash: Enhancing Image Processing Accuracy

Beijing Zhipu Huazhang Technology Co., Ltd. announced that its Zhipu Open Platform BigModel has launched the first free multimodal API—GLM-4V-Flash. This new model leverages the excellent capabilities of the 4V series, achieving improved accuracy in image processing and further lowering the barriers for developers to delve deeper into large models across various fields.

2024-11-30 10:01:37.AIbase

Zhipu AI Open Source End-Side Large Language and Multimodal Model GLM-Edge Series

Zhipu Technology recently announced the open source of its end-side large language and multimodal model GLM-Edge series, marking an important attempt by the company in real-world use cases at the end side. The GLM-Edge series consists of four different model sizes, including GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, and GLM-Edge-V-5B, which are optimized for mobile platforms such as smartphones and vehicle systems, as well as desktop platforms like PCs.

2024-11-19 13:51:41.AIbase

Peking University Team Releases Multimodal Model LLaVA-o1, Inference Capabilities Comparable to GPT-o1!

Recently, research teams from Peking University announced the release of an open-source multimodal model called LLaVA-o1, which is claimed to be the first visual language model capable of spontaneous and systematic reasoning, comparable to GPT-o1. The model excels in six challenging multimodal benchmark tests, with its 11B parameter version outperforming competitors such as Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct.

2024-11-19 09:54:07.AIbase

Mistral Launches the Most Powerful Open Source Multimodal Model Pixtral Large, Upgrading Le Chat to Directly Call Flux Pro

2024-10-25 11:16:59.AIbase

Salesforce AI Research Unveils New Multimodal Model BLIP-3-Video: Cost-Effective Video Understanding

2024-09-27 17:37:02.AIbase

Super Powerful Multimodal Model Emu3: Understanding Images and Videos Through Next Word Prediction

2024-09-26 14:34:11.AIbase

The Open Source Multimodal Model Molmo Can Recognize Objects in Images and Generate Accurate Descriptions

Recently, an open source multimodal AI model named Molmo has drawn widespread attention in the industry. This AI system, based on Qwen2-72B and leveraging OpenAI's CLIP as the visual processing engine, is challenging the dominance of traditional commercial models with its outstanding performance and innovative features. Molmo's standout characteristic is its efficient performance. Despite its relatively small size, it can compete with competitors that are ten times larger in processing capability. This 'small but exquisite' design philosophy not only enhances the model's

2024-08-13 08:15:52.AIbase

Starred Over Ten Thousand! The MiniCPM-V2.6 Model of WallFacer Intelligence Tops GitHub

The latest version 2.6 of WallFacer’s MiniCPM-V series has rapidly climbed to the Top 3 on GitHub and HuggingFace trends, surpassing ten thousand stars. Since its release in February, it has accumulated over a million downloads, becoming a benchmark for on-device model capabilities. MiniCPM-V2.6 achieves performance enhancements for on-device multimodal models with 8 billion parameters, including real-time video understanding, multi-image joint understanding, and multi-image in-context learning, with a quantized backend memory of only 6GB and an inference speed of up to 18 tokens.

2024-08-02 09:04:21.AIbase

Google Launches Powerful Multimodal Model Gemini 1.5 Pro, Outranking GPT-4o and Claude-3.5 Sonnet

Google has released its latest AI masterpiece, Gemini 1.5 Pro, offering an experimental version 0801 through Google AI Studio and the Gemini API. This model leads the LMSYS leaderboard with an ELO score of 1300, surpassing OpenAI's GPT-4o and Anthropic's Claude-3.5 Sonnet. Gemini 1.5 Pro excels in multilingual tasks, mathematics, coding, and visual tasks, featuring a context window of 2 million tokens.

2024-07-31 17:56:44.AIbase

Shusheng · Puyu Lingbi Multimodal Model Upgrade Version 2.5 Supports Longer Contexts and Image-Video Understanding Comparable to GPT-4V

Shusheng · Puyu Lingbi (InternLM-XComposer) Version 2.5 was developed by the Shanghai Artificial Intelligence Laboratory, focusing on long context input and output capabilities, operating smoothly within a length of 96K, and trained with 24K interleaved image-text data. Key upgrades include: high-resolution image understanding, fine-grained video understanding, and multi-turn multi-image dialogue. In application, it can create web pages and write high-quality text-image articles. Evaluations show it surpasses state-of-the-art open-source models across 16 benchmark tests and performs at par with key tasks compared to GPT-4V and Gem.

2024-07-16 10:24:06.AIbase

Meta Unveils Massive Multimodal Model Llama 3 405B on July 23rd

Meta is about to make a big move! They are set to launch an open-source language model called Llama3405B, which is not only their largest model to date but also the largest open-source language model in history. This behemoth, with an astonishing 405 billion parameters, can effortlessly navigate between images and text, completely revolutionizing the old ways that could only handle text.Key Highlights: Meta will release Llama3405B on July 23rd, a multimodal model with 405 billion parameters. Dec

2024-07-04 10:48:36.AIbase

Open-Source Local Real-Time Multimodal Model Moshi: Real-Time Speech Generation with Support for Multiple Accents Moshi, an open-source, real-time, multimodal model, excels in generating speech instantaneously while accommodating various accents.

The French independent non-profit AI research lab Kyutai has launched a voice assistant called Moshi, which is a revolutionary real-time local multimodal foundational model. This innovative model imitates and surpasses some of the functionalities demonstrated by OpenAI's GPT-4o released in May in certain aspects.Product Entry: https://top.aibase.com/tool/moshi-chat Moshi is designed to understand and express emotions, capable of conversing in different accents, including French. It can simultane

2024-06-27 16:41:35.AIbase

LeCun Launches New Visual Multimodal Model Cambrian-1, Visual Capabilities Outperform GPT-4V

In the world of AI, we have just welcomed a remarkable new member—Cambrian-1, a large multimodal language model (MLLM) developed by industry giants LeCun and Xie Saineng. The emergence of this model represents not only a technological leap but also a profound reflection on the research of multimodal learning.

2024-06-19 09:20:50.AIbase

Meta Releases Multiple Models: Multimodal Model Chameleon, Text-to-Music Generation Model JASCO, Audio Watermarking Technology AudioSeal, and More

Recently, Meta quietly released six research achievements that bring new applications and technological breakthroughs to the field of AI. These include a multimodal model, a text-to-music generation model, audio watermarking technology, datasets, and several other projects. Let's take a closer look at these research achievements.

2024-06-17 10:47:33.AIbase

Sketchpad: A Canvas Framework for Multimodal Models to Enhance Mathematical Abilities

Sketchpad enables language models to draw using lines, boxes, and markers, which is closer to human sketching and facilitates reasoning. Additionally, Sketchpad can utilize specialized visual models during the drawing process, such as using object detection models to create bounding boxes and segmentation models to draw masks, further enhancing visual perception and reasoning capabilities.

2024-01-31 10:12:49.AIbase

Microsoft Open Sources Multimodal Model LLaVA-1.5 Comparable to GPT-4V Performance

Microsoft has open-sourced the multimodal model LLaVA-1.5, inheriting the LLaVA architecture and introducing new features. Researchers have tested it in visual question answering, natural language processing, image generation, and other areas, showing that LLaVA-1.5 has reached the highest level among open-source models.

2024-01-15 18:04:00.AIbase

2023 AI Industry Event: GPT-4 Debuts, Multimodal Model War Erupts, AI Sun Yanzi Sparks Controversy

The launch of GPT-4 features multimodal capabilities surpassing its predecessors, leading AI technology development. Baidu introduces Wenxin Yiyan, initiating the domestic multimodal competition with a user base exceeding 100 million. AI Sun Yanzi emerges, triggering controversies over AI music synthesis and copyright. Maoyake Camera launches, sweeping social media and igniting a trend in AI photography products. The state releases 'Interim Measures for the Management of Generative Artificial Intelligence Services' to regulate the AIGC industry.

2023-12-27 10:28:28.AIbase

Shanghai AI Lab Releases 'PuYi 2.0' OpenMEDLab 2.0

Shanghai AI Lab, together with Ruijin Hospital affiliated to Shanghai Jiao Tong University School of Medicine, has released the medical multimodal foundational model group 'PuYi 2.0'. PuYi 2.0 adds multi-domain models and incremental language parameters, covering various data modalities such as medical images, medical texts, and bioinformatics. New open-source datasets include the medical image segmentation dataset SA-Med2D-20M and the pathology dataset SNOW. PuYi 2.0 incorporates an evaluation module to provide reference implementations for medical model capabilities, delivering a one-stop open-source solution for large medical models.

2023-12-07 08:33:07.AIbase

Google Releases Multimodal Model Gemini 1.0, Set to Launch for Developers Early Next Year

Gemini is the latest generation AI model launched by Google, featuring multimodal capabilities. Gemini comes in three sizes: Ultra, Pro, and Nano, suitable for different tasks and devices. Gemini demonstrates exceptional performance, surpassing other models in multiple benchmark tests. It has multimodal reasoning and encoding capabilities, capable of processing text, images, audio, and more. Gemini is expected to be available for developers and enterprise customers early next year.

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

New Breakthrough in Multimodal Models: Fei-Fei Li's Team Unifies Actions and Language, Not Only Understanding Commands but also Reading Implicit Emotions

Zhipu AI Launches Free Multimodal Model GLM-4V-Flash: Enhancing Image Processing Accuracy

Zhipu AI Open Source End-Side Large Language and Multimodal Model GLM-Edge Series

Peking University Team Releases Multimodal Model LLaVA-o1, Inference Capabilities Comparable to GPT-o1!

Mistral Launches the Most Powerful Open Source Multimodal Model Pixtral Large, Upgrading Le Chat to Directly Call Flux Pro

Salesforce AI Research Unveils New Multimodal Model BLIP-3-Video: Cost-Effective Video Understanding

Super Powerful Multimodal Model Emu3: Understanding Images and Videos Through Next Word Prediction

The Open Source Multimodal Model Molmo Can Recognize Objects in Images and Generate Accurate Descriptions

Starred Over Ten Thousand! The MiniCPM-V2.6 Model of WallFacer Intelligence Tops GitHub

Google Launches Powerful Multimodal Model Gemini 1.5 Pro, Outranking GPT-4o and Claude-3.5 Sonnet

Shusheng · Puyu Lingbi Multimodal Model Upgrade Version 2.5 Supports Longer Contexts and Image-Video Understanding Comparable to GPT-4V

Meta Unveils Massive Multimodal Model Llama 3 405B on July 23rd

Open-Source Local Real-Time Multimodal Model Moshi: Real-Time Speech Generation with Support for Multiple Accents Moshi, an open-source, real-time, multimodal model, excels in generating speech instantaneously while accommodating various accents.

LeCun Launches New Visual Multimodal Model Cambrian-1, Visual Capabilities Outperform GPT-4V

Meta Releases Multiple Models: Multimodal Model Chameleon, Text-to-Music Generation Model JASCO, Audio Watermarking Technology AudioSeal, and More

Sketchpad: A Canvas Framework for Multimodal Models to Enhance Mathematical Abilities

Microsoft Open Sources Multimodal Model LLaVA-1.5 Comparable to GPT-4V Performance

2023 AI Industry Event: GPT-4 Debuts, Multimodal Model War Erupts, AI Sun Yanzi Sparks Controversy

Shanghai AI Lab Releases 'PuYi 2.0' OpenMEDLab 2.0

Google Releases Multimodal Model Gemini 1.0, Set to Launch for Developers Early Next Year