1 Million Reward for the Comedy King of AI! Beijing University of Posts and Telecommunications, Nanyang Technological University, and others release the 'FunQA' dataset of 'silly videos': Teaching algorithms to learn human humor

新智元

Published inAI News · 2 min read · Sep 11, 2023

The datasets FunQA, released by institutions such as Beijing University of Posts and Telecommunications and Nanyang Technological University in Singapore, utilize 4,000 comedic videos and 310,000 pieces of commentary text to enhance AI's capabilities in accurate video comprehension, counterfactual reasoning, sense of humor, and free-form text generation. FunQA consists of three subsets, covering tasks such as timestamp localization, video description, and counterintuitive reasoning, aiming to assess the model's understanding of counterintuitive videos. However, the performance of models on the FunQA dataset is generally suboptimal, facing challenges such as accurate information comprehension, logical reasoning, and application of additional knowledge. To promote research, the FunQA Challenge algorithm competition has been launched, with prizes totaling up to $1 million.

Video Understanding Sense of Humor Free Text Generation

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Meta AI Releases New Video Learning Model V-JEPA: A Breakthrough in Video Understanding

Recently, the Meta AI team launched the video joint embedding prediction architecture (V-JEPA) model, an innovative initiative aimed at advancing machine intelligence. Humans can naturally process information from visual signals and recognize surrounding objects and motion patterns. An important goal of machine learning is to reveal the fundamental principles that drive unsupervised learning in humans. Researchers proposed a key hypothesis—the predictive feature principle—arguing that the representations of continuous sensory inputs should be able to predict each other. Early research methods utilized slow feature analysis.

Feb 24, 2025

2.9k

Chinese Research Team Unveils VideoChat-Flash, Boosting Long Video Processing Speed by 100 Times

Traditional video understanding models face numerous challenges when processing long videos, including the complexities of understanding the extended context. Although considerable research has been conducted to enhance video understanding capabilities, effectively overcoming the issues of low training and inference efficiency remains difficult. To address these challenges, the research team utilized HiCo technology to compress redundant parts of video information, significantly reducing computational demands while retaining key information. Specifically, HiCo achieves hierarchical compression of the video by segmenting long videos into shorter clips, thereby reducing processing time.

Jan 21, 2025

1.9k

Integrated AI Framework Sa2VA: Achieving Deep Understanding of Images and Videos

Driven by multimodal large language models (MLLMs), significant advancements have been made in tasks related to images and videos, including visual question answering, narrative generation, and interactive editing. However, achieving fine-grained understanding of video content still poses major challenges. These challenges involve tasks such as pixel-level segmentation, tracking with language descriptions, and visual question answering based on specific video prompts. Although current state-of-the-art video perception models excel in segmentation and tracking tasks, they still fall short in open language understanding and conversational capabilities.

Jan 13, 2025

1.9k

Twelve Labs Launches Multimodal Video Understanding AI to Address Video Content Search and Analysis Challenges

Dec 13, 2024

1.5k

NVIDIA Launches Major Breakthrough: AI Video Understanding that Enables Machines to Truly Comprehend Video Content

Nov 11, 2024

4.0k

A Dark Horse in Video Understanding: The Video-XL Model Can Handle Videos Up to One Hour Long!

Oct 29, 2024

2.8k

ZhiYuan Launches Hour-Level Ultra-Long Video Understanding Model Video-XL

The Beijing ZhiYuan Artificial Intelligence Research Institute, in collaboration with Shanghai Jiao Tong University, Renmin University of China, Peking University, and Beijing University of Posts and Telecommunications, has launched an ultra-long video understanding model named Video-XL. This model is an important demonstration of the core capabilities of multimodal large models and a key step towards General Artificial Intelligence (AGI). Compared to existing multimodal large models, Video-XL shows superior performance and efficiency in processing long videos exceeding 10 minutes.

Oct 28, 2024

3.7k

Salesforce AI Research Unveils New Multimodal Model BLIP-3-Video: Cost-Effective Video Understanding

Oct 25, 2024

1.3k

Zhiyuan Releases Native Multimodal World Model Emu3: Achieving Text, Image, and Video Understanding and Generation Solely Through Next Token Prediction

Oct 21, 2024

1.3k

Shusheng · Puyu Lingbi Multimodal Model Upgrade Version 2.5 Supports Longer Contexts and Image-Video Understanding Comparable to GPT-4V

Shusheng · Puyu Lingbi (InternLM-XComposer) Version 2.5 was developed by the Shanghai Artificial Intelligence Laboratory, focusing on long context input and output capabilities, operating smoothly within a length of 96K, and trained with 24K interleaved image-text data. Key upgrades include: high-resolution image understanding, fine-grained video understanding, and multi-turn multi-image dialogue. In application, it can create web pages and write high-quality text-image articles. Evaluations show it surpasses state-of-the-art open-source models across 16 benchmark tests and performs at par with key tasks compared to GPT-4V and Gem.

Jul 31, 2024

2.1k

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview