Star-Attention

EfficientInference Technology for Long Sequence Large Language Models

CommonProductProgrammingNVIDIALarge Language Models

Visit

Star-Attention is a novel block-sparse attention mechanism proposed by NVIDIA aimed at improving the inference efficiency of large language models (LLMs) based on Transformers for long sequences. This technology significantly boosts inference speed through a two-stage operation while maintaining an accuracy rate of 95-100%. It is compatible with most Transformer-based LLMs, allowing for direct use without additional training or fine-tuning, and can be combined with other optimization methods such as Flash Attention and KV cache compression techniques to further enhance performance.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Star-Attention

Star-Attention Visit Over Time

Star-Attention Visit Trend

Star-Attention Visit Geography

Star-Attention Traffic Sources

Star-Attention Alternatives

Large World Models — Large World Models: Understanding Video and Language

Star-Attention — EfficientInference Technology for Long Sequence Large Language Models

Transformer Explainer — A visualization tool for in-depth understanding of Transformer models

Models Table — A comprehensive list and information about large language models

FP6-LLM — Efficiently serving large language models

BiTA — Bidirectional Adjustment for Large Language Models

LLM Maybe LongLM — Extends the context window of large language models

Llama-3.1-Nemotron-70B-Instruct — A large language model customized by NVIDIA to enhance the supportiveness of query answering.

Prompt Engineering Guide — A comprehensive guide to prompt engineering for large language models

Benchmarking API Performance of Large Language Models — In-depth analysis of key metrics like TTFT and TPS

Open LLM Leaderboard — A publicly accessible leaderboard of large language models.

Brainglue — Brainglue is an interesting experimental platform for large language models

RoleLLM — Role-playing framework for large language models

xLAM — Research on intelligent agents based on large language models

AutoDAN-Turbo — An automated framework for breaking the limitations of large language models

DCLM — Comprehensive framework for building and training large language models

CuMo — An advanced architecture for extending multimodal large language models (LLMs).

VSP-LLM — A framework that combines Visual Speech Processing with Large Language Models

Entry Point AI — A platform for training customized large language models

Buffer of Thoughts — Improves the accuracy and efficiency of large language models in reasoning

parsera — A lightweight Python library for web scraping using large language models.

MInference — Accelerate the inference process of long context large language models

AIKit — A one-stop solution for hosting, deploying, building, and fine-tuning open-source large language models.

EAGLE — Exploration of the design space for multimodal large language models

prism-alignment — Explore the preferences and value alignment of large language models.

OpenCompass 2.0 Large Language Model Leaderboard — A real-time large language model leaderboard that provides comprehensive performance assessments.

Supervised app — A no-code platform for building supervised large language models.

MM1.5 — Optimization and analysis of multimodal large language models

BitNet — Inference framework for 1-bit large language models

Zhipu AI Large Model Open Platform — Integrate large models with just a few lines of code.