SliceGPT

SliceGPT: Compressing Large Language Models by Deleting Rows and Columns

CommonProductProgrammingSparsificationModel Compression

SliceGPT is a new post-training sparsity approach that reduces the network's embedding dimension by replacing each weight matrix with a smaller (dense) matrix. Through extensive experiments, we demonstrate that SliceGPT can remove up to 25% of the model parameters (including embeddings) from LLAMA2-70B, OPT 66B, and Phi-2 models while maintaining 99%, 99%, and 90% of the zero-shot task performance, respectively. Our sliced models run on fewer GPUs and execute faster without any additional code optimizations: on a 24GB consumer-grade GPU, we reduce the total inference computation of LLAMA2-70B to 64% of the dense model; on a 40GB A100 GPU, we reduce it to 66%. We provide a new insight into the computational invariance in transformer networks, which makes SliceGPT possible. We hope it can inspire and promote new avenues for reducing memory and computational requirements of pre-trained models in the future.

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

SliceGPT

SliceGPT Visit Over Time

SliceGPT Visit Trend

SliceGPT Visit Geography

SliceGPT Traffic Sources

SliceGPT Alternatives

SliceGPT — SliceGPT: Compressing Large Language Models by Deleting Rows and Columns

WhisperKit — Automatic Speech Recognition Model Compression & Optimization Tool

HandRefiner — The fp16 version of the HandRefiner model after pruning and compression

LongVU — Spatiotemporal Adaptation Compression Model for Long Video Language Understanding

WolframAlpha — Computational intelligence, a master of all trades

1.58-bit FLUX — A state-of-the-art text-to-image generation model utilizing 1.58-bit quantization.

AI-FFmpeg — A free online video processing tool that supports compression, conversion, speed adjustment, and more.

ZipPy — A tool for rapid detection of AI-generated text using compression ratios.

Memory — A scalable memory layer implementation designed to expand model parameters without increasing computational load.

SAS Model Manager — SAS Model Manager - Comprehensive Lifecycle Management of Analytical Models

RWKV-6 Mixture of Experts — The largest model in the RWKV family, utilizing MoE technology to enhance efficiency.

FABRIC Model — Make your model more personalized

Baichuan Character Large Model — Intelligent character model, building the best large model foundation.

Ai Model Agency — Leading global AI fashion model agency

BlueLM Large Model — An independently developed intelligent language understanding model by vivo

Diffusion Model with Perceptual Loss — Diffusion Model Based on Perceptual Loss

This Model Does Not Exist — AI-generated model, uploading one photo per day for user voting.

Doubao Large Model — A large model developed by ByteDance, providing multimodal capabilities.

DiffusionLight — A technology that uses the diffusion model to estimate lighting effects.

Xingchen Semantic Large Model — A trillion-parameter large model launched by China Telecom

prime — A framework for efficient global distributed training of AI models

Claude 4 — The strongest programming and reasoning model globally, boosting development efficiency.

OpenAI Model Spec — OpenAI has released a model behavior specification to guide how AI models interact with users safely and beneficially.

Micius Generative Pre-trained Transformer (Micius GPT) — A controllable large language model for generative scenarios.

Allegro — Advanced text-to-video generation model

Arcee Spark — A highly efficient and compact 7B parameter language model

Trustworthy Language Model (TLM) Playground — Try Cleanlab's Trustworthy Language Model (TLM) in your browser

Large Geospatial Model — A geospatial model that employs large-scale machine learning to understand scenes and connect millions of locations worldwide.

Yuanxiang Large Model XChat — Leading domestic general-purpose large model

Lin's Grand Model Ranking — Ranking of large model products more suited to the Chinese physique.