Pruna AI Releases Open-Source AI Model Optimization Framework for Efficient Compression

AIbase基地

Published inAI News · 5 min read · Mar 20, 2025

46

Pruna AI, a European startup, focuses on developing compression algorithms for AI models. Recently, the company announced the open-sourcing of its optimization framework to help developers compress AI models more efficiently.

Pruna AI's framework combines several efficiency methods, including caching, pruning, quantization, and distillation, aiming to enhance AI model performance. It standardizes saving and loading compressed models, evaluates the compressed models to check for significant quality degradation, and measures performance improvements from compression.

John Rachwan, co-founder and CTO of Pruna AI, stated: "Our framework is similar to Hugging Face's standardization of transformers and diffusers; we provide a unified way to call and use various efficiency methods." Large companies like OpenAI already employ multiple compression methods in their models, such as distillation to create faster versions of their flagship models.

Distillation is a knowledge extraction technique using a "teacher-student" model. Developers send requests to the teacher model and record the output. This output then trains the student model to approximate the teacher model's behavior. Rachwan noted that while many large companies tend to build their compression tools, the open-source community often only offers single-method solutions. Pruna AI provides a tool integrating multiple methods, significantly simplifying the process.

Currently, Pruna AI's framework supports various model types, including large language models, diffusion models, speech recognition models, and computer vision models. However, the company primarily focuses on optimizing image and video generation models. Companies like Scenario and PhotoRoom already use Pruna AI's services.

Besides the open-source version, Pruna AI offers an enterprise version with advanced optimization features and an optimization agent. Rachwan revealed: "The most exciting feature we're about to release is the compression agent. Users only need to provide the model and specify speed and accuracy requirements; the agent will automatically find the best compression combination."

Pruna AI charges by the hour, similar to renting GPUs on cloud services. Using optimized models helps enterprises save significant costs during inference. For example, Pruna AI successfully reduced the size of a Llama model by eight times with almost no loss of accuracy. The company hopes clients view its compression framework as an investment with a strong return.

Recently, Pruna AI secured $6.5 million in seed funding from investors including EQT Ventures, Daphni, Motier Ventures, and Kima Ventures.

Project: https://github.com/PrunaAI/pruna

Key Highlights:

🌟 Pruna AI launches an open-source optimization framework combining multiple compression methods to improve AI model performance.

🤖 Large companies often use techniques like distillation; Pruna AI offers a tool integrating multiple methods, simplifying the process.

💰 The enterprise version supports advanced features, helping users achieve model compression and performance improvements while maintaining accuracy.

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

Building and Deploying AI

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

Pruna AI Releases Open-Source AI Model Optimization Framework for Efficient Compression

AIbase基地

This article is from AIbase Daily