Latest News! Version 0.2 of Ollama has been released! It has been reported that this update enables concurrency by default, allowing Ollama to handle multiple requests simultaneously, delivering a faster user experience. This update not only unlocks the feature of concurrent requests but also supports loading different models simultaneously, making Ollama more efficient in handling various tasks.

image.png

According to the official news released by Ollama, this update enables Ollama to handle multiple chat sessions, provide code completion services for teams, process different parts of documents simultaneously, and even run multiple agents concurrently. Additionally, Ollama supports loading different models such as Retrieval Augmented Generation (RAG) and agents, allowing users to run both large and small models simultaneously, enhancing the flexibility and performance of the system.

It is also known that this update adds the functionality of automatically loading and unloading models based on the request and GPU memory usage, ensuring the stability and efficiency of the system's operation. This series of updates makes Ollama more powerful and intelligent, bringing users a superior experience. Want to try the latest version of Ollama 0.2? Click the link to download it quickly!

Official download link: https://ollama.com/download