iFLYTEK released the Xinghuo X2 large model, trained on a fully domestic computing power base, achieving self-controlled computing power from the bottom up to the top application. The model enhances general capabilities while focusing on highly specialized fields, aiming to solve real-world problems rather than just pursuing generality.
Amazon plans to launch an AI content market, allowing publishers to directly sell content copyrights to tech companies, aiming to resolve copyright disputes over training data for large models and promote standardized content licensing.
Blogger Tim reviews Byte's AI video model Seedance 2.0, praising its generation accuracy but noting ethical concerns: it can generate unobserved blind spots and clone unauthorized voices, raising industry worries about AI training data sources and privacy.....
OpenAI is seeking alternative AI computing solutions outside of NVIDIA, as it is dissatisfied with the response speed of NVIDIA's latest chips in the inference stage. The company found that hardware speed has become a bottleneck in complex interactions such as code generation, so its strategic focus is shifting from model training to inference optimization.
Radal is a no-code platform that allows you to fine-tune small language models using your own data. Connect your datasets, configure training visually, and deploy models in minutes.
A highly efficient reinforcement learning framework for training language models that perform reasoning and call search engines.
An embodied AI one-stop development platform released by Zhiyuan Robotics, covering the entire chain from data acquisition to model inference
A simple and easy-to-use speech cloning and speech model training tool.
Google
$0.49
Input tokens/M
$2.1
Output tokens/M
1k
Context Length
Openai
$2.8
$11.2
Xai
$1.4
$3.5
2k
$7.7
$30.8
200
-
Anthropic
$105
$525
$0.7
$7
$35
$17.5
$21
Alibaba
$2
$20
$4
$16
Baidu
128
$6
$24
256
$1
$10
prithivMLmods
CodeV is a 7-billion-parameter vision-language model fine-tuned based on Qwen2.5-VL-7B-Instruct. Through two-stage training of supervised fine-tuning (SFT) and reinforcement learning (RL) based on tool-aware policy optimization (TAPO), it aims to achieve reliable and interpretable visual reasoning. It represents visual tools as executable Python code and ensures the consistency between tool usage and question evidence through a reward mechanism, solving the problem of irrelevant tool invocation under high accuracy.
GuangyuanSD
Z-Image-Re-Turbo is a text-to-image generation model that has been optimized for de-reduction and re-acceleration based on the Z-Image-De-Turbo model. This model aims to balance the convenience during training and the speed during inference. It restores the fast generation ability close to the original Turbo model while maintaining the same training-friendly features as Z-Image-De-Turbo, enabling it to be perfectly compatible with a large number of trained LoRA models in the Z-Image ecosystem.
open-thoughts
OpenThinker-Agent-v1-SFT is an agent model obtained through supervised fine-tuning (SFT) based on Qwen/Qwen3-8B. It is the first-stage model of the complete training process (SFT + RL) of OpenThinker-Agent-v1, specifically optimized for agent tasks (such as terminal operations and code repair).
PrimeIntellect
INTELLECT-3 is a Mixture of Experts (MoE) model with 106 billion parameters, trained through large-scale reinforcement learning. It demonstrates excellent performance in mathematics, coding, and reasoning benchmark tests. The model, training framework, and environment are all open-sourced under a permissive license.
Shawon16
This is a video understanding model fine-tuned on an unknown dataset based on the VideoMAE-base architecture, specifically designed for sign language recognition tasks. The model achieved an accuracy of 18.64% after 20 training epochs.
Gjm1234
Wan2.2 is a major upgrade version of the basic video model, focusing on integrating innovative technologies such as effective MoE architecture, efficient training strategies, and multimodal fusion into the video diffusion model, bringing more powerful and efficient solutions to the field of video generation.
TeichAI
This model is obtained through distillation training using the high-reasoning-difficulty Gemini 3 Pro preview dataset based on the Qwen3-4B-Thinking-2507 base model. It focuses on improving complex reasoning abilities in the fields of coding and science. Through training with specific datasets, it aims to efficiently transfer the reasoning abilities of large models (such as Gemini 3 Pro) to a smaller-scale model.
Clemylia
Gheya-1 is a new-generation basic language model in the LES-IA-ETOILES ecosystem, with 202 million parameters. It is an upgraded version of the old Small-lamina series. This model is designed specifically for professional fine-tuning and has targeted training in the fields of artificial intelligence, professional language models, and biology.
ExaltedSlayer
Gemma 3 is a lightweight open-source multimodal model launched by Google. This version is an instruction-tuned quantization-aware training model with 12B parameters, which has been converted to the MXFP4 format of the MLX framework. It supports text and image input and generates text output, with a 128K context window and support for over 140 languages.
This is a video understanding model fine-tuned on an unknown dataset based on the MCG-NJU/videomae-base model. After 20 epochs of training, it achieved an accuracy of 13.31% on the evaluation set. This model is specifically optimized for video analysis tasks.
Charlotte-AMY is a fine-tuned small language model developed by Clemylia, with 51 million parameters, focusing on the fields of hope, friendship, ethics, and support. This model adheres to the concept of 'training quality is better than the number of parameters' and performs excellently in semantic clarity and coherence, providing high-quality ethical consultation and emotional support services.
VibeThinker-1.5B is a 1.5-billion-parameter dense language model launched by Weibo AI. It is fine-tuned based on Qwen2.5-Math-1.5B and is specifically designed for mathematical and algorithmic coding problems. Trained using the 'Spectrum to Signal Principle' framework, it outperforms larger models in multiple math competition tests. The training cost is approximately $7,800, and it supports an output of up to about 40k tokens.
allenai
Olmo 3 is a new generation of language model family developed by the Allen Institute for AI, including 7B and 32B instruction and thinking variants. This model performs excellently in long-chain thinking and can significantly improve the performance of reasoning tasks such as mathematics and coding. All code, checkpoints, and training details will be made public to promote the development of language model science.
Olmo 3 is a series of language models developed by the Allen Institute for AI, including two scales of 7B and 32B, with two variants: instructional and reflective. This model performs excellently in long-chain thinking and can effectively improve the performance of reasoning tasks such as mathematics and coding. It adopts a multi-stage training method, including supervised fine-tuning, direct preference optimization, and reinforcement learning with verifiable rewards.
Olmo-3-7B-Think-DPO is a 7B parameter language model developed by the Allen Institute for AI. It has the ability of long-chain thinking and performs excellently in reasoning tasks such as mathematics and coding. This model has undergone multi-stage training including supervised fine-tuning, direct preference optimization, and reinforcement learning based on verifiable rewards, and is designed specifically for research and educational purposes.
mradermacher
This project provides a static quantized version of the Qwen-4B-Instruct-2507-Self-correct model, supporting tasks such as text generation, bias mitigation, and self-correction. Based on the Qwen-4B architecture, the model has undergone instruction fine-tuning and self-correction training, offering multiple quantized versions to meet different hardware requirements.
sensenova
SenseNova-SI is a series of spatial intelligence enhancement models built on multimodal foundation models. Through careful training with 8 million samples of data, it has achieved excellent performance in multiple spatial intelligence benchmark tests while maintaining strong general multimodal understanding capabilities.
This model is based on Qwen3-4B-Thinking-2507 and fine-tuned on 1000 examples of GPT-5-Codex. It focuses on text generation tasks and uses Unsloth technology to achieve a 2-fold increase in training speed.
SenseNova-SI is a series of spatial intelligence enhancement models built on mature multimodal foundation models. Through training with carefully curated 8 million data samples, it demonstrates excellent performance in multiple spatial intelligence benchmark tests while maintaining strong general multimodal understanding ability.
This is a video action recognition model fine-tuned on the WLASL dataset based on the VideoMAE-Base architecture. After 200 epochs of training, it achieved a top-1 accuracy of 52.96% and a top-5 accuracy of 79.88% on the evaluation set, specifically designed for sign language action recognition tasks.
The Linear Regression MCP project demonstrates an end-to-end machine learning workflow using Claude and the Model Context Protocol (MCP), including data preprocessing, model training, and evaluation.
This project is an MCP server for managing memory text files, helping AI models like Claude maintain context between conversations. It provides functions to add, search, delete, and list memories, supporting exact matching operations based on substrings. It is designed to store memories in simple text files, similar to ChatGPT's memory mechanism, and triggers memory storage through prompts and training.
A TypeScript server that connects Hevy fitness data with language models and provides tools for fitness history, training progress, and personal records through the MCP protocol.
The Unsloth MCP Server is a server for efficiently fine-tuning large language models. Through optimized algorithms and 4-bit quantization technology, it achieves a 2-fold increase in training speed and an 80% reduction in video memory usage, supporting multiple mainstream models.
This is an MCP server that provides a standardized interface for Scikit-learn models, supporting functions such as model training, evaluation, data preprocessing, and persistence.
An MCP server that exposes the PyTorch Lightning framework to tools, agents, and orchestration systems through structured APIs, supporting functions such as training, inspection, validation, testing, prediction, and model checkpoint management.
This project demonstrates the training of a linear regression model using Claude and the Model Context Protocol (MCP) for an end-to-end machine learning workflow. Users only need to upload a CSV dataset, and the system can automatically complete the entire process of data preprocessing, model training, and evaluation (RMSE calculation).
LudusMCP is a model context protocol server for managing the Ludus laboratory environment based on natural language commands. It provides functions for deploying, configuring, and managing virtualized training environments and supports connecting to the Ludus server through WireGuard VPN or SSH tunnel.
The MCP tool is a tool for managing model context in GitHub repositories, supporting version tracking, dataset management, performance recording, and documentation of training configurations.
This project is an integration tool between the Strava API and the Model Context Protocol (MCP) SDK, used to analyze training data and provide personalized recommendations. It supports functions such as training activity analysis, automatic token update, and API request rate limit.
A Model Context Protocol server that provides access to the Whoop API, supporting queries for health data such as workout cycles, recovery status, and training loads.
This project is an integration solution of the Strava API and the Model Context Protocol (MCP) SDK, used to analyze training data and provide personalized recommendations.
An MCP service that helps technical coaches create structured learning hour courses. It generates 60-minute technical training courses with code examples and interactive whiteboards through the 4C learning model.
This project is a research on automated medical coding, providing code for training and evaluating medical coding models on the MIMIC - III and MIMIC - IV datasets, including the implementation of multiple models and the division of new datasets.