OneGen is an efficient single-pass generation and retrieval framework designed for large language models (LLMs), intended for fine-tuning generation, retrieval, or mixed tasks. The core idea is to integrate generation and retrieval tasks within the same context by assigning the retrieval task to retrieval tokens generated autoregressively. This enables the LLM to perform both tasks in a single forward pass. This approach not only reduces deployment costs but also significantly decreases inference costs, as it avoids the need for two forward pass computations for queries.