Meta has recently launched a project called NotebookLlama, which can generate podcast-style summaries, similar to Google's NotebookLM. This project utilizes Meta's own Llama model to process uploaded text files and produce podcast-style summaries.
Firstly, NotebookLlama creates a transcript from the file (such as a PDF of a news article or blog post). It then adds "more dramatic effects" and pauses before feeding the transcript into an open text-to-speech model. Although the results do not sound as good as NotebookLM, Meta's researchers indicate that the quality can be improved with more powerful models.
On the NotebookLlama GitHub page, they wrote: "The text-to-speech model limits its naturalness." "Additionally, another way to script podcasts is to have two agents discuss a topic of interest and outline the podcast. Currently, we use a single model to outline the podcast."
Although NotebookLlama is not the first attempt to replicate NotebookLM's podcast functionality, it remains a project worth watching. However, all AI-generated podcasts share a common issue: the problem of hallucinations, meaning that AI-generated podcasts inevitably contain some fictional content.