Recently, researchers at the Musashino University School of Data Science have developed a new algorithm called AMT-APC, which can automatically generate piano compositions with greater precision. This technology leverages the advantages of Automatic Music Transcription (AMT) models by fine-tuning them to better capture musical nuances and expressiveness, thereby producing piano renditions that closely mimic the original pieces.

Historically, the challenge in automatically generating piano music has been the fidelity of sound quality and the lack of expressive depth. Existing models often only produce simple melodies and rhythms, failing to capture the rich details and emotions of the original pieces. The AMT-APC algorithm, however, takes a different approach. It initially utilizes a pre-trained AMT model to accurately "capture" various sounds in the music and then fine-tunes this model for the task of Automatic Piano Performance (APC).

image.png

The core of the AMT-APC algorithm lies in its two-step strategy:

Step one: Pre-training. Researchers selected a high-performance AMT model named hFT-Transformer as the foundation and further trained it using the MAESTRO dataset to handle longer musical segments.

Step two: Fine-tuning. Researchers created a paired dataset containing original audio and piano performance MIDI files and used this dataset to fine-tune the AMT model, enabling it to generate piano performances that closely align with the style of the original pieces.

image.png

To enhance the expressiveness of the generated piano music, researchers introduced a concept called "style vectors." Style vectors are a set of features extracted from each piano performance, including note onset rate distribution, velocity distribution, and pitch distribution. By inputting style vectors along with the original audio into the model, the AMT-APC algorithm can learn different performance styles and reflect them in the generated piano music.

Experimental results show that compared to existing automatic piano performance models, the AMT-APC algorithm significantly improves both sound fidelity and expressiveness. Using a metric called Qmax to evaluate the similarity between the original piece and the generated audio, the AMT-APC model achieved the lowest Qmax value, indicating its superior ability to replicate the characteristics of the original piece.

This study demonstrates that AMT and APC tasks are highly related, and leveraging existing AMT research can help develop more advanced APC models. In the future, researchers plan to explore more suitable AMT models for APC applications to achieve more realistic and expressive automatic piano performances.

Project link: https://misya11p.github.io/amt-apc/

Paper link: https://arxiv.org/pdf/2409.14086