Modeling Melodic Feature Dependency with Modularized Variational Auto-Encoder

This paper is published in the in Proceedings of The 44th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019), Brighton, U.K., May 12-17, 2019. IEEE.

Full paper: Here, arXiv

Automatic melody generation has been a long-time aspiration for both AI researchers and musicians. However, learning to generate euphonious melodies has turned out to be highly challenging. This paper introduces 1) a new variant of variational autoencoder (VAE), where the model structure is designed in a modularized manner in order to model polyphonic and dynamic music with domain knowledge, and 2) a hierarchical encoding/decoding strategy, which explicitly models the dependency between melodic features. The proposed framework is capable of generating distinct melodies that sounds natural, and the experiments for evaluating generated music clips show that the proposed model outperforms the baselines in human evaluation. (The first three authors contributed equally.)

Researcher of Natural Language Processing, Information Retrieval, and Search/Recommendation Systems.
Shang-Yu Su