MIDIの演奏に強弱をつけてより自然に! – Neural Translation of Musical Style

MIDIの演奏に強弱をつけてより自然に! – Neural Translation of Musical Style
Simple Title
Malik, Iman, and Carl Henrik Ek. "Neural translation of musical style." arXiv preprint arXiv:1708.03535 (2017).
Posted at
June 6, 2015




Can a machine learn to play sheet music? This thesis investigates whether it is possible for a suitable computational model to learn musical style and successfully perform using sheet music. Music captures several aspects of a musician’s style. Musical style can be observed in the unique dynamics of their performances and categorising genre. Style is difficult to define but there is a perceivable relationship between dynamics and style. This thesis investigates whether it is possible for a machine to learn musical style through the dynamics of music. Great advancements have been made in music generation using machine learning. However, the focus of previous research has not been on capturing style. To capture musical style through dynamics, a new architecture called StyleNet is designed. The designed architecture is capable of synthesising the dynamics of digital sheet music. The Piano dataset is created for the purposes of learning style. The designed model is trained on the Piano dataset which contains Jazz and Classical piano solo MIDIs. Di↵erent configurations and training techniques are experimented with. The model’s generated performances are then assessed by a musical Turing test. The model’s ability to perform in different styles is also evaluated. The research concludes that StyleNet’s musical performances successfully pass the musical Turing test. This opens many doors for using such a model for assisting the creative process in the music industry. To summarise, my main contributions and achievements in this project can be listed as follows: • I designed Stylenet which is a neural network architecture capable of synthesising the dynamics of sheet music. • I implemented StyleNet using the Tensorflow library with a total of 1000 lines in Python. • I implemented a batching system to effciently train the StyleNet model with a total of 500 lines in Python. • I designed the data representation format for StyleNet. • I implemented a data preprocessing pipeline for MIDI files with a total of 2000 lines in Python. • I experimented with different StyleNet configurations and designs. • I created the Piano dataset which contains a total of 649 Jazz and Classical piano solo MIDIs. • I successfully trained a StyleNet model which passed the musical Turing test by producing performances that are indistinguishable from that of a human.


モデルは比較的シンプルで、MIDIの楽譜の情報を入力とし、対応する音符の演奏時の強さ(MIDIのベロシティ)の情報を出力するモデルをRNN(LSTM)で実装しています.(特定の音符に対して、それより前の音符の情報だけでなく、それに続く音符の情報を利用するために、入力のシーケンスを「先読み」できるBidirectional LSTMを使っています)


Further Thoughts