Entry

seq2seqモデルで、リズムのパターンからそれにあったベースラインを生成するモデル

Simple Title

Behzad Haki, & Jorda, S. (2019). A Bassline Generation System Based on Sequence-to-Sequence Learning. Proceedings of the International Conference on New Interfaces for Musical Expression, 204–209.

Description

2019年のNIMEで発表された論文。最新の言語モデル(seq-to-seq model)の知見を利用してドラムトラックの音声ファイルからそれにあったベースラインを生成してくれる。

Type

Paper

Year

2019

Posted at

June 1, 2021

Overview - 何がすごい?

Abstract

This paper presents a detailed explanation of a system gen- erating basslines that are stylistically and rhythmically in- terlocked with a provided audio drum loop. The proposed system is based on a natural language processing technique: word-based sequence-to-sequence learning. The word-based sequence-to-sequence learning method proposed in this pa- per is comprised of recurrent neural networks composed of LSTM units. The novelty of the proposed method lies in the fact that the system is not reliant on a voice-by-voice transcription of drums; instead, in this method, a drum rep- resentation is used as an input sequence from which a trans- lated bassline is obtained at the output. The drum repre- sentation consists of fixed size sequences of onsets detected from a 2-bar audio drum loop in eight different frequency bands. The basslines generated by this method consist of pitched notes with different duration. The proposed sys- tem was trained on two distinct datasets compiled for this project by the authors. Each dataset contains a variety of 2-bar drum loops with annotated basslines from two dif- ferent styles of dance music: House and Soca. A listening experiment designed based on the system revealed that the proposed system is capable of generating basslines that are interesting and are well rhythmically interlocked with the drum loops from which they were generated.

Motivation

ダンスミュージックのプロデューサーは音楽教育を受けてない場合が多い。でもベースラインはダンスミュージックの肝。ベースで困るプロデューサーが多い (=まさに僕!!) そこでダンスミュージックに特化したベースラインの生成モデルを考えよう!

ベースラインはドラムとマッチしている必要がある。一方で、ドラムのMIDIがある場合が少ないので、サウンドからドラムのアタックの情報を抜き出す仕組みも必要になる。

本研究で扱う音楽スタイルは House と Soca (トリニダード・トバゴ発祥のダンスミュージック)

Dataset

DiscogsのHouse/Socaジャンルの楽曲のYouTubeリンクを辿って楽曲データを集める → 2小節のベースが入っている部分とその小節のドラムパターンに似たブレーク(ドラムのみの部分)を探す → Essentiaを用いてスペクトルで分解。ハーモニーを含む音 (ベース) とパーカッシブな音(ドラム)に分ける (下の左図) → マニュアルでベース、ドラムパターンを記述していく (下の右図)

合計それぞれ50のベースライン、ドラムの学習データ集めた。

ドラムとそれ以外の分離

ドラムのTranscription

ベースラインの表現

MIDIノートナンバー、サステイン (1000)、無音 (0)の配列で表現

ドラムパターンの表現

ドラムは8つの周波数帯域の音の有無で表現。各タイムステップでの有無を0, 1のバイナリーデータで表現。 0b は単純にバイナリーであることを表現しているだけ

[’0b11111000’, ’0b00000000’, ’..., ’0b00000011’].

それぞれ2小節で16部音符単位なので、32ステップ

Architecture

ベースとドラムの関係は、言語の翻訳での英語とフランス語の関係と同じ=同じ内容を別の言葉で話していると考えらえる。そこでSeq-to-seqの言語モデルの知見を生かす。

seq-to-seq モデルの概念図

具体的なモデル LSTMのシンプルな構成

ベースラインのinputの34(input_2)は、ベースの中に存在するMIDIノートナンバーの種類に対応。

ドラムのinput_1の256は、8帯域に対してオンセットの有無　2^8の256パターン。

もう少し複雑なモデルも試したがシンプルなモデルが一番良い結果に

Results

被験者を用意してテスト。

本研究で生成したベースラインが一番評価が高かった (が差はそれほど大きくない)

実際自分でも試してみた.... Socaのデータセットで学習したモデル。

元のリズム

生成されたベースありのトラック

Further Thoughts

ドラムの解析をして学習データを作るためのツールまで提供しているところがすごい
DiscogsのページのYouTubeリンクを使ってデータを集める発想はなかった...
学習データが50しかないのは意外と少ない。
Abletonで使えるツールを作りたい！！！

Links

集められたデータセット

behzadhaki/drum_bassline_dataset

A dataset of audio drum loops with accompanying basslines for two styles of music (House and Soca). These two datasets were used for training the following model: https://github.com/behzadhaki/bassline_seq2seq For each Style (House/Soca), 50 samples are available. Each sample drum loop along with bassline is located in a separated subfolder (The folder titles correspond to Discogs release ids).

github.com

データセットを作るためのGPUツール

behzadhaki/qtHarmonicPercussiveSeparatorTranscriber

A GUI software developed using pyQT5, used for separating mixed signals and/or transcribing melodies or drums - behzadhaki/qtHarmonicPercussiveSeparatorTranscriber

github.com

behzadhaki/qtHarmonicPercussiveSeparatorTranscriber

Talking Drums - seq2seqモデルを用いたリズム生成の元論文

Talking Drums: Generating drum grooves with neural networks

Presented is a method of generating a full drum kit part for a provided kick-drum sequence. A sequence to sequence neural network model used in natural language translation was adopted to encode multiple musical styles and an online survey was developed to test different techniques for sampling the output of the softmax function.

arxiv.org