Entry

Neural Loop Combiner — リズム、メロディー、ベースライン... どのループを組み合わせる？

Simple Title

Chen, B.-Y., Smith, J. B. L. and Yang, Y.-H. (2020) ‘Neural Loop Combiner: Neural Network Models for Assessing the Compatibility of Loops’.

Description

現代の音楽制作過程で重要なループの組み合わせ。たくさんあるループ間の相性を判定して、適切なループの組み合わせをレコメンドする仕組み。

Type

Paper

Year

2020

Posted at

June 30, 2021

Overview

現在の音楽制作、特にダンスミュージックの制作では、ループの組み合わせが中心的な手法。一方で、たくさんのループの中でどれを組み合わせると良いのか悩む人が多い。
そこで適切なループの組み合わせをサジェストしてくれる仕組みを実装。

Abstract

ABSTRACT Music producers who use loops may have access to thou- sands in loop libraries, but finding ones that are compat- ible is a time-consuming process; we hope to reduce this burden with automation. State-of-the-art systems for esti- mating compatibility, such as AutoMashUpper, are mostly rule-based and could be improved on with machine learn- ing. To train a model, we need a large set of loops with ground truth compatibility values. No such dataset exists, so we extract loops from existing music to obtain positive examples of compatible loops, and propose and compare various strategies for choosing negative examples. For re- producibility, we curate data from the Free Music Archive. Using this data, we investigate two types of model ar- chitectures for estimating the compatibility of loops: one based on a Siamese network, and the other a pure convolu- tional neural network (CNN). We conducted a user study in which participants rated the quality of the combinations suggested by each model, and found the CNN to outper- form the Siamese network. Both model-based approaches outperformed the rule-based one. We have opened source the code for building the models and the dataset.

Motivation

とにかくたくさんあるループから適切なものを探したい！というのがわかりやすい目標。
従来は、ループのリズム構造の類似度、コードの親和性などから算出するルールベースのシステム。→ 昨今の深層学習でアップデートできるのでは？

モチベーション

Dataset

従来のルールベースではなく、データから機械学習的に問題を解きたい → しかし、「相性の良いループの組み合わせのデータセット」なんてものは存在しない。 → そこで... 世の中に存在している音楽からループを抽出したらどうだろう！

Free Music Archive (FMA)で公開されている楽曲データを利用。特に間違いなくループを使っているであろうジャンルとしてHip-Hopとしてタグづけされている曲だけを利用 (6000曲程度)
楽曲からループを抽出する既存研究をもとにループを抽出
それでも重複する(似たような)ループが多すぎる → ハッシュのアルゴリズムを使ってスペクトログラムをハッシュ値に変換。ハッシュ値が近いものは破棄した。
結果 14000近いループのペアを取得

データセット作成のプロセス

正解ではない(合わない）ループのペアを作るために、ランダムにループを組み合わせる (ただしドラムだけ、ベースだけのループはどんなループとも相性が良い可能性があるので除外)
曲の中で一方のトラックのループをずらしたり、反転させたりすることで、人工的にNGのサンプルを作る。

正解ではない(合わない)ループのペアを作る仕組み

System/Architecture

二つのアーキテクチャを試した!

Convolutional Neural Network

ミックスしたループのスペクトログラムをインプットとして、どのくらいそのペアのミックスがあっているかを示す値を出力する。

Siamese neural network (SNN) = 二つの同一のネットワークを組み合わせる手法

入力から抽出された高次元の特徴量が、正解のループのペアは近くに、正解ではないペアは遠くに来るようにConstractive Lossを用いて学習。

図の中でSkelton ModelとあるのはシンプルなCNN + Fully Connected Layersのモデル

二つのアーキテクチャ

Results

定量的な結果

ループのペアがあっているかどうかというClassificationに関してはCNNの方が良いという結果に
一方で、特定のループにあうループをランキングするようなテストだとSiameseの方がうまくいく ← 仕組みからいって当然？

定量的な結果

定性的

既存の研究の結果と比較するとわかりやすい
例えば... このループにあう別のループを探すタスク

従来研究ではこうなってたのが...

本研究では

参考までにオリジナルの曲はこうでした..

Further Thoughts

普通にAbletonとかに実装されてたら便利そう
ループの抽出の仕組みを用いてデータを作るあたりがかっこいい

Links

既存の研究　

AutoMashUpper: Automatic Creation of Multi-Song Music Mashups

In this paper we present a system, AutoMashUpper, for making multi-song music mashups. Central to our system is a measure of "mashability" calculated between phrase sections of an input song and songs in a music collection. We define mashability in terms of harmonic and rhythmic similarity and a measure of spectral balance.

ieeexplore.ieee.org

本論文内で使われたループ抽出に関する研究

Smith, J. B. L. and Goto, M. (2018) ‘Nonnegative Tensor Factorization for Source Separation of Loops in Audio’, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2018-April(April), pp. 171–175. doi: 10.1109/ICASSP.2018.8461876.

[PDF] Nonnegative Tensor Factorization for Source Separation of Loops in Audio | Semantic Scholar

The prevalence of exact repetition in loop-based music makes it an opportune target for source separation. Nonnegative factorization approaches have been used to model the repetition of looped content, and kernel additive modeling has leveraged periodicity within a piece to separate looped background elements.

www.semanticscholar.org

[PDF] Nonnegative Tensor Factorization for Source Separation of Loops in Audio | Semantic Scholar

音源からそれぞれの楽器を分離するツール - SPLEETER

We present and release a new tool for music source separation with pre-trained models called Spleeter. Spleeter was designed with ease of use, separation performance and speed in mind. Spleeter is based on Tensorflow and makes it possible to: * separate audio files into 2, 4 or 5 stems with a single command line using pre-trained models.

createwith.ai