Overview - 何がすごい?

DJのプレイリストを学習データに、VUメーターの値などの一般的なオーディオの特徴量から、「どのDJだったらこの曲をかける？」に答える予測モデル。

一般的に音楽の解析に使われるスペクトログラムとかよりも、DJが普段使っている音楽ソフトウェア上に現れる数値の方がそのアーティストの好みをよりよく表しているのでは？という仮説。

Abstract

In the recording studio, producers of Electronic Dance Music (EDM) spend more time creating, shaping, mixing and mastering sounds, than with compositional aspects or arrangement. They tune the sound by close listening and by leveraging audio metering and audio analysis tools, until they successfully create the desired sound aesthetics. DJs of EDM tend to play sets of songs that meet their sound ideal. We therefore suggest using audio metering and monitoring tools from the recording studio to analyze EDM, instead of relying on conventional low-level audio features. We test our novel set of features by a simple classification task. We attribute songs to DJs who would play the specific song. This new set of features and the focus on DJ sets is targeted at EDM as it takes the producer and DJ culture into account. With simple dimensionality reduction and machine learning these features enable us to attribute a song to a DJ with an accuracy of 63%. The features from the audio metering and monitoring tools in the recording studio could serve for many applications in Music Information Retrieval, such as genre, style and era classification and music recommendation for both DJs and consumers of electronic dance music.

Motivation

DJはメロディーなどと同様に（それ以上に?)音質や音圧を手がかりに選曲しているのでは、と言う仮説をもとに、一般的な音楽制作ソフトで使われるようなメーターの値などの数値を元に、どのDJならこの曲をかけるかを当てるモデルをつくる。DJの好みの音が分かれば、そのDJに合わせた曲のレコメンデーションができるのではないかというのが目的となる。

そもそもジャンルの識別もジャンル自体が曖昧な分、ジャンルの識別モデルの有効性自体が問われる。曖昧なジャンルのデータで学習して本当に意味があるのか？ ⇄ 一方でどのDJが何をかけたかというのは事実(ground truth)として学習に使える！と言う利点がある。

Dataset

次の10人のDJのPlaylistを 1001Tracklists から集めた! プレイリストの中の曲 1262曲を購入(!!)

Martin Garrix 2. Dimitri Vegas & Like Mike 3. Hardwell 4. Armin van Buuren 5. David Guetta 6. Dj Tiësto 7. Don Diablo 8. Afrojack 9. Oliver Heldens 10. Marshmello

イントロとアウトロを除いた曲の中央の3分のみ利用

以下の一般的な音楽ソフトウェアで使われる測定値を抽出。

UVメーター(←VUメーターじゃないですか？ by Simon)
音量のピーク
ダイナミックレンジ
RMS(ラウドネス)
Phase Scope (音場の広がり)...

Phase Scoe

これらを元に $146 \times 1983$ 特徴量を各曲に対して抽出 (4096サンプルごとに解析 180秒 x 3分 / 4096サンプル = 1937... あれ1983じゃなくて1937?? )

Architecture

PCAを用いて 146次元を1次元に圧縮 → ランダムフォレストで10人のDJの誰がかけた曲かを推定する。

(DJを識別モデルのクラスとして使っているようなイメージ)

Results

結果として63%程度の精度で予測できることがわかった！

Confusion Matrixを出してみると、Armin van BuurenとTiesto、Oliver HeldensとDon Diabloを混同しがちと言うことがわかる → 実際にTiestoとArmin... HeldensとDiabloは似ていると言われている。

Further Thoughts

一般的な音響特徴量よりもVUメータの方がいいと言いつつ、音響特徴量で識別モデルを作った時との比較がない。比較したらどちらが精度が良くなるのか単純に気になる。
Deep Learningを使ったらもっと精度よくなるのでは？
B. Owsinski, The Mixing Engineer’s Handbook, 2nd ed. Thomson Course Technology, 2006.

　これによるとDJの国籍とDJスタイルに関連があるとあるが、本当だろうか？

こう言う研究をしている人と友達になりたい。単純にどう言う人なのか興味あり。

Novel Recording Studio Features for Music Information Retrieval