Entry

GANを使った環境音の生成→環境音の識別モデルの性能向上

Simple Title

Madhu, A. and K, S. (2021) ‘EnvGAN: Adversarial Synthesis of Environmental Sounds for Data Augmentation’.

Description

環境音の識別モデルの学習のためのData Augmentation手法の提案

Type

Paper

Year

2021

Posted at

May 18, 2021

Overview - 何がすごい?

Abstract

The research in Environmental Sound Classification (ESC) has been progressively growing with the emergence of deep learning algorithms. However, data scarcity poses a major hurdle for any huge advance in this domain. Data augmentation offers an excellent solution to this problem. While Generative Adversarial Networks (GANs) have been successful in generating synthetic speech and sounds of musical instruments, they have hardly been applied to the generation of environmental sounds. This paper presents EnvGAN, the first ever application of GANs for the adversarial generation of environmental sounds. Our experiments on three standard ESC datasets illustrate that the EnvGAN can synthesize audio similar to the ones in the datasets. The suggested method of augmentation outshines most of the futuristic techniques for audio augmentation.

Motivation

CNNなどを用いた環境音の識別は割と一般化しているが、まだデータ量が少ないこともあって、精度がそこまで上がっていない → Data Augumentation(データの水増し) も使われているが、ピッチ(音程)を変えたり、音をミックスしたりするだけでは不十分 → GANでそれらしい音を生成して水増ししよう！

Architecture

基本的にはWaveGANのアーキテクチャに基づいている。

二つのレイヤーを追加することで生成できる音の長さを、CDクオリティの44,100Hzで4秒、合計196608サンプル分、生成できるように拡張(元々のWaveGANは16384サンプル=1600Hzで1秒ちょっと)。

それぞれの音のクラスに対して、GANのモデルを学習。あまりに元の学習データに近すぎるものは破棄 (=水増しにならないから)

EnvGAN Generatorのアーキテクチャ

環境音の識別は割と普通なCNNを使う。

Results

ESC-10, UrbanSound8k, TUTの三つのデータセットに対して、提案手法が一番良い認識率を示すことが示された。

識別モデルの結果

Further Thoughts

生成される音を聞いてみたい...

Related Works

Donahue, C., McAuley, J. and Puckette, M. (2018) ‘Synthesizing Audio with Generative Adversarial Networks’. Available at: http://arxiv.org/abs/1802.04208 (Accessed: 15 February 2018).