Tag: sound

📄

SONICS - AI生成楽曲のデータセット＆識別モデル

2024

SunoやUdioで生成した楽曲を識別するためのフレームワーク

Rahman, Md Awsafur, Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, and S. Fattah. 2024. “SONICS: Synthetic or Not -- Identifying Counterfeit Songs,” August. http://arxiv.org/abs/2408.14080.

Paper

musicsoundethics

April 25, 2025 2:27 PM (GMT+9)

📄

RAVE - VAEを用いたリアルタイムの音色変換アルゴリズム (2019)

2019

Caillon, Antoine, and Philippe Esling. 2021. “RAVE: A Variational Autoencoder for Fast and High-Quality Neural Audio Synthesis.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/2111.05011.

Paper

sound

July 10, 2024 7:32 AM (GMT+9)

📄

WavJourney - LLMで複数のモデルを組み合わせて、テキスト入力からオーディオコンテンツを生成

2023

LLMと複数の音声合成モデルを駆使して、テキストプロンプトからスピーチ、音楽、SEなどを含む音のコンテンツ(ラジオドラマ、ポッドキャストのようなもの)を生成

Liu, Xubo, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, et al. 2023. “WavJourney: Compositional Audio Creation with Large Language Models.” arXiv [cs.SD]. arXiv. http://arxiv.org/abs/2307.14335.

Paper

sound

May 25, 2024

📄

音楽生成AIは本当に新しい音楽を生成しているか? 学習データをコピーしているだけではないか？ - 定量的な調査

2024

音楽生成AIは学習データをコピーしているだけではないか？学習データと生成されたデータを比較。

Bralios, Dimitrios, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, and Jonathan Le Roux. 2024. “Generation or Replication: Auscultating Audio Latent Diffusion Models.” In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1156–60. IEEE.

Paper

musicsound

May 13, 2024

📄

AudioLDM: latent diffusionを用いてテキストからオーディオ(環境音、音楽等)を生成するモデル

2023

Liu, Haohe, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, and Mark D. Plumbley. 2023. “AudioLDM: Text-to-Audio Generation with Latent Diffusion Models.” arXiv [cs.SD] . arXiv. http://arxiv.org/abs/2301.12503.

CLAPを用いることでText-to-AudioのSOTAを達成。オープンソース化されていて、すぐに試せるオンラインデモもあり！

Paper

musicsound

February 10, 2023

📄

SingSong — ボーカルを入力に伴奏をまるっと音で生成するモデル

2023

音源分離技術を使ってボーカルとそれに付随する伴奏を抽出。その関係を学習。Ground Truth (元々の曲に入ってた伴奏)には流石に劣るがそれに匹敵するクオリティの曲を生成できるようになった。

Donahue, Chris, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, et al. 2023. “SingSong: Generating Musical Accompaniments from Singing.” arXiv [cs.SD] . arXiv. http://arxiv.org/abs/2301.12662.

Paper

musicsound

January 31, 2023

📄

Moûsai: Latent Diffusionモデルでの音楽生成

2023

Latent Diffusionのアーキテクチャを利用して、テキストから音楽を生成するモデル

Schneider, Flavio, Zhijing Jin, and Bernhard Schölkopf. 2023. “Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion.” arXiv [cs.CL] . arXiv. http://arxiv.org/abs/2301.11757.

Paper

musicsound

January 30, 2023

📄

MusicLM: テキストから音楽を生成するモデル

2023

“a calming violin melody backed by a distorted guitar riff” といったテキストから音楽がサウンドファイルとして生成される. Stable Diffusionの音楽版

Agostinelli, Andrea, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, et al. 2023. “MusicLM: Generating Music From Text.” arXiv [cs.SD] . arXiv. http://arxiv.org/abs/2301.11325.

Paper

musicsoundNLP

January 27, 2023

📄

深層学習を用いたウェーブ・シェーピング合成 - NEURAL WAVESHAPING SYNTHESIS

2021

CPUでもサクサク動くのがポイント！

Hayes, B., Saitis, C., & Fazekas, G. (2021). Neural Waveshaping Synthesis.

Paper

musicsound

January 19, 2022

📄

Wav2CLIP: CLIPを使用したロバストなオーディオ表現学習手法

2021

CLIPからオーディオ表現を抽出する手法であるWav2CLIPを提案。オーディオ分類・検索タスクで良好な結果を残す

Wav2CLIP: Learning Robust Audio Representations From CLIP, Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel (2021)

Paper

soundcross-modalimage

October 31, 2021

💾

synth1B1 - 10億以上のシンセ音と、そのパラメータがペアになったデータセット

2021

なんと総時間は約126年分!! データセットを生成するためにpytorch上に実装された、GPUに最適化されたモジュラーシンセ torchsynthも合わせて公開。

Turian, J., Shier, J., Tzanetakis, G., McNally, K., & Henry, M. (2021). One Billion Audio Sounds from GPU-enabled Modular Synthesis.

Dataset

soundmusic

July 23, 2021

👨‍👩‍👦

Paint with Music - DDSPを用いて絵筆のストロークを音楽に

2021

Google Magentaチームの最新のプロジェクト。2020年に発表した DDSP: Differentiable Digital Signal Processing を使って、絵筆のストロークを楽器音に変えている。筆で描くように音を奏でることができる。

Paint with Music - Google Magenta

Project

musicsound

June 22, 2021

👨‍👩‍👦

演奏者が自身の姿と音を学習したAIと向き合うオーディオビジュアルパフォーマンス: Alexander Schubert - Convergence

2020

GANやAutoEncoderが使われている。2021年のアルスエレクトロニカ Digital Musics & Sound Art 部門のゴールデンニカ(最優秀賞)。

Alexander Schubert - Convergence (2020)

Project

musicperformancesoundGAN

June 18, 2021

📄

パーカッション音の合成 - NEURAL PERCUSSIVE SYNTHESIS

2019

Ramires, A., Chandna, P., Favory, X., Gómez, E., & Serra, X. (2019). Neural Percussive Synthesis Parameterised by High-Level Timbral Features. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2020-May, 786–790. Retrieved from http://arxiv.org/abs/1911.11853

Paper

sound

June 4, 2021

📄

NSynth: Neural Audio Synthesis—WaveNetを用いたAutoencoderで楽器音を合成

2017

WaveNetの仕組みを使ったAutoencoderで、楽器の音の時間方向の変化も含めて、潜在空間にマッピング → 潜在ベクトルから楽器の音を合成する。この研究で使った多数の楽器の音を集めたデータセット NSynth を合わせて公開。

Engel, J. et al. (2017) ‘Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders’. Available

Paper

musicsound

May 28, 2021

⚒️

essentia オーディオ/音楽解析ライブラリ

https://github.com/MTG/essentia

Dmitry Bogdanov, et al. 2013. ESSENTIA: an open-source library for sound and music analysis. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). Association for Computing Machinery, New York, NY, USA, 855–858. DOI:https://doi.org/10.1145/2502081.2502229

Tool

soundmusic

May 26, 2021

💾

1万以上のドラム、パーカッション音のデータセット — Freesound One-Shot Percussive Sounds

2020

ドラム、パーカションのワンショットを集めたデータセット

António Ramires, Pritish Chandna, Xavier Favory, Emilia Gómez, & Xavier Serra. (2020). Freesound One-Shot Percussive Sounds (Version 1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3665275

Dataset

soundmusic

May 25, 2021

📄

GANを使った環境音の生成→環境音の識別モデルの性能向上

2021

環境音の識別モデルの学習のためのData Augmentation手法の提案

Madhu, A. and K, S. (2021) ‘EnvGAN: Adversarial Synthesis of Environmental Sounds for Data Augmentation’.

Paper

soundGAN

May 18, 2021

📄

REAL-TIME TIMBRE TRANSFER AND SOUND SYNTHESIS USING DDSP

2021

Google MagentaのDDSPをリアルタイムに動かせるプラグイン

Francesco Ganis, Erik Frej Knudesn, Søren V. K. Lyster, Robin Otterbein, David Südholt, Cumhur Erkut (2021)

Paper

musicsound

April 14, 2021

📄

Neural Granular Sound Synthesis

2020

グラニュラーシンセシスのGrain(音の粒)をVAEを使って生成しようという試み。Grainの空間の中での軌跡についても合わせて学習。

Hertzmann, A. (2020) ‘Visual indeterminacy in GAN art’, Leonardo. MIT Press Journals, 53(4), pp. 424–428. doi: 10.1162/LEON_a_01930.

Paper

musicsound

March 30, 2021

📄

音と映像の関係性の学習 – Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

2018

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

Paper

soundvisual

May 20, 2018

📄

GANによる音の生成 – Synthesizing Audio with Generative Adversarial Networks

2018

Synthesizing Audio with Generative Adversarial Networks

Paper

GANsound

February 16, 2018

📄

動画からそれにあった音を生成 – Visual to Sound: Generating Natural Sound for Videos in the Wild

2018

Visual to Sound: Generating Natural Sound for Videos in the Wild

Paper

soundvisual

January 3, 2018

👨‍👩‍👦

機械学習による、「演奏」の学習 – Performance RNN: Generating Music with Expressive Timing and Dynamics –

2017

Performance RNN: Generating Music with Expressive Timing and Dynamics

demo

musicsoundperformance

July 1, 2017

📄

画像⇆音の生成 – Deep Cross-Modal Audio-Visual GenerationDeep Cross-Modal Audio-Visual Generation

2017

Deep Cross-Modal Audio-Visual Generation

Paper

visualsound

May 14, 2017

⚒️

声質をコピーする音声合成システム – Lyrebird

2017

Lyrebird

Tool

musicsound

April 27, 2017

👨‍👩‍👦

機械学習を用いたドラムマシン – The Infinite Drum Machine : Thousands of everyday sounds, organized using machine learning.

2017

The Infinite Drum Machine : Thousands of everyday sounds, organized using machine learning

Project

musicvisualsound

April 7, 2017

💾

200万ものサウンド・クリップのデータセット – AudioSet

2017

AudioSet

Dataset

soundmusic

March 26, 2017

📄

CRNNで鳥の声の識別 – Convolutional Recurrent Neural Networks for Bird Audio Detection

2017

Convolutional Recurrent Neural Networks for Bird Audio Detection

Paper

sound

March 13, 2017

👨‍👩‍👦

The Lakh MIDI Dataset v0.1

2016

The Lakh MIDI Dataset v0.1

Dataset

musicsound

December 22, 2016

📄

SoundNet: Learning Sound Representations from Unlabeled Video

2016

Aytar, Yusuf, Carl Vondrick, and Antonio Torralba, "Soundnet: Learning sound representations from unlabeled video.", Advances in neural information processing systems 29, pp892-900 (2016)

Paper

musicsound

December 5, 2016

📄

Self-Supervised VQ-VAE for One-Shot Music Style Transfer

2021

Paper

musicsound

Name	Rows
1	About
2	Facebook
3	Twitter
4	Qosmo
5	Keio SFC Computational Creativity Lab

Tag: sound

Tag: sound

💾
References: AI

Footer

Tag: sound

Tag: sound

💾References: AI

Footer

💾
References: AI