Entry

言語モデルの学習にかかる環境負荷の比較

Simple Title

Emma Strubell, Ananya Ganesh, Andrew McCallum (2019)

Description

GPT-2などの言語モデルについて、その精度ではなく、学習時に消費している電力及び、二酸化炭素の放出量についてまとめた。この研究の試算では、例えばTransformer の学習に、一般的な自動車のライフサイクルの約5台分、アメリカ人約17人の一年分に相当するカーボンフットプリントがあることがわかった。

Type

Paper

Year

2019

Posted at

June 10, 2021

Overview - 何がすごい?

GPT-2などの言語モデルについて、その精度に焦点が当たっているが、学習時に無視できない電力を消費し、結果 CO2の放出に寄与していることは見逃されがち。

この研究の試算では、例えばTransformer をNAS(Neural Architecture Search -パラメータのチューニング)付きで走らせると、一般的な自動車のライフサイクルの約5台分、アメリカ人約17人の一年分に相当するカーボンフットプリントがあることがわかった。

Abstract

Recent progress in hardware and methodology for training neural networks has ushered in a new generation of large networks trained on abundant data. These models have obtained notable gains in accuracy across many NLP tasks. However, these accuracy improvements depend on the availability of exceptionally large computational resources that necessitate similarly substantial energy consumption. As a result these models are costly to train and develop, both financially, due to the cost of hardware and electricity or cloud compute time, and environmentally, due to the carbon footprint required to fuel modern tensor processing hardware. In this paper we bring this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training a variety of recently successful neural network models for NLP. Based on these findings, we propose actionable recommendations to reduce costs and improve equity in NLP research and practice.

Motivation

NLPモデルの精度の向上が著しいが、モデルが巨大化するにつれて、モデルを学習するための金銭的コストや環境に与える影響が見逃せないレベルになってきた。しかし、それらを分析するレポートはこれまでなかった。
クラウドのGPUなども普及してきたが、NLPモデルの学習にかかる時間とGPUの数が急増するにつれて一般的なアカデミアの研究者の手の届かないコストになりつつある。

Methods

GPT-2, Transformer, BERTなど、学習のためのコードが公開されているモデルについて、NVIDIA Titan X GPU,1台で一日学習し直してみる。
各オリジナル論文で言及されている

利用されたGPUの機種
GPUの台数
学習にかかった日数

を参考に、NVIDIA Titan X GPUのエネルギー消費量とGPU/CPUの使用パーセンテージを比較して、そのモデルの学習に利用された総電力量を算出する。

各地域の電力源の割合を参考に、CO2の放出量を算出

$\mathrm{CO}_{2} \mathrm{e}=0.954 p_{t}$ $p_t$ がキロワットアワーの時にアメリカでは左の計算式でCO2量を計算できる。単位は pounds (1ポンドはだいたい0.45kg)

エネルギー源の割合

Results

ハイライト

各モデルのCO2のエミッションとクラウドGPUの利用料

BERTモデルを一つ学習する(79時間)のにアメリカ横断のフライトと同じくらいの放出量
Transformer をNAS(Neural Architecture Search -パラメータのチューニング)付きで走らせると、一般的な自動車のライフサイクルの約5台分、アメリカ人約17人の一年分に相当する
下から二番目のモデルはBLEU(翻訳モデルの精度の指標)を 0.1向上させたと論文では報告されているが、そのために$150K = 1700万円のクラウドの利用料がかかっている！ (そしてCO2も)
TPUについては電力消費のデータが公開されていないのでCO2の放出量がわからない

これらのレポートを踏まえた具体的な問題提起/提案

精度だけなく、学習コストも論文に明記すべき = コストパフォーマンスも評価の対象にする

ハイパーパラメータの調整に無駄なコストをかけなくていいように、そもそもハイパーパラメータの変更でどのくらい精度が変わるのかも明記する

このままだとGoogleやAmazonのような企業の研究者しかNLPの研究ができなくなる → 国は大学の研究者が使える共有マシンを用意すべき

商用のクラウドではなく、据置のマシンにすることで、学習にかかるお金が半減する計算

Further Thoughts

思っている以上に環境への影響が大きい。
クラウドの利用料が半端ないことはよくわかった。これではとてもじゃないが一般のユーザは支えない。3億円近い額がかかっているモデルってなんだよ！
Githubのコードを使うと、そのマシンのロケーションから電力源の再生エネルギー/火力発電などの割合から、学習の際に発生したCO2の量についても計算してくれる。

githubで公開されているコードのアウトプット

NFTのカーボンフットプリントなどとも合わせて考えたい

The Unreasonable Ecological Cost of #CryptoArt

22.01.2021 More accurate (based on Gas) figures for NiftyGateway23.01.2021 Anonymized all data. For verification purposes only, details available upon request10.02.2021 Moved some footnotes (e.g. ETH2) to main body26.02.2021 A much shorter summary of this article has been published in the online publication Flash Art under the title " This post has attracted a lot ofcriticism for being too "one-sided".

memoakten.medium.com

The Unreasonable Ecological Cost of #CryptoArt

Links

論文の関連記事

Shrinking deep learning's carbon footprint

MIT researcher Neil Thompson and colleagues show the ability of deep learning models to surpass key benchmarks tracks their nearly exponential rise in computing power use. At this rate, deep nets will survive only if they become radically more efficient.

news.mit.edu

Shrinking deep learning's carbon footprint

Training a single AI model can emit as much carbon as five cars in their lifetimes

The artificial-intelligence industry is often compared to the oil industry: once mined and refined, data, like oil, can be a highly lucrative commodity. Now it seems the metaphor may extend even further. Like its fossil-fuel counterpart, the process of deep learning has an outsize environmental impact.

www.technologyreview.com

Training a single AI model can emit as much carbon as five cars in their lifetimes

Energy and Policy Considerations for Deep Learning in NLP