💾

ジョークを集めたデータセット – Collection of over 200,000 short jokes for humour research

Entry
ジョークを集めたデータセット – Collection of over 200,000 short jokes for humour research
Simple Title
Collection of over 200,000 short jokes for humour research
Type
Dataset
Year
2017
Posted at
March 26, 2017
Tags
society
image

Abstract

Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes.

This dataset is in the form of a csv file containing 231,657 jokes. Length of jokes ranges from 10 to 200 characters. Each line in the file contains a unique ID and joke.

Data

約23万個ものジョークが、[“id”,”phrase”]のcsvファイルにまとめられていて、見てるだけでも結構楽しめます(笑)。

フレーズを使う場面の情報もセットで公開されていると、もっと応用できそうですが、kaggleのプロジェクトページで分析にかけたnotebookを公開している人もいます。

Links