ReDial推荐对话数据集

ReDial（推荐对话）是一个带注释的对话数据集，用户可以在其中相互推荐电影。该数据集是由一个团队工作的研究人员（Polytechnique Montréal, MILA – Quebec AI Institute, Microsoft Research Montréal, HEC Montreal, and Element AI.）收集

数据集介绍

ReDial（推荐对话）是一个带注释的对话数据集，用户可以在其中相互推荐电影。该数据集包含 10,000 多个以提供电影推荐为主题的对话。

官方网址：https://redialdata.github.io/website/
paperwithcode：https://paperswithcode.com/dataset/redial

结构

Structure
The dataset is published in the “jsonl” format, i.e., as a text file where each line corresponds to a Dialogue given as a valid JSON document.

A Dialogue contains these fields:

conversationId: an integer
initiatorWorkerId: an integer identifying to the worker initiating the conversation (the recommendation seeker)
respondentWorkerId: an integer identifying the worker responding to the initiator (the recommender)
messages: a list of Message objects
movieMentions: a dict mapping movie IDs mentioned in this dialogue to movie names
initiatorQuestions: a dictionary mapping movie IDs to the labels supplied by the initiator. Each label is a bool corresponding to whether the initiator has said he saw the movie, liked it, or suggested it.
respondentQuestions: a dictionary mapping movie IDs to the labels supplied by the respondent. Each label is a bool corresponding to whether the initiator has said he saw the movie, liked it, or suggested it.

Each Message contains these fields:

messageId: a unique ID for this message
text: a string with the actual message. The string may contain a token starting with @ followed by an integer. This is a movie ID which can be looked up in the movieMentions field of the Dialogue object.
timeOffset: time since start of dialogue in seconds
senderWorkerId: the ID of the worker sending the message, either initiatorWorkerId or respondentWorkerId.

The labels in initiatorQuestions and respondentQuestions have the following meaning:

suggested: 0 if it was mentioned by the seeker, 1 if it was a suggestion from the recommender
seen: 0 if the seeker has not seen the movie, 1 if they have seen it, 2 if they did not say
liked: 0 if the seeker did not like the movie, 1 if they liked it, 2 if they did not say

Dataset Size
The dataset contains a total of 11348 dialogues, 10006 for training and model selection, and 1342 for testing.

引用

如果您在研究中使用 ReDial，请使用以下 BibTeX 条目引用论文：

@inproceedings{li2018conversational,
  title={Towards Deep Conversational Recommendations},
  author={Li, Raymond and Kahou, Samira Ebrahimi and Schulz, Hannes and Michalski, Vincent and Charlin, Laurent and Pal, Chris},
  booktitle={Advances in Neural Information Processing Systems 31 (NIPS 2018)},
  year={2018}
}

2.9 KiB Raw Permalink Blame History

ReDial推荐对话数据集

数据集介绍

结构

引用

2.9 KiB

Raw Permalink Blame History