ReDial推荐对话数据集
ReDial(推荐对话)是一个带注释的对话数据集,用户可以在其中相互推荐电影。该数据集是由一个团队工作的研究人员(Polytechnique Montréal, MILA – Quebec AI Institute, Microsoft Research Montréal, HEC Montreal, and Element AI.)收集
数据集介绍
ReDial(推荐对话)是一个带注释的对话数据集,用户可以在其中相互推荐电影。该数据集包含 10,000 多个以提供电影推荐为主题的对话。
官方网址:https://redialdata.github.io/website/
paperwithcode:https://paperswithcode.com/dataset/redial
结构
Structure
The dataset is published in the “jsonl” format, i.e., as a text file where each line corresponds to a Dialogue given as a valid JSON document.
A Dialogue contains these fields:
- conversationId: an integer
- initiatorWorkerId: an integer identifying to the worker initiating the conversation (the recommendation seeker)
- respondentWorkerId: an integer identifying the worker responding to the initiator (the recommender)
- messages: a list of Message objects
- movieMentions: a dict mapping movie IDs mentioned in this dialogue to movie names
- initiatorQuestions: a dictionary mapping movie IDs to the labels supplied by the initiator. Each label is a bool corresponding to whether the initiator has said he saw the movie, liked it, or suggested it.
- respondentQuestions: a dictionary mapping movie IDs to the labels supplied by the respondent. Each label is a bool corresponding to whether the initiator has said he saw the movie, liked it, or suggested it.
Each Message contains these fields:
- messageId: a unique ID for this message
- text: a string with the actual message. The string may contain a token starting with @ followed by an integer. This is a movie ID which can be looked up in the movieMentions field of the Dialogue object.
- timeOffset: time since start of dialogue in seconds
- senderWorkerId: the ID of the worker sending the message, either initiatorWorkerId or respondentWorkerId.
The labels in initiatorQuestions and respondentQuestions have the following meaning:
- suggested: 0 if it was mentioned by the seeker, 1 if it was a suggestion from the recommender
- seen: 0 if the seeker has not seen the movie, 1 if they have seen it, 2 if they did not say
- liked: 0 if the seeker did not like the movie, 1 if they liked it, 2 if they did not say
Dataset Size
The dataset contains a total of 11348 dialogues, 10006 for training and model selection, and 1342 for testing.
引用
如果您在研究中使用 ReDial,请使用以下 BibTeX 条目引用论文:
@inproceedings{li2018conversational,
title={Towards Deep Conversational Recommendations},
author={Li, Raymond and Kahou, Samira Ebrahimi and Schulz, Hannes and Michalski, Vincent and Charlin, Laurent and Pal, Chris},
booktitle={Advances in Neural Information Processing Systems 31 (NIPS 2018)},
year={2018}
}