#620 【“我为开源打榜狂” 第5期】ReDial 数据集上传

Closed
created 1 year ago by ZhangbuDong · 1 comments
数据集地址:https://openi.pcl.ac.cn/ZhangbuDong/ReDial/datasets?type=-1
ZhangbuDong commented 1 year ago
Poster
# ReDial推荐对话数据集 ReDial(推荐对话)是一个带注释的对话数据集,用户可以在其中相互推荐电影。该数据集是由一个团队工作的研究人员(Polytechnique Montréal, MILA – Quebec AI Institute, Microsoft Research Montréal, HEC Montreal, and Element AI.)收集 ## 数据集介绍 ReDial(推荐对话)是一个带注释的对话数据集,用户可以在其中相互推荐电影。该数据集包含 10,000 多个以提供电影推荐为主题的对话。 官方网址:https://redialdata.github.io/website/ paperwithcode:https://paperswithcode.com/dataset/redial ## 结构 Structure The dataset is published in the “jsonl” format, i.e., as a text file where each line corresponds to a Dialogue given as a valid JSON document. A Dialogue contains these fields: - conversationId: an integer - initiatorWorkerId: an integer identifying to the worker initiating the conversation (the recommendation seeker) - respondentWorkerId: an integer identifying the worker responding to the initiator (the recommender) - messages: a list of Message objects - movieMentions: a dict mapping movie IDs mentioned in this dialogue to movie names - initiatorQuestions: a dictionary mapping movie IDs to the labels supplied by the initiator. Each label is a bool corresponding to whether the initiator has said he saw the movie, liked it, or suggested it. - respondentQuestions: a dictionary mapping movie IDs to the labels supplied by the respondent. Each label is a bool corresponding to whether the initiator has said he saw the movie, liked it, or suggested it. Each Message contains these fields: - messageId: a unique ID for this message - text: a string with the actual message. The string may contain a token starting with @ followed by an integer. This is a movie ID which can be looked up in the movieMentions field of the Dialogue object. - timeOffset: time since start of dialogue in seconds - senderWorkerId: the ID of the worker sending the message, either initiatorWorkerId or respondentWorkerId. The labels in initiatorQuestions and respondentQuestions have the following meaning: - suggested: 0 if it was mentioned by the seeker, 1 if it was a suggestion from the recommender - seen: 0 if the seeker has not seen the movie, 1 if they have seen it, 2 if they did not say - liked: 0 if the seeker did not like the movie, 1 if they liked it, 2 if they did not say Dataset Size The dataset contains a total of 11348 dialogues, 10006 for training and model selection, and 1342 for testing. ## 引用 如果您在研究中使用 ReDial,请使用以下 BibTeX 条目引用论文: ``` @inproceedings{li2018conversational, title={Towards Deep Conversational Recommendations}, author={Li, Raymond and Kahou, Samira Ebrahimi and Schulz, Hannes and Michalski, Vincent and Charlin, Laurent and Pal, Chris}, booktitle={Advances in Neural Information Processing Systems 31 (NIPS 2018)}, year={2018} } ```
lip01 added the
dataset
label 1 year ago
zeizei closed this issue 1 year ago
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.