Apr 21, 2017
For all participants : NTCIR-13 MedWeb user agreement form is now available.
Apr 10, 2017
For participants of Twitter subtask: Both Japanese and English annotation guidelines for Twitter task are now available (See Dataset).
For participants of Blog subtask: Unfortunately organizers decided to withdraw Blog subtask. Twitter task will be proceeded with that as scheduled.
This task requires participants to perform a multi-label classification that labels for 8 diseases/symptoms must be assigned to each tweet. According to the registered subtasks (Japanese subtask:ja, English subtask:en, Chinese subtask:zh), training data and test data will be distributed to task participants. Given tweets, the output are Positive:p or Negative:n labels for 8 diseases/symptoms. In this task, the target diseases/symptoms are not limited to influenza only since this also deals with other 7 diseases/symptoms including diarrhea/stomachache, hay fever, cough/sore throat, headache, fever, runny nose, and cold. These targets are designed based on the advice of a Japanese government research center (National Institute of Infectious Diseases (NIID)).
Training corpus distribution is started via e-mail from NTCIR office. Test data will be distributed on July 24, 2017 (See Important Dates).
Participants will obtain the following data:
Japanese subtask: Training data 1,920 tweets, Test data 640 tweets (2,560 tweets in total)
English subtask: Training data 1,920 tweets, Test data 640 tweets (2,560 tweets in total)
Chinese subtask: Training data 1,920 tweets, Test data 640 tweets (2,560 tweets in total)
These tweets are related to 8 diseases/symptoms include influenza, diarrhea/stomachache, hay fever, cough/sore throat, headache, fever, runny nose, and cold. Note that the tweet data crawled using Twitter API is not allowed to release due to the Twitter’s developer policy concerning data redistribution. Therefore, we are planning to use quasi-tweets (in Japanese) for 8 diseases/symptoms by means of a crowdsourcing. We also generate English and Chinese corpus by translating a part of quasi-tweets from Japanese into English and Chinese.
Training data corpus consists of 1,920 tweet texts (75% of the whole corpus) with labels. Each tweet is attached Positive:p or Negative:n labels for 8 diseases/symptoms, respectively.
|8888ja||I’m so down with the flu.||p||n||n||n||n||p||n||n|
Test data corpus consists of 640 tweet texts (25% of the whole corpus) without labels.
May 1, 2017
|Training Corpus Distribution (O)|
May 1-Jul 24, 2017
|Dry Run (P)|
Jul 24, 2017
|Test Data Distribution (O)|
Jul 24-Aug 7, 2017
|Formal Run (P)|
Aug 7, 2017
|Run Result Submission Due Date (P)|
Sep 4, 2017
|Evaluation Result Release (O)|
Sep 18, 2017
|Early Draft Task Overview Release (O)|
Sep 25, 2017
|Task Participant Paper (Draft) Submission Due Date (P)|
Oct 9, 2017
|Paper Check and Notification (O)|
Nov 1, 2017
|Task Participant Paper (Camera Ready) Submission Due Date (P)|
Dec 5-8, 2017
|NTCIR-13 Conference @ NII, Tokyo, Japan. (O)(P)|
|ARAMAKI Eiji, Ph.D. (Nara Institute of Science and Technology）|
|WAKAMIYA Shoko, Ph.D. (Nara Institute of Science and Technology）|
|MORITA Mizuki, Ph.D. (The University of Okayama）|
|KANO Yoshinobu, Ph.D. (Shizuoka University）|
|OHKUMA Tomoko, Ph.D. (Fuji Xerox)|
|MASUICHI Hiroshi, Ph.D. (Fuji Xerox)|
|Nara Institute of Science and Technology|