NTCIR13 MedWeb

News

Aug 28, 2018
- We are pleased to announce that NTCIR-13 Test Collection: MedWeb is available for non-participated research groups for research purpose use. To download the collection, please fill out the form in the online application page.
Nov 30, 2017
- The program of NTCIR-13 MedWeb session is available. MedWeb task session will be held on Thursday, December 7, 2017 at 11:00 a.m. to 1:00 p.m. (JST). Poster session is scheduled after the task session at 1:00 p.m. to 2:30 p.m. (JST). We're looking forward seeing you soon!
Oct 11, 2017
- Paper check resutls were sent to task participants. Please contact us if your group didn't recieve any notification even though you submitted the paper.
Sep 18, 2017
- Task organizer's draft paper was sent to task participants who submitted formal run results.
Sep 4, 2017
- Evaluation results of formal run were sent to task participants who submitted formal run results. Please contact us if your group didn't recieve any e-mail even though you submitted the formal run results.
Aug 8, 2017
- The submission of formal run results is closed.
July 24, 2017
- Test data corpus distribution is started. Task participants who submitted NTCIR-13 MedWeb user agreement form can access the revised corpus from the download link sent by NTCIR office before.
July 5, 2017
- Training corpus redistribution is started. Task participants who submitted NTCIR-13 MedWeb user agreement form can access the revised corpus from the download link sent by NTCIR office before.
July 4, 2017
- Both Japanese and English annotation guidelines are now available (See Dataset).
May 2, 2017
- Training corpus distribution is started. Task participants who submitted NTCIR-13 MedWeb user agreement form will get the download link via e-mail from NTCIR office. (See Annotation Guideline and Dataset for the details of dataset)
- Task content is changed. (See MedWeb Task)
Apr 21, 2017
- For all participants : NTCIR-13 MedWeb user agreement form is now available.
Apr 10, 2017
- For participants of Twitter subtask: Both Japanese and English annotation guidelines for Twitter task are now available (See Dataset).
- For participants of Blog subtask: Unfortunately organizers decided to withdraw Blog subtask. Twitter task will be proceeded with that as scheduled.

Welcome to MedWeb (Medical Natural Language Processing for Web Document)

Recently, an increasing number of medical records is being stored in the form of electronic media instead of paper media -- making digital information processing in fields more and more necessary. Nowadays, this trend in information processing focuses not only on electronic health records but also on various data coming from patients. This data we call patient texts include social media texts, web blogs, and so on.

NTCIR-13 MedWeb (Medical Natural Language Processing for Web Document) task provides Twitter-like message texts (in Japanese, English, and Chinese), and then requires to classify them. In detail, MedWeb consists of Twitter task (Japanese subtask, English subtask, and Chinese subtask). Since these subtask settings can be formalized as multi-label classification of disease/symptom-related texts, the achievements of this task can almost be directly applied to a fundamental engine for actual applications.

MedWeb Task

Twitter task - ja, en, zh

Japanese subtask
English subtask
Chinese subtask

This task requires participants to perform a multi-label classification that labels for 8 diseases/symptoms must be assigned to each tweet. According to the registered subtasks (Japanese subtask:ja, English subtask:en, Chinese subtask:zh), training data and test data will be distributed to task participants. Given tweets, the output are Positive:p or Negative:n labels for 8 diseases/symptoms. In this task, the target diseases/symptoms are not limited to influenza only since this also deals with other 7 diseases/symptoms including diarrhea/stomachache, hay fever, cough/sore throat, headache, fever, runny nose, and cold. These targets are designed based on the advice of a Japanese government research center (National Institute of Infectious Diseases (NIID)).

Annotation Guideline and Dataset

Annotation Guideline

Japanese
- [ja-ver2.0 (July 4, 2017)]
- [ja-ver1.0 (Apr 3, 2017)]
English
- [en-ver2.0 (July 4, 2017)]
- [en-ver1.0 (Apr 10, 2017)]

Dataset

Training corpus distribution is started via e-mail from NTCIR office. Test data will be distributed on July 24, 2017 (See Important Dates).
Participants will obtain the following data:

Japanese subtask: Training data 1,920 tweets, Test data 640 tweets (2,560 tweets in total)
English subtask: Training data 1,920 tweets, Test data 640 tweets (2,560 tweets in total)
Chinese subtask: Training data 1,920 tweets, Test data 640 tweets (2,560 tweets in total)

These tweets are related to 8 diseases/symptoms include influenza, diarrhea/stomachache, hay fever, cough/sore throat, headache, fever, runny nose, and cold. Note that the tweet data crawled using Twitter API is not allowed to release due to the Twitter’s developer policy concerning data redistribution. Therefore, we are planning to use quasi-tweets (in Japanese) for 8 diseases/symptoms by means of a crowdsourcing. We also generate English and Chinese corpus by translating a part of quasi-tweets from Japanese into English and Chinese.

（1）Training Data（May 1~）(Revised on July 5)

Training data corpus consists of 1,920 tweet texts (75% of the whole corpus) with labels. Each tweet is attached Positive:p or Negative:n labels for 8 diseases/symptoms, respectively.

An example of training data
ID	Tweet	Influenza	Diarrhea	Hayfever	Cough	Headache	Fever	Runnynose	Cold
8888ja	I’m so down with the flu.	p	n	n	n	n	p	n	n

(2) Test Data (July 24~)

Test data corpus consists of 640 tweet texts (25% of the whole corpus) without labels.

Important Dates

~~Aug 24, 2016~~	~~NTCIR-13 Kick-off event in Tokyo: Introduction of MedWeb (O)(P)~~
~~Mar 31, 2017~~	~~Task Registration Deadline (P) (Extended)~~
~~Apr 3, 2017~~	~~Annotation Guideline Distribution (O)~~
~~May 1, 2017~~	~~Training Corpus Distribution (O)~~
~~May 1-Jul 24, 2017~~	~~Dry Run (P)~~
~~Jul 24, 2017~~	~~Test Data Distribution (O)~~
~~Jul 24-Aug 7, 2017~~	~~Formal Run (P)~~
~~Aug 7, 2017~~	~~Run Result Submission Due Date (P)~~
~~Sep 4, 2017~~	~~Evaluation Result Release (O)~~
~~Sep 18, 2017~~	~~Early Draft Task Overview Release (O)~~
~~Sep 25, 2017~~	~~Task Participant Paper (Draft) Submission Due Date (P)~~
~~Oct 9, 2017~~	~~Paper Check and Notification (O)~~
~~Nov 1, 2017~~	~~Task Participant Paper (Camera Ready) Submission Due Date (P)~~
~~Dec 5-8, 2017~~	NTCIR-13 Conference @ NII, Tokyo, Japan. (O)(P) MedWeb task session will be held on Thursday, December 7, 2017 at 11:00 a.m. to 1:00 p.m. (JST). Poster session is scheduled after the task session at 1:00 p.m. to 2:30 p.m. (JST).

*(P) and (O) indicate dates that should be done by participants and organizers, respectively.

Awards

Best System Award [Award certificate]

Hayate Iso, Camille Ruiz, Taichi Murayama, Katsuya Taguchi, Ryo Takeuchi, Hideya Yamamoto, Shoko Wakamiya and Eiji Aramaki
(NTCIR13 MedWeb Task: multi-label classification of tweets using an ensemble of neural networks)

Best Student Award [Award certificate]

Reine Asakawa and Tomoyoshi Akiba
(AKBL at the NTCIR-13 MedWeb Task)

Organizer

	ARAMAKI Eiji, Ph.D. (Nara Institute of Science and Technology）
	WAKAMIYA Shoko, Ph.D. (Nara Institute of Science and Technology）
	MORITA Mizuki, Ph.D. (Okayama University）
	KANO Yoshinobu, Ph.D. (Shizuoka University）
	OHKUMA Tomoko, Ph.D. (Fuji Xerox)