NTCIR13::MedWeb
The Fourth Medical NLP Shared Task
The One and Only Medical Language Processing Contest

Welcome to MedWeb (Medical Natural Language Processing for Web Document)

What's new: The task registration deadline is extended until March 31, 2017.

Recently, an increasing number of medical records is being stored in the form of electronic media instead of paper media -- making digital information processing in fields more and more necessary. Nowadays, this trend in information processing focuses not only on electronic health records but also on various data coming from patients. This data we call patient texts include social media texts, web blogs, and so on.

NTCIR-13 MedWeb (Medical Natural Language Processing for Web Document) task provides two different types of texts: Twitter message texts (in Japanese, English, and Chinese) and disease journal texts (in Japanese), and then requires to classify them or extract disease information from them. In detail, MedWeb consists of two subtasks: (1) Twitter subtask (in Japanese, English, and Chinese) and (2) Blog subtask (in Japanese). Since these subtask settings can be formalized as (1) binary-classification of disease/symptom-related texts and (2) medical codes labeling to disease or symptom names in patients’ texts, the achievements of this task can almost be directly applied to a fundamental engine for actual applications.


MedWeb Task

(1) Twitter subtask - ja, en, ch

This subtask requires participants to classify a given tweet into two categories: patient or not. The input are tweets data while the output are tweets tagged 1 (patient) or 0 (not). In this subtask, the target diseases/symptoms are not limited to influenza only since this also deals with other 7 diseases/symptoms including diarrhea/stomachache, hay fever, cough/sore throat, headache, fever, runny nose, and cold. These targets are designed based on the advice of a Japanese government research center (National Institute of Infectious Diseases (NIID)). The detailed definition of individual targets is explained in the annotation guideline (available at the website [doi: 10.6084/m9.figshare.3123160.v1]).

(2) Blog subtask - ja

In this subtask, participants are challenged to extract disease expressions (complaints) from a given cancer patient article. In detail, the input is a set of 110 blog articles (=10 actual patients x 11 days’ articles). The output are tagged Japanese texts. The tag is three folds: for privacy information, for complains, and for DateTime.


Dataset

Training Corpus and test data will be distributed on on May 1, 2017 and July 24, 2017, respectively (See Important Dates). Participants will obtain the following data:

(1) Twitter subtask - ja, en, ch

The 8 diseases/symptoms include influenza, diarrhea/stomachache, hay fever, cough/sore throat, headache, fever, runny nose, and cold. Note that tweets data crawled using Twitter API is not allowed to release due to the Twitter’s developer policy concerning data redistribution. Therefore, we are planning to collect quasi-tweets (in Japanese) for 8 diseases/symptoms by means of a crowdsourcing. We also generate English and Chinese corpus by translating a part of quasi-tweets from Japanese into English and Chinese.

(2) Blog subtask - ja

Important Dates

Aug 24, 2016
NTCIR-13 Kick-off event in Tokyo: Introduction of MedWeb (O)(P)
Mar 31, 2017
Task Registration Deadline (P) (Extended) [Online Registration]
Apr 3, 2017
Annotation Guideline Distribution (O)
May 1, 2017
Training Corpus Distribution (O)
May 1-Jul 24, 2017
Dry Run (P)
Jul 24, 2017
Test Data Distribution (O)
Jul 24-Aug 7, 2017
Formal Run (P)
Aug 7, 2017
Run Result Submission Due Date (P)
Sep 4, 2017
Evaluation Result Release (O)
Sep 18, 2017
Early Draft Task Overview Release (O)
Sep 25, 2017
Task Participant Paper (Draft) Submission Due Date (P)
Oct 9, 2017
Paper Check and Notification (O)
Nov 1, 2017
Task Participant Paper (Camera Ready) Submission Due Date (P)
Dec 5-8, 2017
NTCIR-13 Conference @ NII, Tokyo, Japan. (O)(P)
*(P) and (O) indicate dates that should be done by participants and organizers, respectively.

Registration

Go to How to Participate in NTCIR-13 Task

Organizer

ARAMAKI Eiji, Ph.D. (Nara Institute of Science and Technology
WAKAMIYA Shoko, Ph.D. (Nara Institute of Science and Technology
MORITA Mizuki, Ph.D. (The University of Okayama
KANO Yoshinobu, Ph.D. (Shizuoka University
OHKUMA Tomoko, Ph.D. (Fuji Xerox)

Advisor

MASUICHI Hiroshi, Ph.D. (Fuji Xerox)

Sponsorship

Nara Institute of Science and Technology

Link

NTCIR MedNLP-Doc

NTCIR MedNLP-2

NTCIR MedNLP-1

mednlp.jp

NII (National Institute of Informatics)

NTCIR-13