The process of extracting disease name from medical documents is actively carried out in the research of the medical language processing field. Until now, it was almost the case to use a standard disease name typified by ICD standard disease name for disease name extraction. However, in actual medical practice, abbreviations and English names are often used instead of official disease names.
In this way, it is not possible to respond to requests to extract all the information on symptoms and disease name with just a standard disease name. Therefore, we extracted terms related to the symptoms and disease name actually used in the medical field from electronic medical records and discharge summaries.

We social computing laboratory named above data "J-MeDic".
For details on this, please visit the Japanese website

Download of Original data

Could you download above data file

How was "J-MeDic" made?

 Created by a thorough medical record sentence survey

A total of 450,000 symptomatic expressions (about 62,000 types) were obtained as a result of the medical record sentence survey, 28.3% (about 17,000 kinds of types) of them were found not covered with only the standard disease name It was. Three healthcare workers coded symptomatic expressions of high frequency (5,600 disease names appearing 30 times), and those with disagreements were dictionary resourceized with the ambiguity remaining.

Feature

(1)"J-MeDic" has vocabulary on about 130,000 symptoms or disease names

  • We automatically extracted terms related to symptoms and disease names from texts obtained at cooperative medical institutions

(2)Correlation with ICD-10 standard disease name

  • For words related to the symptom(disease name), ICD-10 disease name closest to that word is given

Data specification

Original data (xlsx format)

  • It was taken out of data provided by cooperating medical institutions with occurrence frequency of 100 or more
  • We add symptoms and disease name obtained at the medical scene which contains all the disease name of standard disease name master version 4.01 corresponding to ICD-10.
Format of J-MeDic file
Column name Description
①Surface layer Symptoms or disease names extracted from the J-MeDic or ICD-10 standard disease name master
②ICD-10 code ICD-10 Code listed on the standard disease name master
③Standard disease name corresponding to ICD-10 Standard disease name listed on the standard disease name master of ICD-10
④Reliability level S:Symptom or disease name described in ICD-10 corresponding standard disease name master (approximately 25000 disease names)
A: Symptom or disease name to which the same code was given by three medical staff
B: Symptom or disease name to which the same code was given by one or more medical staff
C: Symptom or disease name that could not be given code
D: Symptom or disease name automatically assigned by the computer.
⑤Label Compound character string consisting of ICD-10 code and standard disease name