The process of extracting disease name from medical documents is actively carried out in the research of the medical language processing field. Until now, it was almost the case to use a standard disease name typified by ICD standard disease name for disease name extraction. However, in actual medical practice, abbreviations and English names are often used instead of official disease names.
In this way, it is not possible to respond to requests to extract all the information on symptoms and disease name with just a standard disease name. Therefore, we extracted terms related to the symptoms and disease name actually used in the medical field from electronic medical records and discharge summaries.

We social computing laboratory named above data "J-MeDic".
For details on this, please visit the Japanese website


 ■The latest version
 ・Update date:2017/11/17, Data:MANBYO_v9, Size of file:1.49MB
 ■old version
 ・Update date:2017/08/23, Data:MANBYO_v5, Size of file:9.34MB

How was "J-MeDic" made?

A total of 450,000 symptomatic expressions (about 62,000 types) were obtained as a result of the medical record sentence survey, 28.3% (about 17,000 kinds of types) of them were found not covered with only the standard disease name It was. Three healthcare workers coded symptomatic expressions of high frequency (5,600 disease names appearing 30 times), and those with disagreements were dictionary resourceized with the ambiguity remaining.


(1)Vocabulary on about 130,000 symptoms or disease names

  • We automatically extracted terms related to symptoms and disease names from texts obtained at cooperative medical institutions

(2)Correlation with ICD-10 standard disease name

  • For words related to the symptom(disease name), ICD-10 disease name closest to that word is given

Data specification

Data format
Column name Description
①Surface layer Symptoms or disease names extracted from the J-MeDic or ICD-10 standard disease name master
②ICD-10 code ICD-10 Code listed on the standard disease name master
③Standard disease name corresponding to ICD-10 Standard disease name listed on the standard disease name master of ICD-10
④Degree of Reliability S: Symptom or disease name described in ICD-10 corresponding standard disease name master (approximately 25000 disease names)
A+: Symptom or disease name to which the same code was given by three medical staff with agrerement
A: Symptom or disease name to which the same code was given by two medical staff with agrerement
B+: Symptom or disease name to which the same code was given by one or more medical staff with agreement
B: Symptom or disease name to which the same code was given by one or more medical staff
C: Symptom or disease name that could not be given code
D: Symptom or disease name automatically assigned by the computer
⑤Label Compound character string consisting of ICD-10 code and standard disease name

Distribution by Degree of Reliability(MANBYO_v9)



"J-MeDic" is created with extreme care as much as possible. But we do not guarantee that "J-MeDic" has no mistake. As a result, something wrong with using "J-MeDic" when you use it, our laboratory will not take any responsibility, so please use it at your own risk in case you use it for research, etc.