Abstract: The dataset is based on an integration of the English terminology of Chinese medicine (internal draft) developed by the people's Health Publishing House (PMPH), the WHO International Standard Terminologies on Traditional Medicine in the Western Pacific Region formulated by the World Health Organization (WHO) and the International Standard Chinese-English Basic Nomenclature of Chinese Medicine formulated by the World Federation of Chinese Medicine Associations (WFCMS), which aims to promote the standardization of Chinese Medicine terms and international communication of TCM. Through Python pandas package and OCR technology, the dataset is cleaned, sorted and merged. Finally, it is divided into 56 categories. A total of 16189 data are sorted out and merged to 8981 terms. The dataset promotes the standardization of TCM terms, facilitates academic communications and inheritance and development of TCM, and is convenient for informatization construction of TCM.
Keywords: Chinese medicine; terminology; Chinese-English