Data Paper Zone II Versions EN1 Vol 3 (3) 2018
Download
Database of quantum chemical calculation results based on compounds molecular structure
 >>
: 2018 - 06 - 08
: 2018 - 08 - 02
: 2018 - 09 - 29
673 7 0
Abstract & Keywords
Abstract: At present, there is a lack of basic physical property data and thermodynamic data for a large number of the compounds of known structure. To promote data integrity and usability in chemistry databases, this study performs geometric structure optimization, thermodynamic data analysis, and spectrum calculation on the structure of about 200,000 compounds using Gaussian03 software, based on compounds structure data and compounds profiles. Then, compound types, together with their thermodynamic parameters and spectral data, are extracted from quantum calculation results according to international standards, including infrared absorption spectrum, dipole moment, exact polarizability, rotational temperature and constant, zero-point vibrational energy, zero-point correction, molecular internal energy, molecular kinetic energy, enthalpy, free energy, heat capacity at constant volume, and entropy. A second processing (i.e., data analysis, mining and duplicate checking) results in quantum chemical calculation data of 18,000 compounds (including spectral data of 5,321 compounds). To improve data (re)usability, all the resulted data have been standardized and processed.
Keywords: compounds structure; quantum chemical calculation; data analysis and processing; thermodynamic data; spectral data
Dataset Profile
Chinese title量化计算结果数据库
English titleA database of quantum chemical calculation results
Data corresponding authorHan Qingzhen (qzhan@ipe.ac.cn)
Data authorsHan Qingzhen, Zhao Yuehong, Wen Hao
Time range2015 – 2017
Data format*.xls
Data service system<http://www.dx.doi.org/10.11922/sciencedb.630>
Sources of fundingNational Science & Technology Infrastructure Program of China – Fundamental Science Data Sharing Platform (DKA2017-12-02-05);
CAS Informatization Program of the Thirteenth Five-Year Plan – "Key Database Construction and Application Services for the Discipline of Chemistry" (XXH1350303-103).
Database compositionThe dataset consists of 13 subsets in total covering the following aspects: Dipole moment (Debye), Exact polarizability, Approx. polarizability, Rotational temperatures (Kelvin), Rotational constants (GHZ), Zero-point vibrational energy (kJ/mol), Zero-point correction (Hartree/Particle), Thermal energy (kJ/mol), Thermal enthalpy (kJ/mol), Thermal free energy (kJ/mol), Total molecular kinetic energy (kJ/mol), CV (J/Mol-Kelvin), S (J/Mol-Kelvin). The database contains one compressed file “Quantumdata.xls”, which stores thermodynamic chemical properties data resulted from quantum calculation.
1.   Introduction
With the increasing demand for the new materials and the improvement of research and development capabilities in recent years, a great many materials of specific properties emerged to meet varied requirements. However, for insufficient data on the thermodynamic properties of these materials and their derivatives, we have limited knowledge about many compounds of known structures. Under these circumstances, it benefits researchers in chemical engineering simulation and molecular material design to perform quantum chemical calculations for the compounds already included in existing chemical databases, and to obtain corresponding thermodynamic and spectral data. Therefore, it is of great significance to develop the database of quantum chemical calculation results.
As this database serves for chemical engineering simulation and material design, relevant data cannot be obtained from existing literature or through experiments. As such, we adopted reliable quantum chemical simulation methods. In the process, these data can be further tested through future calculations on the one hand, and on the other hand, they can be utilized in chemical engineering simulation, molecular design, water pollution treatment, air purification, and so on. Eventually, a database like an inquiry library is created to serve the customers for free. Moreover, the database will be gradually optimized into a sound data management and information service system providing web-based service and information inquiry. Furthermore, this database will be integrated into the ChemDB as an affiliated sub-library.
2.   Data collection and processing
2.1   Data pre-processing
The research objects of this study mainly consisted of traffic nodes and traffic lines. Historical documents such as Shiji,7 Hanshu ,8 Houhanshu9 and Parthian Stations10 provided important sources to identify the traffic nodes' historical names, supplemented by archaeological materials such as bamboo slips unearthed at Juyan and Dunhuang. To locate these nodes required us to correlate their historical names with corresponding modern designations. For this end, we referred to historical documents to locate each historical site to metropolis, prefecture seat, county seat, inhabited locality, bridges, and pass. These documents included Historical Atlas of China,11 Cihai. Geographical Volume: Historical Geography ,12 An Atlas of Chinese Cultural Relics,13–17 and the 3rd National Cultural Relics Survey. The historical sites extracted were then correlated with their modern designations by using the latest administrative data, including Administrative Divisions of the People's Republic of China18 and the Administrative Division Network (http://www.xzqh.org/html/).19
The core of traffic route restoration was to determine the direction of the route and the regions it passed through. The route direction was determined based on the descriptive texts of historical documents and relevant archaeological discoveries, supplemented by research and investigation findings like A History of the Silk Road Transportation. After basic data for the traffic nodes and traffic lines were collected, we used EXCEL files to store attribute tables for the traffic nodes, and WORD documents to store textual description of the traffic routes.
2.2   Methods of thermodynamic data calculation
We adopt Gaussian031 and Gaussian092 and the B3LYP/6-31G3 basis set, and set the temperature T=298.15K and the pressure P=101.3KPa. Structural optimization and frequency calculation are performed on all the input Gaussian files to obtain thermodynamic parameters of all the compounds. Then, batch processing is used to search for output files that terminate normally and convergently, through which to extract the compounds’ optimized geometrical parameters and complete thermodynamic parameters, as well as their dipole moment, exact polarizability, approx polarizability, rotational temperature, rotational constants, zero-point vibrational energy, zero-point correction, molecular internal energy, molecular kinetic energy, enthalpy, free energy, heat capacity at constant volume, and entropy. Unit conversion and standalization4 are performed to generate a standard table (Table 1), these data are then stored in the database of quantum chemical calculation results for online access. The flowchart for data calculation and database building is shown in Figure 1.
Table 1   Thermodynamic parameters of the compounds and their units
Thermodynamic parametersSymbols and DefinitionsSI units
Dipole MomentDipole Moment (μ)Debye
Exact PolarizabilityExact Polarizability
Approx PolarizabilityApprox Polarizability
Rotational TemperaturesRotational TemperaturesK
Rotational ConstantsRotational ConstantsGHZ
Zero-point Vibrational EnergyZero-point Vibrational EnergykJ∙mol−1
Zero-point CorrectionZero-point CorrectionHartree/Particle
Thermal EnergiesThermal EnergieskJ∙mol−1
Thermal EnthalpiesThermal EnthalpieskJ∙mol−1
Thermal Free EnergiesThermal Free EnergieskJ∙mol−1
Total Molecular Kinetic EnergyTotal Molecular Kinetic Energy (298.15 K)kJ∙mol−1
Heatq or QJ
Workw or WJ
Inner EnergyJ
EnthalpyJ
Thermodynamic TemperatureTK
EntropyJ∙K−1
Gibbs Free EnergyJ
Isobaric Heat CapacityJ∙K−1
Isothermal CapacityJ ∙mol−1∙K−1
Heat Capacity Ratio\(\mathrm{\gamma }\left(\mathrm{o}\mathrm{r} k\right)=\mathrm{C}\mathrm{p}/\mathrm{C}\mathrm{v}\)
Compression Factor
Chemical PotentialJ∙mol−1
Standard Chemical Potential or J∙mol−1
Standard Reaction Gibbs Free EnergyJ∙mol−1
Reaction AffinityJ∙mol−1
Standard Reaction EnthalpyJ∙mol−1
Standard Reaction EntropyJ∙mol−1∙K−1
Equilibrium Constant


Fig.1   Flowchart showing data calculation and database building
2.3   Methods of optical spectrum calculation
Spectrum calculation is performed on the converted input files of the compounds, through which to obtain optimized geometrical configuration, complete frequency analysis data, and spectral data, as shown in Table 2. Batch processing is used to search for output files that terminate normally and convergently, through which to extract corresponding compouds. Then GaussSum2.2 is used to extract oscillation frequency of the compounds and corresponding infrared optical absorption data. Finally, Gnuplot is used to plot all the related figures to generate a compressed file named “Spectum.zip”. The results are also stored into the database of quantum chemical calculation results for online access.
Table 2   Spectral data of the compounds and their units
Spectrum parametersSymbols and DefinitionsSI units
Wavelengthλm
Refractive Index
FrequencyHz
Circle/Angle Frequencyω = 2πνs−1, rad∙s−1
Wave Number\(\stackrel{ˇ}{\mathrm{v}\mathrm{ }}=\frac{v}{{c}_{0}}=\frac{1}{n\stackrel{~}{\lambda }} Vacuum\)
\(\sigma =\frac{1}{\stackrel{~}{\lambda }} Media\)
m−1
m−1
Planck ConstantshJ∙s
Absorption Ratio/Factor
Absorbance
Transition Wavenumberm−1
Transition FrequencyHz
Electron Itemsm−1
Vibration Itemsm−1
Rotation Itemsm−1
Rotation Constants\(\stackrel{~}{\mathrm{A}},\stackrel{~}{B},\stackrel{~}{C} \stackrel{~}{\mathrm{A}}=\frac{\hslash }{8{\pi }^{2}c{ I}_{A}} Wave number \)
\(A, B, C A=\frac{\hslash }{8{\pi }^{2}{ I}_{A}} Frequency\)
m−1
Hz
Asymmetric Parameters
Harmonic Vibration Wavenumber\({\mathrm{\varpi }}_{e} ;{\mathrm{\varpi }}_{r}\)m−1
Molecular Electric Dipole Moment\(\mathrm{p}\mathrm{ }\left(\mathrm{o}\mathrm{r}\mathrm{ }\mathrm{\mu }\right) Ep=-p*E\)C∙m
Molecular Magnetic Dipole Moment\(\mathrm{m}\mathrm{ }\left(\mathrm{o}\mathrm{r}\mathrm{ }\mathrm{\mu }\right) Ep=-m*E\)J∙T−1
Molecular Transition Dipole Moment\(\mathrm{M}\mathrm{ }\left(\mathrm{o}\mathrm{r}\mathrm{ }\mathrm{R}\right)=\int \psi \text{'}p\psi \text{'}d\tau \)C∙m
Chemical Displacement
2.4   Database use facilitation
The database of quantum chemical calculation results is integrated into ChemDB as a sublibrary, which provides diverse means for data query and features online global access. In order to be uniform with ChemDB, the database adopts identical labels like ID, CASRN, InChIKey and SRN. The retrieving and restoring methods are listed in Table 3.
Table 3   Retrieving and restoring of the compound labels
LabelsRetrieving and Restoring Methods
IDCompound ID comes from the CAS RN of the compound, subject to check code verification. Compounds whose CAS RN can not be determined are named in format of “B+serial number”, such as “B2000166”.
CAS RNID and CAS RN of the chemical database are collected from various data literature, which are then subject to check code verification.
InChIKeyInChIkey of the compound is generated by using InChI Software Version 1.02 shared by the International Union of Pure and Applied Chemistry.
SRNSRN is generated by the compound structure login system of the compound reference library. It is a decimal integer consisting of an ontological part and a digit verification code.1
Note: The SRN check code is generated by using the Mode 11 Calculator of the ISO 7064:1983 standard.
3.   Description of data samples
Till now, more than 200,000 compounds have been calculated, which generated thermodynamic data of 25, 000 compounds. The quantum chemical calculation results of about 18, 000 compounds have been appended into the database. As more compounds are being calculated, the data amount will continue to increase.
By analyzing specific contents of the result contents, some elements of the database are determined, which are used as the basis for designing the database structure. The index structure of the database showcases the required elements and their order confirmed through targeted analysis. Descriptions of the index structure are shown in Table 4.
Table 4   Index structure of the database
IndexSampleNote
OseChemX C.20110210.111413.257D
8 8 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2124 0.7000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.4249 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.6373 0.7000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.6373 2.1000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.4249 2.8000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2124 2.1000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2124 0.7000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 1 0
3 4 2 0
4 5 1 0
5 6 2 0
6 7 1 0
2 7 2 0
1 8 2 0
M END
Compound English NamebenzaldehydeOptional
Compound Chinese NamebenzaldehydeMandatory
Compound FormulaC7H6OOptional
CAS RN100-52-7Optional
Internal NumberE-PN-001Mandatory
The table of required elements is set mainly to standardize and structure the data fields, in order to form a table for database structure analysis and design, and as references for program writing (Table 5).
The table of required elements is described as follows: a) Element Chinese Name: Chinese name of the data field per the database. b) Element English Name: English name of the data field per the database. c) Data Type: numerical data are stored in int files, textual data in text files, and others as varchar type. d) Size: space required of the element (unit: byte). An English character or Arabic number is represented by one byte, and a Chinese character by two bytes.
Table 5   Required elements
Element Chinese NameElement English NameData TypeSize
化合物结构Str_compoundImage
化合物InChI_KeyInChI keyvarchar27
化合物InChI码InChItext
分子式Formulavarchar100
中文名称Name_CNvarchar100
英文名称Name_ENvarchar150
中文别名OthName_CNtext
英文别名OthName_ENtext
化合物CAS登录号CASRNint10
内部idIDvarchar25
分子量Mol_weightfloat
吉布斯自由能Gfloat
偶极矩Dipolefloat
绝对极化率Exact Polarizabilityfloat
相对极化率Approx Polarizabilityfloat
转动温度Rotational Temperaturesfloat
转动常数Rotational Constantsfloat
零点振动能Zero-point Vibrational Energyfloat
零点校正Zero-point Correctionfloat
分子热能Thermal Energiesfloat
分子反应焓Thermal Enthalpiesfloat
分子反应自由能Thermal Free Energiesfloat
分子总动能Total Molecular Kinetic Energy (298.15 K)float
热力学温度Tfloat
Sfloat
等容热容Cvfloat
4.   Data quality control and assessment
The basic principle and strategy of the database construction rests on the reliability of the data collected. Mature and verified theoretical methods and calculation models are used for the quantum chemical calculation of the compound structure to ensure the reliability of the collected data. During data collection, the calculation results which are obviously wrong or not in accordance with the basic laws of physical chemistry have been eliminated. Data are input, updated and processed in batches at an interval of about half a year, while manual input is carried out on a monthly basis. The database of quantum chemical calculation results sets the following requirements on its data: first, time range: the structure of the compounds is calculated within 100 hours of their convergence; second, discipline scope: the compounds are mainly from chemical subject databases; third, data amount: about 18,000 compounds have been accumulated till now. The data amount will continue to increase as the server continues to calculate new data sources; fourth, data accuracy: usually 5 decimals are retained; fifth, language: English and Chinese; sixth, data type: the basic types of the data include text, number, picture, custom binary format, etc.
At present, the input and output data types of the database include text, numerical, image and custom binary formats. Both the data types and values are within the normative range of the database system. The optimized results of the quantum chemical calculation are filtered and extracted, which are then sorted using EXCEL. In order to ensure the quality of the data, typos and type errors have been checked and eliminated, and data formats have been converted where appropriate.
5.   Data usage and recommendation
The data of the quantum chemical calculation results are described in 13 categories, including Dipole Moment, Exact Polarizability, Approx Polarizability, Rotational Temperatures, Rotational Constants, Zero Point Vibrational Energy, Zero Point Correction, Thermal Energies, Thermal Enthalpies, Thermal Free Energies, Total Molecular Kinetic Energy, CV and Entropy. Furthermore, the database also provides the infrared spectrum of partial structures. Users can query corresponding thermodynamic calculation data by entering the CAS RN, structure, SRN, or InChi key of the compounds, so there will be no need for tedious molecular structure model construction, calculation analysis, results extraction, and so on. It facilitates the evaluation and analysis, structural design, chemical and industrial applications of new materials.
1.
Frisch MJ, Trucks GW, Schlegel HB et al. Gaussian 03, Revision B.03. Pittsburgh, PA: Gaussian Inc, 2003.
2.
Frisch MJ, Trucks GW, Schlegel HB et al. Gaussian 09, Revision A.02. Pittsburgh, PA: Gaussian Inc, 2009.
3.
Becke AD. Density-functional thermochemistry. III. The role of exact exchange. Journal of Chemical Physics 98(1993): 5648 – 5652.
4.
Jiang L. The Chemical Data Sources Integration Research Based on Ontology of ChDR. Master’s Thesis, University of Chinese Academy of Sciences (Beijing), 2015.
Data citation
1. Han Q, Zhao Y & Wen H. A database of quantum chemical calculation results. Science Data Bank. DOI: 10.11922/sciencedb.630 (2018).
Article and author information
How to cite this article
Han Q, Zhao Y & Wen H. Database of quantum chemical calculation results based on compounds molecular structure. China Scientific Data 3(2018). DOI: 10.11922/csdata.2018.0037.zh
Han Qingzhen
database creation; calculation, analysis, update and maintenance of the quantum data.
qzhan@ipe.ac.cn
PhD, Associate Professor; research area: computational chemistry and chemical engineering.
Zhao Yuehong
database creation, operation and maintenance.
PhD, Associate Professor; research area: computational chemistry and chemical engineering.
Wen Hao
database creation and development.
PhD, Professor; research area: computational chemistry and chemical engineering.
National Science & Technology Infrastructure Program of China – Fundamental Science Data Sharing Platform (DKA2017-12-02-05);CAS Informatization Program of the Thirteenth Five-Year Plan – "Key Database Construction and Application Services for the Discipline of Chemistry" (XXH1350303-103)
Publication records
Published: Sept. 29, 2018 ( VersionsEN1
Released: Aug. 2, 2018 ( VersionsZH2
Published: Sept. 29, 2018 ( VersionsZH3
References
中国科学数据
csdata