Zone II • Versions EN2
Abstract: For lack of spatial data of industrial output value, risk and disaster assessment in the industrial economy cannot effectively address global change. Against this background, we developed a new method to spatialize industrial output value based on a combination of DMSP/OLS (Defense Meteorological Satellite Program / Operational Linescan System) nighttime light data, MODIS (Moderate Resolution Imaging Spectroradiometer) annual vegetation data, industrial land distribution data and urbanization rate, through which to build a 1-km grid dataset of industrial output value in China. Major steps for creating this dataset included: (1) Preprocess raw data and select stable light data; (2) Construct an Enhanced Vegetation Index (EVI)-adjusted nighttime light index (EANTLI); (3) Obtain an optimum light index based on industrial land distribution data; (4) Construct a spatial distribution model of industrial output value; (5) Verify data accuracy. We randomly selected 105 cities nationwide to assess the accuracy of the dataset. Results show that the relative error of whole samples ranged from 0% to 39.6%, the relative error of most samples was less than 15%, and the average accuracy of the dataset was as high as 81.40%. The dataset broke the ground in distinguishing the spatial data of industrial output from those of service output. Breaking the limits of administrative boundaries, this dataset directly reflects the spatial and temporal distribution of industrial output value in China, which can be used to identify key industrial areas or to discern the changing trend of the industries.
Keywords: industrial output value; China; DMSP/OLS; spatialization
|Chinese title||2010 年中国工业产值公里格网数据集|
|English title||A 1-km grid dataset of industrial output value in China (2010)|
|Data authors||Xue Qian, Song Wei, Zhu Huiyi|
|Data corresponding author||Song Wei (email@example.com)|
|Geographical scope||Mainland China|
|Spatial resolution||1000 m|
|Data volume||15.1 MB|
|Data format||*.tif ,*.shp, *.mxd, *.xlsx|
|Data service system||<http://www.sciencedb.cn/dataSet/handle/551>|
|Sources of funding||National Key Research and Development Program – Global Change and Mitigation Project: Global change risk of population and economic system: mechanism and assessment (Grant No. 2016YFA0602402)|
|Dataset composition||The dataset is composed of three subsets.|
(1) “1 km industrial output value grid data set of China (2010).gab.zip” stores the administrative boundary data of mainland China and the spatial distribution data of its industrial output value in 2010. The spatial distribution data are on a 1-km grid, totaling a data volume of 13.8 MB.
(2) “Accuracy verification.xls” stores the statistical industrial output values of randomly sampled 90 sample cities in China and their assessed accuracy. The data volume is 23 kB.
(3) “1 km industrial output value grid data set of China (2010).mxd” is a programmable ArcGIS data file, with a data volume of 730 KB.
Risk analysis in the social and economic system is an important component of climate change studies.1 When it comes to industrial systems, climate change not only indirectly affects industrial processes such as raw material storage, processing, and transportation by means of mean value shift,2 but extreme climate events may also negatively impact each link or staff involved in industrial production (e.g., resulting in infrastructure damage, casualties, etc.).3–4 Accurate assessment of the risks and losses in an industrial economic system caused by climate change depends on an exposure analysis of the system,5 while the spatial distribution mapping of industrial output provides important basic data for the analysis.
In recent years, considerable progresses have been made in the spatial distribution mapping of gross domestic product (GDP) worldwide.6 However, with conventional remote sensing approaches, it is difficult to achieve accurate spatial distinction between the output values of secondary and tertiary industries. Grid data of industrial output are hence relatively scarce in China or even on a global scale. Existing spatial mappings of industrial output are largely on provincial-, municipal-, or county-level scales,7 which usually use administrative regions as their spatial resolutions that cannot characterize the spatial distribution of industrial output within the province or city. As a result, in risk assessment, it is difficult to carry out overlay analyses on these industrial output mappings and relevant climate grid data. While there are spatialized data sets for the output value of certain specific industries,8–9 few are large-scale (nationwide), high-resolution and comprehensive. In this context, we conducted remote sensing inversion on the industrial output of China in 2010 based on a combination of DMSP/OLS nighttime light data, provincial statistical data of industrial output, and MODIS vegetation index products. The results were then modified through a validation against the spatial distribution data of industrial land, thus forming the 1-km grid data set of industrial output in China. This dataset provides basic data for risk and disaster assessment in the industrial economic system against the background of global climate changes.
Establishment of this dataset involves the following steps: data collection, preprocessing of nighttime light data and vegetation index data, establishment of an adjusted nighttime light index based on enhanced vegetation index (EANTLI), as well as light modification, distribution model establishment, and data accuracy verification by using the spatial distribution data of industrial land (Figure 1).
2.1 Data sources
“Industry” here is defined in accordance with the statistical standard of the World Bank, which covers mining, manufacturing, construction, electricity, gas and water production, and the supply industry. The World Bank’s definition differs from that of China’s National Bureau of Statistics in that the former incorporates construction into the scope of industry. Hence, industrial outputs in this study are the sum of China’s industrial and construction outputs in 2010. These output data are obtained from China Statistical Yearbook released by China’s National Bureau of Statistics, and from the statistical yearbooks of respective provinces (http://www.stats.gov.cn/tjsj/ndsj/).
For nighttime light data, we selected the nighttime mean light product obtained through DMSP/OLS (an American meteorological satellite) imaging, which was not radiometrically calibrated. The light intensity changes recorded by this product could reflect geographic entity information. In particular, its stable light data are often used for urban region extraction10–11 and socio-economic data inversion.12,13 So we selected stable light data (https://ngdc.noaa.gov) with a spatial resolution of 1 km. The grey value of the pixel (digital number, DN) ranged between 0 – 63, and the data were acquired in the year 2010. Saturation and overflow phenomena in DMSP/OLS data would weaken the correlation between light data and social economic data,14 thus affecting the inversion accuracy. As previous studies have shown a good correlation between vegetation index and light intensity change.15 vegetation index could be used to eliminate the impact of saturation and overflow on socio-economic data inversion. We selected the enhanced vegetation index (EVI) from the synthetic vegetation index product (MOD13A3, with a spatial resolution of 1 km) (http://ladsweb.nascom.nasa.gov/data) to preprocess the saturation and overflow phenomena in nighttime light data.
In addition, we selected the spatial distribution data of industrial land in 2010 from the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (http://www.resdc.cn) to modify the EANTLI index.
2.2 Data processing
The correlation between nighttime light data and vegetation index was used to eliminate saturation and overflow in nighttime light data and to establish an optimal light index. The light index was then modified by using the spatial distribution data of industrial land to produce optimal light data for the industrial land, which were then combined with industrial output data to establish a distribution model. To correct output value underestimations in western China caused by the regions' lower industrial output and scattered distribution of industrial land, we used urbanization rate to modify the poorer light values in Xinjiang, Tibet, and two other provinces, through which the 1-km grid data set of China's industrial output in 2010 was built. Specific steps of data processing are as follows:
(1) Nighttime light data were preprocessed through administrative boundary clipping, binarization, and normalization, during which light intensity values were extracted and normalized light intensity values were calculated.
(2) MODIS vegetation index data were processed. EVI values of China from January to December in 2010 were extracted from the MOD13A3 product, which then underwent preliminary clipping, mosaicking and projection.
(3) EVI data were processed. Mean values of EVI were calculated. Considering monthly variations of the nighttime light data, the mean value data of EVI for the 12 months were selected. Regions with EVI values less than 0.01, namely water areas and bare rocks, were also excluded.
(4) An optimal light index was established via the following formula:14
where EANTLI is the optimal light index, NTLn is the normalized light intensity, EVIi is the processed EVI data, and NTL is the original light intensity.
By greatly eliminating light saturation and overflow, EANTLI helps highlight the light intensity variations within the city, which is conducive to economic data inversion.
(5) The EANTLI index data obtained in Step 4 were modified through the spatial distribution data of industrial land. As this study assumed that industrial output only exists for industrial land, we extracted light values of the industrial land from the EANTLI index data based on industrial land use data, through which to obtain an optimal light index for industrial land, EI.
(6) Based on China Statistical Yearbook released by China’s National Bureau of Statistics,16 the industrial output data of each province (2010) were calculated. A regression analysis was then performed on the industrial output data and the optimal light index to establish a distribution model:
where I represents the industrial output of each grid, Ii represents the industrial output of a specific province, EIi represents the optimal light index for industrial land of a specific province.
Through this step, the 1-km grid data of China’s industrial output were initially obtained.
(7) Data validation in Step 6 shows that due to lower industrial output and more scattered distribution of industrial land, industrial output values were significantly underestimated in Xinjiang, Qinghai, Tibet, and Yunnan. To correct the underestimations, we used supplementary data – land urbanization rates of these four provinces, which are significantly correlated with industrial output on the municipal scale (with a confidence level of 0.01), to establish a multivariate linear regression model to modify the industrial output data. The modification model is as follows:
where Ig represents modified industrial output, In represents industrial output of the n-th grid obtained in Step 6, Un represents urbanization rate of the n-th grid, a and b are the coefficients of In and Un respectively, and c is a constant.
(8) Industrial output data of each city were validated against the municipal statistical data. Only when the data accuracy was verified could the data set be viewed as final, or the distribution model would be reestablished if otherwise.
Through data processing, we obtained the 1-km grid data set of China’s industrial output in 2010. This data set clearly reflected spatial distribution characteristics of the industrial output by kilometers. Due to their geographical and policy advantages, China’s coastal regions showed higher industrial output values, especially the Yangtze River delta, the Pearl River delta and the Bohai Rim (Figure 2). By contrast, located in the hinterlands with poorer transportation conditions, northwestern China revealed significantly lower industrial output values, as well as a more scattered distribution of the values.
In the data validation process, we carried out correlation analysis on the provincial industrial land optimal light data and the industrial output data. The results showed that the two data groups were significantly correlated at the confidence level of 0.01, with a correlation coefficient of 0.72. At the same time, we randomly selected 105 municipal administrative units (Figure 3). We performed a statistical analysis on the 1-km grid industrial output data of the randomly selected 105 cities by using the statistical tools in ArcGIS, and compared the grid data with calculated industrial output data in the municipal statistical yearbooks. Overall, the average accuracy reached 81.40% (Figure 4). To provide valuable references for in-depth analysis of other industries, we further selected cities of different industry types from the 105 validation samples for accuracy comparison. According to the classification criterion prescribed by the Sustainable Development Planning of China’s National Resource-based Cities (2013 – 2020) and other relevant literature,17,18 we selected 11 resource-based cities, 10 comprehensive cities, and 17 industry-oriented cities (Table 1). The validation results showed an overall average accuracy of 82.27% for the 11 resource-based cities, among which that for the seven coal-based cities was 82.75%, that for the non-ferrous metal-based city was 94.08%, and that for the three oil-based cities was 77.22%; in comparison, the 10 comprehensive and 17 industry-oriented cities had an overall average accuracy of 82.34% and 77.25%, respectively. On the whole, underdeveloped regions revealed a slightly lower industrial output inversion accuracy, while developed regions such as the Yangtze River delta, the Pearl River delta, and the Bohai Rim had a relatively higher accuracy. Lower accuracy for the western regions could be mainly attributed to their weak regional light intensity, caused by a lower level of economic development, as well as smaller internal differences in economic development. To solve this problem, we modified the industrial output data of Xinjiang, Tibet, Qinghai, and Yunnan by using urbanization rate data of these provinces which had a good correlation with the industrial output values. The accuracy after data modification significantly increased from 62.4 to 74.5% (by 12.1%).
Figure 4 Comparisons of the industrial output value between this dataset and the statistical yearbook, and the data accuracy
Notes: The horizontal axis represents the serial number of the selected city for accuracy validation; the left vertical axis represents the spatialized industrial output value of each sampled city in the statistical yearbook, and the right vertical axis represents the validation accuracy of the spatialized industrial output value.
|Resource-based cities||Comprehensive cities||Industry-oriented cities|
|City||Accuracy (%)||City||Accuracy (%)||City||Accuracy (%)||City||Accuracy (%)|
|Non-ferrous metal-based cities||Huludao||94.08||Foshan||86.60||Wenzhou||75.29|
|Average accuracy (%)||82.27||82.34||77..5|
Data processing was conducted on the platform of the geographic information system software, ArcGIS, and the entire process was scientific, rational, and normative, thus ensuring the accuracy and reliability of the spatial resolution of the data.
As the 1-km grid dataset of industrial output breaks the constraints of administrative boundaries, it enables industrial output data to be inverted on a large scale based on nighttime light data to obtain spatialized grid data with high accuracy and high spatial resolution. Thus, the grid data can be used for visually analyzing the quantity and spatiotemporal distribution of industrial output values in different regions, providing data support for identifying key industrial areas, discerning the changing trends of industrial structure and evaluating the efficiency of industrial land use in China. In particular, the dataset provides basic data for industrial exposure and vulnerability assessments under climate changes, thus informing industrial layout planning and industrial disaster early-warning and disaster assessment under extreme conditions.
In addition, the definition of industry adopted by our study, which refers to the standard of the World Bank, allows the data set to be used for industrial output assessment on a global scale. Although there are many regionally or globally spatialized data products on the market, most are for GDP or population. In particular, existing spatialized data sets for GDP mostly do not distinguish data of the secondary and tertiary industries, whereas few separate spatialized data products for the secondary and tertiary industries are available. Besides, the methods of remote sensing inversion, industrial land modification, and urbanization rate modification proposed by our study can help effectively distinguish the spatial output values of the secondary and tertiary industries, which enrich social and economic data spatialization methods and provide valuable references for similar research in the future. Lastly, the average accuracy of our data set reached 81.40%, which is significantly better than that of previous datasets of the kind.
This data set includes 1-km industrial output value grid data files (.tif), validation data files (.xls), and a processing process database (.gdb). All the data can be read or written directly by the ArcGIS software or by calling related function libraries using the mainstream programming languages such as Python, thus enabling batch processing of the data.
With a spatial resolution of 1 km, this national-scale industrial output data set meets the accuracy demand of global climate change studies. 1 km is a spatial resolution commonly used for nighttime light image data, MODIS vegetation index data, and nation-scale land use grid data; therefore, data with this resolution best match the basic spatial data used for drawing industrial output distribution maps. Considering that the basic data used for spatial inversion of industrial output always have a spatial resolution of 1 km, the resolution of this data set has reached the highest demanded value. To match lower-resolution meteorological data on a large scale (such as 0.1°, 0.25°, etc.), the scale of this dataset can be increased by using percentage grid method, which does not reduce the data accuracy after the resolution is decreased.
In addition, data update will be performed according to the update cycles of the basic data sets. Since land use data, nighttime light image data, MODIS vegetation index data, and statistical yearbook data have an update cycle of 5 years, 1 year, 1 year, and 1 year, respectively, 5 year will be taken as the update cycle of this data set. To incorporate more grid data of industrial output in future, we will conduct multi-source data fusion simulation to produce the spatial data of industrial output under different climate change scenarios using dynamical downscaling methods.
Working Group I, IPCC. Climate change 2013: The physical science basis. Contribution of Working 43 (2013): 866 – 871.
Wang SR. Impact of Climate Change on Sustainable Economic and Social Development in China and Its Response. Beijing: Science Press, 2011.
Cruz AM & Krausmann E. Vulnerability of the oil and gas sector to climate change and extreme weather events. Climatic Change 121 (2013): 41 - 53.
Lereboullet A, Beltrando G & Bardsley D. Socio-ecological adaptation to climate change: A comparative case study from the Mediterranean wine industry in France and Australia. Agriculture Ecosystems & Environment 164 (2013): 273 – 285.
Sutton PC & Costanza R. Global estimates of market and non-market values derived from nighttime satellite imagery, land cover, and ecosystem service valuation. Ecological Economics 41 (2002): 509 – 527.
Liu YW & Yan QW. Spatial distribution simulation of carbon emission with grid transformation based on SLM in China. Geography and Geo-Information Science 31 (2015): 76 – 80.
Liu XL & Fang CL. Wind energy resources distribution and spatial differences of wind power industry in China. Marietta: American Scholars Press, 2007.
Dong L, Liang H, Gao Z et al. Spatial distribution of China's renewable energy industry: Regional features and implications for a harmonious development future. Renewable & Sustainable Energy Reviews 58 (2016): 1521 – 1531.
Huang X, Schneider A & Friedl MA. Mapping sub-pixel urban expansion in China using MODIS and DMSP/OLS nighttime lights. Remote Sensing of Environment 175 (2016): 92 – 108.
Chen Z, Yu B, Song W et al. A new approach for detecting urban centers and their spatial structure with nighttime light remote sensing. IEEE Transactions on Geoscience & Remote Sensing 55 (2017): 1 – 15.
Wang W, Cheng H & Zhang L. Poverty assessment using DMSP/OLS night-time light satellite imagery at a provincial scale in China. Advances in Space Research 49 (2012): 1253 – 1264.
Wu J, Wang Z, Li W et al. Exploring factors affecting the relationship between light consumption and GDP based on DMSP/OLS nighttime satellite imagery. Remote Sensing of Environment 134 (2013): 111 – 119.
Zhuo L, Zheng J, Zhang X et al. An improved method of night-time light saturation reduction based on EVI. AAG, Chicago, USA, 2015.
Wang Z, Yao F, Li W et al. Saturation correction for nighttime lights data based on the relative NDVI. Remote Sensing 9 (2017): 759 – 772.
National Bureau of Statistics. China Statistical Yearbook 2011. Beijing: China Statistics Yearbook, 2011.
Zhang WJ, Liu SM & Zheng YD. Review on economic transformation of China’s resources-based cities. Resources & Industries 17 (2015): 22 – 26.
1. Chen M, Xu S, Liu H et al. A 1-km grid dataset of industrial output value in China (2010). Science Data Bank. DOI: 10.11922/sciencedb.551
How to cite this article
Xue Q, Song W & Zhu H. A 1-km grid dataset of industrial output value in China (2010). China Scientific Data 3(2018). DOI: 10.11922/csdata.2017.14.zh