Zone II • Versions EN1
Abstract: Inland water distribution is an indispensable component of global water security and management, climate research and dynamic monitoring of ecological environment. Based on GF-1 data and Landsat8 OLI data, we used the minimum-redundancy maximum-relevancy (mRMR) feature selection algorithm and the object-oriented knowledge rule set to automatically extract the land water distribution of Hainan Island during 2013–2017. By using high-resolution remote sensing images and Google Earth images, the merging method is developed to verify and enhance the classification accuracy. Results show a Kappa coefficient of 84.67%，85.98%，80.61%，88.66% and 90.66% for the subsets respectively, demonstrating a high accuracy of the classification results. This dataset can be directly used to study the spatial and temporal distribution of water on land surface. It also provides the fundamental data basis for environmental research on water, such as the study of water-body and the safety assessment of water resources.
Keywords: inland water; object-oriented knowledge rule set; Hainan Island; medium and high resolution remote sensing data
|English title||A Dataset of Inland Water Distribution 2013–2017 in Hainan Island Based on Remote Sensing|
|Corresponding author||Meng Qingyan (firstname.lastname@example.org)|
|Data author(s)||Xu Fen, Meng Qingyan, Zhang Linlin|
|Geographical scope||Hainan Island(1810’N–2010’N, 10837’ E – 11103’E)|
|Spatial resolution||15 m,16 m||Data volume||11.6MB|
|Data service system||<http://www.sciencedb.cn/dataSet/handle/673>|
|Source(s) of funding||Major Science and Technology Plan of Hainan Province (ZDKJ2016021); Science and Technology Program of Sichuan Province (2018JZ0054).|
|Dataset/Database composition||This dataset consists of six subsets in total, which store data of water distribution in five consecutive years and a verification point data. Each subset contains eight files: *.dbf, *.prj, *.shp, *.shx.,* .jpeg, *.sbx, *.sbn, *.xml. An example is shown as follows:|
1. 2015.dbf is the dBASE table file for vector data in 2015.
2. 2015. prj is the coordinate projection file for vector data in 2015.
3. 2015. shp is the main file for vector data in 2015.
4. 2015. shx is the index file for vector data in 2015; geographic data geometric feature index.
5. 2015.jpeg is the water distribution image product for the year 2015.
6. 2015*.sbx is the vector data index files, storage features characteristic index.
7.2015*.sbn is the vector data index files, storage features characteristic index.
8. 2015*.xml is the description of the data and convenient to realize data exchange.
Water is an irreplaceable strategic resource for human survival and social development. Basic information on water is indispensable for global water security and management, climate change research, and ecological environment dynamic monitoring[1,2,3] . Water resources on Hainan Island is rich, with a per capita possession of 3,700 m3, which is 1.76 times of the national average. With the change of the natural environment and the development of social economy, the flooding caused by surface water and the pollution of water bodies have become focal issues for the human society[5,6] . Therefore, it is necessary to develop surface water products to monitor the rapid changes in surface water dynamics and grasp the spatial change patterns of surface water in a timely manner. Doing so, it is also helpful for risk reduction.
The research results for evaluating the mesoscale dynamics of large scale water surface products, generated and mapped at home and abroad, mainly rely on satellite data with low spatial resolution . For example, the Global Lakes and Wetlands Database (GLWD) provided by the United Nations Environment Program World Conservation Monitoring Center (WCMC), including lakes, reservoirs and wetlands, is determined by the area and volume of the polygon vector. The spatial resolution of GLWD is 30 arc seconds (about 1 km) and covers the range from 60°S to 90°N. However in reality, most of the inland water bodies are relatively small, and is difficult to be extracted by low spatial resolution satellite data. Therefore, medium-high spatial resolution remote sensing images are of great significance for water extraction in mesoscale regions. In this paper, using the GF-1 data and the Landsat 8 OLI data, information on the water body in Hainan Island in the rainy seasons from 2013 to 2017 is obtained by setting the object-oriented knowledge rule and artificial correction. On the one hand, it can provide effective information for revealing the distribution and patterns of dynamic change for water bodies. On the other hand, it can provide accurate water boundary information for water quality monitoring and water environmental assessment. The surface water information covering the land area of Hainan Island, based on medium and high spatial resolution remote sensing image extraction, can provide data support for future researches on water resources and environmental change.
2.1 Data collection
2.1.1 Area coverage
Hainan Island is located on 18°10′N–20°10′ N and 108°37E′–111°03′ E. It has a typical tropical monsoon climate with distinct dry season and wet season. The wet season is from May to October every year, in which the total precipitation is about 1 500mm, accounting for 70%~90% of the total annual precipitation. From November to the next April is the dry season, and the precipitation only accounts for 10%~30% of the yearly total. The terrain of Hainan island is high in the middle region and low in the surrounding areas, and most of the rivers originate from the central mountainous areas, forming a radial river system.
2.1.2 Data source
The high spatial resolution remote sensing image provides an important data source for accurate extraction of land water information. However, with the increase on spatial resolution, the image coverage becomes smaller, and the data needed to cover the whole island grow larger. Due to the cloudy and rainy weather in Hainan island, it is difficult to meet the requirements of overall coverage with less or no cloud data. On the one hand, the data of medium and high spatial resolution can meet the requirements of accurate extraction of water information. On the other hand, it can meet the requirements of covering the whole range. In this paper, GF-1 data and Landsat 8 OLI data covering the whole land area of Hainan island from 2013-2017 rainy season (May–October) are taken as the data source, of which spatial resolution is 16 m and 15 m respectively and cloud cover is less than 9%. GF-1 data is from the China Resources Satellite Application Center (http://www.cresda.com/CN/), and Landsat 8 OLI data is from the geospatial data cloud website (http://www.gscloud.cn/). The coverage of remote sensing images is shown in Figure 1.
2.2 Data Processing
The data set uses GF-1 data and Landsat 8 OLI data preprocessed by orthorectification, atmospheric correction, etc. as the data source. Firstly, image segmentation is used to generate image object layer that can reasonably represent surface object type, especially represent water information. Secondly, the knowledge rule is constructed based on the optimal feature information which is selected by samples to extract water information.
Finally, the classification results are filtered to remove the speckles and then artificially corrected. The product of water body distribution is thus generated. The specific processing flow is shown in Figure 2.
2.2.1 Data preprocessing
Firstly, the DEM data is used to orthorectify the GF-1 data and the Landsat 8 data to eliminate the geometric distortion caused by the terrain fluctuation . Then in accordance with the calibration coefficient of the data, the radiation calibration is used to eliminate each radiation distortion. At the same time, the improved 6S Atmospheric Correction Model is used for atmospheric correction. Sensitivity analysis of solar zenith Angle, water vapor, ozone concentration, aerosol optical thickness and other parameters is carried out based on the pixel calculation method of look-up table to obtain the atmospheric correction coefficient of the surface reflectance image, thus eliminating the influence of the atmosphere. The surface reflectance product of the studied area is finally obtained.
2.2.2 Extraction of water information
Firstly, based on the FNEA (Fractal Net Evolution Approach), the spectral and shape heterogeneity are used to segment the image in multiple scales to form the image object layer, and the feature information of the object in the image object layer is statistically analyzed. Secondly, based on the mRMR (minimum-redundancy maximum-relevancy)., the optimal feature subset of the plurality of features is selected, including the water body index, GLCM (Grey Level Concurrence Matrix), information entropy to extract the water body information.
The water body index includes the normalized water body index (NDWI) proposed by Mcfeeths in 1996, the improved normalized water body index MNDWI proposed by Xu, and the mixed water body index (CIWI), in which NDWINDWI extracts water information by using spectral differences between water and non-water in green light band and near infrared band. MNDVI is a water extraction method proposed to solve the confusion between water and soil and buildings, but it is not applicable to GF 1.
On the basis of analyzing the advantages and disadvantages of NDWI and MNDWI and the spectral characteristics of each band, the mixed water index is constructed by the near infrared band and red band, and is calculated as follows:
In the formula, CIWI is a mixed water body index, NIR is the near-infrared band in remote sensing images, NIRmean is the mean value of the near-infrared band in remote sensing images, and R is the red band in remote sensing images. In the process of feature optimization, these three characteristics are found to have strong correlations. In order to improve the computational efficiency, the CIWI index, which contributes the most to the water information extraction, is selected. The specific steps include: firstly, the approximate water body range is extracted by using CIWI, in which there are cloud layer and mountain shadow information. Then, by adjusting the aspect ratio feature threshold, the aspect ratio threshold is selected to eliminate the mountain shadow information. By adjusting the information entropy threshold, the cloud layer information is extracted, and the lands usage under the cloud cover are manually interpreted by comparing the images of the previous and the current period. Finally, the extracted water body information is subjected to post-processing such as particle removal, and the water body distribution products are sorted out, and the accuracy of the final classification result is evaluated using the verification point.
The datasets are then re-named in accordance with their respective years, which includes the water distribution on Hainan Island in five years. The water distribution map of 2013–2017 (Figure 3) is shown in ArcGIS.
4.1 Data quality qualitative assessment
In order to ensure the quality of the data, we use the data with low cloud covering to extract water information, and for the data on the cloud covered area, the information is reprocessed by manually interpreting the feature types in the cloudless image near the date, thus reducing the impact of clouds on water body distribution. In order to avoid the influence of seasonal precipitation on water distribution, images on rainy season from May to October are selected every year to reduce the impact of seasonal precipitation differences on the distribution of water.
In terms of spatial distribution, water information is distributed in a radial manner, with more in the east and less in the west, which is consistent with the characteristics that the windward slope of the hills and mountains in the central and eastern part of Hainan island is rainy, and the leeward side of the coastal mountains in the western region is drier. At the same time, the dataset takes into account both the overall information and the detailed information. For example, it not only covers a large range of rivers and lakes, but also includes scattered water information such as pits and ponds. On the whole, the dataset performs well on the distribution characteristics of water bodies, but there are some discontinuities in the distribution of water bodies (Fig. 4) because the image we use is obvious to the ground object detail characteristic.
Figure 4. Schematic of the local details of the dataset
(a) The display of the vegetation around the pit and river in remote sensing images (left) and the extraction results of corresponding information (right); (b) Display of bridge information in remote sensing image (left) and display of extraction results of corresponding information (right)
In terms of data attributes, the water area information during the rainy season in 2013–2017 showed a correlation with the precipitation information, which is consistent with the fact that the change of water resources in Hainan Island is greatly affected by the change of precipitation. Among the years researched, the water body area is the largest in 2013 due to higher rainfalls in that year, reaching 2393.7 mm, which is 36.9% higher than the average annual precipitation. In 2015, the water area is the smallest, and the precipitation is 1403.5 mm, lower than other researched years, and 19.8% lower than the average. The water area for the other three years also showed positive correlations with precipitation, which generally reflects the rationality of the dataset.
4.2 Quantitative assessment of data quality
Using the Create Random Point tool in ArcGIS, 300 verification points are randomly generated in the study area (Fig. 5), and the verification points are determined by combining the Google Earth image data of different periods and the 2 m spatial resolution image data of some GF-1 PMS sensors. The attribute values are combined with the classification results to construct an error matrix for accuracy evaluation, as shown in Table 1–5. The user precision in the confusion matrix refers to the conditional probability that the type of random samples taken from the classification results are the same as the actual type of ground objects. The overall accuracy is the proportion of samples that are correctly classified in all samples. The Kappa coefficient is used to reflect the overall classification accuracy, the closer the coefficient of which is to 1, the better the classification quality.
The overall accuracy of the classification results in 2013–2017 is relatively high, and the Kappa coefficient is greater than 0.8, with the highest classification accuracy reaches to 90.66% in 2017. The accuracy in 2013 and 2014 is slightly lower than those of 2016 and 2017, the reason of which is that there is an error in the validation sample. The accuracy of 2015 is the lowest, mainly due to the high cloud coverage in some images in 2015, and the cloud layers and cloud shadows are easily divided into water bodies. However on the whole, the classification effect is good, the omission rate and false extraction rate are low, which can meet the requirements of water distribution product mapping.
This dataset is the water distribution data products of five years extracted from medium and high spatial resolution remote sensing data, which can better identify small areas such as lakes and rivers and can be used to provide information for regional water quality and water safety research. It will also make contribution to the future research on water resource reservation and environmental change.
Thanks to Geospatial Data Cloud (http://www.gscloud.cn/) for Landsat 8 OLI data, digital elevation data (GDEM DEM 30 m ), and China Resources Satellite Application Center (http://www.cresda.com/ CN/) provided GF-1 data.
 CHAUMETTE P. Changes in land surface water dynamics since the 1990s and relation to population pressure. Geophysical Research Letters, 39(2012): 85-93.
 LUO S, HU H X, CHENG S P, et al. A primary study on species diversity of water birds and its pelationship to water environment at lake Jinyinhu,Wuhan. Resources & Environment in the Yangtze Basin, 19(2010): 671-677.
 QIAO X J, HE B Y, ZHANG W, et al. Modis-based retrieval and change analysis of suspended sediment concentration in middle Yangtze river. Resources & Environment in the Yangtze Basin, 2013.
 Li H. Hainan international tourism island construction and water resources protection and development strategy. New Oriental, 3, (2012): 18-22.
 PIERDICCA N, PULVIRENTI L, CHINI M, et al. Observing floods from space: Experience gained from COSMO-SkyMed observations. Acta Astronautica, 84(2013): 122-133.
 ALDERMAN K, TURNER L R, TONG S. Floods and human health: a systematic review. Environment International, 47(2012): 37-47.
 FENG M, SEXTON J O, CHANNAN S, et al. A global, high-resolution (30-m) inland water body dataset for 2000: first results of a topographic–spectral classification algorithm. International Journal of Digital Earth,9(2015): 1-21.
 Liao A, Chen L, Chen J. Global surface water high-resolution remote sensing mapping. Chinese science: earth science,8(2014): 1634-1645.
 Zhao Y. Principles and methods of remote sensing application analysis. Beijing: Science press, 2013.
 Markham B, Storey J, Morfitt R. Landsat-8 Sensor Characterization and Calibration. Remote Sensing, 7(2015):2279-2282.
 Peng Y, He G, Zhang Z, et al. Study on atmospheric correction approach of Landsat-8 imageries based on 6S model and lookup table. Journal of Applied Remote Sensing, 10(2016).
 Xu H. The research of extracting water body information based on improved normalized differential water body index (MNDWI), Journal of remote sensing, 9(2005): 589-595.
Xu F, Meng Q & Zhang L. A Dataset of Inland Water Distribution 2013–2017 in Hainan Island Based on Remote Sensing. Science Data Bank, DOI: 10.11922/sciencedb.673 (2018).
How to cite this article
Xu F, Meng Q & Zhang L. A Dataset of Inland Water Distribution 2013–2017 in Hainan Island Based on Remote Sensing. China Scientific Data 4(2019). DOI: 10.11922/csdata.2018.0068.zh