Abstract: One of the key steps of river provenance analysis is to analyze and identify sand and sediment components. The traditional statistical processes are not only time-consuming and laborious, but yield data of uneven quality. Generated by different laboratories using different processing standards, these data more often lack value of contrast or comparison. While automatic identification through machine learning can potentially relieve geologists from such tedious and time-consuming work, a large number of microscopic images will be required for machine training. To facilitate data disclosure and sharing, the authors hereby publish a photomicrograph dataset of sand grains obtained from the Yarlung Tsangpo, Tibet, China. The dataset consists of 8734 tagged clastic particle images and corresponding coordinate information files, 1876 sand microscope images, 120 numbered base maps and two tables for sand composition identification, which we hope can provide good bases for the machine training of automatic sand component identification. It also provides references for identification of other river sand detrital components.
Keywords: sand grains; photomicrograph; sedimentology; machine learning; Yarlung Tsangpo; river sand