Constructing BMMQ-Tree from rasters using Parallel Primitives on GPGPUs

URLS:

1)      source code: http://134.74.112.65/primquad/primquad_src.tar

2)      test data: http://134.74.112.65/primquad/prec_jan_3_2.bil.gz (or other tiles with the naming convention prec_jan_x_y.bil.gz where x=0..10 and y=0..5 )

3)      gray scale image of raw test data: http://134.74.112.65/primquad/prec_jan_3_2_org.png

4)      color image of binned test data (8 bins, quantile): http://134.74.112.65/primquad/prec_jan_3_2_binned.png

Steps to repeat experiments in the report:

1)      Download the source code http://134.74.112.65/primquad/primquad_src.tar and expand the source code to any directory on a 64 bit linux machine with CUDA SDK 4.0 or above installed.

2)      Compile the source code using the following command line:

nvcc ijgis.cu -o ijgis -O2

g++ ijgis_c.cpp -o ijgis_c -O2

g++ ijgis_z.cpp -o ijgis_z -O2

3)      Donload and unzip the test raster tile data : http://134.74.112.65/primquad/prec_jan_3_2.bil.gz (or other tiles with the naming convention listed above) to the same directory

4)      Run the program using the following command line: [./ijgis prec_jan_3_2.bil]. On Nvidia Quadro 6000, it takes about 13.33 milliseconds to complete the whole quadtree contruction process (excluding data transfer from CPU to GPU time). The runtime remains stable for other raster tiles.

The toy version for debugging and manual inspection:

We have applied the 8*8 raster example described in the report to the program for debugging and manual inspection purposes. There is more debugging information in this toy version but the code is not well polished. You can compile and run the program using the following two command lines:

nvcc ijgis_toy.cu  -o ijgis_toy -O2

./ijgis _toy

The data is in text format and can be accessed at http://134.74.112.65/primquad/p1.data. 

The dyanmic version using user defined tile sizes:

We have also provided a slightly different CUDA program called ijgis_dynamic.cu (included in the source file package) that allows to use arbitary power 2 sized raster tile at expenses of larger runtime for z_odering. The program can be compiled using the command line: nvcc ijgis_dynamic.cu -O3 -arch=sm_20 -o ijgis_dynamic. A raster tile with a size of 16384*16384 has been provided for test purposes. The data can be downloaded at http://134.74.112.65/primquad/prec_jan_16384.bil.gz. After the file is unzipped, the following commandline can be used to execute the program [./ijgis_dynamic prec_jan_16384.bil 16384]. On Nvidia Quadro 6000, it takes about 230 milliseconds to constrct a quadtree which is roughly 16 times of 15.40 milliseoncds when running the dynamic program on 4096*4096 tiles. Note that the z-order transformation runtime is increased from about 3 milliseconds to 5 milliseoncds due to allowing dyanmic raster size for 4096*4096 tiles.

Additional instructions to use your own data in the program:

Along with the GPU Thrust/CUDA source code, we have released two CPU C++ programs to prepare input data for experiment purposes. You can use the same programs to prepare your own input data for test purposes. Keep in mind that you will also need to modify the binning functor in the GPU programs (both static and dynamic versions) to make it suitable for your own data. The instructions to compile and run the data preprocessing programs are provided at the beginning of the source code of the two programs:

1)      TilingBIL.cpp: extract multiple tiles from a large raster in BIL format (single band).  6 parameters are required (in_bil_file, in_w, in_h, out_bil_file_root, out_w, out_h). Among them, in_bil_file and out_bil_file_root are the file name of the input BIL file and the prefix of the output BIL files, respectively. in_w and in_h are the width and height of the input BIL file, respectively. out_w and out_h are the width and height of the desired tiles and they need to be the same (squared).

2)      TilingBIL.cpp: to extract a tile from a large raster in BIL format (single band). 8 parameters are required (in_bil_file in_w in_h out_bil_fil out_org_x out_org_y out_w out_h). The two additional parameters, i.e., out_org_x and out_org_y are the origins of the intended tile in the original raster.

If your data is in some other raster formats rather than BIL, you can use the gdal_translate tool for easy conversions. An example of converting from geotiff format to the bil format has been provided at the beginning of the two programs.

Suggestions on defining bins

While binning is optional in the quadtree construction process from a methodological perspective, without binning, the spatial heterogeneity in most raster data will turn result quadtrees into full pyramids which makes tree index sizes very big. Large index sizes are uninteresting from a data management perspective. There are a couple of ways to obtain bin boundaries and here we provide two:

1) The gdalinfo program (http://www.gdal.org/gdalinfo.html) that is available in the open source GDAL package allows generating a histogram by turning on the –hist option. This can also be done programmatically as shown in the code segment at http://134.74.112.65/primquad/GetBin.cpp. A problem is that the GetHistogram GDAL function only support linear historgramming with equal-sized binns. However, further process can be done on the equal-sized binns to derive coarser-grained binns.

2) The commercial ArcGIS package provides extensive supports on binning. Add the BIL file (you will need to provide the companying HDR file which can be modified from http://134.74.112.65/primquad/prec_jan_3_2.hdr) and then right click on the raster layer you just added and bring up the “Layer Property” dialog. Under the “Symbology” tab, choose “Classified” on the right and then click “Classify” button on the right. In the “Classification” dialog, you can choose from quite a few ways of binning (or classification in ArcGIS terminology), including manual, defined interval, equal interval, quantile, natural breaks, etc. Obviously, ArcGIS is more powerful on this aspect as gdalinfo only provides equal interval binning.