dataset #

dataset module contains Two classes Dataset and DataCube.

.. digraph:: Linking

dataset -> Dataset;
dataset -> DataCube;
dpi=200;

Dataset represent a raster object which could be created from reading a geotiff, netcdf, ascii or any file format/driver supported by gdal.
The raster could have single or multi bands.
The raster could have different variables (like netcdf file) and these variable can have similar or different dimensions.
DataCube represent a stack of raster's which have the same dimensions, contains data that have same dimensions (rows & columns).

Dataset #

The main purpose of the Dataset object is to deal with raster objects, single or multi-bands, has variables/subsets like netcdf file or has one variable like most GeoTIFF files.

dataset

The Dataset object data model is as following

dataset-alone

To import the Dataset object

from pyramids.dataset import Dataset

to start exploring the functionality of the dataset module read_file method to read any raster file.

read_file #

to read any files using the Dataset object you can use the read_file method.

path = "examples/data/dem/DEM5km_Rhine_burned_fill.tif"
dataset = Dataset.read_file(path)
dataset.plot(title="Rhine river basin", ticks_spacing=500,cmap="terrain", color_scale=1, vmin=0,
         cbar_label="Elevation (m)")

The read_file method detects the type of the input file from the extension at the end of the path.
Similarly, you can read an ascii file using the same way.
For ASCII files

path = "examples/data/dem/dem5km_rhine.asc"
dataset = Dataset.read_file(path)
dataset.plot(title="Rhine river basin", ticks_spacing=500,cmap="terrain", color_scale=1, vmin=0,
            cbar_label="Elevation (m)")

- For netcdf file

path = "examples/data/dem/dem5km_rhine.nc"
dataset = Dataset.read_file(path)
dataset.plot(title="Rhine river basin", ticks_spacing=500,cmap="terrain", color_scale=1, vmin=0,
            cbar_label="Elevation (m)")

Dataset object attributes #

The Dataset object has the following attributes, which enables you to access all the stored data in you raster file (GeoTIFF/NetCDF/ASCII)

dataset-attributes

raster #

print(dataset.raster)
<osgeo.gdal.Dataset; proxy of <Swig Object of type 'GDALDatasetShadow *' at 0x0000026C8DD51FE0> >

cell_size #

print(dataset.cell_size)
>>> 5000.0

values #

print(dataset.values)
>>> array([[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
         [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
         [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
         ...,
         [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
         [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
         [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
         -3.4028235e+38, -3.4028235e+38, -3.4028235e+38]], dtype=float32)

shape #

print(dataset.shape)
(1, 125, 93)

rows #

print(dataset.rows)
>>> 125

columns #

print(dataset.columns)
>>> 93

pivot_point #

The upper left corner of the raster (minimum lon/x, maximum lat/y).

print(dataset.pivot_point)
>>> (32239263.70388, 5756081.42235)

geotransform #

geotransform data of the upper left corner of the raster (minimum lon/x, pixel-size, rotation, maximum lat/y, rotation, pixel-size).

print(dataset.geotransform)
>>> (32239263.70388, 5000.0, 0.0, 5756081.42235, 0.0, -5000.0)

bounds #

print(dataset.bounds)
>>> 0  POLYGON ((32239263.704 5756081.422, 32239263.7...

dataset.bounds.plot()

bbox #

print(dataset.bbox)
>>> [32239263.70388, 5131081.42235, 32704263.70388, 5756081.42235]

epsg #

integer reference number that defines the projection (https://epsg.io/)

print(dataset.epsg)
>>> 4647

crs #

print(dataset.crs)
>>> 'PROJCS["ETRS89 / UTM zone 32N (zE-N)",GEOGCS["ETRS89",DATUM["European_Terrestrial_Reference_System_1989",SPHEROID["GRS 1980",6378137,298.257222101004,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6258"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4258"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",9],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",32500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","4647"]]'

lat/y #

dataset.lat or dataset.y
>>> array([5753581.42235, 5748581.42235, 5743581.42235, 5738581.42235,
        5733581.42235, 5728581.42235, 5723581.42235, 5718581.42235,
        5713581.42235, 5708581.42235, 5703581.42235, 5698581.42235,
        5693581.42235, 5688581.42235, 5683581.42235, 5678581.42235,
        5673581.42235, 5668581.42235, 5663581.42235, 5658581.42235,
        5653581.42235, 5648581.42235, 5643581.42235, 5638581.42235,
        5633581.42235, 5628581.42235, 5623581.42235, 5618581.42235,
        5613581.42235, 5608581.42235, 5603581.42235, 5598581.42235,
        5593581.42235, 5588581.42235, 5583581.42235, 5578581.42235,
        5573581.42235, 5568581.42235, 5563581.42235, 5558581.42235,
        5553581.42235, 5548581.42235, 5543581.42235, 5538581.42235,
        5533581.42235, 5528581.42235, 5523581.42235, 5518581.42235,
        5513581.42235, 5508581.42235, 5503581.42235, 5498581.42235,
        5493581.42235, 5488581.42235, 5483581.42235, 5478581.42235,
        5473581.42235, 5468581.42235, 5463581.42235, 5458581.42235,
        5453581.42235, 5448581.42235, 5443581.42235, 5438581.42235,
        5433581.42235, 5428581.42235, 5423581.42235, 5418581.42235,
        5413581.42235, 5408581.42235, 5403581.42235, 5398581.42235,
        5393581.42235, 5388581.42235, 5383581.42235, 5378581.42235,
        5373581.42235, 5368581.42235, 5363581.42235, 5358581.42235,
        5353581.42235, 5348581.42235, 5343581.42235, 5338581.42235,
        5333581.42235, 5328581.42235, 5323581.42235, 5318581.42235,
        5313581.42235, 5308581.42235, 5303581.42235, 5298581.42235,
        5293581.42235, 5288581.42235, 5283581.42235, 5278581.42235,
        5273581.42235, 5268581.42235, 5263581.42235, 5258581.42235,
        5253581.42235, 5248581.42235, 5243581.42235, 5238581.42235,
        5233581.42235, 5228581.42235, 5223581.42235, 5218581.42235,
        5213581.42235, 5208581.42235, 5203581.42235, 5198581.42235,
        5193581.42235, 5188581.42235, 5183581.42235, 5178581.42235,
        5173581.42235, 5168581.42235, 5163581.42235, 5158581.42235,
        5153581.42235, 5148581.42235, 5143581.42235, 5138581.42235,
        5133581.42235])

lon/x #

dataset.lon/dataset.x
array([32241763.70388, 32246763.70388, 32251763.70388, 32256763.70388,
    32261763.70388, 32266763.70388, 32271763.70388, 32276763.70388,
    32281763.70388, 32286763.70388, 32291763.70388, 32296763.70388,
    32301763.70388, 32306763.70388, 32311763.70388, 32316763.70388,
    32321763.70388, 32326763.70388, 32331763.70388, 32336763.70388,
    32341763.70388, 32346763.70388, 32351763.70388, 32356763.70388,
    32361763.70388, 32366763.70388, 32371763.70388, 32376763.70388,
    32381763.70388, 32386763.70388, 32391763.70388, 32396763.70388,
    32401763.70388, 32406763.70388, 32411763.70388, 32416763.70388,
    32421763.70388, 32426763.70388, 32431763.70388, 32436763.70388,
    32441763.70388, 32446763.70388, 32451763.70388, 32456763.70388,
    32461763.70388, 32466763.70388, 32471763.70388, 32476763.70388,
    32481763.70388, 32486763.70388, 32491763.70388, 32496763.70388,
    32501763.70388, 32506763.70388, 32511763.70388, 32516763.70388,
    32521763.70388, 32526763.70388, 32531763.70388, 32536763.70388,
    32541763.70388, 32546763.70388, 32551763.70388, 32556763.70388,
    32561763.70388, 32566763.70388, 32571763.70388, 32576763.70388,
    32581763.70388, 32586763.70388, 32591763.70388, 32596763.70388,
    32601763.70388, 32606763.70388, 32611763.70388, 32616763.70388,
    32621763.70388, 32626763.70388, 32631763.70388, 32636763.70388,
    32641763.70388, 32646763.70388, 32651763.70388, 32656763.70388,
    32661763.70388, 32666763.70388, 32671763.70388, 32676763.70388,
    32681763.70388, 32686763.70388, 32691763.70388, 32696763.70388,
    32701763.70388])

band_count #

print(dataset.band_count)
1

band_names #

print(dataset.band_names)
['Band_1']

variables #

print(dataset.variables)
{}

no_data_value #

print(dataset.no_data_value)
[-3.4028234663852886e+38]

meta_data #

print(dataset.meta_data)
{'Band1#grid_mapping': 'transverse_mercator', 'Band1#long_name': 'GDAL Band Number 1', 'Band1#_FillValue': '-3.4028235e+38', 'NC_GLOBAL#Conventions': 'CF-1.5', 'NC_GLOBAL#GDAL': 'GDAL 3.6.3, released 2023/03/07', 'NC_GLOBAL#GDAL_AREA_OR_POINT': 'Area', 'NC_GLOBAL#history': 'Sun Apr 16 22:17:20 2023: GDAL CreateCopy( examples/data/dem/dem5km_rhine.nc, ... )', 'transverse_mercator#crs_wkt': 'PROJCS["ETRS89 / UTM zone 32N (zE-N)",GEOGCS["ETRS89",DATUM["European_Terrestrial_Reference_System_1989",SPHEROID["GRS 1980",6378137,298.257222101004,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6258"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4258"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",9],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",32500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","4647"]]', 'transverse_mercator#false_easting': '32500000', 'transverse_mercator#false_northing': '0', 'transverse_mercator#GeoTransform': '32239263.70388 5000 0 5756081.42235 0 -5000 ', 'transverse_mercator#grid_mapping_name': 'transverse_mercator', 'transverse_mercator#inverse_flattening': '298.257222101004', 'transverse_mercator#latitude_of_projection_origin': '0', 'transverse_mercator#longitude_of_central_meridian': '9', 'transverse_mercator#longitude_of_prime_meridian': '0', 'transverse_mercator#long_name': 'CRS definition', 'transverse_mercator#scale_factor_at_central_meridian': '0.9996', 'transverse_mercator#semi_major_axis': '6378137', 'transverse_mercator#spatial_ref': 'PROJCS["ETRS89 / UTM zone 32N (zE-N)",GEOGCS["ETRS89",DATUM["European_Terrestrial_Reference_System_1989",SPHEROID["GRS 1980",6378137,298.257222101004,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6258"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4258"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",9],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",32500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","4647"]]', 'x#long_name': 'x coordinate of projection', 'x#standard_name': 'projection_x_coordinate', 'x#units': 'm', 'y#long_name': 'y coordinate of projection', 'y#standard_name': 'projection_y_coordinate', 'y#units': 'm'}

dtype #

print(dataset.dtype)
[6]

file_name #

print(dataset.file_name)
'examples/data/dem/dem5km_rhine.nc'

time_stamp #

print(dataset.time_stamp)

driver_type #

print(dataset.driver_type)
'geotiff'

color_table #

print(dataset.color_table)
band values  red green blue alpha
 0    1      0    0     0    0     0
 1    1      1  112   153   89   255
 2    1      2    0     0    0     0
 3    1      3  242   238  162   255
 4    1      4    0     0    0     0
 5    1      5  242   206  133   255
 6    1      6    0     0    0     0
 7    1      7  194   140  124   255
 8    1      8    0     0    0     0
 9    1      9  214   193  156   255

Create Dataset object #

dataset_like #

dataset_like method creates a Geotiff raster like another input raster, new raster will have the same projection, coordinates or the top left corner of the original raster, cell size, nodata value, and number of rows and columns the raster and the dem should have the same number of columns and rows

Parameters #

src : [gdal.dataset]
    source raster to get the spatial information
array : [numpy array]
    to store in the new raster
path : [String]
    path to save the new raster including new raster name and extension (.tif)
pixel_type : [integer]
    type of the data to be stored in the pixels,default is 1 (float32)
    for example pixel type of flow direction raster is unsigned integer
    1 for float32
    2 for float64
    3 for Unsigned integer 16
    4 for Unsigned integer 32
    5 for integer 16
    6 for integer 32

Returns #

save the new raster to the given path

If we have made some calculation on raster array and we want to save the array back in the raster

arr2 = np.ones(shape=arr.shape, dtype=np.float64) * nodataval
arr2[~np.isclose(arr, nodataval, rtol=0.001)] = 5

path = "examples/data/rasterlike.tif"
Raster.rasterLike(src, arr2, path)

- Now to check the raster that has been saved we can read it again with gda.Open

dst = gdal.Open(path)
Map.plot(dst, title="Flow Accumulation", color_scale=1)

Access data methods #

access-data

read_array #

arr = Dataset.read_array(src)
print(arr)
array([[-3.402823e+38, -3.402823e+38, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38],
   [-3.402823e+38, -3.402823e+38, -3.402823e+38, -3.402823e+38,
    -3.402823e+38,  0.000000e+00,  0.000000e+00,  0.000000e+00,
    -3.402823e+38, -3.402823e+38, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38],
   [-3.402823e+38, -3.402823e+38, -3.402823e+38, -3.402823e+38,
    -3.402823e+38,  1.000000e+00,  0.000000e+00,  2.000000e+00,
    -3.402823e+38, -3.402823e+38, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38],
   [-3.402823e+38, -3.402823e+38, -3.402823e+38,  0.000000e+00,
     0.000000e+00,  2.000000e+00,  0.000000e+00,  4.000000e+00,
     0.000000e+00,  0.000000e+00, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38],
   [-3.402823e+38, -3.402823e+38, -3.402823e+38,  0.000000e+00,
     4.000000e+00,  4.000000e+00,  0.000000e+00,  5.000000e+00,
     2.000000e+00,  0.000000e+00, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38],
   [-3.402823e+38, -3.402823e+38, -3.402823e+38,  0.000000e+00,
     0.000000e+00,  1.100000e+01,  0.000000e+00,  0.000000e+00,
     1.000000e+01,  1.000000e+00, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38],
   [-3.402823e+38, -3.402823e+38,  0.000000e+00,  0.000000e+00,
     0.000000e+00,  1.500000e+01,  0.000000e+00,  0.000000e+00,
     0.000000e+00,  1.300000e+01, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38],
   [-3.402823e+38,  0.000000e+00,  1.000000e+00,  1.000000e+00,
     1.500000e+01,  2.300000e+01,  4.500000e+01,  1.000000e+00,
     0.000000e+00,  1.500000e+01, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38],
   [-3.402823e+38,  0.000000e+00,  1.000000e+00,  1.100000e+01,
     6.000000e+00,  0.000000e+00,  2.000000e+00,  4.900000e+01,
     5.400000e+01,  0.000000e+00,  1.600000e+01,  1.700000e+01,
     0.000000e+00, -3.402823e+38],
   [-3.402823e+38,  0.000000e+00,  6.000000e+00,  4.000000e+00,
     0.000000e+00,  1.000000e+00,  1.000000e+00,  0.000000e+00,
     0.000000e+00,  5.500000e+01,  1.000000e+00,  2.000000e+00,
     8.600000e+01, -3.402823e+38],
   [ 0.000000e+00,  4.000000e+00,  2.000000e+00,  0.000000e+00,
     0.000000e+00,  0.000000e+00, -3.402823e+38,  0.000000e+00,
     1.000000e+00,  2.000000e+00,  5.900000e+01,  6.300000e+01,
     0.000000e+00,  8.800000e+01],
   [ 0.000000e+00,  1.000000e+00,  1.000000e+00, -3.402823e+38,
    -3.402823e+38, -3.402823e+38, -3.402823e+38, -3.402823e+38,
    -3.402823e+38,  0.000000e+00,  1.000000e+00,  0.000000e+00,
    -3.402823e+38, -3.402823e+38],
   [-3.402823e+38,  0.000000e+00,  0.000000e+00, -3.402823e+38,
    -3.402823e+38, -3.402823e+38, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38, -3.402823e+38, -3.402823e+38,
    -3.402823e+38, -3.402823e+38]], dtype=float32)

Blocksize and ReadAsArray #

The read_array method reads the raster as a numpy array, the method can take a block size to read the raster in blocks, this is useful when the raster is too large to be read at once.

When you know the block size, you can more effectively plan and execute data processing tasks: - Data Reading/Writing: When reading or writing data, doing so in multiples of the block size can reduce the number of disk accesses required, as each access operation will align with the blocks on disk. - Optimizations: Some formats are optimized for specific block sizes, or for being accessed in certain ways. For example, tiled TIFFs might perform better with square block sizes.

dataset = Dataset.read_file("tests/data/geotiff/era5_land_monthly_averaged.tif")
arr = dataset.read_array(window=[0, 0, 5, 5])
print(arr.shape)
(5, 5)

- to get the block size of the dataset

print(dataset.blocksize)
(128, 128)

Band statistics (stats)#

To get a summary statistics (min, max, mean, std) of a band/all bands.
The method returns a DataFrame of statistics values of each band, the dataframe has the following columns: [min, max, mean, std], the index of the dataframe is the band names.

era5_image = "tests/data/geotiff/era5_land_monthly_averaged.tif"
dataset = Dataset.read_file(era5_image)
stats = dataset.stats()
print(stats)
             min         max        mean       std
  Band_1  270.369720  270.762299  270.551361  0.154270
  Band_2  269.611938  269.744751  269.673645  0.043788
  Band_3  273.641479  274.168823  273.953979  0.198447
  Band_4  273.991516  274.540344  274.310669  0.205754

- The method can also take a mask (polygon/dataset) to calculate the statistics of the masked area only.

era5_image = "tests/data/geotiff/era5_land_monthly_averaged.tif"
dataset = Dataset.read_file(era5_image)
mask = gpd.read_file("tests/data/geotiff/era5-mask.geojson")
stats = dataset.stats(mask=mask)
print(stats)
             min         max        mean       std
  Band_1  270.369720  270.399017  270.384369  0.014648
  Band_2  269.651001  269.744751  269.697876  0.046875
  Band_3  273.889526  273.901245  273.895386  0.005859
  Band_4  274.235657  274.255188  274.245422  0.009766

Write raster to disk #

to write the dataset object to disk using any of the raster formats (GeoTIFF/NetCDF/ASCII), you can use the to_file method.

to_file #

to_file writes the Dataset object to disk.

Parameters #

path: [str]
    a path including the name of the raster and extension.
    >>> path = "data/cropped.tif"
driver: [str]
        driver = "geotiff"/"ascii"/"netcdf".
band: [int]
    band index, needed only in case of ascii drivers. Default is 0.

GeoTIFF #

dataset.to_file("examples/data/dem/dem5km_rhine.tif", driver="geotiff")

ASCII #

dataset.to_file("examples/data/dem/dem5km_rhine.asc", driver="ascii")

- The ASCII file will look like

ncols         93
nrows         125
xllcorner     32239263.70388
yllcorner     5131081.42235
cellsize      5000.0
NODATA_value  -3.4028230607370965e+38
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03
0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03  0.03

NetCDF #

dataset.to_file("examples/data/dem/dem5km_rhine.nc", driver="netcdf")

Spatial properties #

convert_longitude #

some files (especially netcdf files) uses longitude values from 0 degrees to 360 degrees, instead of the usual, GIS-standard, arrangement of -180 degrees to 180 degrees for longitude centered on the Prime Meridian, and -90 degrees to 90 degrees for latitude centered on the Equator. the convert_longitude method corrects such behavior.

read the raster files using the read_file and plot it with the plot method.

dataset = Dataset.read_file(path)
fig, ax = dataset.plot(
    band=0, figsize=(10, 5), title="Noah daily Precipitation 1979-01-01", cbar_label="Raindall mm/day", vmax=30,
    cbar_length=0.85
)

- You cas see how the most left of the map lies the african continent instead of north and south america.

To correct the values of the longitude you can use the convert_longitude as follows.

new_dataset = dataset.convert_longitude()
new_dataset.plot(
    band=0, figsize=(10, 5), title="Noah daily Precipitation 1979-01-01", cbar_label="Raindall mm/day", vmax=30,
    cbar_length=0.85
)

resample #

resample reproject a raster to any projection (default the WGS84 web mercator projection, without resampling) The function returns a GDAL in-memory file object, where you can ReadAsArray etc.

Parameters #

src : [gdal.Dataset]
     gdal raster (src=gdal.Open("dem.tif"))
cell_size : [integer]
     new cell size to resample the raster. (default empty so raster will not be resampled)
resample_technique : [String]
    resampling technique default is "Nearest"
    https://gisgeography.com/raster-resampling/
    "Nearest" for nearest neighbour,"cubic" for cubic convolution,
    "bilinear" for bilinear

Returns #

raster : [gdal.Dataset]
     gdal object (you can read it by ReadAsArray)

print("Original Cell Size =" + str(geo[1]))
cell_size = 100
dst = Raster.resampleRaster(src, cell_size, resample_technique="bilinear")

dst_arr = Raster.read_array(dst)
_, new_geo = Raster.getProjectionData(dst)
print("New cell size is " + str(new_geo[1]))
Map.plot(dst, title="Flow Accumulation")

Original Cell Size =4000.0
New cell size is 100.0

to_crs #

to_crs re-projects a raster to any projection (default the WGS84 web mercator projection, without resampling) The function returns a GDAL in-memory file object, where you can ReadAsArray etc.

Parameters #

to_epsg: [integer]
    reference number to the new projection (https://epsg.io/)
    (default 3857 the reference no of WGS84 web mercator )
method: [String]
    resampling technique default is "Nearest"
    https://gisgeography.com/raster-resampling/
    "nearest neighbour" for nearest neighbour,"cubic" for cubic convolution,
    "bilinear" for bilinear
maintain_alignment : [bool]
    True to maintain the number of rows and columns of the raster the same after re-projection. Default is False.

Returns #

raster:
    gdal dataset (you can read it by ReadAsArray)

print("current EPSG - " + str(epsg))
to_epsg = 4326
dst = Raster.projectRaster(src, to_epsg=to_epsg, option=1)
new_epsg, new_geo = Raster.getProjectionData(dst)
print("New EPSG - " + str(new_epsg))
print("New Geotransform - " + str(new_geo))

current EPSG - 32618
New EPSG - 4326
New Geotransform - (-75.60441, 0.03606600000000526, 0.0, 4.704305, 0.0, -0.03606600000000526)

Option 2

dst = Raster.projectRaster(src, to_epsg=to_epsg, option=2)
new_epsg, new_geo = Raster.getProjectionData(dst)
print("New EPSG - " + str(new_epsg))
print("New Geotransform - " + str(new_geo))

New EPSG - 4326
New Geotransform - (-75.60441003848668, 0.03611587177268461, 0.0, 4.704560448076901, 0.0, -0.03611587177268461)

crop #

If you have an array and you want to clip/crop it using another raster/array.

Crop array using a raster #

The crop method clips/crops (matches the location of nodata value from source raster to destination raster).

Parameters """""""""" src: [gdal.dataset/np.ndarray] raster you want to clip/store NoDataValue in its cells exactly the same like mask raster mask: [gdal.dataset/np.ndarray] mask raster to get the location of the NoDataValue and where it is in the array mask_noval: [numeric] in case the mask is np.ndarray, the mask_noval have to be given.

Returns """"""" dst: [gdal.dataset] the second raster with NoDataValue stored in its cells exactly the same like src raster

aligned_raster = "examples/data/Evaporation_ECMWF_ERA-Interim_mm_daily_2009.01.01.tif"
band = 1
dst = Dataset.Open(aligned_raster)
dst_arr = dst.read_array()
dst_nodataval = dst.no_data_value[band - 1]

Map.plot(
    dst_arr,
    nodataval=dst_nodataval,
    title="Before Cropping-Evapotranspiration",
    color_scale=1,
    ticks_spacing=0.01,
)

dst_arr_cropped = Raster.cropAlligned(dst_arr, src)
Map.plot(
    dst_arr_cropped,
    nodataval=nodataval,
    title="Cropped array",
    color_scale=1,
    ticks_spacing=0.01,
)

Crop raster using another raster while preserving the alignment #

cropping rasters may change the alignment of the cells and to keep the alignment during cropping a raster we will crop the same previous raster but will give the input to the function as a gdal.dataset object.

dst_cropped = Raster.cropAlligned(dst, src)
Map.plot(dst_cropped, title="Cropped raster", color_scale=1, ticks_spacing=0.01)

Crop raster using array #

dst_cropped = Raster.cropAlligned(dst, arr, mask_noval=nodataval)
Map.plot(dst_cropped, title="Cropped array", color_scale=1, ticks_spacing=0.01)

crop #

crop method crops a raster using another raster/polygon.

Parameters """""""""" mask: [Polygon GeoDataFrame/Dataset] GeodataFrame with a polygon geometry, or a Dataset object. touch: [bool] To include the cells that touch the polygon not only those that lie entirely inside the polygon mask. Default is True. inplace: [bool] True to make the changes in place.

Returns """"""" Dataset: Dataset object.

RasterA = gdal.Open(aligned_raster)
epsg, geotransform = Raster.getProjectionData(RasterA)
print("Raster EPSG = " + str(epsg))
print("Raster Geotransform = " + str(geotransform))
Map.plot(RasterA, title="Raster to be cropped", color_scale=1, ticks_spacing=1)

Raster EPSG = 32618
Raster Geotransform = (432968.1206170588, 4000.0, 0.0, 520007.787999178, 0.0, -4000.0)

We will use the soil raster from the previous example as a mask so the projection is different between the raster and the mask and the cell size is also different

dst = Raster.crop(RasterA, soil_raster)
dst_epsg, dst_geotransform = Raster.getProjectionData(dst)
print("resulted EPSG = " + str(dst_epsg))
print("resulted Geotransform = " + str(dst_geotransform))
Map.plot(dst, title="Cropped Raster", color_scale=1, ticks_spacing=1)

resulted EPSG = 32618
resulted Geotransform = (432968.1206170588, 4000.0, 0.0, 520007.787999178, 0.0, -4000.0)

align #

matchRasterAlignment method matches the coordinate system and the number of of rows & columns between two rasters alignment_src is the source of the coordinate system, number of rows, number of columns & cell size data_src is the source of data values in cells the result will be a raster with the same structure like alignment_src but with values from data_src using Nearest Neighbour interpolation algorithm

Parameters #

alignment_src : [gdal.dataset/string]
    spatial information source raster to get the spatial information
    (coordinate system, no of rows & columns)
data_src : [gdal.dataset/string]
    data values source raster to get the data (values of each cell)

Returns #

dst : [gdal.dataset]
    result raster in memory

soil_raster = gdal.Open(soilmappath)
epsg, geotransform = Raster.getProjectionData(soil_raster)
print("Before alignment EPSG = " + str(epsg))
print("Before alignment Geotransform = " + str(geotransform))
# cell_size = geotransform[1]
Map.plot(soil_raster, title="To be aligned", color_scale=1, ticks_spacing=1)

Before alignment EPSG = 3116
Before alignment Geotransform = (830606.744300001, 30.0, 0.0, 1011325.7178760837, 0.0, -30.0)

soil_aligned = Raster.matchRasterAlignment(src, soil_raster)
New_epsg, New_geotransform = Raster.getProjectionData(soil_aligned)
print("After alignment EPSG = " + str(New_epsg))
print("After alignment Geotransform = " + str(New_geotransform))
Map.plot(soil_aligned, title="After alignment", color_scale=1, ticks_spacing=1)

After alignment EPSG = 32618
After alignment Geotransform = (432968.1206170588, 4000.0, 0.0, 520007.787999178, 0.0, -4000.0)

get_cell_coords #

getCellCoords returns the coordinates of all cell centres inside the domain (only the cells that does not have nodatavalue)

Parameters #

src : [gdal_Dataset]
    Get the data from the gdal datasetof the DEM

Returns #

coords : array
    Array with a list of the coordinates to be interpolated, without the Nan
mat_range : array
    Array with all the centres of cells in the domain of the DEM

coords, centers_coords = Raster.getCellCoords(src)
print(coords)
array([[434968.12061706, 520007.78799918],
   [434968.12061706, 520007.78799918],
   [434968.12061706, 520007.78799918],
   [434968.12061706, 520007.78799918],
   [434968.12061706, 520007.78799918],
   [434968.12061706, 520007.78799918],
   [434968.12061706, 520007.78799918],

print(centers_coords)
array([[[434968.12061706, 520007.78799918],
    [438968.12061706, 520007.78799918],
    [442968.12061706, 520007.78799918],
    [446968.12061706, 520007.78799918],
    [450968.12061706, 520007.78799918],
    [454968.12061706, 520007.78799918],
    [458968.12061706, 520007.78799918],

path = "examples/data/save_raster_test.tif"
Raster.saveRaster(src, path)

overlay #

The overlay function takes two ascii files the BaseMap which is the raster/asc file of the polygons and the secon is the asc file you want to extract its values.

def overlayMap(
        path: str,
        classes_map: Union[str, np.ndarray],
        exclude_value: Union[float, int],
        compressed: bool=False,
        occupied_cells_only: bool=True) -> Tuple[Dict[List[float], List[float]], int]:
"""
"""overlayMap.

        OverlayMap extracts and return a list of all the values in an ASCII file,
        if you have two maps one with classes, and the other map contains any type of values,
        and you want to know the values in each class

Parameters #

path: [str] a path to ascii file. classes_map: [str/array] a path including the name of the ASCII and extension, or an array >>> path = "classes.asc" exclude_value: [Numeric] values you want to exclude from extracted values. compressed: [Bool] if the map you provided is compressed. occupied_cells_only: [Bool] if you want to count only cells that is not zero.

Returns #

ExtractedValues: [Dict] dictionary with a list of values in the basemap as keys and for each key a list of all the intersected values in the maps from the path. NonZeroCells: [dataframe] the number of cells in the map.

To extract the

import Hapi.raster as R

Path = "F:/02Case studies/Hapi Examples/"
SavePath  = Path + "results/ZonalStatistics"
BaseMapF = Path + "data/Polygons.tif"
ExcludedValue = 0
Compressed = True
OccupiedCellsOnly = False

ExtractedValues, Cells = R.OverlayMap(Path+"DepthMax22489.zip", BaseMapF,ExcludedValue, Compressed,OccupiedCellsOnly)

count_domain_cells #

To number of cells in a raster that are not no_data_value value.

Parameters #

band: [int] band index. Default is 0.

Returns #

int: Number of cells

    path = "examples/data/dem/DEM5km_Rhine_burned_fill.tif"
    dataset = Dataset.read_file(path)
    cells = dataset.count_domain_cells()
    print(f"Number of cells = {cells}")

    Number of cells = 6374

in case the dataset is a multi-band raster, you can specify the band index.

    cells = dataset.count_domain_cells(band=1)

Mathematical operations #

apply #

apply executes a mathematical operation on raster array and returns the result

Parameters #

src : [gdal.dataset]
    source raster to that you want to make some calculation on its values
fun: [function]
    defined function that takes one input which is the cell value

Returns #

Dataset
    gdal dataset object

def classify(val):
    if val < 20:
        val = 1
    elif val < 40:
        val = 2
    elif val < 60:
        val = 3
    elif val < 80:
        val = 4
    elif val < 100:
        val = 5
    else:
        val = 0
    return val

dst = Raster.mapAlgebra(src, classify)
Map.plot(dst, title="Classes", color_scale=4, ticks_spacing=1)

fill #

fill takes a raster and fill it with one value.

Parameters #

src : [gdal.dataset]
    source raster
val: [numeric]
    numeric value
save_to : [str]
    path including the extension (.tif)

Returns #

raster : [saved on disk]
    the raster will be saved directly to the path you provided.

path = "examples/data/fill-raster-example.tif"
value = 20
Raster.rasterFill(src, value, save_to=path)

"now the resulted raster is saved to disk"
dst = gdal.Open(path)
Map.plot(dst, title="Flow Accumulation")

nearestNeighbour #

nearestCell calculates the the indices (row, col) of nearest cell in a given raster to a station coordinate system of the raster has to be projected to be able to calculate the distance

Parameters #

Raster: [gdal.dataset]
    raster to get the spatial information (coordinates of each cell)
StCoord: [Dataframe]
    dataframe with two columns "x", "y" contains the coordinates
    of each station

Returns #

StCoord:
    the same input dataframe with two extra columns "cellx","celly"

points = pd.read_csv("examples/data/points.csv")
print(points)
   id            x            y
0   1  454795.6728  503143.3264
1   2  443847.5736  481850.7151
2   3  454044.6935  481189.4256
3   4  464533.7067  502683.6482
4   5  463231.1242  486656.3455
5   6  487292.5152  478045.5720

points["row"] = np.nan
points["col"] = np.nan

points.loc[:, ["row", "col"]] = GC.nearestCell(src, points[["x", "y"]][:]).values
print(points)

   id            x            y   row   col
0   1  454795.6728  503143.3264   4.0   5.0
1   2  443847.5736  481850.7151   9.0   2.0
2   3  454044.6935  481189.4256   9.0   5.0
3   4  464533.7067  502683.6482   4.0   7.0
4   5  463231.1242  486656.3455   8.0   7.0
5   6  487292.5152  478045.5720  10.0  13.0

Plotting #

color_table #

The color_table property in the Dataset object can assign a certain symbology to each band in the raster.
To assign a certain symbology you have to have to create a Dataframe containing the values and corresponding colors (hexadecimal number) for each band in the raster.
assigning a color_table to the raster file will help when opening the file in GIS software like QGIS or ArcGIS, the raster will be displayed with the colors you assigned to it.without the need to assign the colors manually.

print(df)
        band  values    color
>>> 0    1       1  #709959
>>> 1    1       2  #F2EEA2
>>> 2    1       3  #F2CE85
>>> 3    2       1  #C28C7C
>>> 4    2       2  #D6C19C
>>> 5    2       3  #D6C19C

- Assign the DataFrame to the color_table property.

dataset.color_table = df

When saving the raster to disk, the following file will be created along side the raster file.

.. code:: xml

<PAMDataset>
  <PAMRasterBand band="1">
    <ColorInterp>Palette</ColorInterp>
    <ColorTable>
      <Entry c1="0" c2="0" c3="0" c4="0" />
      <Entry c1="112" c2="153" c3="89" c4="255" />
      <Entry c1="0" c2="0" c3="0" c4="0" />
      <Entry c1="242" c2="238" c3="162" c4="255" />
      <Entry c1="0" c2="0" c3="0" c4="0" />
      <Entry c1="242" c2="206" c3="133" c4="255" />
      <Entry c1="0" c2="0" c3="0" c4="0" />
      <Entry c1="194" c2="140" c3="124" c4="255" />
      <Entry c1="0" c2="0" c3="0" c4="0" />
      <Entry c1="214" c2="193" c3="156" c4="255" />
    </ColorTable>
  </PAMRasterBand>
</PAMDataset>

.. note::

- The values in the `values` column in the `DataFrame` should cover the entir range of  values in the raster.
- Any value that is not in the `values` column will not be assigned any color.

Overviews #

Overviews are essentially lower resolution versions of a raster dataset. They are used to improve the performance of rendering large raster datasets by providing pyramid layers of the same data at different resolutions. When you view a large image at a small scale (zoomed out), Pyramids can use these overviews instead of the full-resolution data, which speeds up the loading and display of the raster.

overviews can be stored in two ways:

Internal Overviews:
    These are overviews that are stored within the same file as the main dataset. Many raster formats support
    internal overviews.

External Overviews:
    These are overviews stored in separate files, typically with the extension .ovr. External overviews are
    useful when the raster format does not support internal overviews or when you want to keep the overviews
    separate from the main dataset file.

To create an overview for a raster dataset you can use the create_overviews method.

create_overviews #

Parameters """""""""" resampling_method : str, optional The resampling method used to create the overviews, by default "NEAREST"

    possible values are:
            "NEAREST", "CUBIC", "AVERAGE", "GAUSS", "CUBICSPLINE", "LANCZOS", "MODE", "AVERAGE_MAGPHASE", "RMS",
            "BILINEAR".
overview_levels : list, optional
    The overview levels, overview_levels are restricted to the typical power-of-two reduction factors.
    Default [2, 4, 8, 16, 32]

Returns """"""" internal/external overviews: The overview (also known as pyramids) could be internal or external depending on the state you read the dataset with.

    - External (.ovr file):
        If the dataset is read with a`read_only=True` then the overviews' file will be created as an
        in the same directory of the dataset, with the same name of the dataset and .ovr extension.
    - Internal:
        If the dataset is read with a`read_only=False` then the overviews will be created internally in the
        dataset, and the dataset needs to be saved/flushed to save the new changes to disk.
overview_count: [list]
    a list property attribute of the overviews for each band.

path = "tests\data\geotiff\rhine\0_Rainfall_MSWEP_mm_daily_1979_1_1.tif"
dataset = Dataset.read_file(path)
dataset.create_overviews(resampling_method="nearest", overview_levels=[2, 4, 8, 16, 32])

print(dataset.overview_number)

>>> [5]

The previous code will create 5 overviews for each band in the dataset.

get_overview #

To get the overview of a certain band you can use the get_overview method. The overview is a gdal band object.

Parameters """""""""" band : int, optional The band index, by default 0 overview_index: [int] index of the overview. Default is 0.

Returns """"""" gdal.Band gdal band object

overview = dataset.get_overview(band=0, overview_index=0)
print(overview)

>>> <osgeo.gdal.Band; proxy of <Swig Object of type 'GDALRasterBandShadow *' at 0x0000020C1F9F9C60> >

recreate_overviews #

Parameters """""""""" resampling_method : str, optional The resampling method used to create the overviews, by default "nearest".

    possible values are:
        "NEAREST", "CUBIC", "AVERAGE", "GAUSS", "CUBICSPLINE", "LANCZOS", "MODE", "AVERAGE_MAGPHASE", "RMS",
        "BILINEAR".

Raises """""" ValueError resampling_method should be one of {"NEAREST", "CUBIC", "AVERAGE", "GAUSS", "CUBICSPLINE", "LANCZOS", "MODE", "AVERAGE_MAGPHASE", "RMS", "BILINEAR"} ReadOnlyError If the overviews are internal and the Dataset is opened with a read only. Please read the dataset using read_only=False

References #

.. target-notes:: .. Digital-Earth: https://github.com/Serapieum-of-alex/Digital-Earth .. cropAlignedFolder: https://github.com/MAfarrag/pyramids/tree/main/examples/data/crop_aligned_folder