descriptors module#
statista.descriptors
#
Statistical descriptors.
rmse(obs, sim)
#
Root Mean Squared Error.
Calculates the Root Mean Squared Error between observed and simulated values. RMSE is a commonly used measure of the differences between values predicted by a model and the values actually observed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs
|
Union[list, ndarray]
|
Measured/observed values as a list or numpy array. |
required |
sim
|
Union[list, ndarray]
|
Simulated/predicted values as a list or numpy array. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The RMSE value representing the square root of the average squared difference between observed and simulated values. |
Raises:
Type | Description |
---|---|
ValueError
|
If the input arrays have different lengths. |
Examples:
- Using lists:
- Using numpy arrays:
See Also
- mae: Mean Absolute Error
- mbe: Mean Bias Error
Source code in statista/descriptors.py
rmse_hf(obs, sim, ws_type, n, alpha)
#
Weighted Root Mean Square Error for High flow.
Calculates a weighted version of RMSE that gives more importance to high flow values. Different weighting schemes can be applied based on the ws_type parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs
|
Union[list, ndarray]
|
Observed flow values as a list or numpy array. |
required |
sim
|
Union[list, ndarray]
|
Simulated flow values as a list or numpy array. |
required |
ws_type
|
int
|
Weighting scheme type (integer between 1 and 4): 1: Uses h^n weighting where h is the rational discharge. 2: Uses (h/alpha)^n weighting with a cap at 1 for h > alpha. 3: Binary weighting: 0 for h <= alpha, 1 for h > alpha. 4: Same as type 3. Any other value: Uses sigmoid function weighting. |
required |
n
|
int
|
Power parameter for the weighting function. |
required |
alpha
|
Union[int, float]
|
Upper limit parameter for the weighting function (between 0 and 1). |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The weighted RMSE value for high flows. |
Raises:
Type | Description |
---|---|
TypeError
|
If ws_type is not an integer, alpha is not a number, or n is not a number. |
ValueError
|
If ws_type is not between 1 and 4, n is negative, or alpha is not between 0 and 1. |
Examples:
>>> import numpy as np
>>> from statista.descriptors import rmse_hf
>>> observed = [10, 20, 50, 100, 200]
>>> simulated = [12, 18, 55, 95, 190]
>>> error = rmse_hf(observed, simulated, ws_type=1, n=2, alpha=0.5)
>>> print(f"Weighted RMSE for high flows: {error:.4f}")
Weighted RMSE for high flows: 7.2111
>>> error = rmse_hf(observed, simulated, ws_type=3, n=1, alpha=0.7)
>>> print(f"Weighted RMSE for high flows: {error:.4f}")
Weighted RMSE for high flows: 8.3666
See Also
- rmse: Root Mean Square Error
- rmse_lf: Weighted Root Mean Square Error for Low flow
Source code in statista/descriptors.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
|
rmse_lf(obs, qsim, ws_type, n, alpha)
#
Weighted Root Mean Square Error for Low flow.
Calculates a weighted version of RMSE that gives more importance to low flow values. Different weighting schemes can be applied based on the ws_type parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs
|
Union[list, ndarray]
|
Observed flow values as a list or numpy array. |
required |
qsim
|
Union[list, ndarray]
|
Simulated flow values as a list or numpy array. |
required |
ws_type
|
int
|
Weighting scheme type (integer between 1 and 4): 1: Uses qr^n weighting where qr is the rational discharge for low flows. 2: Uses a quadratic function of (1-qr) with a cap at 0 for (1-qr) > alpha. 3: Same as type 2. 4: Uses linear function 1-((1-qr)/alpha) with a cap at 0 for (1-qr) > alpha. Any other value: Uses sigmoid function weighting. |
required |
n
|
int
|
Power parameter for the weighting function. |
required |
alpha
|
Union[int, float]
|
Upper limit parameter for the weighting function (between 0 and 1). |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The weighted RMSE value for low flows. |
Raises:
Type | Description |
---|---|
TypeError
|
If ws_type is not an integer, alpha is not a number, or n is not a number. |
ValueError
|
If ws_type is not between 1 and 4, n is negative, or alpha is not between 0 and 1. |
Examples:
>>> import numpy as np
>>> from statista.descriptors import rmse_lf
>>> observed = [10, 20, 50, 100, 200]
>>> simulated = [12, 18, 55, 95, 190]
>>> error = rmse_lf(observed, simulated, ws_type=1, n=2, alpha=0.5)
>>> print(f"Weighted RMSE for low flows: {error:.4f}")
Weighted RMSE for low flows: 2.8284
>>> error = rmse_lf(observed, simulated, ws_type=4, n=1, alpha=0.7)
>>> print(f"Weighted RMSE for low flows: {error:.4f}")
Weighted RMSE for low flows: 2.0000
See Also
- rmse: Root Mean Square Error
- rmse_hf: Weighted Root Mean Square Error for High flow
Source code in statista/descriptors.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
|
kge(obs, sim)
#
Kling-Gupta Efficiency.
Calculates the Kling-Gupta Efficiency (KGE) between observed and simulated values.
KGE addresses limitations of using a single error function like Nash-Sutcliffe Efficiency (NSE) or RMSE by decomposing the error into three components: correlation, variability, and bias. This provides a more comprehensive assessment of model performance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs
|
Union[list, ndarray]
|
Observed flow values as a list or numpy array. |
required |
sim
|
Union[list, ndarray]
|
Simulated flow values as a list or numpy array. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The KGE value. KGE ranges from -∞ to 1, with 1 being perfect agreement. Values closer to 1 indicate better model performance. |
Raises:
Type | Description |
---|---|
ValueError
|
If the input arrays have different lengths or contain invalid values. |
Examples:
- Example with good performance:
- Example with poorer performance:
References
Gupta, H. V., Kling, H., Yilmaz, K. K., & Martinez, G. F. (2009). Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of Hydrology, 377(1-2), 80-91.
See Also
- nse: Nash-Sutcliffe Efficiency
- rmse: Root Mean Square Error
Source code in statista/descriptors.py
wb(obs, sim)
#
Water Balance Error.
Calculates the water balance error, which measures how well the model reproduces the total stream flow volume.
This metric allows error compensation between time steps and is not an indication of the temporal accuracy of the model. It only measures the overall volume balance. Note that the naive model of Nash-Sutcliffe (simulated flow equals the average observed flow) will result in a WB error of 100%.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs
|
Union[list, ndarray]
|
Observed flow values as a list or numpy array. |
required |
sim
|
Union[list, ndarray]
|
Simulated flow values as a list or numpy array. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The water balance error as a percentage (0-100). 100% indicates perfect volume balance, while lower values indicate poorer performance. |
Raises:
Type | Description |
---|---|
ValueError
|
If the sum of observed values is zero (division by zero). |
ValueError
|
If the input arrays have different lengths. |
Examples:
- Example with goof volume balance
- Example with volume underestimation
References
Oudin, L., Andréassian, V., Mathevet, T., Perrin, C., & Michel, C. (2006). Dynamic averaging of rainfall-runoff model simulations from complementary model parameterizations. Water Resources Research, 42(7).
See Also
- rmse: Root Mean Square Error
- nse: Nash-Sutcliffe Efficiency
Source code in statista/descriptors.py
nse(obs, sim)
#
Nash-Sutcliffe Efficiency.
Calculates the Nash-Sutcliffe Efficiency (NSE), a widely used metric for assessing the performance of hydrological models.
NSE measures the relative magnitude of the residual variance compared to the variance of the observed data. It indicates how well the model predictions match the observations compared to using the mean of the observations as a predictor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs
|
Union[list, ndarray]
|
Observed flow values as a list or numpy array. |
required |
sim
|
Union[list, ndarray]
|
Simulated flow values as a list or numpy array. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The NSE value. NSE ranges from -∞ to 1: - NSE = 1: Perfect match between simulated and observed values - NSE = 0: Model predictions are as accurate as the mean of observed data - NSE < 0: Mean of observed data is a better predictor than the model |
Raises:
Type | Description |
---|---|
ValueError
|
If the input arrays have different lengths. |
ValueError
|
If the variance of observed values is zero. |
Examples:
- Example with good performance:
- Example with poorer performance:
- Example with negative NSE (poor model):
See Also
- nse_hf: Modified Nash-Sutcliffe Efficiency for high flows
- nse_lf: Modified Nash-Sutcliffe Efficiency for low flows
- kge: Kling-Gupta Efficiency
Source code in statista/descriptors.py
nse_hf(obs, sim)
#
Modified Nash-Sutcliffe Efficiency for High Flows.
Calculates a modified version of the Nash-Sutcliffe Efficiency that gives more weight to high flow values. This is particularly useful for evaluating model performance during flood events or peak flows.
This modification weights the squared errors by the observed flow values, giving more importance to errors during high flow periods.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs
|
Union[list, ndarray]
|
Observed flow values as a list or numpy array. |
required |
sim
|
Union[list, ndarray]
|
Simulated flow values as a list or numpy array. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The modified NSE value for high flows. Like standard NSE, it ranges from -∞ to 1: - NSE_HF = 1: Perfect match between simulated and observed values - NSE_HF = 0: Model predictions are as accurate as the mean of observed data - NSE_HF < 0: Mean of observed data is a better predictor than the model |
Raises:
Type | Description |
---|---|
ValueError
|
If the input arrays have different lengths. |
ValueError
|
If the weighted variance of observed values is zero. |
Examples:
- Example with good performance on high flows
- Example with poor performance on high flows
References
Hundecha Y. & Bárdossy A. (2004). Modeling of the effect of land use changes on the runoff generation of a river basin through parameter regionalization of a watershed model. Journal of Hydrology, 292(1-4), 281-295.
See Also
- nse: Standard Nash-Sutcliffe Efficiency
- nse_lf: Modified Nash-Sutcliffe Efficiency for low flows
Source code in statista/descriptors.py
nse_lf(obs, sim)
#
Modified Nash-Sutcliffe Efficiency for Low Flows.
Calculates a modified version of the Nash-Sutcliffe Efficiency that gives more weight to low flow values. This is particularly useful for evaluating model performance during drought periods or base flow conditions.
This modification applies a logarithmic transformation to the flow values before calculating the NSE, which gives more weight to relative errors during low flow periods.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs
|
Union[list, ndarray]
|
Observed flow values as a list or numpy array. Values must be positive. |
required |
sim
|
Union[list, ndarray]
|
Simulated flow values as a list or numpy array. Values must be positive. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The modified NSE value for low flows. Like standard NSE, it ranges from -∞ to 1: - NSE_LF = 1: Perfect match between simulated and observed values - NSE_LF = 0: Model predictions are as accurate as the mean of observed data - NSE_LF < 0: Mean of observed data is a better predictor than the model |
Raises:
Type | Description |
---|---|
ValueError
|
If the input arrays have different lengths. |
ValueError
|
If any values in the input arrays are zero or negative (logarithm cannot be applied). |
ValueError
|
If the weighted variance of log-transformed observed values is zero. |
Examples:
- Example with good performance on low flows:
- Example with poor performance on low flows:
References
Hundecha Y. & Bárdossy A. (2004). Modeling of the effect of land use changes on the runoff generation of a river basin through parameter regionalization of a watershed model. Journal of Hydrology, 292(1-4), 281-295.
See Also
- nse: Standard Nash-Sutcliffe Efficiency
- nse_hf: Modified Nash-Sutcliffe Efficiency for high flows
Source code in statista/descriptors.py
mbe(obs, sim)
#
Mean Bias Error (MBE).
Calculates the Mean Bias Error between observed and simulated values. MBE measures the average tendency of the simulated values to be larger or smaller than the observed values. A positive value indicates overestimation bias, while a negative value indicates underestimation bias.
Formula: MBE = (sim - obs)/n
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs
|
Union[list, ndarray]
|
Observed values as a list or numpy array. |
required |
sim
|
Union[list, ndarray]
|
Simulated values as a list or numpy array. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The Mean Bias Error value. - MBE = 0: No bias - MBE > 0: Overestimation bias (simulated values tend to be larger than observed) - MBE < 0: Underestimation bias (simulated values tend to be smaller than observed) |
Raises:
Type | Description |
---|---|
ValueError
|
If the input arrays have different lengths. |
Examples:
- Example with overestimation bias:
- Example with underestimation bias:
- Example with no bias:
See Also
- mae: Mean Absolute Error
- rmse: Root Mean Square Error
Source code in statista/descriptors.py
mae(obs, sim)
#
Mean Absolute Error (MAE).
Calculates the Mean Absolute Error between observed and simulated values. MAE measures the average magnitude of the errors without considering their direction. It's the average of the absolute differences between observed and simulated values.
Formula: MAE = |(obs - sim)|/n
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs
|
Union[list, ndarray]
|
Observed values as a list or numpy array. |
required |
sim
|
Union[list, ndarray]
|
Simulated values as a list or numpy array. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The Mean Absolute Error value. MAE is always non-negative, with smaller values indicating better model performance. - MAE = 0: Perfect match between observed and simulated values - MAE > 0: Average absolute difference between observed and simulated values |
Raises:
Type | Description |
---|---|
ValueError
|
If the input arrays have different lengths. |
Examples:
>>> import numpy as np
>>> from statista.descriptors import mae
>>> observed = [10, 20, 30, 40, 50]
>>> simulated = [12, 18, 33, 42, 48]
>>> mae_value = mae(observed, simulated)
>>> print(f"MAE: {mae_value:.1f}")
MAE: 2.2
- Example with larger errors:
```python
>>> observed = [10, 20, 30, 40, 50]
>>> simulated = [15, 15, 35, 35, 55]
>>> mae_value = mae(observed, simulated)
>>> print(f"MAE: {mae_value:.1f}")
MAE: 5.0
```
- Example with perfect match
```python
>>> observed = [10, 20, 30, 40, 50]
>>> simulated = [10, 20, 30, 40, 50]
>>> mae_value = mae(observed, simulated)
>>> print(f"MAE: {mae_value:.1f}")
MAE: 0.0
```
See Also
- mbe: Mean Bias Error
- rmse: Root Mean Square Error (gives more weight to larger errors)
Source code in statista/descriptors.py
pearson_corr_coeff(x, y)
#
Pearson Correlation Coefficient.
Calculates the Pearson correlation coefficient between two variables, which measures the linear relationship between them.
Key properties: - Independent of the magnitude of the numbers (scale-invariant) - Sensitive to relative changes only - Measures only linear relationships
The mathematical formula is: R = Cov(x,y) / (σx * σy) where Cov is the covariance and σ is the standard deviation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Union[list, ndarray]
|
First variable as a list or numpy array. |
required |
y
|
Union[list, ndarray]
|
Second variable as a list or numpy array. |
required |
Returns:
Name | Type | Description |
---|---|---|
Number |
Number
|
The correlation coefficient between -1 and 1: - R = 1: Perfect positive linear relationship - R = 0: No linear relationship - R = -1: Perfect negative linear relationship |
Raises:
Type | Description |
---|---|
ValueError
|
If the input arrays have different lengths. |
ValueError
|
If either array has zero variance (standard deviation = 0). |
Examples:
- Perfect positive correlation:
- Perfect negative correlation:
- No correlation:
See Also
- r2: Coefficient of determination
Source code in statista/descriptors.py
r2(obs, sim)
#
Coefficient of Determination (R²).
Calculates the coefficient of determination (R²) between observed and simulated values.
R² measures how well the predicted values match the observed values, based on the distance between the points and the 1:1 line (not the best-fit regression line). The closer the data points are to the 1:1 line, the higher the coefficient of determination.
Important properties: - Unlike the Pearson correlation coefficient, R² depends on the magnitude of the numbers - It measures the actual agreement between values, not just correlation - It can range from negative infinity to 1
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs
|
Union[list, ndarray]
|
Observed values as a list or numpy array. |
required |
sim
|
Union[list, ndarray]
|
Simulated values as a list or numpy array. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The coefficient of determination: - R² = 1: Perfect match between simulated and observed values - R² = 0: Model predictions are as accurate as using the mean of observed data - R² < 0: Model predictions are worse than using the mean of observed data |
Raises:
Type | Description |
---|---|
ValueError
|
If the input arrays have different lengths. |
Examples:
- Good model fit:
- Poor model fit:
- Negative R² (very poor model):
See Also
- pearson_corr_coeff: Pearson correlation coefficient (measures correlation, not agreement)
- nse: Nash-Sutcliffe Efficiency (mathematically equivalent to R² for the 1:1 line)