Statistics¶

This script demonstrates the usage of the Statistic class from the Cleopatra package.
The Statistic class provides functionality for creating statistical plots, specifically histograms.

In [6]:

Copied!

import numpy as np
from cleopatra.statistical_glyph import StatisticalGlyph

# Set the random seed for reproducibility
np.random.seed(1)
import numpy as np
from cleopatra.statistical_glyph import StatisticalGlyph

# Set the random seed for reproducibility
np.random.seed(1)

1. Creating Histograms with 1D Data¶

Let's start by creating a histogram for 1D data.
We'll generate some random data and use the Statistic class to create a histogram.

Generate 1D data¶

In [7]:

Copied!

data_1d = 4 + np.random.normal(0, 1.5, 200)
data_1d = 4 + np.random.normal(0, 1.5, 200)

Create a Statistic object with the 1D data¶

In [8]:

Copied!

stat_plot_1d = StatisticalGlyph(data_1d)
stat_plot_1d = StatisticalGlyph(data_1d)

Generate a histogram plot for the 1D data¶

In [9]:

Copied!





fig_1d, ax_1d, hist_1d = stat_plot_1d.histogram()

# Display information about the histogram
print(f"Number of bins: {len(hist_1d['n'][0])}")
print(f"Bin counts: {hist_1d['n'][0]}")
print(f"Bin edges: {hist_1d['bins'][0][:5]}... (showing first 5)")
fig_1d, ax_1d, hist_1d = stat_plot_1d.histogram()

# Display information about the histogram
print(f"Number of bins: {len(hist_1d['n'][0])}")
print(f"Bin counts: {hist_1d['n'][0]}")
print(f"Bin edges: {hist_1d['bins'][0][:5]}... (showing first 5)")

No description has been provided for this image

Number of bins: 15
Bin counts: [ 2.  4.  3. 10. 11. 20. 30. 27. 31. 25. 17.  8.  5.  6.  1.]
Bin edges: [0.34774335 0.8440597  1.34037605 1.8366924  2.33300874]... (showing first 5)

1.1 Customizing the Histogram¶

Now let's customize the histogram by changing the number of bins, color, transparency, and width.

Create a Statistic object with the 1D data¶

In [10]:

Copied!

stat_plot_1d_custom = StatisticalGlyph(data_1d)
stat_plot_1d_custom = StatisticalGlyph(data_1d)

Generate a customized histogram plot¶

In [11]:

Copied!





fig_1d_custom, ax_1d_custom, hist_1d_custom = stat_plot_1d_custom.histogram(
    bins=20,                # Increase the number of bins
    color=["#FF5733"],      # Change the color to orange-red
    alpha=0.8,              # Slightly increase transparency
    rwidth=0.9,             # Increase the width of the bins
    xlabel="Values",         # Add x-axis label
    ylabel="Frequency",      # Add y-axis label
    xlabel_font_size=12,    # Set x-axis label font size
    ylabel_font_size=12,    # Set y-axis label font size
    grid_alpha=0.3          # Reduce grid transparency
)
fig_1d_custom, ax_1d_custom, hist_1d_custom = stat_plot_1d_custom.histogram(
    bins=20,                # Increase the number of bins
    color=["#FF5733"],      # Change the color to orange-red
    alpha=0.8,              # Slightly increase transparency
    rwidth=0.9,             # Increase the width of the bins
    xlabel="Values",         # Add x-axis label
    ylabel="Frequency",      # Add y-axis label
    xlabel_font_size=12,    # Set x-axis label font size
    ylabel_font_size=12,    # Set y-axis label font size
    grid_alpha=0.3          # Reduce grid transparency
)

2. Creating Histograms with 2D Data¶

The Statistic class can also handle 2D data, creating multiple histograms in the same plot.
Let's generate some 2D data and create histograms.

Generate 2D data with 3 columns¶

In [12]:

Copied!





data_2d = np.zeros((200, 3))
data_2d[:, 0] = 3 + np.random.normal(0, 1.0, 200)  # Mean of 3, std of 1.0
data_2d[:, 1] = 5 + np.random.normal(0, 1.2, 200)  # Mean of 5, std of 1.2
data_2d[:, 2] = 7 + np.random.normal(0, 0.8, 200)  # Mean of 7, std of 0.8
data_2d = np.zeros((200, 3))
data_2d[:, 0] = 3 + np.random.normal(0, 1.0, 200)  # Mean of 3, std of 1.0
data_2d[:, 1] = 5 + np.random.normal(0, 1.2, 200)  # Mean of 5, std of 1.2
data_2d[:, 2] = 7 + np.random.normal(0, 0.8, 200)  # Mean of 7, std of 0.8

Create a Statistic object with the 2D data¶

In [13]:

Copied!

stat_plot_2d = StatisticalGlyph(data_2d)
stat_plot_2d = StatisticalGlyph(data_2d)

Generate a histogram plot for the 2D data
Note: We need to provide colors for each column

In [14]:

Copied!

fig_2d, ax_2d, hist_2d = stat_plot_2d.histogram(color=["red", "green", "blue"])
fig_2d, ax_2d, hist_2d = stat_plot_2d.histogram(color=["red", "green", "blue"])

2.1 Customizing the 2D Histogram¶

Let's customize the 2D histogram with more options.

Create a Statistic object with the 2D data and custom parameters¶

In [15]:

Copied!





stat_plot_2d_custom = StatisticalGlyph(
    data_2d,
    color=["#FF5733", "#33FF57", "#3357FF"],  # Custom colors
    alpha=0.5,                                # Set transparency
    rwidth=0.8                                # Set bin width
)
stat_plot_2d_custom = StatisticalGlyph(
    data_2d,
    color=["#FF5733", "#33FF57", "#3357FF"],  # Custom colors
    alpha=0.5,                                # Set transparency
    rwidth=0.8                                # Set bin width
)

Generate a customized histogram plot¶

In [16]:

Copied!





fig_2d_custom, ax_2d_custom, hist_2d_custom = stat_plot_2d_custom.histogram(
    bins=25,                # Increase the number of bins
    xlabel="Values",         # Add x-axis label
    ylabel="Frequency",      # Add y-axis label
    xlabel_font_size=14,    # Set x-axis label font size
    ylabel_font_size=14,    # Set y-axis label font size
    xtick_font_size=10,     # Set x-axis tick font size
    ytick_font_size=10,     # Set y-axis tick font size
    grid_alpha=0.2,         # Reduce grid transparency
    figsize=(10, 6)         # Set figure size
)
fig_2d_custom, ax_2d_custom, hist_2d_custom = stat_plot_2d_custom.histogram(
    bins=25,                # Increase the number of bins
    xlabel="Values",         # Add x-axis label
    ylabel="Frequency",      # Add y-axis label
    xlabel_font_size=14,    # Set x-axis label font size
    ylabel_font_size=14,    # Set y-axis label font size
    xtick_font_size=10,     # Set x-axis tick font size
    ytick_font_size=10,     # Set y-axis tick font size
    grid_alpha=0.2,         # Reduce grid transparency
    figsize=(10, 6)         # Set figure size
)

3. Comparing Distributions¶

The Statistic class is particularly useful for comparing multiple distributions.
Let's create an example that compares different distributions.

Generate data from different distributions¶

In [17]:

Copied!

n_samples = 1000
data_distributions = np.zeros((n_samples, 3))
n_samples = 1000
data_distributions = np.zeros((n_samples, 3))

Normal distribution¶

In [18]:

Copied!

data_distributions[:, 0] = np.random.normal(0, 1, n_samples)
data_distributions[:, 0] = np.random.normal(0, 1, n_samples)

Exponential distribution¶

In [19]:

Copied!

data_distributions[:, 1] = np.random.exponential(1, n_samples)
data_distributions[:, 1] = np.random.exponential(1, n_samples)

Uniform distribution¶

In [20]:

Copied!

data_distributions[:, 2] = np.random.uniform(-1.5, 1.5, n_samples)
data_distributions[:, 2] = np.random.uniform(-1.5, 1.5, n_samples)

Create a Statistic object with the distribution data¶

In [21]:

Copied!





stat_plot_distributions = StatisticalGlyph(
    data_distributions,
    color=["#3498DB", "#E74C3C", "#2ECC71"],  # Blue, Red, Green
    alpha=0.6,
    rwidth=0.9
)
stat_plot_distributions = StatisticalGlyph(
    data_distributions,
    color=["#3498DB", "#E74C3C", "#2ECC71"],  # Blue, Red, Green
    alpha=0.6,
    rwidth=0.9
)

Generate a histogram plot comparing the distributions¶

In [22]:

Copied!





fig_dist, ax_dist, hist_dist = stat_plot_distributions.histogram(
    bins=30,
    xlabel="Values",
    ylabel="Frequency",
    figsize=(12, 7)
)

# Add a legend to identify the distributions
ax_dist.legend(["Normal", "Exponential", "Uniform"])
fig_dist, ax_dist, hist_dist = stat_plot_distributions.histogram(
    bins=30,
    xlabel="Values",
    ylabel="Frequency",
    figsize=(12, 7)
)

# Add a legend to identify the distributions
ax_dist.legend(["Normal", "Exponential", "Uniform"])

Out[22]:

<matplotlib.legend.Legend at 0x1b8fb7a51f0>

4. Error Handling¶

The Statistic class includes error handling to ensure that the number of colors
provided matches the number of samples in the data.
Let's see what happens when we provide an incorrect number of colors.

In [23]:

Copied!





try:
    # Create a Statistic object with 2D data but only 2 colors for 3 columns
    stat_plot_error = StatisticalGlyph(data_2d)
    
    # This should raise an error because we're providing only 2 colors for 3 columns
    fig_error, ax_error, hist_error = stat_plot_error.histogram(color=["red", "green"])
except ValueError as e:
    print(f"Error: {e}")
try:
    # Create a Statistic object with 2D data but only 2 colors for 3 columns
    stat_plot_error = StatisticalGlyph(data_2d)
    
    # This should raise an error because we're providing only 2 colors for 3 columns
    fig_error, ax_error, hist_error = stat_plot_error.histogram(color=["red", "green"])
except ValueError as e:
    print(f"Error: {e}")

Error: The number of colors:2 should be equal to the number of samples:3

Summary¶

In this notebook, we've explored the Statistic class from the Cleopatra package. We've seen how to:

Create histograms for 1D data
Create histograms for 2D data
Customize histograms with various parameters
Compare different distributions
Handle errors when using the class

The Statistic class provides a convenient way to create and customize histograms for statistical analysis and visualization.