Statistics¶
- This script demonstrates the usage of the Statistic class from the Cleopatra package.
- The Statistic class provides functionality for creating statistical plots, specifically histograms.
In [6]:
Copied!
import numpy as np
from cleopatra.statistical_glyph import StatisticalGlyph
# Set the random seed for reproducibility
np.random.seed(1)
import numpy as np
from cleopatra.statistical_glyph import StatisticalGlyph
# Set the random seed for reproducibility
np.random.seed(1)
1. Creating Histograms with 1D Data¶
- Let's start by creating a histogram for 1D data.
- We'll generate some random data and use the Statistic class to create a histogram.
Generate 1D data¶
In [7]:
Copied!
data_1d = 4 + np.random.normal(0, 1.5, 200)
data_1d = 4 + np.random.normal(0, 1.5, 200)
Create a Statistic object with the 1D data¶
In [8]:
Copied!
stat_plot_1d = StatisticalGlyph(data_1d)
stat_plot_1d = StatisticalGlyph(data_1d)
Generate a histogram plot for the 1D data¶
In [9]:
Copied!
fig_1d, ax_1d, hist_1d = stat_plot_1d.histogram()
# Display information about the histogram
print(f"Number of bins: {len(hist_1d['n'][0])}")
print(f"Bin counts: {hist_1d['n'][0]}")
print(f"Bin edges: {hist_1d['bins'][0][:5]}... (showing first 5)")
fig_1d, ax_1d, hist_1d = stat_plot_1d.histogram()
# Display information about the histogram
print(f"Number of bins: {len(hist_1d['n'][0])}")
print(f"Bin counts: {hist_1d['n'][0]}")
print(f"Bin edges: {hist_1d['bins'][0][:5]}... (showing first 5)")
Number of bins: 15 Bin counts: [ 2. 4. 3. 10. 11. 20. 30. 27. 31. 25. 17. 8. 5. 6. 1.] Bin edges: [0.34774335 0.8440597 1.34037605 1.8366924 2.33300874]... (showing first 5)
1.1 Customizing the Histogram¶
- Now let's customize the histogram by changing the number of bins, color, transparency, and width.
Create a Statistic object with the 1D data¶
In [10]:
Copied!
stat_plot_1d_custom = StatisticalGlyph(data_1d)
stat_plot_1d_custom = StatisticalGlyph(data_1d)
Generate a customized histogram plot¶
In [11]:
Copied!
fig_1d_custom, ax_1d_custom, hist_1d_custom = stat_plot_1d_custom.histogram(
bins=20, # Increase the number of bins
color=["#FF5733"], # Change the color to orange-red
alpha=0.8, # Slightly increase transparency
rwidth=0.9, # Increase the width of the bins
xlabel="Values", # Add x-axis label
ylabel="Frequency", # Add y-axis label
xlabel_font_size=12, # Set x-axis label font size
ylabel_font_size=12, # Set y-axis label font size
grid_alpha=0.3 # Reduce grid transparency
)
fig_1d_custom, ax_1d_custom, hist_1d_custom = stat_plot_1d_custom.histogram(
bins=20, # Increase the number of bins
color=["#FF5733"], # Change the color to orange-red
alpha=0.8, # Slightly increase transparency
rwidth=0.9, # Increase the width of the bins
xlabel="Values", # Add x-axis label
ylabel="Frequency", # Add y-axis label
xlabel_font_size=12, # Set x-axis label font size
ylabel_font_size=12, # Set y-axis label font size
grid_alpha=0.3 # Reduce grid transparency
)
2. Creating Histograms with 2D Data¶
- The Statistic class can also handle 2D data, creating multiple histograms in the same plot.
- Let's generate some 2D data and create histograms.
Generate 2D data with 3 columns¶
In [12]:
Copied!
data_2d = np.zeros((200, 3))
data_2d[:, 0] = 3 + np.random.normal(0, 1.0, 200) # Mean of 3, std of 1.0
data_2d[:, 1] = 5 + np.random.normal(0, 1.2, 200) # Mean of 5, std of 1.2
data_2d[:, 2] = 7 + np.random.normal(0, 0.8, 200) # Mean of 7, std of 0.8
data_2d = np.zeros((200, 3))
data_2d[:, 0] = 3 + np.random.normal(0, 1.0, 200) # Mean of 3, std of 1.0
data_2d[:, 1] = 5 + np.random.normal(0, 1.2, 200) # Mean of 5, std of 1.2
data_2d[:, 2] = 7 + np.random.normal(0, 0.8, 200) # Mean of 7, std of 0.8
Create a Statistic object with the 2D data¶
In [13]:
Copied!
stat_plot_2d = StatisticalGlyph(data_2d)
stat_plot_2d = StatisticalGlyph(data_2d)
Generate a histogram plot for the 2D data
Note: We need to provide colors for each column
In [14]:
Copied!
fig_2d, ax_2d, hist_2d = stat_plot_2d.histogram(color=["red", "green", "blue"])
fig_2d, ax_2d, hist_2d = stat_plot_2d.histogram(color=["red", "green", "blue"])
2.1 Customizing the 2D Histogram¶
- Let's customize the 2D histogram with more options.
Create a Statistic object with the 2D data and custom parameters¶
In [15]:
Copied!
stat_plot_2d_custom = StatisticalGlyph(
data_2d,
color=["#FF5733", "#33FF57", "#3357FF"], # Custom colors
alpha=0.5, # Set transparency
rwidth=0.8 # Set bin width
)
stat_plot_2d_custom = StatisticalGlyph(
data_2d,
color=["#FF5733", "#33FF57", "#3357FF"], # Custom colors
alpha=0.5, # Set transparency
rwidth=0.8 # Set bin width
)
Generate a customized histogram plot¶
In [16]:
Copied!
fig_2d_custom, ax_2d_custom, hist_2d_custom = stat_plot_2d_custom.histogram(
bins=25, # Increase the number of bins
xlabel="Values", # Add x-axis label
ylabel="Frequency", # Add y-axis label
xlabel_font_size=14, # Set x-axis label font size
ylabel_font_size=14, # Set y-axis label font size
xtick_font_size=10, # Set x-axis tick font size
ytick_font_size=10, # Set y-axis tick font size
grid_alpha=0.2, # Reduce grid transparency
figsize=(10, 6) # Set figure size
)
fig_2d_custom, ax_2d_custom, hist_2d_custom = stat_plot_2d_custom.histogram(
bins=25, # Increase the number of bins
xlabel="Values", # Add x-axis label
ylabel="Frequency", # Add y-axis label
xlabel_font_size=14, # Set x-axis label font size
ylabel_font_size=14, # Set y-axis label font size
xtick_font_size=10, # Set x-axis tick font size
ytick_font_size=10, # Set y-axis tick font size
grid_alpha=0.2, # Reduce grid transparency
figsize=(10, 6) # Set figure size
)
3. Comparing Distributions¶
- The Statistic class is particularly useful for comparing multiple distributions.
- Let's create an example that compares different distributions.
Generate data from different distributions¶
In [17]:
Copied!
n_samples = 1000
data_distributions = np.zeros((n_samples, 3))
n_samples = 1000
data_distributions = np.zeros((n_samples, 3))
Normal distribution¶
In [18]:
Copied!
data_distributions[:, 0] = np.random.normal(0, 1, n_samples)
data_distributions[:, 0] = np.random.normal(0, 1, n_samples)
Exponential distribution¶
In [19]:
Copied!
data_distributions[:, 1] = np.random.exponential(1, n_samples)
data_distributions[:, 1] = np.random.exponential(1, n_samples)
Uniform distribution¶
In [20]:
Copied!
data_distributions[:, 2] = np.random.uniform(-1.5, 1.5, n_samples)
data_distributions[:, 2] = np.random.uniform(-1.5, 1.5, n_samples)
Create a Statistic object with the distribution data¶
In [21]:
Copied!
stat_plot_distributions = StatisticalGlyph(
data_distributions,
color=["#3498DB", "#E74C3C", "#2ECC71"], # Blue, Red, Green
alpha=0.6,
rwidth=0.9
)
stat_plot_distributions = StatisticalGlyph(
data_distributions,
color=["#3498DB", "#E74C3C", "#2ECC71"], # Blue, Red, Green
alpha=0.6,
rwidth=0.9
)
Generate a histogram plot comparing the distributions¶
In [22]:
Copied!
fig_dist, ax_dist, hist_dist = stat_plot_distributions.histogram(
bins=30,
xlabel="Values",
ylabel="Frequency",
figsize=(12, 7)
)
# Add a legend to identify the distributions
ax_dist.legend(["Normal", "Exponential", "Uniform"])
fig_dist, ax_dist, hist_dist = stat_plot_distributions.histogram(
bins=30,
xlabel="Values",
ylabel="Frequency",
figsize=(12, 7)
)
# Add a legend to identify the distributions
ax_dist.legend(["Normal", "Exponential", "Uniform"])
Out[22]:
<matplotlib.legend.Legend at 0x1b8fb7a51f0>
4. Error Handling¶
- The Statistic class includes error handling to ensure that the number of colors
- provided matches the number of samples in the data.
- Let's see what happens when we provide an incorrect number of colors.
In [23]:
Copied!
try:
# Create a Statistic object with 2D data but only 2 colors for 3 columns
stat_plot_error = StatisticalGlyph(data_2d)
# This should raise an error because we're providing only 2 colors for 3 columns
fig_error, ax_error, hist_error = stat_plot_error.histogram(color=["red", "green"])
except ValueError as e:
print(f"Error: {e}")
try:
# Create a Statistic object with 2D data but only 2 colors for 3 columns
stat_plot_error = StatisticalGlyph(data_2d)
# This should raise an error because we're providing only 2 colors for 3 columns
fig_error, ax_error, hist_error = stat_plot_error.histogram(color=["red", "green"])
except ValueError as e:
print(f"Error: {e}")
Error: The number of colors:2 should be equal to the number of samples:3
Summary¶
In this notebook, we've explored the Statistic class from the Cleopatra package. We've seen how to:
- Create histograms for 1D data
- Create histograms for 2D data
- Customize histograms with various parameters
- Compare different distributions
- Handle errors when using the class
The Statistic class provides a convenient way to create and customize histograms for statistical analysis and visualization.