Tools Used in Descriptive Statistics : Deep Learning

Raajeev H Dave (AI Man)
3 min readDec 15, 2024

1. Tools Used in Descriptive Statistics

Here are some of the most commonly used tools to perform descriptive statistics:

a. Calculator

For small datasets, a calculator is often sufficient to find averages, medians, and ranges.

b. Spreadsheet Software (Excel, Google Sheets)

  • Excel is widely used for descriptive statistics.
  • Built-in functions like AVERAGE(), MEDIAN(), MODE(), and STDEV() help calculate measures quickly.
  • You can also create charts, like histograms and pie charts, to visualize data.

c. Statistical Software (R, Python, SPSS)

  • For larger datasets, tools like Python (with libraries such as NumPy, Pandas, and Matplotlib) or R make calculations efficient.
  • They help with advanced statistics like variance and standard deviation, and plotting complex graphs.

d. Data Visualization Tools (Tableau, Power BI)

  • These tools are great for creating interactive visualizations and dashboards.
  • You can explore your data using bar charts, scatter plots, and heatmaps to find hidden patterns.

2. Methods Used in Descriptive Statistics

Let’s break it into three categories:

A. Measures of Central Tendency (What’s typical in the data?)

These show where most of the data points lie.

Mean (Average):

The sum of all data points divided by the number of points.

  • Example: Test scores: 50, 60, 70 → Mean = 50+60+70\3=60
  • In Python:
import numpy as np 
data = [50, 60, 70]
mean = np.mean(data)
print(mean)

Median (Middle Value):

The middle number when data is arranged in order.

  • Example: Test scores: 50, 60, 70 → Median = 60.
  • If there’s an even number of values, take the average of the two middle values.

Mode (Most Frequent):

The value that appears most often in the dataset.

  • Example: Test scores: 50, 60, 60, 70 → Mode = 60.

B. Measures of Spread (How different is the data?)

These tell you how much the data varies.

Range:

The difference between the highest and lowest values.

  1. Example: Heights of students: 150 cm, 155 cm, 170 cm → Range = 170−150=20170–150 = 20170−150=20.

Variance:

Measures how far each data point is from the mean.

  • Small variance means the data points are close to the mean.

Standard Deviation (SD):

The square root of variance; shows the average distance of each data point from the mean.

  • Example in Python
std_dev = np.std(data) 
print(std_dev)

C. Data Visualization Methods

Visualization makes it easier to interpret descriptive statistics.

  1. Histograms: Shows the distribution of data. For example, a histogram of student scores can show how many students scored in each range (e.g., 0–50, 50–100).
  2. Bar Charts: Great for comparing categories, like sales by region or test scores by subject.
  3. Box Plots: Displays the range, median, and outliers in a dataset. It’s helpful for spotting extreme values.
  4. Scatter Plots: Shows relationships between two variables. For example, plotting hours studied vs. test scores can show whether studying more leads to better results.

3. Real-Life Example Using Python

Imagine we have a dataset of monthly sales numbers for 12 months: 1200,1500,1300,1700,1600,1800,2200,2100,2000,1900,2300,2500

Here’s how we calculate descriptive statistics using Python:

import numpy as np
import matplotlib.pyplot as plt
# Data
sales = [1200, 1500, 1300, 1700, 1600, 1800, 2200, 2100, 2000, 1900, 2300, 2500]
# Central Tendency
mean = np.mean(sales)
median = np.median(sales)
mode = max(set(sales), key=sales.count) # Mode calculation
# Spread
range_sales = max(sales) - min(sales)
std_dev = np.std(sales)
# Print results
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Range: {range_sales}")
print(f"Standard Deviation: {std_dev}")
# Visualization: Histogram
plt.hist(sales, bins=5, color='skyblue', edgecolor='black')
plt.title('Monthly Sales Distribution')
plt.xlabel('Sales Range')
plt.ylabel('Frequency')
plt.show()

4. Connecting to Deep Learning

How Descriptive Statistics Helps in AI/ML:

Data Preparation:

  1. Before training models, we calculate mean and standard deviation to normalize data, ensuring all features are on the same scale.

Feature Selection:

  1. If one feature (like age) has a much higher range than others (like gender coded as 0 or 1), it could dominate the model. Descriptive statistics helps balance this.

Understanding Relationships:

  1. By visualizing and summarizing the data, you can identify correlations. For example, is higher income correlated with higher spending?

Spotting Errors or Outliers:

  1. Descriptive statistics can highlight outliers (like an impossible sales value of 1 million) that could skew the model.

In Summary

Descriptive statistics is essential for understanding data. Whether it’s calculating averages or visualizing distributions, these tools and methods simplify large datasets into actionable insights.

--

--

No responses yet