Types of Activation Functions: Sigmoid tanh, ReLU, Softmax. Part 1
Activation functions in neural networks help determine if a neuron should be activated (fired) or not, similar to how our brain decides when to send a signal. Without activation functions, neural networks wouldn’t be able to learn or model complex patterns.
1. Sigmoid Activation Function
- Description: The sigmoid function squashes the input values to a range between 0 and 1. It’s like a switch that turns on slowly, reaching closer to 1 for larger inputs and closer to 0 for smaller inputs.
- Formula: Sigmoid(x)=11+e−x\text{Sigmoid}(x) = \frac{1}{1 + e^{-x}}Sigmoid(x)=1+e−x1
- Example: Imagine this as a dimmer switch for a light. A small increase in the signal (x) gradually turns on the light until it’s almost fully bright (close to 1) or almost off (close to 0).
- Use Case: Often used in binary classification tasks (where you need to classify something into two groups, like ‘yes’ or ‘no’).
2. Tanh (Hyperbolic Tangent) Activation Function
- Description: Tanh is similar to Sigmoid but squashes values between -1 and 1. It’s centered around zero, which makes it better for many networks because it handles negative and positive values.
- Formula: Tanh(x)=21+e−2x−1\text{Tanh}(x) = \frac{2}{1 + e^{-2x}} — 1Tanh(x)=1+e−2x2−1
- Example: Think of it as a temperature gauge that can go up or down. The closer you get to the extreme temperatures, the closer you are to 1 (hot) or -1 (cold).
- Use Case: It’s preferred over Sigmoid in many cases because it allows the network to handle both positive and negative inputs more effectively.
3. ReLU (Rectified Linear Unit) Activation Function
- Description: ReLU is simple. If the input is positive, it outputs the same value. If the input is negative, it outputs 0.
- Formula: ReLU(x)=max(0,x)\text{ReLU}(x) = \max(0, x)ReLU(x)=max(0,x)
- Example: Imagine a tap that only lets water flow if you turn it above a certain level (0). If you turn it below that level, no water flows out (0).
- Use Case: ReLU is very popular in hidden layers of neural networks because it’s simple and helps models learn faster.
4. Leaky ReLU Activation Function
- Description: Leaky ReLU is similar to ReLU but with a small “leak” for negative values. Instead of giving a strict 0 for negative inputs, it gives a tiny negative output (like 0.01 times the input).
- Formula: Leaky ReLU(x)=max(0.01x,x)\text{Leaky ReLU}(x) = \max(0.01x, x)Leaky ReLU(x)=max(0.01x,x)
- Example: Imagine the same tap, but this time, it allows a small trickle of water even when it’s almost off.
- Use Case: This can help avoid situations where neurons “die” by always being zero in a network, which helps the network learn better.
5. Softmax Activation Function
- Description: Softmax is used when you want the output to be a set of probabilities. It takes multiple values and squashes them into a range from 0 to 1, where the total adds up to 1. This is useful for multi-class classification.
- Formula: Softmax(xi)=exi∑jexj\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}Softmax(xi)=∑jexjexi
- Example: Imagine you have 3 options: cat, dog, or rabbit. Softmax converts the outputs so that you get a probability for each option, like 0.7 (70%) for cat, 0.2 (20%) for dog, and 0.1 (10%) for rabbit.
- Use Case: Commonly used in the output layer when dealing with multiple classes (like classifying images as either a cat, dog, or rabbit).
Recap Table
Let’s go through some scenarios with everyday life examples to help clarify when you might use each activation function
1. Sigmoid Activation Function
- Scenario: Imagine you’re trying to decide whether or not to attend a concert based on ticket prices.
- Example: If the ticket is really cheap, you’re very likely to go (output close to 1). If it’s expensive, you’re unlikely to go (output close to 0). Sigmoid helps you make a binary decision by squashing the range to either a “yes” or “no.”
- Best Used For: Situations where you need a binary (two-option) outcome, like yes/no, true/false, or go/stay.
2. Tanh Activation Function
- Scenario: Imagine you’re grading homework on a scale from -10 to +10, where negative scores mean mistakes and positive scores mean good points.
- Example: If someone performs really well, their score will be close to +10 (closer to 1 in Tanh terms). If they perform poorly, their score will be closer to -10 (closer to -1 in Tanh terms). This is useful because it distinguishes positive and negative outcomes and is “centered” around zero.
- Best Used For: Tasks where you want to account for both positive and negative outcomes, such as sentiment analysis (positive/negative emotion) or temperature gauges (hot/cold).
3. ReLU (Rectified Linear Unit) Activation Function
- Scenario: Think of a digital assistant that only responds to loud enough sounds.
- Example: If you speak softly, the assistant won’t react (output is 0). But if you speak louder, it responds based on how loud you were (output is directly proportional to loudness). ReLU works in a similar way by ignoring all negative values and focusing on positive signals, allowing only “significant” information to pass through.
- Best Used For: Layers in neural networks where you want to speed up learning without “cluttering” the data with unnecessary information.
4. Leaky ReLU Activation Function
- Scenario: Imagine a “drip” faucet that always lets a tiny trickle of water flow, even when nearly shut off.
- Example: If you only open it a bit, a small amount of water (signal) still flows out, and if you open it more, more water flows. Leaky ReLU is like ReLU but allows a small trickle for negative values to keep the flow going.
- Best Used For: Avoiding the “dead neuron” problem in deep networks, where some neurons might otherwise stop learning due to being stuck at zero output.
5. Softmax Activation Function
- Scenario: Let’s say you’re choosing a favorite ice cream flavor from chocolate, vanilla, and strawberry.
- Example: Instead of choosing just one, you assign a probability to each flavor: 50% for chocolate, 30% for vanilla, and 20% for strawberry. Softmax gives you a probability distribution across multiple classes, letting you express that you prefer chocolate the most, but you’d still consider the others.
- Best Used For: Multi-class classification tasks where you need to assign probabilities across several categories (e.g., categorizing images into types of animals).
Quick Summary
- Sigmoid: Great for yes/no decisions. Use when you need a binary choice, like deciding to attend a concert based on cost.
- Tanh: Useful for handling positive/negative distinctions, like scoring with both positive and negative values.
- ReLU: Ideal for ignoring minor or negative values and focusing on stronger signals, like only responding to loud sounds.
- Leaky ReLU: Adds a small response to negative values to prevent “dead” spots, like a dripping faucet.
- Softmax: Use when you want a probability across multiple options, like choosing a favorite flavor with weighted preferences.
Below, I’ll add Python code that uses each activation function with sample data for actual analysis. We’ll use NumPy for computations and some simple synthetic data for illustration.
1. Sigmoid Activation Function (Binary Classification)
- Example: Let’s analyze whether students pass based on their test scores. We’ll use the sigmoid function to classify scores as “pass” or “fail.”
import numpy as np
import pandas as pd
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Sample data: student scores
data = pd.DataFrame({'Student': ['A', 'B', 'C', 'D', 'E'],
'Score': [55, 45, 70, 65, 50]})
# Apply sigmoid to normalize scores
data['Pass Probability'] = sigmoid(data['Score'] - 50) # Shift to center around 50
data['Pass'] = data['Pass Probability'].apply(lambda x: 'Yes' if x > 0.5 else 'No')
print(data)
Explanation: Here, the sigmoid function gives a probability of passing based on the score. Scores above 50 result in a higher probability of passing, classified as “Yes,” and those below as “No.”
2. Tanh Activation Function (Sentiment Analysis)
- Example: Imagine analyzing sentiment scores from customer reviews. We’ll use Tanh to map scores onto a scale from -1 (negative) to +1 (positive).
def tanh(x):
return np.tanh(x)
# Sample data: sentiment scores from customer feedback
data = pd.DataFrame({'Customer': ['X', 'Y', 'Z', 'W', 'V'], 'Sentiment Score': [-3, -1, 0, 1, 3]})
# Apply Tanh to normalize sentiment scores
data['Normalized Sentiment'] = tanh(data['Sentiment Score'])
print(data)
- Explanation: The Tanh function scales the scores from -1 to 1, helping distinguish positive and negative sentiments. This function is particularly useful when analyzing data centered around zero.
3. ReLU Activation Function (Threshold for Audio Level)
- Example: Let’s say we’re monitoring noise levels in a quiet area and only want to capture sound above a certain level. We’ll use ReLU to “filter out” low noise levels.
def relu(x):
return np.maximum(0, x)
# Sample data: noise levels in decibels (dB)
data = pd.DataFrame({'Time': ['8AM', '9AM', '10AM', '11AM', '12PM'], 'Noise Level (dB)': [-2, 5, 10, 3, -1]})
# Apply ReLU to filter noise levels data['Filtered Noise Level'] = relu(data['Noise Level (dB)'])
print(data)
Explanation: Negative noise levels are set to 0, effectively filtering out values below a threshold. This makes ReLU useful for threshold-based data filtering.
4. Leaky ReLU Activation Function (Detecting Subtle Temperature Variations)
- Example: In temperature monitoring, small negative values might still have meaning. Leaky ReLU helps retain some of these smaller signals.
def leaky_relu(x, alpha=0.1):
return np.where(x > 0, x, alpha * x)
# Sample data: daily temperature changes (in °C)
data = pd.DataFrame({'Day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'], 'Temperature Change': [-5, -2, 0, 3, 5]})
# Apply Leaky ReLU to temperature changes data['Adjusted Temperature Change'] = leaky_relu(data['Temperature Change'])
print(data)
Explanation: Negative values are allowed to “leak” through with a small scaling factor. This can be useful in data with small variations, ensuring that minor negative changes are not completely ignored.
5. Softmax Activation Function (Multi-Class Probability Distribution)
- Example: Imagine predicting a student’s preferred subject based on scores in different subjects. We’ll use Softmax to generate probabilities for each subject.
def softmax(x):
e_x = np.exp(x - np.max(x)) # For numerical stability
return e_x / e_x.sum(axis=0)
# Sample data: scores in different subjects
data = pd.DataFrame({'Student': ['A', 'B', 'C', 'D', 'E'],
'Math': [90, 80, 70, 60, 50],
'Science': [85, 78, 65, 58, 50],
'English': [88, 85, 60, 55, 45]})
# Apply Softmax to each student's scores
data[['Math_prob', 'Science_prob', 'English_prob']] = data[['Math', 'Science', 'English']].apply(softmax, axis=1)
print(data)
Explanation: Softmax normalizes the scores into probabilities for each subject, indicating the likelihood that a student’s preferred subject is Math, Science, or English. This is useful for multi-class classification problems.
Summary
These examples show how different activation functions can be used to handle various types of data:
- Sigmoid: Binary classification for pass/fail scenarios.
- Tanh: Sentiment analysis with values ranging from -1 to 1.
- ReLU: Filtering low signals based on a threshold.
- Leaky ReLU: Retaining small negative values, useful for minor variations.
- Softmax: Multi-class probability distribution, useful for subject preferences.
Try running these on your own data, and see how the activation functions transform it!