LSTM Simple learnning and output
Absolutely! This is one of the best ways to understand how LSTMs work. By using simple, synthetic data, you can create a “lab environment” to observe the model’s learning process and its final predictions clearly.
Here’s a step-by-step guide on how to do this, complete with a simple example you can run.
1. Choose a Simple, Learnable Pattern
The key is to give the LSTM a pattern that is sequential and has dependencies that span multiple steps. A simple LSTM should be able to learn this.
Great Beginner Example: A Simple Sum
- Input Sequence: A sequence of 3 numbers.
- Target Output: The sum of the first two numbers.
- Why not the sum of all three? This makes it more interesting. The model has to remember the first two numbers and ignore the third. This tests its ability to use memory and gates.
Example Data Generation:
import numpy as np
# Let's create 1000 sample sequences
num_samples = 1000
seq_length = 3
# Generate random data
X = np.random.randint(0, 100, (num_samples, seq_length))
# Create the target: sum of the first two numbers, ignore the third
y = X[:, 0] + X[:, 1]
print("Example input and output:")
for i in range(5):
print(f"Input: {X[i]} -> Target: {y[i]}")
# Example output:
# Input: [5 23 89] -> Target: 28
# Input: [42 11 76] -> Target: 53
# Input: [1 65 33] -> Target: 66
2. Prepare the Data for the LSTM
LSTMs expect input in a specific 3D shape: (number_of_samples, number_of_timesteps, number_of_features).
- Our
Xhas 1000 samples, each with 3 timesteps. - Each timestep is just one number (one feature), so we need to reshape it.
# Reshape X to be [samples, timesteps, features]
X_reshaped = X.reshape((X.shape[0], X.shape[1], 1))
# It's also good practice to scale the data for neural networks.
# Since our output is a sum, scaling both input and output will help.
from sklearn.preprocessing import MinMaxScaler
x_scaler = MinMaxScaler()
y_scaler = MinMaxScaler()
# Scale the input (fit on flattened data, then reshape)
X_flattened = X_reshaped.reshape(-1, 1)
X_scaled_flattened = x_scaler.fit_transform(X_flattened)
X_final = X_scaled_flattened.reshape(X_reshaped.shape)
# Scale the target
y_final = y_scaler.fit_transform(y.reshape(-1, 1))
3. Build a Very Small LSTM Model
We’ll create a model with only one LSTM layer and a small number of units to see if it can learn our simple function.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
model = Sequential()
# Let's use only 5 units ("switchboard operators")!
model.add(LSTM(5, input_shape=(seq_length, 1))) # (timesteps, features)
# The output is a single number (the sum)
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.summary()
The model.summary() will show you the number of parameters. For this tiny model, it will be:
( (1 + 5) * 5 * 4 ) + (5 * 1) = 120 + 5 = 125 parameters(1 input + 5 previous hidden states) * 5 units * 4 gates + output layer weights. This is much easier to track than thousands!
4. Train the Model and Watch it Learn
Train the model for a short while and see if the loss decreases.
history = model.fit(X_final, y_final, epochs=50, batch_size=16, verbose=1, validation_split=0.2)
# Plot the learning process
import matplotlib.pyplot as plt
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.legend()
plt.show()
What to observe: The loss should start high and quickly drop close to zero. This shows the model is successfully learning the pattern “look at the first two numbers and add them.”
5. Test the Model on New Data
Now for the fun part! Let’s generate some new “phone numbers” and see what the model predicts.
# Create some new unseen test data
new_data = np.array([[10, 20, 99], # Should predict ~30
[1, 2, 100], # Should predict ~3
[50, 60, 0]]) # Should predict ~110
# Prepare the new data exactly like the training data
new_data_reshaped = new_data.reshape((new_data.shape[0], new_data.shape[1], 1))
new_data_flattened = new_data_reshaped.reshape(-1, 1)
new_data_scaled = x_scaler.transform(new_data_flattened)
new_data_final = new_data_scaled.reshape(new_data_reshaped.shape)
# Make predictions
predictions_scaled = model.predict(new_data_final)
# Inverse transform the predictions to get the actual values
predictions = y_scaler.inverse_transform(predictions_scaled)
# Print the results
print("\nModel Predictions:")
for i in range(len(new_data)):
print(f"Input: {new_data[i]} -> Predicted: {predictions[i][0]:.2f} | Actual Sum: {new_data[i][0] + new_data[i][1]}")
Expected Output: The predictions should be very close to the actual sum of the first two numbers. The third number should have almost no effect, proving the model learned the right pattern and uses its gates to ignore irrelevant information.
Other Simple Pattern Ideas to Experiment With:
- The Copy Task: Input:
[A, B, C, 0]-> Target:A. The model must remember the first value and output it at the end. - The Reverse Task: Input:
[A, B, C]-> Target:C. The model must remember the last value. - Simple Binary Patterns: Input:
[1, 0, 1, 0]-> Target:1(predict the next in the alternating sequence).
By starting with these trivial examples, you can confidently see the LSTM learning, overfitting, or failing, which gives you incredible intuition for how it works before you ever touch complex real-world data.