RNNs: Processing Sequences
How Recurrent Neural Networks introduced the concept of memory in neural networks for sequential data.
The Problem: Data That Flows in Order
Early neural networks had a fundamental limitation—they treated each input independently. Feed them a sentence word by word, and they'd forget the previous word by the time they saw the next one. This made them useless for language, speech, stock prices, or anything where order matters.
Consider translating "The cat sat on the mat." To understand "sat," you need to remember "cat." To understand the sentence, you need context from every word before it. Standard neural networks couldn't do this—they had no memory.
The RNN Solution: Loops and Memory
Recurrent Neural Networks introduced a simple but powerful idea: loops. Instead of processing inputs in isolation, RNNs pass information from one step to the next. When processing the word "sat," the network also receives information about "The cat" from previous steps.
This creates a form of memory. The network maintains a hidden state that accumulates information as it processes the sequence. Each new input updates this hidden state, which then influences how the next input is processed.
Step 1: Process "The" → Store in hidden state
Step 2: Process "cat" + hidden state → Update hidden state
Step 3: Process "sat" + hidden state → Now knows about "The cat"
Result: Each word has context from all previous words
Where RNNs Work Well
| Use Case | Example | Why RNN Works |
|---|---|---|
| Character prediction | Autocomplete | Short sequences, simple patterns |
| Sensor data | IoT anomaly detection | Time-ordered readings |
| Simple speech | Basic voice commands | Short utterances |
The Critical Limitation: Vanishing Gradients
RNNs have a fatal flaw. As sequences get longer, the network struggles to remember information from early steps. During training, the error signals that help the network learn become weaker and weaker as they travel back through time—a problem called vanishing gradients.
In practice, RNNs work for sequences of maybe 10-20 steps. Ask them to remember something from 100 steps ago, and they fail. It's like trying to pass a message through 100 people—by the end, the original message is lost.
Backpropagation through time (BPTT) is how RNNs learn. Gradients must flow backward through every time step. With each step, gradients get multiplied by weights—if weights are less than 1, gradients shrink exponentially. This mathematical reality is the vanishing gradient problem.
This limitation directly led to the next evolution: LSTMs.