In general, a Convolutional Neural Network (CNN) consists of an input layer, hidden layers, and an output layer. Real-world CNNs are nonlinear because they include activation functions that introduce nonlinearity.
In our case, the model takes a single input, which is passed to a 13-dimensional linear layer, followed by a nonlinear 13-dimensional tanh activation layer. The output from this layer is then passed through another 13-dimensional linear layer that produces a single output value.
Linear layers contain weights and biases, following the equation y = mx + b, while the tanh activation function outputs values in the range [-1, 1].
1. What’s happening overall
The text explains how PyTorch stores and updates the weights and
biases (called parameters) of a small neural network
built using nn.Sequential.
A “parameter” is just a number that the model can learn, like weights and biases.
2. model.parameters()
When you call model.parameters(), PyTorch collects all the weights
and biases from every layer in your model.
[param.shape for param in seq_model.parameters()]
# Output:
[torch.Size([13, 1]), torch.Size([13]), torch.Size([1, 13]), torch.Size([1])]
This means:
- First layer weights: [13, 1]
- First layer bias: [13]
- Second layer weights: [1, 13]
- Second layer bias: [1]
These are the exact numbers the optimizer (like SGD or Adam) will update during training.
3. After backward()
When you run loss.backward(), PyTorch calculates how each parameter
should change (the gradient).
- Compute loss
- Call
loss.backward()→ get gradients - Call
optimizer.step()→ update weights
4. named_parameters()
This function gives you the names of the parameters along with their values.
for name, param in seq_model.named_parameters():
print(name, param.shape)
# Output:
0.weight torch.Size([13, 1])
0.bias torch.Size([13])
2.weight torch.Size([1, 13])
2.bias torch.Size([1])
Here 0 and 2 are the layer order numbers inside nn.Sequential.
5. Using OrderedDict for readable names
from collections import OrderedDict
seq_model = nn.Sequential(OrderedDict([
('hidden_linear', nn.Linear(1, 8)),
('hidden_activation', nn.Tanh()),
('output_linear', nn.Linear(8, 1))
]))
Now the parameters look more descriptive:
hidden_linear.weight torch.Size([8, 1])
hidden_linear.bias torch.Size([8])
output_linear.weight torch.Size([1, 8])
output_linear.bias torch.Size([1])
6. Accessing specific parameters
seq_model.output_linear.bias
# Output:
Parameter containing:
tensor([-0.0173], requires_grad=True)
This means it’s a bias value that will be updated during training.
7. Checking gradients
After training, you can see how much each parameter changed by checking its
.grad value:
seq_model.hidden_linear.weight.grad
This shows how each weight in the hidden layer changed after training.
Summary Table
| Concept | Meaning |
|---|---|
parameters() |
Collects all weights and biases of the model |
named_parameters() |
Same, but includes names for easier identification |
loss.backward() |
Calculates gradients (how much each parameter should change) |
optimizer.step() |
Updates all parameters using those gradients |
OrderedDict |
Lets you name your layers instead of using numbers |
.grad |
Shows the gradient of a parameter after backpropagation |
In short: PyTorch tracks every learnable weight and bias in your model, computes their gradients when you train, and updates them using the optimizer to make the model perform better.