Skip to main content

Batch Size and Minibatches in Machine Learning


Key Concepts: Minibatches, DataLoader, and the Limits of Fully Connected Networks

This document summarizes several fundamental ideas in deep learning training pipelines, including minibatch gradient descent, PyTorch’s DataLoader, model capacity, and the limitations of fully connected (dense) networks for image data. These concepts motivate the transition to convolutional neural networks (CNNs).

1. Minibatch Gradient Descent

Training with minibatches means computing gradients on a small subset of the dataset rather than the full dataset. This introduces noise, which has important benefits:

  • Efficiency: Computing gradients on the entire dataset is slow; minibatches make training fast and scalable.
  • Useful Noise: Minibatch gradients are noisy approximations. This stochasticity helps avoid local minima and supports stable convergence.
  • Learning Rate Requirements: Because minibatch gradients fluctuate, a reasonably small learning rate prevents instability.

Shuffling the dataset each epoch ensures the sequence of minibatches remains representative of the overall data distribution.

2. PyTorch DataLoader

The DataLoader automates:

  • Batching of samples
  • Shuffling each epoch
  • Iterating over data easily within training loops

A typical DataLoader setup:

train_loader = torch.utils.data.DataLoader(
    cifar2, batch_size=64, shuffle=True
)
        

Each iteration returns a minibatch of images and labels, ready for processing in the forward pass.

3. The Training Loop

Each training step consists of:

  • Forward pass
  • Loss computation
  • Zeroing gradients
  • Backward propagation
  • Optimizer step

Example batch shapes:

  • imgs: 64 × 3 × 32 × 32
  • labels: 64

After training, accuracy is measured on a separate validation set without tracking gradients.

4. Increasing Model Capacity and Overfitting

Adding more layers or larger layers increases the model’s capacity. This leads to:

  • Near-perfect training accuracy
  • Limited improvement in validation accuracy

This behavior indicates overfitting: the model memorizes the training set rather than learning generalizable patterns.

You can inspect the number of trainable parameters using p.numel(). Fully connected layers tend to produce extremely large parameter counts.

5. Why Fully Connected Networks Fail for Images

A. They Ignore Spatial Relationships

Flattening an image into a 1D vector removes the natural 2D structure. The network must learn pixel relationships independently for every location:

  • An airplane at one position must be learned separately from an airplane shifted by a few pixels.
  • The model is not translation invariant.

B. They Require Massive Numbers of Parameters

Every output neuron connects to every input pixel. For image inputs, especially high-resolution ones, this causes exponential growth in parameter count. For example, a single fully connected layer on a 1024×1024 RGB image could require billions of parameters.

This is computationally and memory-wise impractical.

6. Motivation for Convolutional Layers

The shortcomings of fully connected layers lead naturally to the need for convolutional neural networks (CNNs):

  • They exploit local patterns through small receptive fields.
  • They reuse parameters across spatial positions (weight sharing).
  • They are naturally translation invariant.
  • They scale efficiently to large images.

Convolutional layers are therefore the standard architecture for image tasks.

Conclusion

  • Minibatches provide efficiency and useful randomness during training.
  • PyTorch’s DataLoader simplifies data handling.
  • Fully connected networks are prone to overfitting and do not scale well to images.
  • CNNs solve these issues by leveraging the 2D structure of images and promoting translation invariance.

These concepts form the foundation for understanding modern deep learning approaches to image classification.

What Is Translation Variance?

Translation variance refers to a model’s tendency to produce different outputs when an input image is shifted (translated) left, right, up, or down.

This is often an undesirable property in image recognition because the meaning of the image does not change if an object shifts a few pixels.

Why Translation Variance Happens

Fully connected (dense) neural networks treat an image as a large 1D vector, ignoring the spatial relationships between neighboring pixels. As a result:

  • A feature learned at one location must be relearned at every other location.
  • Shifting the object in the input produces a completely different pattern of values.
  • The model often fails to recognize the same object in a different position.

This makes the model not generalize well to translated images.

Translation Invariance vs. Translation Variance

Concept Meaning Example Behavior
Translation Invariance The model's prediction does not change when the input image is shifted. A CNN recognizes a cat regardless of whether it appears at the top-left or center.
Translation Variance The model's prediction does change when the image is shifted. A fully connected network fails to identify the same plane if it moves a few pixels.

Why CNNs Fix Translation Variance

Convolutional neural networks naturally achieve translation invariance because they use:

  • Local receptive fields – small regions of the image are processed at a time.
  • Weight sharing – one filter slides across the whole image.

This means the same pattern can be detected anywhere in the image, allowing the model to recognize objects regardless of their position.

Summary

  • Translation variance → predictions change when the image is shifted.
  • Fully connected networks → translation-variant (bad for images).
  • CNNs → translation-invariant (ideal for vision tasks).

Further Reading


People are good at skipping over material they already know!

View Related Topics to







Contact Us

Name

Email *

Message *

Popular Posts

BER vs SNR for M-ary QAM, M-ary PSK, QPSK, BPSK, ...

📘 Overview of BER and SNR 🧮 Online Simulator for BER calculation of m-ary QAM and m-ary PSK 🧮 MATLAB Code for BER calculation of M-ary QAM, M-ary PSK, QPSK, BPSK, ... 📚 Further Reading 📂 View Other Topics on M-ary QAM, M-ary PSK, QPSK ... 🧮 Online Simulator for Constellation Diagram of m-ary QAM 🧮 Online Simulator for Constellation Diagram of m-ary PSK 🧮 MATLAB Code for BER calculation of ASK, FSK, and PSK 🧮 MATLAB Code for BER calculation of Alamouti Scheme 🧮 Different approaches to calculate BER vs SNR What is Bit Error Rate (BER)? The abbreviation BER stands for Bit Error Rate, which indicates how many corrupted bits are received (after the demodulation process) compared to the total number of bits sent in a communication process. BER = (number of bits received in error) / (total number of tran...

Constellation Diagram of ASK in Detail

A binary bit '1' is assigned a power level of E b \sqrt{E_b}  (or energy E b E_b ), while a binary bit '0' is assigned zero power (or no energy).   Simulator for Binary ASK Constellation Diagram SNR (dB): 15 Run Simulation Noisy Modulated Signal (ASK) Original Modulated Signal (ASK) Energy per bit (Eb) (Tb = bit duration): We know that all periodic signals are power signals. Now we’ll find the energy of ASK for the transmission of binary ‘1’. E b = ∫ 0 Tb (A c .cos(2П.f c .t)) 2 dt = ∫ 0 Tb (A c ) 2 .cos 2 (2П.f c .t) dt Using the identity cos 2 x = (1 + cos(2x))/2: = ∫ 0 Tb ((A c ) 2 /2)(1 + cos(4П.f c .t)) dt ...

Coherence Bandwidth and Coherence Time

🧮 Coherence Bandwidth 🧮 Coherence Time 🧮 MATLAB Code s 📚 Further Reading For Doppler Delay or Multi-path Delay Coherence time T coh ∝ 1 / v max (For slow fading, coherence time T coh is greater than the signaling interval.) Coherence bandwidth W coh ∝ 1 / Ï„ max (For frequency-flat fading, coherence bandwidth W coh is greater than the signaling bandwidth.) Where: T coh = coherence time W coh = coherence bandwidth v max = maximum Doppler frequency (or maximum Doppler shift) Ï„ max = maximum excess delay (maximum time delay spread) Notes: The notation v max −1 and Ï„ max −1 indicate inverse proportionality. Doppler spread refers to the range of frequency shifts caused by relative motion, determining T coh . Delay spread (or multipath delay spread) determines W coh . Frequency-flat fading occurs when W coh is greater than the signaling bandwidth. Coherence Bandwidth Coherence bandwidth is...

Constellation Diagrams of ASK, PSK, and FSK

📘 Overview of Energy per Bit (Eb / N0) 🧮 Online Simulator for constellation diagrams of ASK, FSK, and PSK 🧮 Theory behind Constellation Diagrams of ASK, FSK, and PSK 🧮 MATLAB Codes for Constellation Diagrams of ASK, FSK, and PSK 📚 Further Reading 📂 Other Topics on Constellation Diagrams of ASK, PSK, and FSK ... 🧮 Simulator for constellation diagrams of m-ary PSK 🧮 Simulator for constellation diagrams of m-ary QAM BASK (Binary ASK) Modulation: Transmits one of two signals: 0 or -√Eb, where Eb​ is the energy per bit. These signals represent binary 0 and 1.    BFSK (Binary FSK) Modulation: Transmits one of two signals: +√Eb​ ( On the y-axis, the phase shift of 90 degrees with respect to the x-axis, which is also termed phase offset ) or √Eb (on x-axis), where Eb​ is the energy per bit. These signals represent binary 0 and 1.  BPSK (Binary PSK) Modulation: Transmits one of two signals...

MATLAB Code for ASK, FSK, and PSK

📘 Overview & Theory 🧮 MATLAB Code for ASK 🧮 MATLAB Code for FSK 🧮 MATLAB Code for PSK 🧮 Simulator for binary ASK, FSK, and PSK Modulations 📚 Further Reading ASK, FSK & PSK HomePage MATLAB Code MATLAB Code for ASK Modulation and Demodulation % The code is written by SalimWireless.Com % Clear previous data and plots clc; clear all; close all; % Parameters Tb = 1; % Bit duration (s) fc = 10; % Carrier frequency (Hz) N_bits = 10; % Number of bits Fs = 100 * fc; % Sampling frequency (ensure at least 2*fc, more for better representation) Ts = 1/Fs; % Sampling interval samples_per_bit = Fs * Tb; % Number of samples per bit duration % Generate random binary data rng(10); % Set random seed for reproducibility binary_data = randi([0, 1], 1, N_bits); % Generate random binary data (0 or 1) % Initialize arrays for continuous signals t_overall = 0:Ts:(N_bits...

UGC-NET Electronic Science Previous Year Question Papers with Answer Keys and Full Explanations

    UGC-NET Electronic Science Question Paper With Answer Key Download Pdf [2023] Download Question Paper               See Answers   2025 | 2024 | 2023 | 2022 | 2021 | 2020 UGC-NET Electronic Science  2023 Answers with Explanations Q.115 (A) It is an AC bridge to measure frequency True. The Wien bridge is an AC bridge used for accurate frequency measurement . (B) It is a DC bridge to measure amplitude False. Wien Bridge works with AC signals , not DC. (C) It is used as frequency determining element True. In Wien bridge oscillators, the RC network sets the oscillation frequency . (D) It is used as band-pass filter Partially misleading. The Wien bridge network acts like a band-pass filter in the oscillator, but the bridge itself is not typically described this way. Exam questions usually mark this as False . (E) It is used as notch filter False. That is the Wien NOTCH bridge ,...

Comparisons among ASK, PSK, and FSK | And the definitions of each

📘 Comparisons among ASK, FSK, and PSK 🧮 Online Simulator for calculating Bandwidth of ASK, FSK, and PSK 🧮 MATLAB Code for BER vs. SNR Analysis of ASK, FSK, and PSK 📚 Further Reading 📂 View Other Topics on Comparisons among ASK, PSK, and FSK ... 🧮 Comparisons of Noise Sensitivity, Bandwidth, Complexity, etc. 🧮 MATLAB Code for Constellation Diagrams of ASK, FSK, and PSK 🧮 Online Simulator for ASK, FSK, and PSK Generation 🧮 Online Simulator for ASK, FSK, and PSK Constellation 🧮 Some Questions and Answers Modulation ASK, FSK & PSK Constellation MATLAB Simulink MATLAB Code Comparisons among ASK, PSK, and FSK    Comparisons among ASK, PSK, and FSK Comparison among ASK, FSK, and PSK Parameters ASK FSK PSK Variable Characteristics Amplitude Frequency ...

Online Simulator for ASK, FSK, and PSK

Try our new Digital Signal Processing Simulator!   Start Simulator for binary ASK Modulation Message Bits (e.g. 1,0,1,0) Carrier Frequency (Hz) Sampling Frequency (Hz) Run Simulation Simulator for binary FSK Modulation Input Bits (e.g. 1,0,1,0) Freq for '1' (Hz) Freq for '0' (Hz) Sampling Rate (Hz) Visualize FSK Signal Simulator for BPSK Modulation ...