Skip to main content

Batch Size and Minibatches in Machine Learning


Key Concepts: Minibatches, DataLoader, and the Limits of Fully Connected Networks

This document summarizes several fundamental ideas in deep learning training pipelines, including minibatch gradient descent, PyTorch’s DataLoader, model capacity, and the limitations of fully connected (dense) networks for image data. These concepts motivate the transition to convolutional neural networks (CNNs).

1. Minibatch Gradient Descent

Training with minibatches means computing gradients on a small subset of the dataset rather than the full dataset. This introduces noise, which has important benefits:

  • Efficiency: Computing gradients on the entire dataset is slow; minibatches make training fast and scalable.
  • Useful Noise: Minibatch gradients are noisy approximations. This stochasticity helps avoid local minima and supports stable convergence.
  • Learning Rate Requirements: Because minibatch gradients fluctuate, a reasonably small learning rate prevents instability.

Shuffling the dataset each epoch ensures the sequence of minibatches remains representative of the overall data distribution.

2. PyTorch DataLoader

The DataLoader automates:

  • Batching of samples
  • Shuffling each epoch
  • Iterating over data easily within training loops

A typical DataLoader setup:

train_loader = torch.utils.data.DataLoader(
    cifar2, batch_size=64, shuffle=True
)
        

Each iteration returns a minibatch of images and labels, ready for processing in the forward pass.

3. The Training Loop

Each training step consists of:

  • Forward pass
  • Loss computation
  • Zeroing gradients
  • Backward propagation
  • Optimizer step

Example batch shapes:

  • imgs: 64 × 3 × 32 × 32
  • labels: 64

After training, accuracy is measured on a separate validation set without tracking gradients.

4. Increasing Model Capacity and Overfitting

Adding more layers or larger layers increases the model’s capacity. This leads to:

  • Near-perfect training accuracy
  • Limited improvement in validation accuracy

This behavior indicates overfitting: the model memorizes the training set rather than learning generalizable patterns.

You can inspect the number of trainable parameters using p.numel(). Fully connected layers tend to produce extremely large parameter counts.

5. Why Fully Connected Networks Fail for Images

A. They Ignore Spatial Relationships

Flattening an image into a 1D vector removes the natural 2D structure. The network must learn pixel relationships independently for every location:

  • An airplane at one position must be learned separately from an airplane shifted by a few pixels.
  • The model is not translation invariant.

B. They Require Massive Numbers of Parameters

Every output neuron connects to every input pixel. For image inputs, especially high-resolution ones, this causes exponential growth in parameter count. For example, a single fully connected layer on a 1024×1024 RGB image could require billions of parameters.

This is computationally and memory-wise impractical.

6. Motivation for Convolutional Layers

The shortcomings of fully connected layers lead naturally to the need for convolutional neural networks (CNNs):

  • They exploit local patterns through small receptive fields.
  • They reuse parameters across spatial positions (weight sharing).
  • They are naturally translation invariant.
  • They scale efficiently to large images.

Convolutional layers are therefore the standard architecture for image tasks.

Conclusion

  • Minibatches provide efficiency and useful randomness during training.
  • PyTorch’s DataLoader simplifies data handling.
  • Fully connected networks are prone to overfitting and do not scale well to images.
  • CNNs solve these issues by leveraging the 2D structure of images and promoting translation invariance.

These concepts form the foundation for understanding modern deep learning approaches to image classification.

What Is Translation Variance?

Translation variance refers to a model’s tendency to produce different outputs when an input image is shifted (translated) left, right, up, or down.

This is often an undesirable property in image recognition because the meaning of the image does not change if an object shifts a few pixels.

Why Translation Variance Happens

Fully connected (dense) neural networks treat an image as a large 1D vector, ignoring the spatial relationships between neighboring pixels. As a result:

  • A feature learned at one location must be relearned at every other location.
  • Shifting the object in the input produces a completely different pattern of values.
  • The model often fails to recognize the same object in a different position.

This makes the model not generalize well to translated images.

Translation Invariance vs. Translation Variance

Concept Meaning Example Behavior
Translation Invariance The model's prediction does not change when the input image is shifted. A CNN recognizes a cat regardless of whether it appears at the top-left or center.
Translation Variance The model's prediction does change when the image is shifted. A fully connected network fails to identify the same plane if it moves a few pixels.

Why CNNs Fix Translation Variance

Convolutional neural networks naturally achieve translation invariance because they use:

  • Local receptive fields – small regions of the image are processed at a time.
  • Weight sharing – one filter slides across the whole image.

This means the same pattern can be detected anywhere in the image, allowing the model to recognize objects regardless of their position.

Summary

  • Translation variance → predictions change when the image is shifted.
  • Fully connected networks → translation-variant (bad for images).
  • CNNs → translation-invariant (ideal for vision tasks).

Further Reading


Contact Us

Name

Email *

Message *

Popular Posts

UGC NET Electronic Science Previous Year Question Papers with Solutions

Home / Engineering & Other Exams / UGC NET 2026 PYQ ⬇️ Download Papers and Solutions 📋 Exam Pattern 💡 Preparation Tips ❓ FAQs 📊 Exam Highlights: Electronic Science (88) Feature Details Junior Research Fellowship (JRF) ₹37,000 + HRA per month Eligibility M.Sc/M.Tech in Electronics (55%) Validity of Certificate JRF (3 Years) | Lectureship (Lifetime) 📥 Download UGC NET Electronics PDFs Complete collection of previous year question papers, answer keys and explanations for Subject Code 88. Start Downloading 📂 View All Question Papers June 2025 - Question Paper Download PDF June 2025 - Solved Paper + Explanation ...

UGC NET Electronic Science June 2025 Question Paper with Answer Key & Detailed Solutions

Home / UGC NET PYQ / June 2025 Solved UGC NET Electronic Science June 2025 Question Paper with Answer Key and Full Explanations 📥 Download Question Paper (PDF) 2025 2024 2023 2022 2021 2020 Explanations 1.  Answer: Option (3) For forming a p-type semiconductor, the dopant must be a trivalent impurity (three valence electrons) so that it creates acceptor levels and holes become the majority carriers. Among the given elements, boron (B) is a group-III element (trivalent). Arsenic (As) and phosphorus (P) are group-V (pentavalent) donors that produce n-type material, and germanium (Ge) is a group-IV element usually used as the semiconductor, not as an acceptor dopant. Hence, doping an intrinsic semiconductor with B produces a p-type semiconductor. 2.  Answer: Option (4) The ohmic resistance of a JFET at zero gate bias is given by the standard relation: R DS(on) = V P / I DSS ...

BER vs SNR for M-ary QAM, M-ary PSK, QPSK, BPSK, ...(MATLAB Code + Simulator)

Bit Error Rate (BER) & SNR Guide Analyze communication system performance with our interactive simulators and MATLAB tools. 📘 Theory 🧮 Simulators 💻 MATLAB Code 📚 Resources BER Definition SNR Formula BER Calculator MATLAB Comparison 📂 Explore M-ary QAM, PSK, and QPSK Topics ▼ 🧮 Constellation Simulator: M-ary QAM 🧮 Constellation Simulator: M-ary PSK 🧮 BER calculation for ASK, FSK, and PSK 🧮 Approaches to BER vs SNR What is Bit Error Rate (BER)? The BER indicates how many corrupted bits are received compared to the total number of bits sent. It is the primary figure of merit f...

Q-function in BER vs SNR Calculation

Q-function in BER vs. SNR Calculation | Interactive Guide Q-function in BER vs. SNR Calculation In digital communications and signal processing, the Q-function plays a significant role in predicting system reliability. It allows engineers to quantify the probability that Gaussian noise will exceed a specific threshold, causing a bit error. What is the Q-function? The Q-function is a mathematical function representing the tail probability of the standard normal (Gaussian) distribution. It is the complementary cumulative distribution function (CCDF) of a standard Gaussian distribution. Q(x) = (1 / √(2Ï€)) ∫â‚“∞ e^(-t² / 2) dt Q-Function Interactive Simulator Move the slider to see how the "Tail Probability" (the area in red) changes. This area represents the Probability of Error (BER) . Threshold Distance ( x ) — (Simulates Increasing SNR) ...

UGC NET Electronic Science December 2024 Question Paper with Answer Key & Detailed Solutions

Home / UGC NET PYQ / June 2025 Solved UGC NET Electronic Science December 2024 Question Paper with Answer Key and Full Explanations 📥 Download Question Paper (PDF) 2025 2024 2023 2022 2021 2020 Q.1 Answer: Option (3) Q.2 Answer: Option (3) Solution 1. JMP SHORT LABEL Intrasegment (within the same code segment). Direct jump. ❌ Not intersegment indirect. 2. JMP 5000H:2000H Intersegment (far jump because both CS and IP are specified). Direct jump (address is explicitly given). ❌ Not indirect. 3. JMP [2000H] The destination address is taken from memory location 2000H. This is indirect. In 8086, a far indirect jump can use a memory operand containing both IP and CS (depending on operand size), making it an intersegment indirect jump. ✅ Correct answer. 4. JMP [BX] Indirect jump through memory addressed by BX. Usually intrasegment (near indirect jump). ❌ Not in...

Which of the following statements are correct? A. If the intermediate frequency is too high, poor selectivity results even if sharp cutoff filters are used in the IF stage.

  61) Which of the following statements are correct?  A. If the intermediate frequency is too high, poor selectivity results even if sharp cutoff filters are used in the IF stage.  B. A high value of intermediate frequency increases tracking difficulties.  C. As the intermediate frequency is lowered, image frequency rejection becomes better.  D. A very low intermediate frequency can make the selectivity too sharp.  Choose the correct answer from the options given below:  1. A and B only [Option ID = 3073]  2. B and C only [Option ID = 3074]  3. C and D only [Option ID = 3075]  4. B and D only [Option ID = 3076 Answer: 4  Previous yr Question papers with Full Explanations → Electronics and Communiaction Study Materials → Try Interactive Online Simulator Run the Simulation The Superheterodyne Principle The...

MATLAB Code for ASK, FSK, and PSK (with Online Simulator)

MATLAB Code for ASK, FSK, and PSK Comprehensive implementation of digital modulation and demodulation techniques with simulation results. 📘 Theory 📡 ASK Code 📶 FSK Code 🎚️ PSK Code 🕹️ Simulator 📚 Further Reading Amplitude Shift Frequency Shift Phase Shift Live Simulator ASK, FSK & PSK HomePage MATLAB Code MATLAB Code for ASK Modulation and Demodulation COPY % The code is written by SalimWireless.Com clc; clear all; close all; % Parameters Tb = 1; fc = 10; N_bits = 10; Fs = 100 * fc; Ts = 1/Fs; samples_per_bit = Fs * Tb; rng(10); binar...

Constellation Diagrams of ASK, PSK, and FSK (with MATLAB Code + Simulator)

Constellation Diagrams: ASK, FSK, and PSK Comprehensive guide to signal space representation, including interactive simulators and MATLAB implementations. 📘 Overview 🧮 Simulator ⚖️ Theory Q-function 📚 Resources BASK (Binary ASK) Modulation Transmits one of two signals: 0 or $\sqrt{E_b}$, representing binary 0 and 1. BFSK (Binary FSK) Modulation Transmits one of two signals: $\sqrt{E_b}$ on the Y-axis or $\sqrt{E_b}$ on the X-axis. These are orthogonal signals. BPSK (Binary PSK) Modulation Transmits $+\sqrt{E_b}$ or $-\sqrt{E_b}$ (antipodal signaling). Signal Space Simulator Visualize Constellation Diagrams with Noise Control. SNR (dB): 15 ...