In machine learning and deep learning (ML/DL), machines are quite effective at recognizing patterns. They apply various convolutional operations to extract meaningful features for tasks such as object recognition. These systems can identify not only objects but also more abstract patterns—such as word sentiment or classifying inputs into multiple categories.
Today, artificial intelligence (AI) has become so advanced that it can converse like a human, provided it has some contextual input or product details. AI can also summarize and translate languages in real time using different pretrained models. These models are trained on millions of data samples, making them highly accurate. For example, ResNet-18 is a pretrained model trained on millions of images, and it is considered highly effective for image classification tasks.
In this tutorial, we will use the PyTorch library to classify images. PyTorch is a widely-used library for deep learning in Python. It provides modules for building neural networks, calculating loss, optimizers, and more.
The code is simple and comes with a .ipynb
file (Jupyter Notebook) and a dataset so you can start from scratch.
Steps to Run the Code
If you are using Google Colab:
- 1. Open the
.ipynb
file in Google Colab. - 2. Upload the
.zip
file containing the dataset. - 3. Run the code cells sequentially.
- 4. Test with your own image or data to verify whether the model is working.
If you are using Jupyter Notebook locally:
- 1. If not already installed, install Jupyter Notebook using the command:
pip install jupyter notebook
- 2. Open the notebook using the command:
jupyter notebook
- 3. Run each cell one by one to execute the code.
Code
import torch
import torchvision.transforms as transforms
from torchvision import datasets, models
from torchvision.models import ResNet18_Weights
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim
# Use GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Image transformations
transform = transforms.Compose([
transforms.Resize((128, 128)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
# Load dataset (must be in ImageFolder format)
dataset = datasets.ImageFolder(root="dataset", transform=transform)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
label_names = dataset.classes
num_classes = len(label_names)
# Load pretrained model
model = models.resnet18(weights=ResNet18_Weights.DEFAULT)
# Freeze all layers
for param in model.parameters():
param.requires_grad = False
# Replace final fully connected layer to match number of classes
model.fc = nn.Linear(model.fc.in_features, num_classes)
# Unfreeze final layer
for param in model.fc.parameters():
param.requires_grad = True
model.to(device)
# Define loss and optimizer (only final layer is being optimized)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)
# Training loop
for epoch in range(5):
running_loss = 0.0
model.train()
for images, labels in dataloader:
images, labels = images.to(device), labels.to(device)
outputs = model(images)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
print(f"Epoch {epoch+1}, Loss: {running_loss / len(dataloader):.4f}")
View Full Code on GitHub