Skip to main content

Lung_Cancer_PET_DICOM_Classification


Hybrid CNN + Radiomics for PET DICOM Classification

An advanced deep learning approach combining spatial feature extraction with quantitative medical imaging features.



1. Environment Setup & Imports

We initialize the environment using PyTorch and necessary medical imaging libraries like Pydicom and SimpleITK.

# -------------------------
# 1. Imports
# -------------------------
import os
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, roc_curve
import matplotlib.pyplot as plt
import pydicom
import cv2
import SimpleITK as sitk
import kagglehub

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)
Using device: cuda

The Dataset

Downloading DICOM Lung Cancer CT-PET subset from KaggleHub.

Dataset Link: https://www.kaggle.com/datasets/sshhwweettaa/lung-cancer-ct-pet-subset-dicom-format
Class Distribution

Class Distribution Map

2. Data Loading & Splitting

path = kagglehub.dataset_download("sshhwweettaa/lung-cancer-ct-pet-subset-dicom-format")
classes = ["A", "B", "E", "G"]
filepaths, labels = [], []
data_dir = os.path.join(path, "imbalanced_dataset")

for idx, cls in enumerate(classes):
    cls_dir = os.path.join(data_dir, cls)
    for f in os.listdir(cls_dir):
        if f.endswith(".dcm"):
            filepaths.append(os.path.join(cls_dir, f))
            labels.append(idx)

Train / Validation Split:

train_files, val_files, train_labels, val_labels = train_test_split(
    filepaths, labels, test_size=0.2, stratify=labels, random_state=42
)
print("Train size:", len(train_files))
print("Validation size:", len(val_files))
Train size: 14800
Validation size: 3700

3. Hybrid Dataset & Radiomics Extraction

This class handles the dual-stream input: Resized DICOM pixel arrays for the CNN, and statistical features (Mean, Std, Min, Max) for the Radiomics stream.

class HybridPETDataset(Dataset):
    def __init__(self, filepaths, labels):
        self.filepaths = filepaths
        self.labels = labels

    def __len__(self):
        return len(self.filepaths)

    def __getitem__(self, idx):
        filepath = self.filepaths[idx]
        label = self.labels[idx]
        img_dcm = pydicom.dcmread(filepath)
        img = img_dcm.pixel_array.astype(np.float32)

        if img.ndim == 3:
            img = img.mean(axis=-1)

        img = cv2.resize(img, (64,64))
        img = (img - img.min()) / (img.max() - img.min() + 1e-6)

        # Radiomics
        mask_path = filepath.replace(".dcm", "_mask.nii")
        if os.path.exists(mask_path):
            mask_itk = sitk.ReadImage(mask_path)
            mask_array = sitk.GetArrayFromImage(mask_itk)
            masked_pixels = img[mask_array > 0] if np.any(mask_array > 0) else img.flatten()
        else:
            masked_pixels = img.flatten()

        rad_vec = np.array([
            masked_pixels.mean(),
            masked_pixels.std(),
            masked_pixels.min(),
            masked_pixels.max()
        ], dtype=np.float32)

        return torch.tensor(img).unsqueeze(0).float(), torch.tensor(rad_vec).float(), torch.tensor(label).long()

4. Hybrid CNN + Radiomics Architecture

The model concatenates flattened CNN features with processed radiomics vectors before the final classification layers.

class HybridCNNRadiomics(nn.Module):
    def __init__(self, radiomics_dim, num_classes=4):
        super().__init__()
        self.conv1 = nn.Conv2d(1,32,3,padding=1)
        self.conv2 = nn.Conv2d(32,64,3,padding=1)
        self.pool = nn.MaxPool2d(2,2)
        self.rad_fc1 = nn.Linear(radiomics_dim,128)
        self.fc1 = nn.Linear(64*32*32 + 128,256)
        self.fc2 = nn.Linear(256,num_classes)

    def forward(self, x_img, x_rad):
        x = F.relu(self.conv1(x_img))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)
        r = F.relu(self.rad_fc1(x_rad))
        combined = torch.cat((x,r), dim=1)
        return self.fc2(F.relu(self.fc1(combined)))

5. Model Training

optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
criterion = nn.CrossEntropyLoss()

for epoch in range(15):
    model.train()
    # ... training loop logic ...
    print(f"Epoch {epoch+1}: Loss = {avg_loss:.4f}")
Epoch 1: Loss = 0.5901
Epoch 5: Loss = 0.0600
Epoch 10: Loss = 0.0193
Epoch 15: Loss = 0.0040
Loss Curve

6. Final Evaluation (ROC-AUC)

The final performance is measured using the Area Under the Receiver Operating Characteristic Curve (ROC-AUC) using a One-vs-Rest (OvR) strategy.

model.eval()
y_true, y_scores = [], []
with torch.no_grad():
    for imgs, radiomics, labels in val_loader:
        out = model(imgs.to(device), radiomics.to(device))
        prob = torch.softmax(out, dim=1)
        y_true.extend(labels.numpy())
        y_scores.extend(prob.cpu().numpy())

roc_auc = roc_auc_score(y_true_bin, y_scores_np, multi_class='ovr')
print("ROC-AUC:", roc_auc)
ROC-AUC: 0.998938374226931

People are good at skipping over material they already know!

View Related Topics to







Contact Us

Name

Email *

Message *

Popular Posts

BER vs SNR for M-ary QAM, M-ary PSK, QPSK, BPSK, ...(MATLAB Code + Simulator)

Bit Error Rate (BER) & SNR Guide Analyze communication system performance with our interactive simulators and MATLAB tools. 📘 Theory 🧮 Simulators 💻 MATLAB Code 📚 Resources BER Definition SNR Formula BER Calculator MATLAB Comparison 📂 Explore M-ary QAM, PSK, and QPSK Topics ▼ 🧮 Constellation Simulator: M-ary QAM 🧮 Constellation Simulator: M-ary PSK 🧮 BER calculation for ASK, FSK, and PSK 🧮 Approaches to BER vs SNR What is Bit Error Rate (BER)? The BER indicates how many corrupted bits are received compared to the total number of bits sent. It is the primary figure of merit for a...

ASK, FSK, and PSK (with MATLAB + Online Simulator)

📘 ASK Theory 📘 FSK Theory 📘 PSK Theory 📊 Comparison 🧮 MATLAB Codes 🎮 Simulator ASK or OFF ON Keying ASK is a simple (less complex) Digital Modulation Scheme where we vary the modulation signal's amplitude or voltage by the message signal's amplitude or voltage. We select two levels (two different voltage levels) for transmitting modulated message signals. Example: "+5 Volt" (upper level) and "0 Volt" (lower level). To transmit binary bit "1", the transmitter sends "+5 Volts", and for bit "0", it sends no power. The receiver uses filters to detect whether a binary "1" or "0" was transmitted. Fig 1: Output of ASK, FSK, and PSK modulation using MATLAB for a data stream "1 1 0 0 1 0 1 0" ( Get MATLAB Code ) ...

Calculation of SNR from FFT bins in MATLAB

📘 Overview 💻 FFT Bin Method 💻 Kaiser Window 📚 Further Reading SNR Estimation Overview In digital signal processing, estimating the Signal-to-Noise Ratio (SNR) accurately is crucial. Below, we demonstrate how to calculate SNR from periodogram and FFT bins using the Kaiser Window . The beta (β) parameter is the key—it allows you to control the trade-off between main-lobe width and side-lobe levels for precise spectral analysis. 1 Define Sampling rate and Time vector 2 Compute FFT and Periodogram PSD 3 Identify Signal Bin and Frequency resolution 4 Segment Signal Power from Noise floor 5 Logarithmic calculation of SNR in dB Method 1: Estimation from FFT Bins This approach uses a Hamming window to estimate SNR directly from the spectral bins. MATLAB Source Code Copy Code clc...

Online Simulator for ASK, FSK, and PSK

Try our new Digital Signal Processing Simulator!   •   Interactive ASK, FSK, and BPSK tools updated for 2025. Start Now Interactive Modulation Simulators Visualize binary modulation techniques (ASK, FSK, BPSK) in real-time with adjustable carrier and sampling parameters. 📡 ASK Simulator 📶 FSK Simulator 🎚️ BPSK Simulator 📚 More Topics ASK Modulator FSK Modulator BPSK Modulator More Topics Simulator for Binary ASK Modulation Digital Message Bits Carrier Freq (Hz) Sampling Rate (...

Constellation Diagrams of ASK, PSK, and FSK (with MATLAB Code + Simulator)

Constellation Diagrams: ASK, FSK, and PSK Comprehensive guide to signal space representation, including interactive simulators and MATLAB implementations. 📘 Overview 🧮 Simulator ⚖️ Theory 📚 Resources Definitions Constellation Tool Key Points MATLAB Code 📂 Other Topics: M-ary PSK & QAM Diagrams ▼ 🧮 Simulator for M-ary PSK Constellation 🧮 Simulator for M-ary QAM Constellation BASK (Binary ASK) Modulation Transmits one of two signals: 0 or -√Eb, where Eb​ is the energy per bit. These signals represent binary 0 and 1. BFSK (Binary FSK) Modulation Transmits one ...

FIR vs IIR Digital Filters and Recursive vs Non Recursive Filters

Filters >> FIR vs. IIR Digital Filters and Recursive vs. Non-Recursive Filters Key Features The higher the order of a filter, the sharper the stopband transition The sharpness of FIR and IIR filters is very different for the same order A FIR filter has an equal time delay at all frequencies, while the IIR filter's time delay varies with frequency. Usually, the biggest time delay in the IIR filter is at the filter's cutoff frequency. The term 'IR' (impulse response) is in both FIR and IIR. The term 'impulse response' refers to the appearance of the filter in the time domain. 1. What Is the Difference Between an FIR and an IIR Filters? The two major classifications of digital filters used for signal filtration are FIR and IIR....

MATLAB Code for ASK, FSK, and PSK (with Online Simulator)

MATLAB Code for ASK, FSK, and PSK Comprehensive implementation of digital modulation and demodulation techniques with simulation results. 📘 Theory 📡 ASK Code 📶 FSK Code 🎚️ PSK Code 🕹️ Simulator 📚 Further Reading Amplitude Shift Frequency Shift Phase Shift Live Simulator ASK, FSK & PSK HomePage MATLAB Code MATLAB Code for ASK Modulation and Demodulation COPY % The code is written by SalimWireless.Com clc; clear all; close all; % Parameters Tb = 1; fc = 10; N_bits = 10; Fs = 100 * fc; Ts = 1/Fs; samples_per_bit = Fs * Tb; rng(10); binar...

Theoretical BER vs SNR for m-ary PSK and QAM

Relationship Between Bit Error Rate (BER) and Signal-to-Noise Ratio (SNR) The relationship between Bit Error Rate (BER) and Signal-to-Noise Ratio (SNR) is a fundamental concept in digital communication systems. Here’s a detailed explanation: BER (Bit Error Rate): The ratio of the number of bits incorrectly received to the total number of bits transmitted. It measures the quality of the communication link. SNR (Signal-to-Noise Ratio): The ratio of the signal power to the noise power, indicating how much the signal is corrupted by noise. Relationship The BER typically decreases as the SNR increases. This relationship helps evaluate the performance of various modulation schemes. BPSK (Binary Phase Shift Keying) Simple and robust. BER in AWGN channel: BER = 0.5 × erfc(√SNR) Performs well at low SNR. QPSK (Quadrature...