In this article, we will explore how to classify categories from tabular data stored in a .csv
file using a neural network built with PyTorch. Suppose you're given a dataset where each row corresponds to an instance, and it includes both numerical features and a target class label such as Class1
, Class2
, or Class3
. Your task is to train a model that can predict the correct class based on the input features.
In our example, the target classes are Introvert, Extrovert, and Ambivert, and the dataset contains 29 other columns representing various input features. We aim to build a classification model using a feedforward neural network in PyTorch. This includes defining multiple layers, selecting an appropriate loss function (e.g., CrossEntropyLoss
), and optimizing the model using techniques like the Adam optimizer to improve accuracy.
In the field of machine learning (ML) and deep learning (DL), machines are particularly good at detecting patterns in data. While convolutional neural networks (CNNs) are commonly used for tasks like image recognition, fully connected neural networks are well-suited for tabular classification tasks. These models can learn complex relationships in data and make predictions on abstract categories, such as sentiment, user behavior, or personality type.
In this tutorial, we will use PyTorch to build and train a neural network that classifies individuals into one of the three personality types: Introvert, Extrovert, or Ambivert.
The code is simple and comes with a .ipynb
file (Jupyter Notebook) and a dataset so you can start from scratch.
Steps to Run the Code
If you are using Google Colab:
- 1. Open the
.ipynb
file in Google Colab. - 2. Upload the
.zip
file containing the dataset. - 3. Run the code cells sequentially.
- 4. Test with your own image or data to verify whether the model is working.
If you are using Jupyter Notebook locally:
- 1. If not already installed, install Jupyter Notebook using the command:
pip install jupyter notebook
- 2. Open the notebook using the command:
jupyter notebook
- 3. Run each cell one by one to execute the code.
Code for personality-type classification
import torch
import torch.nn as nn
import torch.nn.functional as F
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
# Load your dataset
# Replace this path with your own file or URL
my_df = pd.read_csv('personality_synthetic_dataset.csv')
# Split features and target
X = my_df.drop('personality_type', axis=1).values # 29 features
y = my_df['personality_type'].values # target
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=10
)
# Convert to PyTorch tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)
# Label encode the target
label_encoder = LabelEncoder()
y_train = torch.LongTensor(label_encoder.fit_transform(y_train))
y_test = torch.LongTensor(label_encoder.transform(y_test))
# Normalize the inputs
X_train_mean = X_train.mean(dim=0)
X_train_std = X_train.std(dim=0)
X_train = (X_train - X_train_mean) / X_train_std
X_test = (X_test - X_train_mean) / X_train_std
# Define the neural network model
class Model(nn.Module):
def __init__(self, in_features=29, h1=64, h2=32, h3=16, out_features=3):
super().__init__()
self.fc1 = nn.Linear(in_features, h1)
self.fc2 = nn.Linear(h1, h2)
self.fc3 = nn.Linear(h2, h3)
self.fc4 = nn.Linear(h3, out_features)
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
return self.fc4(x)
# Instantiate model
model = Model()
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# Train loop
epochs = 100
losses = []
for epoch in range(epochs):
optimizer.zero_grad()
y_pred = model(X_train)
loss = criterion(y_pred, y_train)
loss.backward()
optimizer.step()
losses.append(loss.item())
if epoch % 10 == 0:
print(f'Epoch {epoch}: Loss = {loss.item()}')
View Full Code on GitHub