Overview
The Malignant Comment Classifier leverages Natural Language Processing (NLP) and machine learning to identify and moderate toxic, hateful, and abusive language on social media platforms — ensuring a safer online environment.
Objective
- Detect and classify online comments into toxicity levels
- Minimize false positives to maintain freedom of expression
- Enable real-time moderation using a lightweight model
Dataset
The project uses Kaggle’s Toxic Comment Classification Challenge dataset containing over 159,000 comments labeled under:
- toxic
- severe_toxic
- obscene
- threat
- insult
- identity_hate
Data Preprocessing
Text was cleaned and prepared using regular expressions, NLTK stopword removal, and TF-IDF vectorization for feature extraction.
import re, pandas as pd
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
data = pd.read_csv("train.csv")
data["comment_text"] = data["comment_text"].apply(
lambda x: re.sub('[^a-zA-Z]', ' ', x.lower())
)
vectorizer = TfidfVectorizer(max_features=10000, stop_words='english')
X = vectorizer.fit_transform(data["comment_text"])
Model Development
A Logistic Regression classifier was used for multi-label classification. The model predicts toxicity probability across multiple labels for each comment.
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, pred))
print(classification_report(y_test, pred))
Performance Metrics
- Accuracy: 95.8%
- Cross-validation: 95.82%
- ROC-AUC: ≈ 0.97
The model achieved high performance, effectively distinguishing between toxic and non-toxic comments, with balanced precision and recall.
Deployment
The trained model was serialized using pickle for web deployment.
It can be integrated into REST APIs or moderation systems to detect toxicity in real-time.
import pickle
with open('malignant_model.pkl', 'wb') as f:
pickle.dump(model, f)
Conclusion
This project demonstrates the power of classic ML models in NLP-based moderation. Future work includes upgrading to transformer-based models like BERT and incorporating multilingual datasets for broader coverage.