Palak Gupta👋
Turning data into insights with my Strategic Data Analysis
Turning data into insights with my Strategic Data Analysis
Portfolio Project 9:
The Spam Classifier Model project focused on developing a machine learning model that can accurately distinguish between spam and legitimate (ham) messages. The objective was to automate spam detection in SMS or email systems using Natural Language Processing (NLP) techniques and classification algorithms. This model enhances user experience by reducing unnecessary or malicious communication.
Research:The research phase involved analyzing common characteristics of spam messages, such as use of promotional keywords, urgency, links, and unusual formatting. A labeled SMS Spam Collection dataset was used, comprising thousands of real-world messages marked as 'spam' or 'ham'.
Information Architecture:The dataset was cleaned by removing stop words, punctuation, and performing tokenization, stemming, and lemmatization. Text vectorization methods like TF-IDF (Term Frequency–Inverse Document Frequency) and CountVectorizer were used to convert raw text into numerical features suitable for machine learning models.
Wireframing and Prototyping:A basic user interface prototype was created using Streamlit, where users could enter a message to check if it’s spam. Behind the scenes, models like Multinomial Naive Bayes, Logistic Regression, and Support Vector Machines (SVM) were trained and compared using accuracy, precision, recall, and F1-score metrics.
The final model, based on Multinomial Naive Bayes with TF-IDF vectorization, achieved ~98% accuracy and a F1-score of 0.97 on the test set. It successfully detected spam messages containing promotional content, phishing links, and urgent call-to-actions. The real-time Streamlit app provided an intuitive way to classify messages, demonstrating the model’s practicality for integration into email/SMS systems. The project highlighted how simple yet powerful NLP techniques can significantly improve digital communication hygiene and user trust.