Palak Gupta -Data Analyst

Portfolio Project 2:

Cyber Security Anomaly Detection

Services:

Data Analysis | Machine Learning | Cyber Security | Python

Overview

The Cybersecurity Anomaly Detection project aimed to build a robust AI-driven framework capable of identifying malicious activities in network traffic using unsupervised learning models. By analyzing datasets like NSL-KDD and CIC-IDS2017, the goal was to detect anomalies in real-time and support proactive threat mitigation strategies in modern cybersecurity environments, especially relevant to emerging digital spaces like the Metaverse.

Research: The project began with understanding various cyber threats—such as DoS, DDoS, port scanning, and data exfiltration—by analyzing the structure of known cybersecurity datasets. Research also focused on how anomaly detection methods outperform signature-based systems for detecting zero-day attacks.

Information Architecture: Data was preprocessed by encoding categorical variables, normalizing numerical features, and labeling normal vs. anomalous events. Data pipelines were structured to support batch and real-time processing, making the architecture scalable and modular..

Wireframing and Prototyping: Initial wireframes outlined how anomalies would be detected and visualized. A prototype dashboard was created to display real-time alerts, SHAP-based model explanations, and threat classification summaries.

Challenges

High Dimensionality:

Challenge: Network traffic data had hundreds of features, making it computationally expensive and harder to interpret.
Solution:Feature selection techniques (e.g., PCA and mutual information scores) were used to retain only the most informative features.

Imbalanced Data:

Challenge: Majority of traffic was normal, with very few labeled anomalies.
Solution:Used unsupervised models like Isolation Forest and Autoencoders which don't rely on labeled data, making them ideal for skewed distributions.

Model Explainability:

Challenge:: Security teams require transparency in AI decisions.
Solution: Integrated SHAP (SHapley Additive exPlanations) to explain why a model flagged specific network events as anomalies.

Real-Time Integration:

Challenge: Processing high-velocity data in real time while maintaining accuracy.
Solution:Designed a lightweight inference pipeline and used stream processing frameworks to integrate model outputs into alert systems.

Results/Conclusion:

The anomaly detection system successfully flagged previously unseen threats with high accuracy and low false-positive rates. The hybrid use of Isolation Forest, Autoencoders, and K-Means allowed for robust detection across different attack types. SHAP visualizations added transparency, helping cybersecurity analysts trust and act on model predictions. The tool is adaptable for securing digital platforms like the Metaverse, and future improvements may include online learning for model updates and integration with SIEM systems for enterprise-scale deployment.

Let's 👋 Work Together Let's 👋 Work Together

Palak Gupta👋