profile

Palak Guptađź‘‹

Turning data into insights with my Strategic Data Analysis

Book A call
project-details-1

Portfolio Project 8:

Customer Churn Analysis

Services:

Data Analysis

Github

Overview

The Customer Churn Analysis project aimed to identify customers who are likely to leave a service or subscription and understand the key factors influencing churn behavior. By leveraging historical customer data and predictive modeling, the project sought to help businesses improve retention strategies, reduce churn rates, and enhance customer satisfaction.

Research: The project began with researching churn behavior in various industries—telecom, retail, and SaaS. The focus was on understanding common reasons for customer churn, such as poor service, pricing issues, lack of engagement, and better competitor offerings

Information Architecture: The dataset included features such as customer demographics (age, gender, location), account tenure, usage metrics (monthly charges, total charges, number of services), customer support interactions, and churn status. Data preprocessing involved handling missing values, encoding categorical variables, and scaling numerical features for model readiness.

Wireframing and Prototyping: Exploratory Data Analysis (EDA) visualizations were created using Seaborn and Power BI, including churn distribution, service usage patterns, and tenure heatmaps. Predictive models like Logistic Regression, Random Forest, and XGBoost were developed. A user-friendly Streamlit dashboard prototype was also created for real-time churn prediction based on new customer data.

Challenges

Class Imbalance:
  • Challenge: Churned customers often represent a small percentage of the overall dataset, leading to biased models.
  • Solution:Applied resampling techniques like SMOTE (Synthetic Minority Over-sampling Technique) and adjusted class weights during model training to handle imbalance.
Feature Importance:
  • Challenge:Identifying which features most strongly influence customer churn.
  • Solution:Used SHAP (SHapley Additive exPlanations) values and feature importance rankings from tree-based models to interpret the key drivers of churn, such as monthly charges, tenure, and service usage.
Model Interpretability vs. Accuracy:
  • Challenge:Balancing explainability with predictive power.
  • Solution: While complex models like XGBoost gave better accuracy, simpler models like Logistic Regression were used alongside for better stakeholder understanding and interpretability.
Customer Segmentation:
  • Challenge:Not all customers churn for the same reasons; generic retention strategies were ineffective.
  • Solution:Applied K-Means clustering to segment users based on behavior and designed tailored retention plans for each segment.

Results/Conclusion:

The churn prediction model achieved an accuracy of over 85% and an AUC-ROC score of 0.91, successfully identifying high-risk customers. Key churn indicators included high monthly bills, short tenure, and multiple service downgrades. The business team used these insights to offer targeted retention strategies like loyalty programs and personalized outreach, resulting in an estimated 18% reduction in churn over a quarter. This project showcased practical applications of machine learning in business and underlined the importance of blending technical insights with human-centered strategy.

banner-shape-1
banner-shape-1
object-3d-1
object-3d-2