Palak Guptađź‘‹
Turning data into insights with my Strategic Data Analysis
Turning data into insights with my Strategic Data Analysis
Portfolio Project 7:
The COVID-19 Data Prediction project focused on analyzing and forecasting the spread of COVID-19 using historical case data. The aim was to build predictive models that could estimate future infection rates and help health agencies, policymakers, and the public prepare for potential surges. The analysis involved trend visualization, statistical modeling, and machine learning techniques to forecast daily cases, recoveries, and deaths.
Research:The research phase involved studying publicly available COVID-19 datasets from sources like Johns Hopkins University and Kaggle. Key variables included daily confirmed cases, deaths, recoveries, testing rates, and vaccination status. Additional features like population density and lockdown dates were also reviewed to assess their impact on infection spread.
Information Architecture: The data was organized by country, region, and date. After extensive cleaning—handling missing values, normalizing population data, and dealing with reporting inconsistencies—the dataset was prepared for time series modeling and regression analysis.
Time series plots (using Matplotlib and Plotly) and dashboards (in Power BI) were created to display trends in confirmed cases, death rates, and recovery patterns. Prototypes of forecasting models were built using ARIMA, Facebook Prophet, and LSTM neural networks to simulate future case trajectories.
The project successfully demonstrated the use of data science in real-world crisis prediction. LSTM-based models outperformed traditional time series models, especially for longer forecast windows. Forecasts helped predict peaks and declines with reasonable accuracy, and region-specific insights were derived regarding infection trends and recovery speed. The findings were visualized on interactive dashboards, aiding public understanding and institutional decision-making. This project highlighted the importance of data quality, external factor inclusion, and transparent modeling in health analytics. Future work could involve real-time data pipelines and integration with hospital resource prediction models.