Project Report: Titanic Dataset Analysis
Executive Summary
This project aimed to analyze the Titanic passenger dataset to gain insights into factors influencing survival rates. We conducted extensive data exploration, data cleaning,
and feature engineering to prepare the data for analysis. We then built a predictive machine learning model to estimate passenger survival. Here are the key findings and insights:
Key Findings and Insights
-
Demographic Insights:
- Age and Gender: The majority of passengers were young adults. More males were on board than females.
- Most passengers were in third class (lower class), followed by first and second class.
-
Survival Rate Analysis
- Overall Survival Rate: The overall survival rate was approximately 38%.
- Passenger Class: First-class passengers had the highest survival rate, followed by second class, and third class.
- Gender: Females had a significantly higher survival rate compared to males.
- Age Groups: Infants and children had higher survival rates than adults.
- Embarkation Port: Passengers who boarded at Cherbourg (C) had a higher survival rate compared to other ports.
Family Relationships
- Family Size: Passengers with small family sizes (1-3 members) had a higher survival rate than those traveling alone or with larger families.
- Traveling Alone: Passengers traveling alone had a lower survival rate compared to those with family members on board.
Fare Analysis
- Fare Distribution: Most passengers paid lower fares, but there was a wide range of fares.
- Passenger Class: First-class passengers paid significantly higher fares on average than second and third-class passengers.
Cabin Investigation
- Limited Cabin Information: The dataset had limited cabin information, with many missing values in the 'Cabin' column.
- Survival Rate: Passengers with cabin information did not show a significant difference in survival rate compared to those without.
Machine Learning Model
- We built a predictive logistic regression model to estimate passenger survival based on selected features.
The model achieved a perfect accuracy score of 1.00, making no prediction errors on the test dataset.
The precision, recall, and F1-Score for both survival and non-survival classes were 1.00, indicating a balanced and accurate model.
Recommendations
Based on our analysis, we recommend considering passenger class, gender, and age as important factors when making predictions or inferences related to survival on the Titanic.
Future analysis could explore more detailed cabin data if available and its impact on survival rates. In conclusion, our analysis of the Titanic dataset revealed significant insights into the factors affecting passenger survival. Our machine learning model demonstrated exceptional predictive accuracy. These findings can be valuable for understanding historical events and serve as a foundation for more complex analyses in the future.
View the code on GitHub:
Titanic Analysis Project