Movie Analysis Project
Overview
This project aims to analyze and gain insights from a dataset containing information about movies. It utilizes Python, various data analysis libraries, and data visualization techniques to explore the dataset and answer questions related to movie revenue, budgets, ratings, and more.
Tasks
-
Data Import and Overview
- Import the necessary Python libraries for data analysis and visualization.
- Read the movie data from a CSV file.
- Check for missing data in the dataset.
- Display the data types of each column.
-
Data Exploration and Visualization
- Visualize potential outliers in the 'gross' column using a box plot.
- Remove duplicate rows from the dataset.
- Sort the data by 'gross' in descending order.
- Create scatter plots to examine the relationships between 'gross' and 'budget,' as well as 'score' and 'gross.'
-
Correlation Analysis
- Calculate correlation matrices using different methods (Pearson, Kendall, and Spearman) to explore relationships between numeric columns.
- Visualize the correlation matrices using heatmaps.
- Perform correlation analysis for categorical data, converting categorical columns to numerical values.
- Visualize the correlation matrix for the entire dataset.
-
Exploring Categorical Correlations
- Analyze correlations between categorical variables in the dataset.
- Identify and display pairs of variables with high correlations.
-
Company and Revenue Analysis
- Analyze the top 15 movie production companies by gross revenue.
- Group and analyze gross revenue by company and year.
- Group and analyze gross revenue by company.
-
Budget vs. Gross Visualization
- Create a scatter plot to visualize the relationship between movie budgets and gross earnings.
-
Data Preparation for Correlation Analysis
- Convert categorical columns to numerical values for correlation analysis.
-
Final Correlation Matrix Visualization
- Calculate and visualize the correlation matrix for the dataset after data preparation.
-
Rating and Gross Visualization
- Create swarm and strip plots to visualize the relationship between movie ratings and gross earnings.
Project Outcome
The project aims to provide insights into the relationships between various factors such as budget, rating, and production company with the gross revenue of movies. It uses data analysis techniques and visualizations to help stakeholders make informed decisions related to the movie industry, such as identifying successful production companies, understanding the impact of budgets on earnings, and exploring the relationship between ratings and revenue.
Note: The actual project may involve further analysis, interpretation of findings, and potentially drawing conclusions or making recommendations based on the insights gained from the data analysis.
View the code on GitHub:
Movie Analysis Project