Data-driven strategist turning insights into impact
This project is maintained by kevinjbts
Leveraging a sample project by Alex the Analyst, utilized the IMDB Movie database to explore correlation between various features within the dataset against movie performance. The initial hypothesis is that movie votes and gross profit have a positive correlation. Utilized Python to clean and analyze data using key visualizations including regression, heatmaps and correlation matrixes.
The dataset contains 16 features and 7668 observations. The 8 features include string, integer and categorical data types. To clean the dataset, the following steps were taken:
As a Data Analyst & Strategist tasked with providng reccomendations to NEWCO’s founder, the EDA on current e-commerce sales resulted in a relevant action plan that can be iterated in the future. From the analysis, the top 10 States by Sales Volume represented $30k in sales, while also having an evenly distributed number of customers (no outliers). Because of this, allocating higher advertising towards these regions should be tested for improved ROAS vs. underperforming states.
In addition to adjusting advertising targeting, the Sales Analysis creates segmentation of the top 20% of purchasers for the company, representing a highly engaged audience for customized marketing campaigns and / or for sales reps.
Lastly, the insights able to be generated are limited by the features and current dataset. Future customer surveys and / or data collection can be leveraged to offer new opportunities to analyze performance, while further guiding advertising targeting.
View the full Python EDA in Jupyter Notebook here:
Based on a basic regression, there is a positive correlation between gross profit and total movie budget.
After developing a heatmap of features within the IMDB database, movie votes and gross profit seem to be correlated (.63) as well as gross profit and movie budget (0.74) which further supports the previous analysis.
This is further expanded when all features are leveraged, and the same features rise to the surface as the top correlated features.
When more specifically charting gross earnings vs. budget, we see the positive correlation.
Based on a few various visualizations, it is clear that Gross Earnings and Budget are heavily correlated.
While at a larger level, gross earnings are highly correlated to budget, it would be interesting to also look into movies which over-indexed on gross earnings below a budget threshold. If we know that high budget movies tend to perform well, seeing any correlation between movies that had lower budgets but over-indexed on earnings might help to show how to create higher-earning movies without the expansive budgets necessary. Or for smaller producers looking at trends that can help them to over-perform with their films.