CHOCOLATE BAR RECIPE TREND ANALYSIS
(2006-2020)
(2006-2020)
Here we will study chocolate bar ingredient trends, preferences by companies and its rating. We will mostly use Numpy, Pandas to compute the results and, Matplotlib & Seaborn for plotting graphs. The dataset used in this project is taken from kaggle.com and contains data about 66 chocolate bar companies such as 'company', 'company_location', 'country_of_bean_origin', 'review_date', chocolate 'rating', 'cocoa_percent', common ingredients and tastes information.
According to Flavors of Cacao Rating Scale:
4.0 - 5.0 = Outstanding
3.5 - 3.9 = Highly Recommended
3.0 - 3.49 = Recommended
2.0 - 2.9 = Disappointing
1.0 - 1.9 = Unpleasant
Table of contents
-
Downloading the Dataset
- Python’s OS module
- Pandas "read_csv” method
-
Data Preparation and Cleaning
- .columns,.sample(), .shape, .sum()
- .unique(), .apply(), lambda
- .value_counts(), .info(), .describe()
-
Exploratory Analysis and Visualization
- Company And Ingredients
- Tastes
- Percentage of Cocoa and Variation Over Years
- Rating and Cocoa Percent
- Correlation between different columns
-
Questions and Answers
- How presence of cocoa butter and lecithin effect rating in latest three years (2018-2020)?
- How much cocoa is actually preferred by top companies?
- From which countries, top companies import cocoa beans?
- What must have been the recipe of top rated chocolate in the year 2019?
- What tastes are in top rated chocolates during 2016-2020?
- What are the major regions of chocolate, companies of which, generally makes it to Top 50?
-
Inferences and Conclusion
-
References and Future Work
Downloading the Dataset
- Let's download our dataset from kaggle.com in our local computer folder for analysis or download the same using the
opendatasets
Python library - This data is in CSV format.
- This same original dataset will be later uploaded at Jovial files section along with this project, using command:
jovian.commit(project=project_name, files=['chocolate-bar-2020/chocolate.csv'])
- If downloaded in a local computer, we create a local Conda environment and install the required libraries by running the commands at terminal, for example:
conda create -n zerotopandas -y python=3.8
conda activate zerotopandas
pip install jovian jupyter numpy pandas matplotlib seaborn --upgrade
We can now start the Jupyter notebook using command:
jupyter notebook
!pip install opendatasets --upgrade --quiet
dataset_url = 'https://www.kaggle.com/soroushghaderi/chocolate-bar-2020'