Learn practical skills, build real-world projects, and advance your career
Updated 4 years ago
Exploratory Data Analysis with Chocolate
Analysing the dataset of Chocolate Bar Ratings to unearth key insights hidden within the data
Context
Chocolate is one of the most popular candies in the world. Each year, residents of the United States collectively eat more than 2.8 billions pounds. However, not all chocolate bars are created equal! This dataset contains expert ratings of over 1,700 individual chocolate bars, along with information on their regional origin, percentage of cocoa, the variety of chocolate bean used and where the beans were grown.
Rating System
- 5= Elite (Transcending beyond the ordinary limits)
- 4= Premium (Superior flavor development, character and style)
- 3= Satisfactory(3.0) to praiseworthy(3.75) (well made with special qualities)
- 2= Disappointing (Passable but contains at least one significant flaw)
- 1= Unpleasant (mostly unpalatable)
Acknowledgements
- The dataset used here have been acquired from Rachael Tatman's Chocolate Bar Ratings dataset on Kaggle.
- The original ratings were compiled by Brady Brelinski, Founding Member of the Manhattan Chocolate Society. For up-to-date information, as well as additional content (including interviews with craft chocolate makers), please see his website: Flavors of Cacao
Loading Data
# Import necessary libraries
import pandas as pd #data-wrangling library
import matplotlib.pyplot as plt #data-visualization library
import seaborn as sns #data-visualization library
# load the dataset from local storage
df=pd.read_csv("Dataset/flavors_of_cacao.csv")
# Understanding the basic ground information of my data
def all_about_my_data(df):
print("Here is some Basic Ground Info about your Data:\n")
# Shape of the dataframe
print("Number of Instances:",df.shape[0])
print("Number of Features:",df.shape[1])
# Summary Stats
print("\nSummary Stats:")
print(df.describe())
# Missing Value Inspection
print("\nMissing Values:")
print(df.isna().sum())
all_about_my_data(df)
Here is some Basic Ground Info about your Data:
Number of Instances: 1795
Number of Features: 9
Summary Stats:
REF Review\nDate Rating
count 1795.000000 1795.000000 1795.000000
mean 1035.904735 2012.325348 3.185933
std 552.886365 2.927210 0.478062
min 5.000000 2006.000000 1.000000
25% 576.000000 2010.000000 2.875000
50% 1069.000000 2013.000000 3.250000
75% 1502.000000 2015.000000 3.500000
max 1952.000000 2017.000000 5.000000
Missing Values:
Company \n(Maker-if known) 0
Specific Bean Origin\nor Bar Name 0
REF 0
Review\nDate 0
Cocoa\nPercent 0
Company\nLocation 0
Rating 0
Bean\nType 1
Broad Bean\nOrigin 1
dtype: int64