Learn practical skills, build real-world projects, and advance your career

Exploratory Data Analysis with Chocolate

Analysing the dataset of Chocolate Bar Ratings to unearth key insights hidden within the data

Context

Chocolate is one of the most popular candies in the world. Each year, residents of the United States collectively eat more than 2.8 billions pounds. However, not all chocolate bars are created equal! This dataset contains expert ratings of over 1,700 individual chocolate bars, along with information on their regional origin, percentage of cocoa, the variety of chocolate bean used and where the beans were grown.

Rating System

  • 5= Elite (Transcending beyond the ordinary limits)
  • 4= Premium (Superior flavor development, character and style)
  • 3= Satisfactory(3.0) to praiseworthy(3.75) (well made with special qualities)
  • 2= Disappointing (Passable but contains at least one significant flaw)
  • 1= Unpleasant (mostly unpalatable)

Acknowledgements

  • The dataset used here have been acquired from Rachael Tatman's Chocolate Bar Ratings dataset on Kaggle.
  • The original ratings were compiled by Brady Brelinski, Founding Member of the Manhattan Chocolate Society. For up-to-date information, as well as additional content (including interviews with craft chocolate makers), please see his website: Flavors of Cacao

Loading Data

# Import necessary libraries
import pandas as pd #data-wrangling library
import matplotlib.pyplot as plt #data-visualization library
import seaborn as sns #data-visualization library
# load the dataset from local storage
df=pd.read_csv("Dataset/flavors_of_cacao.csv")

# Understanding the basic ground information of my data
def all_about_my_data(df):
    print("Here is some Basic Ground Info about your Data:\n")
    
    # Shape of the dataframe
    print("Number of Instances:",df.shape[0])
    print("Number of Features:",df.shape[1])
    
    # Summary Stats
    print("\nSummary Stats:")
    print(df.describe())
    
    # Missing Value Inspection
    print("\nMissing Values:")
    print(df.isna().sum())

all_about_my_data(df)
Here is some Basic Ground Info about your Data: Number of Instances: 1795 Number of Features: 9 Summary Stats: REF Review\nDate Rating count 1795.000000 1795.000000 1795.000000 mean 1035.904735 2012.325348 3.185933 std 552.886365 2.927210 0.478062 min 5.000000 2006.000000 1.000000 25% 576.000000 2010.000000 2.875000 50% 1069.000000 2013.000000 3.250000 75% 1502.000000 2015.000000 3.500000 max 1952.000000 2017.000000 5.000000 Missing Values: Company \n(Maker-if known) 0 Specific Bean Origin\nor Bar Name 0 REF 0 Review\nDate 0 Cocoa\nPercent 0 Company\nLocation 0 Rating 0 Bean\nType 1 Broad Bean\nOrigin 1 dtype: int64