Learn practical skills, build real-world projects, and advance your career
Updated 3 years ago
from os import path
import sys
!{sys.executable} -m pip install opendatasets --upgrade --quiet
import opendatasets as od
import pandas as pd
import numpy as np
!{sys.executable} -m pip install matplotlib --upgrade --quiet
import matplotlib.pyplot as plt
!{sys.executable} -m pip install seaborn --upgrade --quiet
import seaborn as sns
from datetime import datetime
import locale
locale.setlocale(locale.LC_ALL, '')
!{sys.executable} -m pip install jovian --upgrade --quiet
import jovian
Dataset
I've decided to choose and analyse IMDb movie dataset (https://www.kaggle.com/stefanoleone992/imdb-extensive-dataset)
The dataset consists of 4 files:
- "movies" - contains information about movies, their genre, average rating, year,
- "ratings" - contains detailed rating information for each movie, voters age, gender,
- "names" - information about people (not only from movies),
- "title_principals" - contains data connecting people with movies, describing their role in the movie, and additionaly, name in the movie (if available).
The files come with a .csv extension. CSV stands for Comma Separated Values.
Downloading
This dataset is a bit big (above 200MB) so it might take a while while it downloads.
Luckily we have to download it only once :P
if not path.isdir('imdb-extensive-dataset'):
od.download('https://www.kaggle.com/stefanoleone992/imdb-extensive-dataset')
Check the contents
Let's load the acquired files as pandas dataframes. To do so, we'll use read_csv
function.