Learn practical skills, build real-world projects, and advance your career

Content Based Recommender System

A content-based filtering system first uses the metadata of new products when creating recommendations to a user. It then matches these features to the categories and descriptions of existing products.

Content-based recommendation systems may be used in a variety of domains ranging from recommending web pages, news articles, restaurants, television programs, and hotels. The advantage of content-based filtering is that it doesn’t have a cold-start problem. If you just start out a new website, or any new products can be recommended right away.

Let’s assume we are starting a new ecommerce platform, and we have merchants signed up with products that they are willing to sell on our platform, and we start seeing traffic coming from our website users, but we don’t have any users history, therefore, we are going to build a content-based recommendation systems to analyze product descriptions to identify products that are of particular interest to the user.

We would like to recommend products based on the products that a user has already booked or viewed using the cosine similarity. We would recommend products with the largest similarity to the ones previously booked or viewed or showed interest by the user. Our recommender system is highly dependent on defining an appropriate similarity measure. Eventually, we select a subset of products to display to the user or to determine an order in which to display the products.

import pandas as pd
import numpy as np
from nltk.corpus import stopwords
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import re
import random
import plotly.graph_objs as go
import chart_studio.plotly as py
import cufflinks

pd.options.display.max_columns = 30
from IPython.core.interactiveshell import InteractiveShell
import plotly.figure_factory as ff
InteractiveShell.ast_node_interactivity = 'all'
from plotly.offline import iplot
cufflinks.go_offline()
cufflinks.set_config_file(world_readable=True, theme='solar')

# import pyLDAvis to interpret the topics in a topic model that has been fit to a corpus of text data 
import pyLDAvis
import pyLDAvis.sklearn
pyLDAvis.enable_notebook()
df = pd.read_excel('Test_Pandas.xlsx')
df.head()
print('We have', df['item_name'].nunique(), 'unique products')
We have 118345 unique products