Jovian
⭐️
Sign In

This notebook illustrates one of the graph based Key Phrase Extraction (SingleRank) on Openshift 4 dataset.

Outline

  • Download the dataset
  • Preprocessing
  • Initialize SingleRank
  • Extract keyphrases
  • Dump the results
In [4]:
import pke
import pandas as pd
In [47]:
# skips useless warnings in the pke methods
import logging

logging.basicConfig(level=logging.CRITICAL)
In [57]:
def keyphrases(text):
    
    # define the set of valid Part Of Speech tags 
    pos = {'NOUN', 'PROPN', 'ADJ'}
    
    #create a SingleRank extractor
    singleRank_extractor = pke.unsupervised.SingleRank()
    
    # load the content of the document
    singleRank_extractor.load_document(input=text, language='en', normalization=None)
    
    # candidate selection (select the longest sequences of nouns and adjectives as candidates)
    singleRank_extractor.candidate_selection(pos)
    
    # candidate_weighing
    # candidate phrases are weighted using sum of their word's scores computed
    # using random walk. In graph, nodes are words of certain part-of-speech(nouns & adjectives)
    # that are connected if they occur in a window of 10 words
    singleRank_extractor.candidate_weighting(window=10, pos=pos)
    
    # rank the keyphrase and get the 10-higest scored candidates
    keyphrases_with_scores = singleRank_extractor.get_n_best(n=10)
    phrases = [keyphrase for keyphrase, score in keyphrases_with_scores]
    
    return phrases
In [68]:
df=pd.read_csv('openshift4_demo.csv')
In [69]:
df.head()
Out[69]:
In [70]:
text_content = df['allTitle'][3]

Single Rank initialization

In [71]:
# define the set of valid POS
pos = {'NOUN', 'PROPN', 'ADJ'}
In [72]:
#create a SingleRank extractor
singleRank_extractor = pke.unsupervised.SingleRank()
In [73]:
# load the content of the document
singleRank_extractor.load_document(input=text_content, language='en', normalization=None)

Keyphrase Extraction

In [74]:
# candidate selection
singleRank_extractor.candidate_selection(pos)
In [75]:
# candidate_weighting using the default weighing scheme
singleRank_extractor.candidate_weighting(window=10, pos=pos)
In [76]:
keyphrases_with_scores = singleRank_extractor.get_n_best(n=10); keyphrases_with_scores
WARNING:root:Not enough candidates to choose from (10 requested, 3 given)
Out[76]:
[('ssh bastion pod', 0.3750001099999999),
 ('openshift container platform', 0.37500003999999987),
 ('cluster nodes', 0.25000007999999996)]
In [77]:
phrases = [keyphrase for keyphrase, score in keyphrases_with_scores]
In [78]:
phrases
Out[78]:
['ssh bastion pod', 'openshift container platform', 'cluster nodes']

KPE from title

In [79]:
df['allTitle_kpe'] = df['allTitle'].apply(lambda x: keyphrases(x))
WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 1 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 1 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 7 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 4 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 4 given) WARNING:root:Not enough candidates to choose from (10 requested, 4 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 4 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 6 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 4 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 5 given) WARNING:root:Not enough candidates to choose from (10 requested, 5 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 1 given) WARNING:root:Not enough candidates to choose from (10 requested, 4 given) WARNING:root:Not enough candidates to choose from (10 requested, 1 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 1 given) WARNING:root:Not enough candidates to choose from (10 requested, 4 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 6 given) WARNING:root:Not enough candidates to choose from (10 requested, 5 given) WARNING:root:Not enough candidates to choose from (10 requested, 5 given) WARNING:root:Not enough candidates to choose from (10 requested, 4 given) WARNING:root:Not enough candidates to choose from (10 requested, 2 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given) WARNING:root:Not enough candidates to choose from (10 requested, 1 given) WARNING:root:Not enough candidates to choose from (10 requested, 3 given)
In [82]:
df.head(20)
Out[82]:
In [83]:
results = df[['allTitle', 'allTitle_kpe']]
In [84]:
results.to_csv('results.csv')
In [ ]:
import jovian
jovian.commit()
[jovian] Saving notebook..
In [ ]: