Learn practical skills, build real-world projects, and advance your career
!wget http://10.114.38.22:8888/data.tsv -O data.tsv
--2019-05-23 02:44:20-- http://10.114.38.22:8888/data.tsv Connecting to 10.114.38.22:8888... connected. HTTP request sent, awaiting response... 200 OK Length: 844413808 (805M) [text/tab-separated-values] Saving to: 'data.tsv' data.tsv 100%[===================>] 805.29M 371MB/s in 2.2s 2019-05-23 02:44:22 (371 MB/s) - 'data.tsv' saved [844413808/844413808]
from iwork.pipe import *
from iwork.nlp.utils import Sent2Vec, VecQuery
from gensim.models.fasttext import load_facebook_model
9%|▊ | 869476/10000000 [00:19<00:32, 276794.62it/s]
sv = Sent2Vec(load_facebook_model('./skipgram.title'))
def get_sent_vec(s):
    return sv.transform(str(s), jieba.cut)
%%time
df_ = pd.read_csv('data.tsv', '\t', names=['id', 'title'], usecols=[0, 1])
CPU times: user 16.4 s, sys: 1.2 s, total: 17.6 s Wall time: 17.6 s