Learn practical skills, build real-world projects, and advance your career

Tweets-

This dataset as a part of the paper published in 2020 IEEE International Conference on Big Data Special Session 2, is created to determine possible speculators and influencers in a stock market. Although we used both tweet data and companies' market data in our project, we thought that it is a better choice to split our datasets into two parts while sharing in Kaggle. This dataset is helpful for those interested in tweets that are written about Amazon, Apple, Google, Microsoft, and Tesla by using their appropriate share tickers.

Note: For those interested in the process of evaluating speculators and influencers in a stock market, the dataset in the following link may be helpful.
https://www.kaggle.com/omermetinn/values-of-top-nasdaq-copanies-from-2010-to-2020

Content
This dataset contains over 3 million unique tweets with their information such as tweet id, author of the tweet, post date, the text body of the tweet, and the number of comments, likes, and retweets of tweets matched with the related company.

Acknowledgements
Tweets are collected from Twitter by a parsing script that is based on Selenium.
Note: For those interested in the script, please visit the following link.
https://github.com/omer-metin/TweetCollector

Inspiration
Some of the interesting questions (tasks) which can be performed on this dataset -

  1. Determining the correlation between the market value of company respect to the public opinion of that company.
  2. Sentiment Analysis of the companies with a time series in a graph and reasoning the possible declines and rises.
  3. Evaluating troll users who try to occupy the social agenda.

How to run the code

This is an executable Jupyter notebook hosted on Jovian.ml, a platform for sharing data science projects. You can run and experiment with the code in a couple of ways: using free online resources (recommended) or on your own computer.

Option 1: Running using free online resources (1-click, recommended)

The easiest way to start executing this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks. You can also select "Run on Colab" or "Run on Kaggle".

Option 2: Running on your computer locally
  1. Install Conda by following these instructions. Add Conda binaries to your system PATH, so you can use the conda command on your terminal.

  2. Create a Conda environment and install the required libraries by running these commands on the terminal:

conda create -n zerotopandas -y python=3.8 
conda activate zerotopandas
pip install jovian jupyter numpy pandas matplotlib seaborn opendatasets --upgrade
  1. Press the "Clone" button above to copy the command for downloading the notebook, and run it on the terminal. This will create a new directory and download the notebook. The command will look something like this:
jovian clone notebook-owner/notebook-id
  1. Enter the newly created directory using cd directory-name and start the Jupyter notebook.
jupyter notebook

You can now access Jupyter's web interface by clicking the link that shows up on the terminal or by visiting http://localhost:8888 on your browser. Click on the notebook file (it has a .ipynb extension) to open it.

Downloading the Dataset

TODO - add some explanation here

Instructions for downloading the dataset (delete this cell)

!pip install jovian opendatasets --upgrade --quiet