The aim is to predict whether a tweet is about a natural disaster or not using a dataset provided by a kaggle task.
The training data consists of over 7000 labeled tweets. Columns:
id
- a unique identifier for each tweettext
- the text of the tweetlocation
- the location the tweet was sent from (may be blank)keyword
- a particular keyword from the tweet (may be blank)target
- in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)
pip install -r requirements.txt
- numpy
- pandas
- tensorflow
- scikit-learning
Just run the main.py
file! You'll be asked to copy-paste a tweet you wannna predict.
4-layer RNN with LSTM units, dropouts and two fully-connected layers in the end.
So far, the length of the sequence has been set to 30, longer tweets are cut and shorter are padded.