Learning word embedding models (Word2Vec and Doc2Vec) based on the electrical consumption of various home appliances.
- Python 3.8.10+
- Pip / Anaconda
- Jupyter Notebook
numpy
, scipy
, torch
, pandas
, seaborn
, matplotlib
, scikit_learn
, gensim
and matplotlib_venn
Dependencies and their version details are listed in the requirements.txt
file. They can be easily installed with the setup.py
script:
$ git clone https://github.com/mpinta/tracevec
$ cd tracevec
$ python setup.py install
The project consists of five connecting parts, which are:
- Training word embedding models (using Gensim topic modelling library)
- Clustering (Doc2Vec vectors into clusters)
- Classification (of the electrical device type using Doc2Vec vectors)
- Prediction (of the next electricity consumption category using Word2Vec vectors)
- RNN Forecasting (the next electricity consumption category using RNN with GRU)
First, prepare your Pip or Anaconda environment and make sure you have all of the above dependencies installed. Then open the tracevec.ipynb
notebook file, which stores and describes all the results of our training and model analysis. You can also run and modify the code yourself, as it is fully equipped with the descriptive comments. You can find our Word2Vec
and Doc2Vec
models in the models
directory (skip the model part training if you don’t want to create new ones).
All data sets required to run the code are included in the repository. If you are running code without the included data sets, it is only necessary to clone the tracebase repository, which represents projects main data set, into the datasets
directory. All the other modified data sets (consumptions, samples, forecast-train and forecast-test) are gradually created by the notebook code itself. The tracebase data set is not our property and is used only as a depencency (submodule) - we appreciate the work done by the authors. Make sure to initialize the submodule with:
$ git submodule init
$ git submodule update
The code was originally used in the following publications:
Pintarič Matic, (2022).
S strojnim učenjem podprta analiza vzorcev vektorizirane porabe električne energije.
Maribor: University of Maribor, Faculty of Electrical Engineering and Computer Science.
Contains information from the tracebase data set, which is made available at http://www.tracebase.org under the Open Database License (ODbL).