Skip to content

missunderstoodninja/ClimateWins

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Predicting Climate Change Using Machine Learning at ClimateWins

Climate Wins is interested in using machine learning to help predict the consequences of climate change around Europe and, potentially, the world. It’s been sorting through hurricane predictions from The National Oceanic and Atmospheric Administration (NOAA) in the U.S., typhoon data from The Japan Meteorological Agency (JMA) in Japan, world temperatures, and a great deal of other data. However, it’s hard to grasp how everything is changing in the world at once. This is where machine learning comes in!

Getting Started

For this project, we are interested in identifying the best resources for predicting climate changes where people live. We have chosen to use both supervised and unsupervised machine learning algorithms.

Models

We have chosen one supervised learning model and two unsupervised models:

  • K-Nearest Neighbour (KNN)
  • Decision Tree
  • Artificial Neural Network

Data

  • The data is obtained from the European Climate Assessment & Data Set project
  • The dataset consists of weather observations from 18 European weather stations from the late 1800s to 2022.
  • Recordings exist for almost every day with values such as temperature, wind speed, snow, global radiation, and more.

Using ML in Weather Data

Machine learning (ML) algorithms are a set of techniques that allow computers to learn from data and make predictions or decisions without being explicitly programmed for a specific task. ML can be highly beneficial for analysing and predicting weather patterns. This can include forecasting temperatures, humidity, wind speed, rainfall etc. They can also be used to detect unusual events or patterns such as heatwaves or unseasonal rains. For this project, we are using three ML algorithms:

  • k-Nearest Neighbour (KNN) - is a simple, versatile, and widely-used machine learning algorithm used for both classification and regression tasks. It is a supervised learning algorithm, meaning it relies on labelled training data to learn and make predictions.
  • Decision Tree - also used for classification and regression tasks. It works by splitting the data into subsets based on the value of input features, making it highly interpretable and effective for a range of practical applications.
  • Artificial Neural Network - is used for a wide range of machine learning tasks, including classification, regression, and many more complex problems like image recognition, natural language processing, and time-series forecasting.

Ethical Considerations

To address ethical concerns, we have to consider any bias that may impact how the analysis is conducted and the results thereof. Bias in machine learning can affect model performance, accuracy, fairness, and overall generalisability.

Some biases observed in this project include:

  1. Collection Bias: The data was collected from 18 weather stations. However, according to the ECAD there are a total of 23755 weather stations across Europe. This sample of weather stations may not be a representative sample
  2. Temporal Bias: Given that the data range is so large (1800s to 2022), some of the data is likely to not be relevant anymore and could result in a distorted outcome from the models
  3. Location Bias: The data has been collected from only European weather stations and may not be able to predict weather patterns from other areas of the world given that climates are different.

Models

K-Nearest Neighbour

We ran the data through a KNN model, which yielded an overall accuracy score of 88,15% for all 15 weather stations. Valentia has the best accuracy score of 95.83%, well above the mean of 88%. Sonnblick showed an accuracy score of 100%, indicating that the model was overfitting. i.e. the model has overadapted to the training data and captures even random fluctuations.

Station Predicted Negative Predicted Positive Actual Negative Actual Positive Accuracy
BASEL 3907 935 465 431 84.38%
BELGRADE 3238 1502 460 538 82.61%
BUDAPEST 3416 1432 406 484 84.49%
DEBILT 4346 732 369 291 88.50%
DUSSELDORF 4167 800 431 340 86.56%
HEATHROW 4161 754 414 409 85.66%
KASSEL 4563 607 316 252 90.10%
LJUBLJANA 3726 1133 410 469 84.68%
MAASTRICHT 4249 819 357 313 88.32%
MADRID 2735 2257 313 433 87.00%
MUNCHEN 4222 766 426 324 86.93%
OSLO 4624 507 352 255 89.42%
SONNBLICK 5738 0 0 0 100.00%
STOCKHOLM 4449 588 384 317 87.78%
VALENTIA 5391 108 168 71 95.83%

Decision Tree

The decision tree recognises patterns in the data to create subsets of the data. The decision tree we created is quite deep and complex, meaning it is likely overfitting. For this, it would need to be pruned. This will reduce the complexity and hence improve predictive accuracy. Decision Tree

Artificial Neural Network

For the first run of the unsupervised learning ANN algorithm, we obtained an accuracy score of 50.02% & 49.93% on training and test data, respectively. Upon changing the number of hidden layers and iterations that the model runs through we obtained improved scores of 88.49% & 59.72% (training and test). ANN

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published