Predicting Climate Change Using Machine Learning at ClimateWins

Climate Wins is interested in using machine learning to help predict the consequences of climate change around Europe and, potentially, the world. It’s been sorting through hurricane predictions from The National Oceanic and Atmospheric Administration (NOAA) in the U.S., typhoon data from The Japan Meteorological Agency (JMA) in Japan, world temperatures, and a great deal of other data. However, it’s hard to grasp how everything is changing in the world at once. This is where machine learning comes in!

Getting Started

For this project, we are interested in identifying the best resources for predicting climate changes where people live. We have chosen to use both supervised and unsupervised machine learning algorithms.

Models

We have chosen one supervised learning model and two unsupervised models:

K-Nearest Neighbour (KNN)
Decision Tree
Artificial Neural Network

Data

The data is obtained from the European Climate Assessment & Data Set project
The dataset consists of weather observations from 18 European weather stations from the late 1800s to 2022.
Recordings exist for almost every day with values such as temperature, wind speed, snow, global radiation, and more.

Using ML in Weather Data

Machine learning (ML) algorithms are a set of techniques that allow computers to learn from data and make predictions or decisions without being explicitly programmed for a specific task. ML can be highly beneficial for analysing and predicting weather patterns. This can include forecasting temperatures, humidity, wind speed, rainfall etc. They can also be used to detect unusual events or patterns such as heatwaves or unseasonal rains. For this project, we are using three ML algorithms:

k-Nearest Neighbour (KNN) - is a simple, versatile, and widely-used machine learning algorithm used for both classification and regression tasks. It is a supervised learning algorithm, meaning it relies on labelled training data to learn and make predictions.
Decision Tree - also used for classification and regression tasks. It works by splitting the data into subsets based on the value of input features, making it highly interpretable and effective for a range of practical applications.
Artificial Neural Network - is used for a wide range of machine learning tasks, including classification, regression, and many more complex problems like image recognition, natural language processing, and time-series forecasting.

Ethical Considerations

To address ethical concerns, we have to consider any bias that may impact how the analysis is conducted and the results thereof. Bias in machine learning can affect model performance, accuracy, fairness, and overall generalisability.

Some biases observed in this project include:

Collection Bias: The data was collected from 18 weather stations. However, according to the ECAD there are a total of 23755 weather stations across Europe. This sample of weather stations may not be a representative sample
Temporal Bias: Given that the data range is so large (1800s to 2022), some of the data is likely to not be relevant anymore and could result in a distorted outcome from the models
Location Bias: The data has been collected from only European weather stations and may not be able to predict weather patterns from other areas of the world given that climates are different.

Models

K-Nearest Neighbour

We ran the data through a KNN model, which yielded an overall accuracy score of 88,15% for all 15 weather stations. Valentia has the best accuracy score of 95.83%, well above the mean of 88%. Sonnblick showed an accuracy score of 100%, indicating that the model was overfitting. i.e. the model has overadapted to the training data and captures even random fluctuations.

Station	Predicted Negative	Predicted Positive	Actual Negative	Actual Positive	Accuracy
BASEL	3907	935	465	431	84.38%
BELGRADE	3238	1502	460	538	82.61%
BUDAPEST	3416	1432	406	484	84.49%
DEBILT	4346	732	369	291	88.50%
DUSSELDORF	4167	800	431	340	86.56%
HEATHROW	4161	754	414	409	85.66%
KASSEL	4563	607	316	252	90.10%
LJUBLJANA	3726	1133	410	469	84.68%
MAASTRICHT	4249	819	357	313	88.32%
MADRID	2735	2257	313	433	87.00%
MUNCHEN	4222	766	426	324	86.93%
OSLO	4624	507	352	255	89.42%
SONNBLICK	5738	0	0	0	100.00%
STOCKHOLM	4449	588	384	317	87.78%
VALENTIA	5391	108	168	71	95.83%

Decision Tree

The decision tree recognises patterns in the data to create subsets of the data. The decision tree we created is quite deep and complex, meaning it is likely overfitting. For this, it would need to be pruned. This will reduce the complexity and hence improve predictive accuracy.

Artificial Neural Network

For the first run of the unsupervised learning ANN algorithm, we obtained an accuracy score of 50.02% & 49.93% on training and test data, respectively. Upon changing the number of hidden layers and iterations that the model runs through we obtained improved scores of 88.49% & 59.72% (training and test).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Scripts		Scripts
Visualisations		Visualisations
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Climate Change Using Machine Learning at ClimateWins

Getting Started

Models

Data

Using ML in Weather Data

Ethical Considerations

Models

K-Nearest Neighbour

Decision Tree

Artificial Neural Network

About

Releases

Packages

Languages

missunderstoodninja/ClimateWins

Folders and files

Latest commit

History

Repository files navigation

Predicting Climate Change Using Machine Learning at ClimateWins

Getting Started

Models

Data

Using ML in Weather Data

Ethical Considerations

Models

K-Nearest Neighbour

Decision Tree

Artificial Neural Network

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages