As a researcher exploring new ways to analyze climate data, I recently tested out some advanced machine learning techniques on a dataset of weather measurements from Switzerland. My goal was to see if I could uncover hidden patterns and groupings within the data, without relying on human-provided labels. Here are the key things I tried:
Preparing the Data
First I cleaned up the raw data - over 1700 observations across 10 years - to handle issues like missing values. I also scaled the different weather measurements (temperature, precipitation, etc.) onto the same standardized range. This normalization step helps some algorithms compare attributes consistently.
Reducing Dimensions
Next, I used a custom algorithm based on L1 regularized PCA to reduce the number of dimensions, from 14 down to 6. By projecting the multi-dimensional data into a lower-dimensional space, we can surface the components that contain the most critical information while filtering out noise. My revised approach focuses on minimizing the influence of outliers compared to traditional techniques.
Clustering
Here’s where the unsupervised learning comes in! Without guidepost category labels, I experimented with clustering algorithms like k-means and Birch to find intrinsic groups within the climate data based purely on similarity. After evaluation, k-means grouped the observations into 3 clusters, while Birch identified 5 clusters in a subset. Visually checking the cluster distributions showed promising separations.
Validating a Classifier
As a final test, I built a supervised classifier using the clusters as proxy labels to validate consistency. My SVM model scored reasonably well, correctly classifying most observations against their assigned groups. The classifier confusion matrix also gave insight into areas of overlap between the clusters.
Final Thoughts
In the end, the unsupervised learning pipeline showed encouraging ability to detect patterns in the climate data without human supervision. Going forward, I’m excited to refine these methods further and apply them to larger meteorological datasets. The better we understand Earth’s intricate weather machinery, the better we can model and predict its behavior.
To see more details, check out the paper
Disclaimer: This project was completed as part of my MSc in Data Science Lancaster University. This blog post is an LLM generated text, based upon the hand-written report.
Disclaimer 2: This was my first introduction to ML