Threshold Methods for Extreme Value Theory

April 15, 2021

Series: msc

The report introduces tevt, an R package for threshold estimation methods and diagnostic plots for extreme value theory (EVT). It aims to fix issues with existing EVT packages and provide the most comprehensive collection of threshold methods. Background

Provides theory on EVT, the generalized Pareto distribution (GPD), and importance of choosing a threshold above which the GPD approximates the tail. Discusses bias-variance tradeoff. Introduces kernel density estimation (KDE) which is useful for nonparametrically estimating densities without making assumptions. Important for mixture models. Exploration of Danish Fire Insurance Data

Applies tevt to Danish fire insurance data. Most threshold methods estimate thresholds between 2-3.3. Fits a normal + GPD mixture model. Poor normal fit motivates use of semi/nonparametric models. Tries 2-tailed normal + GPD, 2-tailed KDE + GPD, and boundary corrected KDE + GPD models. KDE models fit best but have computational burden. Questions whether lower tail needs GPD or can use normal distribution. Further investigation needed. Conclusion

tevt extends functionality of existing EVT packages, is available in GitHub Further work could include more threshold methods, compatibility with other packages, more computationally efficient mixture models, and more analysis of Danish data’s lower tail. Technical Details

Implemented GPD distribution and parameter estimation, 11 threshold estimation methods Allows individual and automatic diagnostic plot creation without console input Handles corner cases like ξ=0 Uses functions from evmix for mixture model fitting

To see more details, check out the paper (The original report has been lost to rmarkdown, but has a copy was found in LaTeX)

Disclaimer: This project was completed as part of my MSc in Data Science Lancaster University. This blog post is an LLM generated text, based upon the hand-written report.

This is a post in the msc series.
Other posts in this series:

September 15, 2021 - MSc Thesis - Recipe Box Production Planning
May 13, 2021 - Geostatical Models
May 12, 2021 - Satellite Semantic Segmentation
April 15, 2021 - Threshold Methods for Extreme Value Theory
March 25, 2021 - char n-gram based language identification
March 18, 2021 - Reviewing LiDAR for Road Applications
February 19, 2021 - Comparing approaches for Deep Learning Time Series Classification
December 18, 2020 - Climate Clustering with AutoML