The report introduces tevt, an R package for threshold estimation methods and diagnostic plots for extreme value theory (EVT). It aims to fix issues with existing EVT packages and provide the most comprehensive collection of threshold methods. Background
Provides theory on EVT, the generalized Pareto distribution (GPD), and importance of choosing a threshold above which the GPD approximates the tail. Discusses bias-variance tradeoff. Introduces kernel density estimation (KDE) which is useful for nonparametrically estimating densities without making assumptions. Important for mixture models. Exploration of Danish Fire Insurance Data
Applies tevt to Danish fire insurance data. Most threshold methods estimate thresholds between 2-3.3. Fits a normal + GPD mixture model. Poor normal fit motivates use of semi/nonparametric models. Tries 2-tailed normal + GPD, 2-tailed KDE + GPD, and boundary corrected KDE + GPD models. KDE models fit best but have computational burden. Questions whether lower tail needs GPD or can use normal distribution. Further investigation needed. Conclusion
tevt extends functionality of existing EVT packages, is available in GitHub Further work could include more threshold methods, compatibility with other packages, more computationally efficient mixture models, and more analysis of Danish data’s lower tail. Technical Details
Implemented GPD distribution and parameter estimation, 11 threshold estimation methods Allows individual and automatic diagnostic plot creation without console input Handles corner cases like ΞΎ=0 Uses functions from evmix for mixture model fitting
To see more details, check out the paper (The original report has been lost to rmarkdown, but has a copy was found in LaTeX)
Disclaimer: This project was completed as part of my MSc in Data Science Lancaster University. This blog post is an LLM generated text, based upon the hand-written report.