STK-MAT2011 - Project Work in Finance, Insurance, Risk and Data Analysis

Projects

Below is a preliminary list of suggested projects, but you may also contact other possible supervisors. Send an e-mail to Gudmund Hermansen about your decision.

The project paper should be about 15 pages long and must include the official front page.

Project 1: Explainable AI (XAI) (Camilla Lingj?rde - camiling@math.uio.no). This project focuses on Explainable AI (XAI) and aims to explore methods for understanding predictions from machine learning models. A central focus will be on Shapley values, a game-theory-based approach with a strong theoretical foundation for distributing feature importance fairly. Students will begin by studying the concepts of XAI and the theory behind Shapley values. They will then set up a simulation study that compares Shapley values with other XAI methods. The project might also involve applying a machine learning model, such as random forest, to a real-world dataset—potentially in a medical context—and analyzing feature importance using Shapley values. The main goal of the project is to evaluate the strengths and limitations of Shapley values and to identify potential areas for improvement.

Project 2: Predicting Long-Term Economic Growth and State-Based Violence (Gudmund Hermansen and Jonas Vestby (PRIO) - gudmunhh@math.uio.no). Predicting the future is challenging, especially over long time horizons. In this project, we will examine traditional econometric models (regression-based) for long-term economic growth and long-term changes in state-based violence. The macroeconomic literature includes several claims about the identification of significant factors (explanatory variables) that can explain such phenomenas over extended time periods. If these claims are true, we should for example be able to predict future economic growth for a given country, or at least simulate realistic and plausible scenarios (given reasonable explanatory variables). In practice, however, this is rarely the case. Attempts to predict or simulate reveal significant challenges. For instance, such data are often non-stationary, meaning that properties and relationships with the data and/or the model may change over time. Furthermore, traditional regression-based models struggle to adequately account for the noise/uncertainty inherent in such data. In this project, we will replicate the results of the traditional models and explore how they can be adjusted to better address the observed issues. Additionally, we will investigate how models commonly used in the machine learning literature can tackle these problems.

Project 3: Investigating Intraday High-Low Prediction Claims (Gudmund Hermansen - gudmunhh@math.uio.no). This project explores the hypothesis that the opening range — the price range established during the first hour of trading — can predict the high or low of the day with an 88% probability. This claim has been popularised by in various social medias. In this project, this claim will be tested using intraday data from currencies and cryptocurrencies (and other markets). In addition to assessing the validity of the claim, we will investigate whether it can be refined and/or generalised. Additionally, the project will explore the potential for developing predictive models based on the "first-hour" data and examine the viability of trading strategies informed by these findings.

Project 4: Theoretical aspects for generative models in deep learning (Odd Kolbj?rnsen oddkol@math.uio.no/ odd.kolbjornsen@akerbp.com). Current state of the art generative model in deep learning (such as SR3) are connected to auto regressive models. In this project we will investigate theoretical aspects of the methodology by analyzing the performance on a mixture gaussian distribution for which analytical expressions are available. If time allows we will further investigate conditional sampling in this setting. This is the back bone for super resolution in conditional image generation. The project can go in a theoretical direction or in a programming direction.

Project 5: Signal alignment for problems with slowly varying deviations (Odd Kolbj?rnsen oddkol@math.uio.no/ odd.kolbjornsen@akerbp.com). Analysis of multi modal data (i.e. data from different types for sensors/origin) is the next big thing in deep learning. A key aspect in this setting is registration, i.e. transforming different sets of data into one coordinate system. In this project we will investigate a Bayesian method for aligning two time series with both registration and observation errors. The project require programming.

Project 6: The effect of using antithetic samples in Ensemble Kalman smoother (Odd Kolbj?rnsen oddkol@math.uio.no/ odd.kolbjornsen@akerbp.com). Ensemble Kalman methods is the state of the art in multiple problems for data integration and data assimilation, such as metrology and history matching of oil reservoirs. In brief the method can be said to generate multiple realizations from the prior distribution (an ensemble of realizations) and modify these to approximate the posterior distribution. In the linear setting the method is exact. In the common workflow the samples are generated independently, in this project we will investigate the effect of having two sample sets which negatively correlated. The project can be done purely theoretical or mostly programming (but the best is a combination of both)

Project 7: Ridge regression for spare data (Gudmund Hermansen - gudmunhh@math.uio.no). In this project, you will work through the main theory behind what is known as ridge regression. Ridge is one (of several) popular techniques for regularisation used in statistics and machine learning. We can interpret ridge as giving additional weights (which could also be zero) to the input features, or covariates, in a regression model. The project will be based on the lecture notes https://arxiv.org/pdf/1509.09169.pdf. In addition to understanding the theory underlying ridge regression, the main focus of this project is to understand the effect of regularisation with sparse input data. Imagine estimating a linear regression model when some of the input features are mostly 0 (sparse). Regularisation techniques, such as ridge, has a tendency to give such sparse features either too high or too low weight, and we will investigate potential solutions to this problem.

Project 8: Machine learning and high frequency financial time series (Gudmund Hermansen - gudmunhh@math.uio.no). In this project you will compare more traditional statistical models developed for high frequency financial time series with competing methods from machine learning. You will work with several examples of high frequency tick data from foreign exchange, and explore possibilities and limitations of both approaches.

Project 9: Applied data analysis and statistical modelling for a kaggle-like competition or dataset. (Gudmund Hermansen - gudmunhh@math.uio.no). Within this project applied data analysis and predictive modelling will be carried out. A student is allowed to choose a competition or a data-set of interest for him/her on one of the popular data science platforms: kaggle, topcoder or UC Irvine Machine Learning Repository. Then preliminary data analysis should be performed, followed by careful statistical modelling, inference and eventually evaluation of predictions and explaining the results.

Project 10: Stochastic analysis and finance and insurance and risk (Fred Espen Benth - fredb@math.uio.no). Students that are interested in a project within stochastic analysis, finance or insurance and risk should contact Fred Espen Benth (fredb@math.uio.no) og Gudmund Hermansen (gudmunhh@math.uio.no) for more information.

Published Jan. 20, 2025 - Last modified Jan. 20, 2025