Skip to main content

Matematik, Statistik og Datalogi

(7) Mathematics, Statistics, or Computer Science (from IMADA)

 

These topics are generally only applicable for scient.oecon. students.

Topics from the above subject areas (when admissible) can also be chosen by scient.oecon. students. If an IMADA supervisor desired, chose Mathematics, Statistics, or Computer Science as your main topic area and refer to the topic number in the selected topic application.

Questions about the topics within the subject area of

All topics in this subject area can be written in English or Danish.

 

 

7.1.a Surrogate modeling of option prize values via Gaussian process regression

In its general form for multiple assets, there is no known closed-form solution to the Black-Scholes equations. Moreover, numerical approximations may quickly become very large due to the curse of dimensionality. The objective of this thesis is to construct and examine surrogate models for the Black-Scholes equation under suitable parametric variations via using Gaussian process regression. Gaussian process regression can be thought of as a statistical multivariate interpolation method that allows incorporating gradient information as well as data layers of variable fidelity.

Literature (suggestions):

Santner, T.J. and Williams, B.J. and Notz, W.~I.: The Design and Analysis of Computer Experiments, Springer, New York Berlin Heidelberg, 2003

Espen Benth, F.: Option Theory with Stochastic Analysis: An Introduction to Mathematical Finance, Springer, Berlin, Heidelberg, 2004

Prerequisites: MM533, ST521 (IMADA)

(This topic is related to finance; in particular the topic derivatives)

 

7.2.a Network dynamics

Networks such as the Internet, the World Wide Web, and social and biological networks permeate our modern societies. A lot of recent study on networks takes a dynamical systems view according to which the vertices of a graph represent discrete dynamical entities, with their own rules of behavior, and the edges represent direct interactions between the entities. Such networks, for instance epidemic network in which a virus is spreading or economic networks which host the capital flows, have not only topological properties but have dynamical properties as well.

The aim of this project is to introduce the student to concepts and methods for the analysis of networks and their dynamical properties. The topics of this project cover the following

  • Graph theory: basic concepts, theorems and algorithms regarding network flows;
  • Markov Chains: basic concepts and results;
  • Linear dynamic systems on networks;
  • Random graphs: basic concepts and theorems;
  • Application of network dynamics in Epidemics or economics.

Literature (suggestions):

R. Diestel, Graph Theory, 4th edition.

M. Newman, A. Barabasi and D. J. Watts, The structure and Dynamics of Networks.

O. Haeggstroem, Finite Markov Chains and Algorithmic Applications.

S. I. Resnick, Adventures in stochastic processes.

 

7.3.a Computational Optimization
Projects in computational optimization deal with real-life applications in planning, scheduling and logistics. They focus on mathematical modeling, implementation of the model in a programming language learned during the bachelor, experimentation, analysis of results and technical writing. If existing solvers are not enough to solve the problem at hand, then the implementation part will expand to develop advanced, ad-hoc techniques, as it is often the case for difficult optimization problems. Examples of advanced techniques are: column generation, branch and cut, metaheuristics.
Specific industrial cases may arise in connection with an industrial partner at the time of applying for the bachelor. They can also be brought by the student.

Recent cases included the optimization of:

  • Production scheduling in manufacturing
  • Routing in distribution and services
  • Course/exam timetabling
  • Train timetabling and bus line planning
  • Revenue management in the airways sector

Prerequisites for these projects are the courses:

  • DM545 Integer and Linear Programming
  • DM550 Introduction to Programming
  •  DM507 Algorithms and Data structures

Further possible projects may imply self-study of complementary topics such as: machine learning techniques, eg, linear and logistic regression and neural networks; visualization techniques; computational social choice; other programming languages. Examples are:

  • Credit risk analysis: statistical models to predict the defaults in credit requests
  • Computational social choice: how hard is it to determine the winner of a certain election system or how traditional optimization problems can be formulated with an objective function that takes into account not only a utilitarian but also a social welfare point of view.
  • Crowd intelligence: using the opinions of experts to improve performance in automatic prediction systems.
  • Visualization methods: how to efficiently and effectively visualize data such that we can make sense of them.

 

7.4.a Extreme value statistics and reinsurance pricing
Extreme value statistics deals with modelling extreme events, that is events with a low frequency of occurrence but a high impact. At the theoretical level this means that we focus on studying tails of distribution functions, while for practical data analysis the focus will be at the largest observations in a random sample. Insurance claim data typically exhibit a heavy-tailed behaviour, which makes the class Pareto-type distributions a good candidate for modelling this type of data. In the project it is the intention to study this class of heavy tailed distributions and apply it to a dataset of car insurance claims. Insurance companies typically protect themselves against large claims by entering into re-insurance contracts with re-insurance companies. In such contracts the re-insurer intervenes if a claim exceeds the level specified in the re-insurance contract and pays the excess over that level. Re-insurers only intervene for extreme claims and hence extreme value statistics is a crucial tool for them, in order to obtain an accurate description of the upper tail of the claim size distribution. In the project focus will also be on pricing re-insurance contracts based on extreme value statistics.

7.4.b Estimation of small tail probabilities with application to insurance

This project is on the estimation of a small tail probability of a random variable, say X, based on a random sample X_1,…,X_n. To estimate P(X>x) we can obviously use a simple empirical estimator like #{i:X_i>x}/n. This estimator works well when x is in the range of the data, but for x outside (above) the range of X_1,…,X_n, it produces a trivial estimate of zero. In such situations extreme value theory becomes useful, as it allows to extrapolate outside the data range. In the project we focus on estimation of P(X>x) where x is large, for the class of heavy-tailed or Pareto-type distributions. This class of distributions is very relevant for modelling non-life insurance claims, which typically show a heavy tailed-behavior. An estimator for P(X>x) will be studied theoretically and illustrated on insurance claim data.

Literature:
Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J.L., 2004. Statistics of extremes. Wiley.
de Haan, L., Ferreira, A., 2006. Extreme value theory. Springer.
Embrechts, P., Klüppelberg, C., Mikosch, T., 1997. Modelling extremal events for insurance and finance. Springer. 
Mikosch, T., 2009. Non-life insurance mathematics. Springer.

7.4.c Covariance matrix estimation

Covariance matrices play a key role in multivariate statistics. The estimation of a covariance matrix is an important part of principal component analysis, discriminant analysis, inferring a graphical model structure, portfolio selection and other problems. The natural estimator of the covariance matrix is the sample covariance matrix. It is of great interest to understand the behaviour of the sample covariance matrix and also the behaviour of its eigenvalues and eigenvectors. The classical results for the sample covariance matrix are established in the low-dimensional setting under the assumption of Gaussianity. Nowadays high-dimensional data sets where the number of features is larger or even much larger than the number of observations are very common. However, when we have high-dimensional data, the sample covariance matrix is a poor estimator and a number of alternative approaches are proposed such as shrinkage methods or assuming that the covariance matrix has a certain structure (for example, low rank plus sparse). The goal of this project is to study covariance matrix estimation in the low-dimensional setting and the high-dimensional setting from a theoretical point of view as well as to illustrate the performance of certain covariance estimation approaches using simulated and real data (for example, in the portfolio selection problem).

References:
T.W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley Series in Probability and Statistics. Wiley-Interscience, Third Edition, 2003.
Clifford Lam. High-dimensional covariance matrix estimation. WIREs Computational Statistics, 12(2):e1485, 2020.
Olivier Ledoit and Michael Wolf. Honey, I shrunk the sample covariance matrix. The Journal of Portfolio Management, 30(4):110–119, 2004.
H. Neudecker and A.M. Wesselman. The asymptotic variance matrix of the sample correlation matrix. Linear Algebra and its Applications, 127:589–599, 1990.

 

7.4.d Modelling dependence with copula functions

Copula functions play an important role in statistics. A copula function is a joint distribution function with uniform [0,1] marginal distribution functions. Besides having many interesting theoretical properties, they are also practically very relevant for modelling multivariate data. This is mainly due to Sklar’s theorem which states that for every joint distribution function F with marginal distribution functions F_1 and F_2 there exists a copula function C such that F(y_1,y_2)=C(F_1(y_1),F_2(y_2)) for all real y_1, y_2. This entails that for every joint distribution function F we can disentangle the marginal behaviour given by F_1 and F_2, and the dependence structure given by C. In the project we will study copula functions and their properties in general, and also look at important classes of copulas. The theory studied will be applied to the bivariate Loss-ALAE insurance dataset from Frees and Valdez (1998), where we will also illustrate how copulas can be used for reinsurance pricing. 

Literature:
Albrecher, H., Beirlant, J., Teugels, J., 2017. Reinsurance - actuarial and statistical aspects. Wiley.  
Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J., 2004. Statistics of extremes. Wiley.
Cebrian, A., Denuit, M., Lambert, P., 2003. Analysis of bivariate tail dependence using extreme value copulas: an application to the SOA medical large claims database. Belgian Actuarial Bulletin, 3, 33-41.

 

7.5.a Lattice simulations of pandemics

Epidemics are typically modeled in terms of differential equations, which  assume that the various states of individuals are uniformly distributed in space. In order o take into account correlations and clustering, lattice-based models have been introduced. Grassberger  considered synchronous (cellular automaton) versions of models, and showed that the epidemic growth goes through a critical behavior such that transmission remains local when infection rates are below critical values, and spread throughout the system (i.e. it percolates) when they are above a critical value. 

 

Critical values and exponents can be determined through numerical simulations, where the effect of containment strategies can also be estimated.

Ref: Grassberger, Peter (1983). "On the critical behavior of the general epidemic process and dynamical percolation". Mathematical Biosciences. 63 (2): 157–172

 

Sidst opdateret: 22.02.2024