Example : emhawkes package

Basic Hawkes model

Univariate Hawkes process

library(emhawkes)

This subsection outlines the steps for constructing, running simulations, and estimating a univariate Hawkes model. To begin, create an hspec object, which defines the Hawkes model. The S4 class hspec contains slots for the model parameters: mu, alpha, beta, dimens, rmark, and impact.

In a univariate model, the basic parameters of the model—mu, alpha, and beta—can be given as numeric values. If numeric values are provided, they will be converted to matrices. Below is an example of a univariate Hawkes model without a mark.

mu1 <- 0.3; alpha1 <- 1.2; beta1 <- 1.5
hspec1 <- new("hspec", mu = mu1, alpha = alpha1, beta = beta1)
show(hspec1)
#> An object of class "hspec" of 1-dimensional Hawkes process
#> 
#> Slot mu: 
#>      [,1]
#> [1,]  0.3
#> 
#> Slot alpha: 
#>      [,1]
#> [1,]  1.2
#> 
#> Slot beta: 
#>      [,1]
#> [1,]  1.5

The function hsim implements simulation where the input arguments are hspec, size, and the initial values of the intensity component process, lambda_component0, and the initial values of the Hawkes processes, N0. More precisely, the intensity process of the basic univariate Hawkes model is represented by

\[ \lambda(t) = \mu + \int_{-\infty}^t \alpha e^{-\beta (t-s)} d N(s) = \mu + \lambda_c(0) e^{-\beta t} + \int_0^t \alpha e^{-\beta (t-s)} d N(s) \]

where the lambda_component0 denotes

\[ \lambda_c(0) = \int_{-\infty}^0 \alpha e^{\beta s} d N(s). \]

If lambda_component0 is not provided, the internally determined initial values for the intensity process are used. If size is sufficiently large, the exact value of lambda_component0 may not be important. The default initial value of the counting process, N0, is zero.

set.seed(1107)
res1 <- hsim(hspec1, size = 1000)
summary(res1)
#> -------------------------------------------------------
#> Simulation result of exponential (marked) Hawkes model.
#> Realized path :
#>        arrival N1 mark lambda1
#>  [1,]  0.00000  0    0 0.90000
#>  [2,]  0.97794  1    1 0.43838
#>  [3,]  1.09001  2    1 1.43128
#>  [4,]  1.28999  3    1 2.02711
#>  [5,]  1.53225  4    1 2.33527
#>  [6,]  1.65001  5    1 3.01139
#>  [7,]  2.51807  6    1 1.36377
#>  [8,]  2.81710  7    1 1.74553
#>  [9,]  2.87547  8    1 2.72378
#> [10,]  3.16415  9    1 2.65016
#> [11,]  3.51378 10    1 2.40131
#> [12,]  4.22355 11    1 1.43843
#> [13,] 16.96752 12    1 0.30000
#> [14,] 17.71654 13    1 0.69015
#> [15,] 19.10293 14    1 0.49874
#> [16,] 24.06354 15    1 0.30082
#> [17,] 24.09256 16    1 1.44967
#> [18,] 28.40173 17    1 0.30366
#> [19,] 28.53743 18    1 1.28198
#> [20,] 28.56702 19    1 2.38725
#> ... with 980 more rows 
#> -------------------------------------------------------

The results of hsim is an S3 class hreal, which consists of hspec, inter_arrival, arrival, type, mark, N, Nc, lambda, lambda_component, rambda, rambda_component.

hspec is the model specification.
inter_arrival is the inter-arrival time of every event.
arrival is the cumulative sum of inter_arrival.
type is the type of events, i.e., $i$ for $N_i$, and is used for a multivariate model.
mark is a numeric vector that represents additional information for the event.
lambda represents $\lambda$, which is the left continuous and right limit version.
The right continuous version of intensity is rambda.
lambda_component represents $\lambda_{ij}$, and rambda_component is the right continuous version.

inter_arrival, type, mark, N, and Nc start at zero. Using the summary() function, one can print the first 20 elements of arrival, N, and lambda. The print() function can also be used.

By definition, we have lambda == mu + lambda_component.

# first and third columns are the same
cbind(res1$lambda[1:5], res1$lambda_component[1:5], mu1 + res1$lambda_component[1:5])
#>          [,1]     [,2]     [,3]
#> [1,] 0.900000 0.600000 0.900000
#> [2,] 0.438383 0.138383 0.438383
#> [3,] 1.431282 1.131282 1.431282
#> [4,] 2.027111 1.727111 2.027111
#> [5,] 2.335269 2.035269 2.335269

For all rows except the first, rambda equals lambda + alpha in this model.

# second and third columns are the same
cbind(res1$lambda[1:5], res1$rambda[1:5], res1$lambda[1:5] + alpha1)
#>          [,1]     [,2]     [,3]
#> [1,] 0.900000 0.900000 2.100000
#> [2,] 0.438383 1.638383 1.638383
#> [3,] 1.431282 2.631282 2.631282
#> [4,] 2.027111 3.227111 3.227111
#> [5,] 2.335269 3.535269 3.535269

Additionally, verify that the exponential decay is accurately represented in the model.

# By definition, the following two are equal:
res1$lambda[2:6]
#> [1] 0.438383 1.431282 2.027111 2.335269 3.011391
mu1 + (res1$rambda[1:5] - mu1) * exp(-beta1 * res1$inter_arrival[2:6])
#> [1] 0.438383 1.431282 2.027111 2.335269 3.011391

The log-likelihood function is calculated using the logLik method. In this context, the inter-arrival times and hspec are provided as inputs to the function.

logLik(hspec1, inter_arrival = res1$inter_arrival)
#> The initial values for intensity processes are not provided. Internally determined initial values are used.
#> loglikelihood 
#>     -214.2385

The likelihood estimation is performed using the hfit function. The specification of the initial parameter values, hspec0, is required. Note that only inter_arrival is needed for this univariate model. For more accurate simulation, it is recommended to specify lambda0, the initial value of the lambda component. If lambda0 is not provided, the function uses internally determined initial values. By default, the BFGS method is employed for numerical optimization.

# initial value for numerical optimization
mu0 <- 0.5; alpha0 <- 1.0; beta0 <- 1.8
hspec0 <- new("hspec", mu = mu0, alpha = alpha0, beta = beta0)
# the intial values are provided through hspec
mle <- hfit(hspec0, inter_arrival = res1$inter_arrival)
summary(mle)
#> --------------------------------------------
#> Maximum Likelihood estimation
#> BFGS maximization, 24 iterations
#> Return code 0: successful convergence 
#> Log-Likelihood: -213.4658 
#> 3  free parameters
#> Estimates:
#>        Estimate Std. error t value Pr(> t)    
#> mu1     0.33641    0.03475   9.682  <2e-16 ***
#> alpha1  1.16654    0.09608  12.141  <2e-16 ***
#> beta1   1.52270    0.12468  12.213  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------

Bivariate Hawkes model

The intensity process of a basic bivariate Hawkes model is defined by

\[ \lambda_1(t) = \mu_1 + \int_{-\infty}^t \alpha_{11} e^{-\beta_{11}(t-s)} d N_1(s) + \int_{-\infty}^t \alpha_{12} e^{-\beta_{12}(t-s)} d N_2(s), \]

\[ \lambda_2(t) = \mu_2 + \int_{-\infty}^t \alpha_{21} e^{-\beta_{21}(t-s)} d N_1(s) + \int_{-\infty}^t \alpha_{22} e^{-\beta_{22}(t-s)} d N_2(s). \]

In a bivariate model, the parameters within the slots of hspec are matrices. Specifically, mu is a 2-by-1 matrix, while alpha and beta are 2-by-2 matrices.

\[ \mu = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}, \quad \alpha = \begin{bmatrix} \alpha_{11} & \alpha_{12} \\ \alpha_{21} & \alpha_{22} \end{bmatrix}, \quad \beta = \begin{bmatrix} \beta_{11} & \beta_{12} \\ \beta_{21} & \beta_{22} \end{bmatrix} \]

rmark is a random number generating function for marks and is not used in non-mark models. lambda_component0 is a 2-by-2 matrix that represents the initial values of lambda_component, which includes the set of values lambda11, lambda12, lambda21, and lambda22. The intensity processes are represented by

\[ \lambda_1(t) = \mu_1 + \lambda_{11}(t) + \lambda_{12}(t), \]

\[ \lambda_2(t) = \mu_2 + \lambda_{21}(t) + \lambda_{22}(t). \]

The terms $\lambda_{ij}$ are referred to as lambda components, and lambda0 represents $_{ij}(0). The parameterlambda_component0` can be omitted in this model, in which case internally determined initial values will be used.

mu2 <- matrix(c(0.2), nrow = 2)
alpha2 <- matrix(c(0.5, 0.9, 0.9, 0.5), nrow = 2, byrow = TRUE)
beta2 <- matrix(c(2.25, 2.25, 2.25, 2.25), nrow = 2, byrow = TRUE)
hspec2 <- new("hspec", mu=mu2, alpha=alpha2, beta=beta2)
print(hspec2)
#> An object of class "hspec" of 2-dimensional Hawkes process
#> 
#> Slot mu: 
#>      [,1]
#> [1,]  0.2
#> [2,]  0.2
#> 
#> Slot alpha: 
#>      [,1] [,2]
#> [1,]  0.5  0.9
#> [2,]  0.9  0.5
#> 
#> Slot beta: 
#>      [,1] [,2]
#> [1,] 2.25 2.25
#> [2,] 2.25 2.25

To perform a simulation, use the hsim function.

set.seed(1107)
res2 <- hsim(hspec2,  size=1000)
summary(res2)
#> -------------------------------------------------------
#> Simulation result of exponential (marked) Hawkes model.
#> Realized path :
#>       arrival N1 N2 mark lambda1 lambda2
#>  [1,] 0.00000  0  0    0 0.52941 0.52941
#>  [2,] 0.57028  1  0    1 0.29130 0.29130
#>  [3,] 1.66175  1  1    1 0.25073 0.28505
#>  [4,] 2.17979  1  2    1 0.49638 0.38238
#>  [5,] 2.47685  1  3    1 0.81319 0.54975
#>  [6,] 2.64001  2  3    1 1.24825 0.78866
#>  [7,] 2.70249  3  3    1 1.54519 1.49341
#>  [8,] 2.94547  4  3    1 1.26810 1.46968
#>  [9,] 3.39313  4  4    1 0.77271 0.99242
#> [10,] 3.52533  4  5    1 1.29379 1.15989
#> [11,] 3.56971  5  5    1 2.00432 1.52115
#> [12,] 3.70761  5  6    1 1.88965 1.82866
#> [13,] 4.30122  5  7    1 0.88106 0.75983
#> [14,] 4.34337  6  7    1 1.63800 1.16393
#> [15,] 4.40222  7  7    1 1.89764 1.83275
#> [16,] 4.58943  8  7    1 1.64219 1.86211
#> [17,] 5.14665  9  7    1 0.75437 0.93131
#> [18,] 5.18186  9  8    1 1.17407 1.70707
#> [19,] 5.36167  9  9    1 1.45050 1.53925
#> [20,] 5.89118 10  9    1 0.85331 0.75875
#> ... with 980 more rows 
#> -------------------------------------------------------

In multivariate models, type is crucial as it represents the type of event.

# Under bi-variate model, there are two types, 1 or 2.
res2$type[1:10]
#>  [1] 0 1 2 2 2 1 1 1 2 2

In multivariate models, the column names of N are N1, N2, N3, and so on.

res2$N[1:3, ]
#>      N1 N2
#> [1,]  0  0
#> [2,]  1  0
#> [3,]  1  1

Similarly, the column names of lambda are lambda1, lambda2, lambda3, and so on.

res2$lambda[1:3, ]
#>        lambda1   lambda2
#> [1,] 0.5294118 0.5294118
#> [2,] 0.2913028 0.2913028
#> [3,] 0.2507301 0.2850475

The column names of lambda_component are lambda_component11, lambda_component12, lambda_component13, and so on.

res2$lambda_component[1:3, ]
#>        lambda11    lambda12   lambda21    lambda22
#> [1,] 0.11764706 0.211764706 0.21176471 0.117647059
#> [2,] 0.03260813 0.058694641 0.05869464 0.032608134
#> [3,] 0.04569443 0.005035631 0.08224997 0.002797573

By definition, the following two expressions are equivalent:

mu2[1] + rowSums(res2$lambda_component[1:5, c("lambda11", "lambda12")])
#> [1] 0.5294118 0.2913028 0.2507301 0.4963769 0.8131889
res2$lambda[1:5, "lambda1"]
#> [1] 0.5294118 0.2913028 0.2507301 0.4963769 0.8131889

From the results, we obtain vectors of realized inter_arrival and type. A bivariate model requires both inter_arrival and type for estimation.

inter_arrival2 <- res2$inter_arrival
type2 <- res2$type

The log-likelihood is computed using the logLik function.

logLik(hspec2, inter_arrival = inter_arrival2, type = type2)
#> The initial values for intensity processes are not provided. Internally determined initial values are used.
#> loglikelihood 
#>     -974.2809

Maximum log-likelihood estimation is performed using the hfit function. In this process, the parameter values in hspec0, such as mu, alpha, and beta, serve as starting points for the numerical optimization. For illustration purposes, we set hspec0 <- hspec2. Since the true parameter values are unknown in practical applications, these initial guesses are used. The realized inter_arrival and type data are utilized for estimation.

hspec0 <- hspec2
mle <- hfit(hspec0, inter_arrival = inter_arrival2, type = type2)
summary(mle)
#> --------------------------------------------
#> Maximum Likelihood estimation
#> BFGS maximization, 36 iterations
#> Return code 0: successful convergence 
#> Log-Likelihood: -970.1408 
#> 4  free parameters
#> Estimates:
#>          Estimate Std. error t value  Pr(> t)    
#> mu1       0.19095    0.01636  11.671  < 2e-16 ***
#> alpha1.1  0.48217    0.07405   6.511 7.45e-11 ***
#> alpha1.2  0.98625    0.09495  10.387  < 2e-16 ***
#> beta1.1   2.07987    0.16952  12.269  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------

coef(mle)
#>       mu1  alpha1.1  alpha1.2   beta1.1 
#> 0.1909541 0.4821725 0.9862542 2.0798691

miscTools::stdEr(mle)
#>        mu1   alpha1.1   alpha1.2    beta1.1 
#> 0.01636127 0.07405118 0.09495461 0.16952428

- Basic Hawkes model