The package “mclogit” fits conditional logit models using a maximum likelihood estimator. It does this by maximizing the log-likelihood function using an iterative weighted least-squares (IWLS) algorithm, which follows the algorithm used by the glm.fit() function from the “stats” package of R (Nelder and Wedderburn 1972; McCullagh and Nelder 1989; R Core Team 2023).

If π_ij is the probability that individual i chooses alternative j from his/her choice set 𝒮_i, where

$$ \pi_{ij}=\frac{\exp(\eta_{ij})}{\sum_k{\in\mathcal{S}_i}\exp(\eta_{ik})} $$

and if y_ij is the dummy variable with equals 1 if individual i chooses alternative j and equals 0 otherwise, the log-likelihood function (given that the choices are identically independent distributed given π_ij) can be written as

ℓ = ∑_i, jy_ijln π_ij = ∑_i, jy_ijη_ij − ∑_iln (∑_jexp (η_ij))

If the data are aggregated in the terms of counts such that n_ij is the number of individuals with the same choice set and the same choice probabilities π_ij that have chosen alternative j, the log-likelihood is (given that the choices are identically independent distributed given π_ij)

ℓ = ∑_i, jn_ijln π_ij = ∑_i, jn_ijη_ij − ∑_in_i+ln (∑_jexp (η_ij))

where n_i+ = ∑_{j ∈ 𝒮_i}n_ij.

η_ij = α₁x_1ij + ⋯ + α_rx_rij = x_ij′α

then the gradient of the log-likelihood with respect to the coefficient vector α is

$$ \frac{\partial\ell}{\partial\boldsymbol{\alpha}} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} \frac{\partial\ell}{\partial\eta_{ij}} = \sum_{i,j} \boldsymbol{x}_{ij} (n_{ij}-n_{i+}\pi_{ij}) = \sum_{i,j} \boldsymbol{x}_{ij} n_{i+} (y_{ij}-\pi_{ij}) = \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) $$

and the Hessian is

$$ \frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}'} \frac{\partial\ell^2}{\partial\eta_{ij}^2} = - \sum_{i,j,k} \boldsymbol{x}_{ij} n_{i+} (\delta_{jk}-\pi_{ij}\pi_{ik}) \boldsymbol{x}_{ij}' = - \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} $$

Here y_ij = n_ij/n_i+, while N is a diagonal matrix with diagonal elements n_i+.

Newton-Raphson iterations then take the form

$$ \boldsymbol{\alpha}^{(s+1)} = \boldsymbol{\alpha}^{(s)} - \left( \frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} \right)^{-1} \frac{\partial\ell}{\partial\boldsymbol{\alpha}} = \boldsymbol{\alpha}^{(s)} + \left( \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \right)^{-1} \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) $$

where π and W are evaluated at α = α^(s).

Multiplying by X′WX gives

X′WXα^(s + 1) = X′WXα^(s) + X′N(y − π) = X′W(Xα^(s) + W⁻N(y − π)) = X′Wy^*

where W⁻ is a generalized inverse of W and y^* is a “working response vector” with elements

$$ y_{ij}^*=\boldsymbol{x}_{ij}'\boldsymbol{\alpha}^{(s)}+\frac{y_{ij}-\pi_{ij}}{\pi_{ij}} $$

The IWLS algorithm thus involves the following steps:

Create some suitable starting values for π, W, and y^*
Construct the “working dependent variable” y^*
Solve the equation

X′WXα = X′Wy^*

for α.
Compute updated η, π, W, and y^*.
Compute the updated value for the log-likelihood or the deviance

$$ d=2\sum_{i,j}n_{ij}\ln\frac{y_{ij}}{\pi_{ij}} $$
If the decrease of the deviance (or the increase of the log-likelihood) is smaller than a given tolerance criterian (typically Δd ≤ 10⁻⁷) stop the algorighm and declare it as converged. Otherwise go back to step 2 with the updated value of α.

The starting values for the algorithm used by the mclogit package are constructe as follows:

Set

$$ \eta_{ij}^{(0)} = \ln (n_{ij}+\tfrac12) - \frac1{q_i}\sum_{k\in\mathcal{S}_i}\ln (n_{ij}+\tfrac12) $$

(where q_i is the size of the choice set 𝒮_i)
Compute the starting values of the choice probabilities π_ij⁽⁰⁾ according to the equation at the beginning of the page
Compute intial values of the working dependent variable according to

$$ y_{ij}^{*(0)} = \eta_{ij}^{(0)}+\frac{y_{ij}-\pi_{ij}^{(0)}}{\pi_{ij}^{(0)}} $$

References

McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. Monographs on Statistics & Applied Probability. Boca Raton et al.: Chapman & Hall/CRC.

Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized Linear Models.” Journal of the Royal Statistical Society. Series A (General) 135 (3): 370–84. https://doi.org/10.2307/2344614.

R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

- References

The IWLS algorithm used to fit conditional logit models

References