The package “mclogit” fits
conditional logit models using a maximum likelihood estimator. It does
this by maximizing the log-likelihood function using an iterative
weighted least-squares (IWLS) algorithm, which follows the
algorithm used by the glm.fit()
function from the “stats”
package of R (Nelder and Wedderburn
1972; McCullagh and Nelder 1989; R Core Team 2023).
If πij is the probability that individual i chooses alternative j from his/her choice set 𝒮i, where
$$ \pi_{ij}=\frac{\exp(\eta_{ij})}{\sum_k{\in\mathcal{S}_i}\exp(\eta_{ik})} $$
and if yij is the dummy variable with equals 1 if individual i chooses alternative j and equals 0 otherwise, the log-likelihood function (given that the choices are identically independent distributed given πij) can be written as
ℓ = ∑i, jyijln πij = ∑i, jyijηij − ∑iln (∑jexp (ηij))
If the data are aggregated in the terms of counts such that nij is the number of individuals with the same choice set and the same choice probabilities πij that have chosen alternative j, the log-likelihood is (given that the choices are identically independent distributed given πij)
ℓ = ∑i, jnijln πij = ∑i, jnijηij − ∑ini+ln (∑jexp (ηij))
where ni+ = ∑j ∈ 𝒮inij.
If
ηij = α1x1ij + ⋯ + αrxrij = xij′α
then the gradient of the log-likelihood with respect to the coefficient vector α is
$$ \frac{\partial\ell}{\partial\boldsymbol{\alpha}} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} \frac{\partial\ell}{\partial\eta_{ij}} = \sum_{i,j} \boldsymbol{x}_{ij} (n_{ij}-n_{i+}\pi_{ij}) = \sum_{i,j} \boldsymbol{x}_{ij} n_{i+} (y_{ij}-\pi_{ij}) = \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) $$
and the Hessian is
$$ \frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}'} \frac{\partial\ell^2}{\partial\eta_{ij}^2} = - \sum_{i,j,k} \boldsymbol{x}_{ij} n_{i+} (\delta_{jk}-\pi_{ij}\pi_{ik}) \boldsymbol{x}_{ij}' = - \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} $$
Here yij = nij/ni+, while N is a diagonal matrix with diagonal elements ni+.
Newton-Raphson iterations then take the form
$$ \boldsymbol{\alpha}^{(s+1)} = \boldsymbol{\alpha}^{(s)} - \left( \frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} \right)^{-1} \frac{\partial\ell}{\partial\boldsymbol{\alpha}} = \boldsymbol{\alpha}^{(s)} + \left( \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \right)^{-1} \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) $$
where π and W are evaluated at α = α(s).
Multiplying by X′WX gives
X′WXα(s + 1) = X′WXα(s) + X′N(y − π) = X′W(Xα(s) + W−N(y − π)) = X′Wy*
where W− is a generalized inverse of W and y* is a “working response vector” with elements
$$ y_{ij}^*=\boldsymbol{x}_{ij}'\boldsymbol{\alpha}^{(s)}+\frac{y_{ij}-\pi_{ij}}{\pi_{ij}} $$
The IWLS algorithm thus involves the following steps:
Create some suitable starting values for π, W, and y*
Construct the “working dependent variable” y*
Solve the equation
X′WXα = X′Wy*
for α.
Compute updated η, π, W, and y*.
Compute the updated value for the log-likelihood or the deviance
$$ d=2\sum_{i,j}n_{ij}\ln\frac{y_{ij}}{\pi_{ij}} $$
If the decrease of the deviance (or the increase of the log-likelihood) is smaller than a given tolerance criterian (typically Δd ≤ 10−7) stop the algorighm and declare it as converged. Otherwise go back to step 2 with the updated value of α.
The starting values for the algorithm used by the mclogit package are constructe as follows:
Set
$$ \eta_{ij}^{(0)} = \ln (n_{ij}+\tfrac12) - \frac1{q_i}\sum_{k\in\mathcal{S}_i}\ln (n_{ij}+\tfrac12) $$
(where qi is the size of the choice set 𝒮i)
Compute the starting values of the choice probabilities πij(0) according to the equation at the beginning of the page
Compute intial values of the working dependent variable according to
$$ y_{ij}^{*(0)} = \eta_{ij}^{(0)}+\frac{y_{ij}-\pi_{ij}^{(0)}}{\pi_{ij}^{(0)}} $$