---
title: The IWLS algorithm used to fit conditional logit models
output: rmarkdown::html_vignette
vignette: >
  % \VignetteIndexEntry{The IWLS algorithm used to fit conditional logit models}
  % \VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: mclogit.bib
---

The package "mclogit" fits conditional logit models using a maximum
likelihood estimator. It does this by maximizing the log-likelihood
function using an *iterative weighted least-squares* (IWLS) algorithm,
which follows the algorithm used by the `glm.fit()` function from the
"stats" package of *R* [@nelder.wedderburn:glm;@mccullagh.nelder:glm.2ed;@Rcore]. 

If $\pi_{ij}$ is the probability that individual $i$ chooses
alternative $j$ from his/her choice set $\mathcal{S}_i$, where

$$
\pi_{ij}=\frac{\exp(\eta_{ij})}{\sum_k{\in\mathcal{S}_i}\exp(\eta_{ik})}
$$

and if $y_{ij}$ is the dummy variable with equals 1 if individual
$i$ chooses alternative $j$ and equals 0 otherwise, the
log-likelihood function (given that the choices are identically
independent distributed given $\pi_{ij}$) can be written as

$$
\ell=\sum_{i,j}y_{ij}\ln\pi_{ij}
    =\sum_{i,j}y_{ij}\eta_{ij}-\sum_i\ln\left(\sum_j\exp(\eta_{ij})\right)
$$

If the data are aggregated in the terms of counts such that
$n_{ij}$ is the number of individuals with the same choice set and
the same choice probabilities $\pi_{ij}$ that have chosen
alternative $j$, the log-likelihood is (given that the choices are
identically independent distributed given $\pi_{ij}$)

$$
\ell=\sum_{i,j}n_{ij}\ln\pi_{ij}
    =\sum_{i,j}n_{ij}\eta_{ij}-\sum_in_{i+}\ln\left(\sum_j\exp(\eta_{ij})\right)
$$

where $n_{i+}=\sum_{j\in\mathcal{S}_i}n_{ij}$.

If

$$
\eta_{ij} =
\alpha_1x_{1ij}+\cdots+\alpha_rx_{rij}=\boldsymbol{x}_{ij}'\boldsymbol{\alpha}
$$

then the gradient of the log-likelihood with respect to the coefficient
vector $\boldsymbol{\alpha}$ is

$$
\frac{\partial\ell}{\partial\boldsymbol{\alpha}}
=
\sum_{i,j}
\frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}}
\frac{\partial\ell}{\partial\eta_{ij}}
=
\sum_{i,j}
\boldsymbol{x}_{ij}
(n_{ij}-n_{i+}\pi_{ij})
=
\sum_{i,j}
\boldsymbol{x}_{ij}
n_{i+}
(y_{ij}-\pi_{ij})
=
\boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})
$$

and the Hessian is

$$
\frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'}
=
\sum_{i,j}
\frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}}
\frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}'}
\frac{\partial\ell^2}{\partial\eta_{ij}^2}
=
-
\sum_{i,j,k}
\boldsymbol{x}_{ij}
n_{i+}
(\delta_{jk}-\pi_{ij}\pi_{ik})
\boldsymbol{x}_{ij}'
=
-
\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}
$$

Here $y_{ij}=n_{ij}/n_{i+}$, while
$\boldsymbol{N}$ is a diagonal matrix with diagonal elements
$n_{i+}$.

Newton-Raphson iterations then take the form

$$
\boldsymbol{\alpha}^{(s+1)}
=
\boldsymbol{\alpha}^{(s)}
-
\left(
\frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'}
\right)^{-1}
\frac{\partial\ell}{\partial\boldsymbol{\alpha}}
=
\boldsymbol{\alpha}^{(s)}
+
\left(
\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}
\right)^{-1}
\boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})
$$

where $\boldsymbol{\pi}$ and $\boldsymbol{W}$ are evaluated at
$\boldsymbol{\alpha}=\boldsymbol{\alpha}^{(s)}$.

Multiplying by $\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}$ gives

$$
\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}
\boldsymbol{\alpha}^{(s+1)}
=
\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}
\boldsymbol{\alpha}^{(s)}
+
\boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})
=
\boldsymbol{X}'\boldsymbol{W}
\left(\boldsymbol{X}\boldsymbol{\alpha}^{(s)}+\boldsymbol{W}^-\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})\right)
=
\boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^*
$$

where $\boldsymbol{W}^-$ is a generalized inverse of $\boldsymbol{W}$
and $\boldsymbol{y}^*$ is a "working response vector" with elements

$$
y_{ij}^*=\boldsymbol{x}_{ij}'\boldsymbol{\alpha}^{(s)}+\frac{y_{ij}-\pi_{ij}}{\pi_{ij}}
$$

The IWLS algorithm thus involves the following steps:

1.  Create some suitable starting values for $\boldsymbol{\pi}$,
    $\boldsymbol{W}$, and $\boldsymbol{y}^*$

2.  Construct the "working dependent variable" $\boldsymbol{y}^*$

3.  Solve the equation

    $$
    \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}
    \boldsymbol{\alpha}
    =
    \boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^*
    $$

    for $\boldsymbol{\alpha}$.

4.  Compute updated $\boldsymbol{\eta}$, $\boldsymbol{\pi}$,
    $\boldsymbol{W}$, and $\boldsymbol{y}^*$.

5.  Compute the updated value for the log-likelihood or the deviance

    $$
    d=2\sum_{i,j}n_{ij}\ln\frac{y_{ij}}{\pi_{ij}}
    $$

6.  If the decrease of the deviance (or the increase of the
    log-likelihood) is smaller than a given tolerance criterian
    (typically $\Delta d \leq 10^{-7}$) stop the algorighm and declare
    it as converged. Otherwise go back to step 2 with the updated value
    of $\boldsymbol{\alpha}$.

The starting values for the algorithm used by the *mclogit* package are
constructe as follows:

1.  Set

    $$
    \eta_{ij}^{(0)} = \ln (n_{ij}+\tfrac12)
                    - \frac1{q_i}\sum_{k\in\mathcal{S}_i}\ln (n_{ij}+\tfrac12)
    $$

    (where $q_i$ is the size of the choice set $\mathcal{S}_i$)

2.  Compute the starting values of the choice probabilities
    $\pi_{ij}^{(0)}$ according to the equation at the beginning of
    the page

3.  Compute intial values of the working dependent variable according to

    $$
    y_{ij}^{*(0)}
    =
    \eta_{ij}^{(0)}+\frac{y_{ij}-\pi_{ij}^{(0)}}{\pi_{ij}^{(0)}}
    $$

# References