core/reconstruction/track_fitting.md

0001 # Track Fitting
0002
0003 The track fitting algorithms estimate the track parameters.
0004 It is part of the pattern recognition/track  reconstruction/tracking.
0005 We can run the track fitting algorithms, after we allocated all hits to single tracks with the help of a track finding algorithm.
0006 It is not necessary, that all points of a track are present.
0007
0008 Currently, we have implementations for three different fitters:
0009 * Kalman Filter (KF)
0010 * Gaussian Sum Filter (GSF)
0011 * Global Chi-Square Fitter (GX2F) [in development]
0012 Even though all of them are least-squares fits, the concepts are quite different.
0013 Therefore, we should not expect identical results from all of them.
0014
0015 (kf_core)=
0016 ## Kalman Filter (KF) [wip]
0017 The Kalman Filter is an iterative fitter.
0018 It successively combines measurements to obtain an estimate of the track parameters.
0019 The KF needs an estimate as a starting point. The procedure alternates between two methods:
0020 1. Extrapolate the current state to the next surface.
0021 2. Update the extrapolation using the measurement of the new surface.[^billoir]
0022 The meaning of "this surface" and "the next surface" changes with the context.
0023 There are three different interpretations for this.
0024 The KF can give us those three interpretations as sets of track parameters:
0025     * predicted: Uses "older" data (i.e. from the last surfaces) to make the prediction. This prediction is an extrapolation from the old data onto the current surface.
0026     * filtered: Uses the "current" data (i.e. the predicted data updated with the measurement on the current surface). It is some kind of weighted mean.
0027     * smoothed: Uses the "future" data to predict the current parameters. This can only be evaluated if the whole propagation is finished once. This can be done in to ways: one uses backwards-propagation and one does not.
0028
0029 :::{todo}
0030 Complete Kalman Filter description
0031 :::
0032
0033 (gsf_core)=
0034 ## Gaussian Sum Filter (GSF)
0035
0036 The GSF is an extension of the Kalman-Filter that allows to handle non-gaussian errors by modelling the track state as a gaussian mixture:
0037
0038 $$
0039 p(\vec{x}) = \sum_i w_i \varphi(\vec{x}; \mu_i, \Sigma_i), \quad \sum_i w_i = 1
0040 $$
0041
0042 A common use case of this is electron fitting. The energy-loss of Bremsstrahlung for electrons in matter are highly non-Gaussian, and thus cannot be modelled accurately by the default material interactions in the Kalman Filter. Instead, the Bremsstrahlung is modelled as a Bethe-Heitler distribution, where $z$ is the fraction of the energy remaining after the interaction ($E_f/E_i$), and $t$ is the material thickness in terms of the radiation length:
0043
0044 $$
0045 f(z) = \frac{(- \ln z)^{c-1}}{\Gamma(c)}, \quad c = t/\ln 2
0046 $$
0047
0048 (figBetheHeitler)=
0049 :::{figure} figures/gsf_bethe_heitler_approx.svg
0050 :width: 450px
0051 :align: center
0052 The true Bethe-Heitler distribution compared with a gaussian mixture approximation (in thin lines the individual components are drawn) at t = 0.1 (corresponds to ~ 10mm Silicon).
0053 :::
0054
0055 To be able to handle this with the Kalman filter mechanics, this distribution is approximated by a gaussian mixture as well (see {numref}`figBetheHeitler`). The GSF Algorithm works then as follows (see also {numref}`figGsf`)
0056
0057 * On a surface with material, the Bethe-Heitler energy-loss distribution is approximated with a fixed number of gaussian components for each component. Since this way the number of components would grow exponentially with each material interaction, components that are close in terms of their *Kullback–Leibler divergence* are merged to limit the computational cost.
0058 * On a measurement surface, for each component a Kalman update is performed. Afterwards, the component weights are corrected according to each component's compatibility with the measurement.
0059
0060 (figGsf)=
0061 :::{figure} figures/gsf_overview.svg
0062 :width: 450px
0063 :align: center
0064 Simplified overview of the GSF algorithm.
0065 :::
0066
0067 ### The Multi-Stepper
0068 To implement the GSF, a special stepper is needed, that can handle a multi-component state internally: The {class}`Acts::MultiEigenStepperLoop`, which is based on the {class}`Acts::EigenStepper` and thus shares a lot of code with it. It interfaces to the navigation as one aggregate state to limit the navigation overhead, but internally processes a multi-component state. How this aggregation is performed can be configured via a template parameter, by default weighted average is used ({struct}`Acts::WeightedComponentReducerLoop`).
0069
0070 Even though the multi-stepper interface exposes only one aggregate state and thus is compatible with most standard tools, there is a special aborter is required to stop the navigation when the surface is reached, the {struct}`Acts::MultiStepperSurfaceReached`. It checks if all components have reached the target surface already and updates their state accordingly. Optionally, it also can stop the propagation when the aggregate state reaches the surface.
0071
0072
0073 ### Using the GSF
0074
0075 The GSF is implemented in the class {struct}`Acts::GaussianSumFitter`. The interface of its `fit(...)`-functions is very similar to the one of the {class}`Acts::KalmanFitter` (one for the standard {class}`Acts::Navigator` and one for the {class}`Acts::DirectNavigator` that takes an additional `std::vector<const Acts::Surface *>` as an argument):
0076
0077 ```{doxygenstruct} Acts::GaussianSumFitter
0078 ---
0079 members: fit
0080 outline:
0081 ---
0082 ```
0083
0084 The fit can be customized with several options. Important ones are:
0085 * *maximum components*: How many components at maximum should be kept.
0086 * *weight cut*: When to drop components.
0087 * *component merging*: How a multi-component state is reduced to a single set of parameters and covariance. The method can be chosen with the enum {enum}`Acts::ComponentMergeMethod`. Two methods are supported currently:
0088     * The *mean* computes the mean and the covariance of the mean.
0089     * *max weight* takes the parameters of component with the maximum weight and computes the variance around these. This is a cheap approximation of the mode, which is not implemented currently.
0090 * *mixture reduction*: How the number of components is reduced to the maximum allowed number. Can be configured via a {class}`Acts::Delegate`:
0091     * *Weight cut*: Keep only the N components with the largest weights. Implemented in {func}`Acts::reduceMixtureLargestWeights`.
0092     * *KL distance*: Merge the closest components until the required amount is reached. The distance measure is the *Kullback-Leibler distance* in the *q/p* component. Implemented in {func}`Acts::reduceMixtureWithKLDistance`.
0093
0094 :::{note}
0095 A good starting configuration is to use 12 components, the *max weight* merging and the *KL distance* reduction.
0096 :::
0097
0098 All options can be found in the {struct}`Acts::GsfOptions`:
0099
0100 ```{doxygenstruct} Acts::GsfOptions
0101 ---
0102 outline:
0103 ---
0104 ```
0105
0106 If the GSF finds the column with the string identifier *"gsf-final-multi-component-state"* (defined in `Acts::GsfConstants::kFinalMultiComponentStateColumn`) in the track container, it adds the final multi-component state to the track as a `std::optional<Acts::MultiComponentBoundTrackParameters<SinglyCharged>>` object.
0107
0108 A GSF example can be found in the ACTS Examples Framework [here](https://github.com/acts-project/acts/blob/main/Examples/Scripts/Python/truth_tracking_gsf.py).
0109
0110 ### Customising the Bethe-Heitler approximation
0111
0112 The GSF needs an approximation of the Bethe-Heitler distribution as a Gaussian mixture on each material interaction (see above). This task is delegated to a separate class, that can be provided by a template parameter to {struct}`Acts::GaussianSumFitter`, so in principle it can be implemented in different ways.
0113
0114 However, ACTS ships with the class {class}`Acts::AtlasBetheHeitlerApprox` that implements the ATLAS strategy for this task: To be able to evaluate the approximation of the Bethe-Heitler distribution for different materials and thicknesses, the individual Gaussian components (weight, mean, variance of the ratio $E_f/E_i$) are parametrised as polynomials in $x/x_0$. This class can load files in the ATLAS format that can be found [here](https://gitlab.cern.ch/atlas/athena/-/tree/main/Tracking/TrkFitter/TrkGaussianSumFilter/Data). A default parameterization can be created with {func}`Acts::makeDefaultBetheHeitlerApprox`.
0115
0116 The {class}`Acts::AtlasBetheHeitlerApprox` is constructed with two parameterizations, allowing to use different parameterizations for different $x/x_0$. In particular, it has this behaviour:
0117 * $x/x_0 < 0.0001$: Return no change
0118 * $x/x_0 < 0.002$: Return a single gaussian approximation
0119 * $x/x_0 < 0.1$: Return the approximation for low $x/x_0$.
0120 * $x/x_0 \geq 0.1$: Return the approximation for high $x/x_0$. The maximum possible value is $x/x_0 = 0.2$, for higher values it is clipped to 0.2 and the GSF emits a warning.
0121
0122 ### Further reading
0123
0124 * *Thomas Atkinson*, Electron reconstruction with the ATLAS inner detector, 2006, see [here](https://cds.cern.ch/record/1448253)
0125 * *R Frühwirth*, Track fitting with non-Gaussian noise, 1997, see [here](https://doi.org/10.1016/S0010-4655(96)00155-5)
0126 * *R Frühwirth*, A Gaussian-mixture approximation of the Bethe–Heitler model of electron energy loss by bremsstrahlung, 2003, see [here](https://doi.org/10.1016/S0010-4655(03)00292-3)
0127
0128 (gx2f_core)=
0129 ## Global Chi-Square Fitter (GX2F)
0130
0131 In general the *GX2F* is a weighted least squares fit, minimising the
0132
0133 $$
0134 \chi^2 = \sum_i \frac{r_i^2}{\sigma_i^2}
0135 $$
0136
0137 of a track.
0138 Here, $r_i$ are our residuals that we weight with $\sigma_i^2$, the covariance of the measurement (a detector property).
0139 Unlike the *KF* and the *GSF*, the *GX2F* looks at all measurements at the same time and iteratively minimises the starting parameters.
0140
0141 With the *GX2F* we can obtain the final parameters $\vec\alpha_n$ from starting parameters $\vec\alpha_0$.
0142 We set the $\chi^2 = \chi^2(\vec\alpha)$ as a function of the track parameters, but the $\chi^2$-minimisation could be used for many other problems.
0143 Even in the context of track fitting, we are quite free on how to use the *GX2F*.
0144 Especially the residuals $r_i$ can have many interpretations.
0145 Most of the time we will see them as the distance between a measurement and our prediction.
0146 But we can also use scattering angles, energy loss, ... as residuals.
0147 Therefore, the subscript $i$ stands most of the time for a measurement surface, since we want to go over all of them.
0148
0149 This chapter on the *GX2F* guides through:
0150 - Mathematical description of the base algorithm
0151 - Mathematical description of the multiple scattering
0152 - (coming soon) Mathematical description of the energy loss
0153 - Implementation in ACTS
0154 - Pros/Cons
0155
0156 ### Mathematical description of the base algorithm
0157
0158 :::{note}
0159 The mathematical derivation is shortened at some places.
0160 There will be a publication including the full derivation coming soon.
0161 :::
0162
0163 To begin with, there will be a short overview on the algorithm.
0164 Later in this section, each step is described in more detail.
0165 1. Minimise the $\chi^2$ function
0166 2. Update the initial parameters (iteratively)
0167 3. Calculate the covariance for the final parameters
0168
0169 But before going into detail, we need to introduce a few symbols.
0170 As already mentioned, we have our track parameters $\vec\alpha$ that we want to fit.
0171 To fit them we, we need to calculate our residuals as
0172
0173 $$
0174 r_i = m_i - f_i^m(\vec\alpha)
0175 $$
0176
0177 where $f^m(\vec\alpha)$ is the projection of our propagation function $f(\vec\alpha)$ into the measurement dimension.
0178 Basically, if we have a pixel measurement we project onto the surface, discarding all angular information.
0179 This projection could be different for each measurement surface.
0180
0181 #### 1. Minimise the $\chi^2$ function
0182
0183 We expect the minimum of the $\chi^2$ function at
0184
0185 $$
0186 \frac{\partial\chi^2(\vec\alpha)}{\partial\vec\alpha} = 0.
0187 $$
0188
0189 To find the zero(s) of this function we could use any method, but we will stick to a modified [Newton-Raphson method](https://en.wikipedia.org/wiki/Newton%27s_method),
0190 since it requires just another derivative of the $\chi^2$ function.
0191
0192 #### 2. Update the initial parameters (iteratively)
0193
0194 Since we are using the Newton-Raphson method to find the minimum of the $\chi^2$ function, we need to iterate.
0195 Each iteration (should) give as improved parameters $\vec\alpha$.
0196 While iterating we update a system, therefore we want to bring it in this form:
0197
0198 $$
0199 \vec\alpha_{n+i} = \vec\alpha_n + \vec{\delta\alpha}_n.
0200 $$
0201
0202 After some derivations of the $\chi^2$ function and the Newton-Raphson method, we find matrix equation to calculate $\vec{\delta\alpha}_n$:
0203
0204 $$
0205 [a_{kl}] \vec{\delta\alpha}_n = \vec b
0206 $$
0207
0208 with
0209
0210 $$
0211 a_{kl} = \sum_{i=1}^N \frac{1}{\sigma_i^2} \frac{\partial f_i^m(\vec\alpha)}{\partial \alpha_k}\frac{\partial f_i^m(\vec\alpha)}{\partial \alpha_l}\\
0212 $$
0213
0214 (where we omitted second order derivatives) and
0215
0216 $$
0217 b_k = \sum_{i=1}^N \frac{r_i}{\sigma_i^2} \frac{\partial f_i^m(\vec\alpha)}{\partial \alpha_k}.
0218 $$
0219
0220 At first sight, these expression might seem intimidating and hard to compute.
0221 But having a closer look, we see, that those derivatives already exist in our framework.
0222 All derivatives are elements of the Jacobian
0223
0224 $$
0225 \mathbf{J} = \begin{pmatrix}
0226                  \cdot & \dots & \cdot\\
0227                  \vdots & \frac{\partial f^m(\vec\alpha)}{\partial \alpha_k} & \vdots\\
0228                  \cdot & \dots & \cdot
0229              \end{pmatrix}.
0230 $$
0231
0232 At this point we got all information to perform a parameter update and repeat until the parameters $\vec\alpha$ converge.
0233
0234 #### 3. Calculate the covariance for the final parameters
0235
0236 The calculation of the covariance of the final parameters is quite simple compared to the steps before:
0237
0238 $$
0239 cov_{\vec\alpha} = [a_{kl}]^{-1}
0240 $$
0241
0242 Since it only depends on the $[a_{kl}]$ of the last iteration, the *GX2F* does not need an initial estimate for the covariance.
0243
0244 ### Mathematical description of the multiple scattering
0245
0246 To describe multiple scattering, the *GX2F* can fit the scattering angles as they were normal parameters.
0247 Of course, fitting more parameters increases the dimensions of all matrices.
0248 This makes it computationally more expensive to.
0249
0250 But first shortly recap on multiple scattering.
0251 To describe scattering, on a surface, only the two angles $\theta$ and $\phi$ are needed, where:
0252 - $\theta$ is the angle between the extrapolation of the incoming trajectory and the scattered trajectory
0253 - $\phi$ is the rotation around the extrapolation of the incoming trajectory
0254
0255 This description is only valid for thin materials.
0256 To model thicker materials, one could in theory add multiple thin materials together.
0257 It can be shown, that it is enough to two sets of $\theta$ and $\phi$ on both sides of the material.
0258 We could name them $\theta_{in}$, $\theta_{out}$, $\phi_{in}$, and $\phi_{out}$.
0259 But in the end they are just multiple parameters in our fit.
0260 That's why we will only look at $\theta$ and $\phi$ (like for thin materials).
0261
0262 By defining residuals and covariances for the scattering angles, we can put them into our $\chi^2$ function.
0263 For the residuals we choose (since the expected angle is 0)
0264
0265 $$
0266 r_s = \begin{cases}
0267          0 - \theta_s(\vec\alpha) \\
0268          0 - \sin(\theta_{loc})\phi_s(\vec\alpha)
0269       \end{cases}
0270 $$
0271
0272 with $\theta_{loc}$ the angle between incoming trajectory and normal of the surface.
0273 (We cannot have angle information $\phi$ if we are perpendicular.)
0274 For the covariances we use the Highland form [^cornelissen] as
0275
0276 $$
0277 \sigma_{scat} = \frac{13.6 \text{MeV}}{\beta c p} Z\prime \sqrt{\frac{x}{X_0}} \left( 1 + 0.038 \ln{\frac{x}{X_0}} \right)
0278 $$
0279
0280 with
0281 - $x$ ... material layer with thickness
0282 - $X_0$ ... radiation length
0283 - $p$ ... particle momentum
0284 - $Z\prime$ ... charge number
0285 - $\beta c$ ... velocity
0286
0287 Combining those terms we can write our $\chi^2$ function including multiple scattering as
0288
0289 $$
0290 \chi^2 = \sum_{i=1}^N \frac{r_i^2}{\sigma_i^2} + \sum_{s}^S \left(\frac{\theta_s^2}{\sigma_s^2} + \frac{\sin^2{(\theta_{loc})}\phi_s^2}{\sigma_s^2}\right)
0291 $$
0292
0293 Note, that both scattering angles have the same covariance.
0294
0295 ### (coming soon) Mathematical description of the energy loss [wip]
0296
0297 :::{todo}
0298 Write *GX2F*: Mathematical description of the energy loss
0299
0300 The development work on the energy loss has not finished yet.
0301 :::
0302
0303 ### Implementation in ACTS
0304
0305 The implementation is in some points similar to the KF, since the KF interface was chosen as a starting point.
0306 This makes it easier to replace both fitters with each other.
0307 The structure of the *GX2F* implementation follows coarsely the mathematical outline given above.
0308 It is best to start reading the implementation from `fit()`:
0309 1. Set up the fitter:
0310    - Actor
0311    - Aborter
0312    - Propagator
0313    - Variables we need longer than one iteration
0314 2. Iterate
0315    1. Update parameters
0316    2. Propagate through geometry
0317    3. Collect:
0318        - Residuals
0319        - Covariances
0320        - Projected Jacobians
0321    4. Loop over collection and calculate and sum over:
0322        - $\chi^2$
0323        - $[a_{kl}]$
0324        - $\vec b$
0325    5. Solve $[a_{kl}] \vec{\delta\alpha}_n = \vec b$
0326    6. Check for convergence
0327 3. Calculate covariance of the final parameters
0328 4. Prepare and return the final track
0329
0330 #### Configuration
0331
0332 Here is a simplified example of the configuration of the fitter.
0333
0334 ```cpp
0335 template <typename traj_t>
0336 struct Gx2FitterOptions {
0337 Gx2FitterOptions( ... ) : ... {}
0338
0339 Gx2FitterOptions() = delete;
0340
0341 ...
0342 //common options:
0343 // geoContext, magFieldContext, calibrationContext, extensions,
0344 // propagatorPlainOptions, referenceSurface, multipleScattering,
0345 // energyLoss, freeToBoundCorrection
0346
0347 /// Max number of iterations during the fit (abort condition)
0348 size_t nUpdateMax = 5;
0349
0350 /// Disables the QoP fit in case of missing B-field
0351 bool zeroField = false;
0352
0353 /// Check for convergence (abort condition). Set to 0 to skip.
0354 double relChi2changeCutOff = 1e-7;
0355 };
0356 ```
0357
0358 Common options like the geometry context or toggling of the energy loss are similar to the other fitters.
0359 For now there are three *GX2F* specific options:
0360 1. `nUpdateMax` sets an abort condition for the parameter update as a maximum number of iterations allowed.
0361 We do not really want to use this condition, but it stops the fit in case of poor convergence.
0362 2. `zeroField` toggles the q/p-fit.
0363 If there is no magnetic field, we get no q/p-information.
0364 This would crash the fitter when needing matrix inverses.
0365 When this option is set to `true`, most of the matrices will omit the q/p-rows and -columns.
0366 3. `relChi2changeCutOff` is the desired convergence criterion.
0367 We compare at each step of the iteration the current to the previous $\chi^2$.
0368 If the relative change is small enough, we finish the fit.
0369
0370 ### Pros/Cons
0371
0372 There are some reasons for and against the *GX2F*.
0373 The biggest issue of the *GX2F* is its performance.
0374 Currently, the most expensive part is the propagation.
0375 Since we need to do a full propagation each iteration, we end up with at least 4-5 full propagation.
0376 This is a lot compared to the 2 propagations of the *KF*.
0377 However, since the *GX2F* is a global fitter, it can easier resolve left-right-ambiguous measurements, like in the TRT (Transition Radiation Tracker – straw tubes).
0378
0379 [^billoir]: [https://twiki.cern.ch/twiki/pub/LHCb/ParametrizedKalman/paramKalmanV01.pdf](https://twiki.cern.ch/twiki/pub/LHCb/ParametrizedKalman/paramKalmanV01.pdf)
0380 [^cornelissen]: [https://cds.cern.ch/record/1005181/files/thesis-2006-072.pdf#page=80](https://cds.cern.ch/record/1005181/files/thesis-2006-072.pdf)