Track Crowds in Real-Time with FairMOT - A Detailed Tutorial
FairMOT is a real-time, anchor-free tracking system that solves identity switch issues by combining object detection and re-identification in a single network, ideal for crowded scenes, surveillance, sports, and autonomous systems.

What if I told you that current multi-object tracking systems fail up to 35% of the time in crowded scenarios?
Imagine a surveillance system that loses track of a suspect the moment they pass behind a crowd. A self-driving car that "forgets" a pedestrian at a crosswalk after a brief occlusion. An athlete tracking system that swaps jerseys mid-play.
These aren't hypothetical failures; they're daily realities of multi-object tracking (MOT) limitations that plague our most critical applications.
Like, established algorithms like SORT and DeepSORT consistently underperform in complex scenarios?
Despite the extensive research, the fundamental issues persist: objects disappear during occlusion, identities get swapped in crowded scenes, and tracking accuracy plummets when objects move non-linearly.
The answer lies in their fundamental architectural flaws of the two-step tracking methods of separate detection and re-identification.
This flaw is solved by FairMOT, which uses a novel approach in object tracking by performing both tasks in parallel.
In this blog, we are going to see how this new approach works in the real world and how you can implement FairMOT yourself.
What is Object Tracking?
Multi‑Object Tracking (MOT) aims to detect objects (e.g., people, vehicles) in each video frame and associate these detections over time to form continuous trajectories.
Traditional MOT pipelines follow a tracking‑by‑detection step:
- Detection step: A detector (e.g., Faster R‑CNN, YOLO) finds bounding boxes per frame.
- Association step: Appearance (ReID) and motion cues link detections to existing tracks.
While two‑step methods benefit from specialized detectors and ReID networks, they suffer from high computational cost and feature inconsistency.
One‑shot trackers merge these tasks into a single network but frequently favour detection, leading to high ID switch rates and unstable identity persistence in crowded scenes
What is FairMOT and what previous tracking problem does it solve?
FairMOT is a single‐step, anchor‐free tracking system built on the CenterNet detector.
Earlier trackers treated appearance (ReID) as a side task and focused mostly on finding objects. That often caused identity confusion when people crossed paths.
FairMOT fixes this by giving detection and ReID equal weight, so it learns to both locate objects and recognize them consistently. Its main ideas are:
- Two equal branches: One head predicts object centers and box sizes, and the other head produces a feature vector (embedding) for each pixel.
- Center‐based sampling: We grab each object’s appearance embedding exactly at its detected center. This precision cuts identity switches by nearly 30% compared to older methods.
- Balanced training loss: We use an uncertainty‐based formula to keep the detection loss and ReID loss in check so that neither task overwhelms the other.
By training these branches together, FairMOT tracks objects more reliably. It reduces “ID switch” mistakes, boosts overall accuracy, and still runs in real time.
Tracking using FairMOT
Tracking in a large area
Tracking in a small area
Tracking running athletes
Tracking jogging people
Implementing FairMOT using GitHub
Performing tracking using FairMOT is easy; you have to clone its official repo and follow a few steps.
git clone https://github.com/ifzhang/FairMOT.git
Then create a conda environment
conda create -n FairMOT
conda activate FairMOT
After that, install the required libraries with these versions
conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=10.2 -c pytorch
cd ${FAIRMOT_ROOT}
Also, install other requirements and you also need ffmpeg.
pip install cython
pip install -r requirements.txt
Then install DCNv2, which is important for FairMOT to perform
git clone -b pytorch_1.7 https://github.com/ifzhang/DCNv2.git
cd DCNv2
./make.sh
Also in the model dir, we need to store a pretrained model for detection.
In the FairMOT official Git repository, various pre-trained models are available on a large dataset that can be utilized for our tracking purposes.
One such is fairmot_dla34.pth (Download and store it models dir).
Now, you can perform tracking on any video using this command.
cd src
python demo.py mot --load_model ../models/fairmot_dla34.pth --input-video /path/to/your/video.mp4 --output-root /path/to/output/directory --conf_thres 0.4
Limitations of FairMOT
Despite its significant improvements over previous methods, FairMOT faces several limitations that affect its performance in certain scenarios:
- Small Object Tracking: Performance degrades for tiny objects due to limited feature resolution.
- Heavy Crowds: Extremely dense scenes can still induce identity switches when multiple centers overlap.
- Static ReID Features: Embeddings are extracted per frame; long‑term occlusions may break tracks without temporal smoothing.
- Hyperparameter Sensitivity: Association thresholds and loss‑weight balances require careful tuning for new domains.
Despite these caveats, FairMOT sets a strong baseline for real‑time, fair multi‑object tracking.
Conclusion
FairMOT represents a transformative breakthrough in multi-object tracking by fundamentally addressing the fairness problem that has limited previous tracking algorithms.
Through its innovative anchor-free design and equal treatment of detection and re-identification tasks, FairMOT achieves unprecedented performance improvements, with 76.5 MOTA and 79.3 IDF1 on challenging datasets while maintaining real-time inference speeds of 30 FPS.
This represents a paradigm shift from the traditional "detection first, re-ID secondary" approach to a truly fair and balanced tracking system.
FAQs
What problem does FairMOT solve in multi-object tracking?
FairMOT solves the identity switch problem by treating object detection and re-identification equally, unlike traditional two-step trackers.
Is FairMOT suitable for real-time applications?
Yes, FairMOT runs at 30 FPS, making it highly suitable for real-time applications such as surveillance, traffic monitoring, and sports analytics.
How is FairMOT different from DeepSORT or SORT?
Unlike DeepSORT or SORT, FairMOT uses a single network that performs detection and ReID simultaneously, resulting in better IDF1 and fewer ID switches.
References

Simplify Your Data Annotation Workflow With Proven Strategies
.png)
