StrongSORT Tutorial: Master Multi-Object Tracking

Learn what StrongSORT is, how it improves multi-object tracking, and how to easily implement it in your own projects using modern detectors like YOLO.

StrongSORT model for detecting multiple objects
StrongSORT for muti object tracking

Have you ever wondered how security cameras can follow multiple people through crowded spaces, or how autonomous vehicles track dozens of cars simultaneously on busy highways?

This seemingly magical capability is powered by Multiple Object Tracking (MOT), a critical technology behind modern surveillance systems, self-driving cars, sports analytics, and even robotic automation.

Multiple object tracking has become essential for applications ranging from surveillance and autonomous driving to sports analytics and robotics. 

However, traditional tracking methods often struggle with challenges like object occlusion, identity switches, and computational efficiency. 

This is where StrongSORT emerges as a game-changing solution that significantly improves upon classic tracking algorithms.

StrongSORT, a next-generation tracking algorithm that builds on earlier methods like DeepSORT to deliver more robust, accurate, and efficient tracking.

Designed to overcome the limitations of previous approaches, StrongSORT is quickly becoming a go-to solution for reliable multi-object tracking in dynamic, real-world environments.

In this blog, we'll explore how StrongSORT works, why it's a major leap forward, and how you can implement it in your own computer vision projects.

What is Object Tracking?

Object tracking is the process of locating a moving object or multiple objects over time in a video stream. 

Unlike object detection, which identifies objects in individual frames, tracking maintains consistent identities for objects across multiple frames, creating trajectories that show how objects move through space and time.

How does Tracking work generally?

The tracking process typically involves three key components:

Detection: Algorithms identify objects of interest in each video frame using methods like YOLO (You Only Look Once) or other deep learning models. This step provides bounding boxes around detected objects along with confidence scores.

Prediction: The system predicts where each tracked object will appear in the next frame based on its previous movements. This often uses motion models like Kalman filters to estimate future positions.

Data Association: The most challenging aspect involves matching newly detected objects with existing tracks from previous frames. This process must handle scenarios where objects temporarily disappear, change appearance, or overlap with other objects.

How does StrongSORT work?

StrongSORT is not built from scratch; it’s a powerful evolution of the well-known DeepSORT multi-object tracking framework.

While DeepSORT laid the foundation for robust online tracking by integrating object detection with appearance-based re-identification, StrongSORT addresses its key limitations by upgrading core components and introducing more sophisticated association strategies.

1. Enhanced Detection Backbone

DeepSORT relies heavily on the quality of detections. StrongSORT replaces legacy detectors (like YOLOv3 or Faster R-CNN) with YOLOX-X, a state-of-the-art object detector that provides more accurate and reliable bounding boxes.

This immediately improves the quality of input to the tracking pipeline.

2. Upgraded Appearance Embedding with OSNet

DeepSORT uses a shallow CNN to extract appearance features for re-identification.

StrongSORT upgrades this to OSNet (Omni-Scale Network), a more powerful ReID network that captures multi-scale features, improving the model's ability to distinguish between visually similar objects across frames.

3. Smarter Motion Compensation

While DeepSORT uses a Kalman Filter for motion prediction, it can struggle with occlusions and abrupt motion.

StrongSORT enhances this with EMA-Kalman filtering, a strategy that adapts better to object drift and stabilizes trajectories over time.

4. Motion-Aware Matching Cascade

StrongSORT introduces a motion-aware matching cascade, which prioritizes recent and confidently tracked objects during data association. This is a more refined approach than DeepSORT's Hungarian matching and helps reduce ID switches in crowded scenes.

Tracking using StrongSORT

Tracking athletes

Tracking Cars


Tracking Planes


Tracking Peoples


Tracking people in a crowd

Implementing StrongSORT

To implement is easy using its GitHub repo, which also supports YOLO object detection.

  1. Clone the repository recursively:

git clone --recurse-submodules https://github.com/mikel-brostrom/Yolov7_StrongSORT_OSNet.git

cd Yolov7_StrongSORT_OSNet

Clone the Git Repo

  1. Make sure that you fulfill all the requirements: Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7.

To install, run:


pip install -r requirements.txt

Install the required libraries

Tracking can be run on most video formats


python track.py --source 0  # webcam
                           img.jpg  # image
                           vid.mp4  # video
                           path/  # directory
                           path/*.jpg  # glob
                           'https://youtu.be/Zgi9g1ksQHc'  # YouTube
                           'rtsp://example.com/media.mp4'

Track on any video format

If you want to track a subset of the MS COCO classes, add their corresponding index after the classes flag


python track.py --source 0 --yolo-weights yolov7.pt --classes 16 17  # tracks cats and dogs, only

Track specific classes of COCO dataset

The above applies to StrongSORT models as well. Choose a ReID model based on your needs from this ReID model zoo.

ReID models are essential for maintaining object consistency when they are momentarily occluded or leave the camera's field of view.


$ python track.py --source 0 --strong-sort-weights osnet_x0_25_market1501.pt
                                                   osnet_x0_5_market1501.pt
                                                   osnet_x0_75_msmt17.pt
                                                   osnet_x1_0_msmt17.pt
                                                   ...

Use any REID model

For the best result, use this command


python track.py --source path/to/video -
-yolo-weights yolov7x.pt --strong-sort-weights osnet_x0_25_msmt17.pt --save-vid --show-vid

CMD for best accuracy

Limitations of StrongSORT

Even though StrongSORT is a big improvement over older tracking methods, it still has some challenges, especially in tough conditions.

Like, here is an example of tracking failure due to multiple reasons like occlusion, detection failure etc.

StrongSORT, just like any other tracking algorithm, has various limitations, such as:

1. Relies Heavily on Object Detection

StrongSORT depends on a detector (YOLO) to find objects. If this detector misses something or draws bad boxes, the tracker can’t do its job properly. This is a problem in poor lighting, bad weather, or when objects are too small, tilted, or oddly shaped.

2. Needs Good Hardware

Compared to simple trackers, StrongSORT uses more advanced steps like deep feature extraction and motion modeling. These steps need more processing power, which can slow things down on weaker devices, especially if you're tracking many objects at once.

3. Appearance Matching Can Struggle

StrongSORT uses a model to recognize how an object looks. But if an object’s appearance changes a lot (due to lighting, angle, or partial hiding), the model might get confused. Also, if the model hasn’t seen similar-looking objects during training, it might not work well.

4. Tuning Settings is Important

To get the best results, you need to tweak several settings in StrongSORT, like how strict the tracker is when matching objects across frames. This can be tricky and time-consuming, especially if your video conditions change often.

5. Crowded Scenes Cause Problems

In very busy scenes with many similar-looking objects, the tracker can make mistakes, like switching identities or losing track of people. Also, if object sizes vary a lot in the same frame, it can confuse the system.

6. Struggles with Long Occlusions

StrongSORT can reconnect objects that go missing for a short time using AFLink. But if an object disappears for too long, it’s hard for the tracker to link it correctly again, especially if it comes back in a different spot or looks slightly different.

Conclusion

StrongSORT is a powerful object tracking system that improves on older methods like DeepSORT by using better tools for detecting and tracking objects.

With upgrades like newer YOLO for detection and advanced motion tracking, it performs very well in tests, especially in complex scenes.

It’s great at handling challenges like occlusion (when objects are blocked) and fast movement.

Features like AFLink help keep track of objects over time without slowing things down too much. This makes it useful for self-driving cars, security cameras, and smart city systems.

However, StrongSORT still has some limits. It needs a good detector, strong hardware, and careful setup to work well. Teams using it should be ready to adjust settings and check if their devices can handle it.

As demand for smart tracking grows, StrongSORT is a strong choice for developers building advanced computer vision systems.

FAQs

What is StrongSORT and how is it different from DeepSORT?

StrongSORT is an advanced object tracking algorithm that improves upon DeepSORT by integrating better object detection (YOLOX), re-identification (OSNet), and smarter motion models (EMA-Kalman).

What are the hardware requirements to run StrongSORT?

StrongSORT requires a GPU for optimal performance due to its deep feature extraction and motion modeling steps. Running on CPU can be slow for real-time tracking.

How can I implement StrongSORT in my project?

You can clone the GitHub repo, install dependencies from requirements.txt, and run track.py With your video source and chosen detection/ReID models.

References

StrongSORT using YOLO | GitHub

Free
Data Annotation Workflow Plan

Simplify Your Data Annotation Workflow With Proven Strategies

Download the Free Guide