Labellerr AI's Blog

Product Update: October 2025

Product Update: October 2025

Labellerr’s October 2025 release introduces smarter dataset tagging, reusable file viewers, SDK automation, and new canvas shortcuts. These updates enhance annotation speed, streamline workflows, and prepare teams for scalable ML operations.

YOLO11 vs YOLOv8

YOLO11 vs YOLOv8: Model Comparison

A detailed expert comparison of YOLOv8 and YOLO11 object detection models, covering performance, accuracy, hardware needs, and practical recommendations for developers and researchers.

Pill Counting System using YOLOv12

Building a Pill Counting System with Labellerr and YOLO

Fine-tuning YOLO for pill counting enables accurate detection and tracking of pills in pharmaceutical setups. Learn how to customize YOLO for your dataset to handle overlapping pills, varied lighting, and real-time counting tasks efficiently.

Understanding OpenPose: The Easy Way

Explore how OpenPose enables computers to understand human body language by detecting keypoints and forming skeletons in real time. This guide covers how it works, its real-world applications, and provides a simple, beginner-friendly approach to get started with pose estimation

Scalable, Secure AI Agent Operating System Kernel

AIOS Explained: A Secure AI Agent Operating System Kernel

AIOS (AI Agent Operating System) integrates large language models into the OS, providing a unified platform for agent deployment, scheduling, context-switching, resource allocation, persistent memory, and secure tool management.

Product Update: Sep 2025

Product Update: September 2025

Greater Transparency and Control in Project Workflows This month’s release introduces enhancements to project creation, improved handling of grouped annotations, and new validation for object-type tasks. Together, these updates make project workflows faster, more transparent, and more reliable for annotators and reviewers. 1. Start Annotating Projects Immediately with Smarter

Product Update: Aug 2025

Product Update: August 2025

Advancing Precision in Video Annotation This month’s release focuses on improving video annotation workflows, enhancing SAM2 tracking, and adding finer control for editing. These updates are designed to make annotation more accurate, efficient, and user-friendly for both annotators and reviewers. 1. Frame-by-Frame Precision in Video Annotation to Eliminate Off-Sync

Browser-Use: The Future of AI-Powered Web Automation

Browser-Use: Open-Source AI Agent For Web Automation

Browser-Use revolutionizes web automation with agentic AI—leveraging language models and dynamic HTML analysis to automate browsing, form filling, data extraction, scheduling, and multi-step workflows.

Large Language Models

LLaMA 4 Explained - Everything You Need to Know

LLaMA 4, launched by Meta in April 2025, is a breakthrough AI model. With Scout and Maverick already live (and Behemoth coming), it blends speed, efficiency, and multimodal power. Its open-weight, Mixture-of-Experts design shows that open AI can rival GPT-4.5, Gemini, and other closed systems.

Multi-Agent Systems

What are Multi-Agent Systems? A Beginner's Guide

Multi-Agent Systems (MAS) use multiple smart agents that sense, decide, and act independently while working together. Unlike traditional AI, they adapt quickly, scale easily, and power real-world solutions from traffic control to healthcare and e-commerce.

5 Open-Source Coding LLMs You Can Run Locally in 2025

Language Models

5 Open-Source Coding LLMs You Can Run Locally in 2025

In 2025, open-source coding LLMs like Qwen3-Coder, Devastral, StarCode2, Codestral, and Qwen-2.5Coder offer sophisticated multi-language support, agentic task handling, long context windows, and state-of-the-art code generation for local use.

Qwen3 Coder: The Agentic LLM-Coder Reshaping Software Development

Qwen3 Coder: Agentic LLM-Coder For Software Development

In 2025, open-source coding LLMs like Qwen3-Coder offer sophisticated multi-language support, agentic task handling, long context windows, and state-of-the-art code generation for local use—empowering developers to build, debug, translate, and optimize code securely on their own har

mixture of experts

GLM-4.5 by Zhipu AI: Model for Coding, Reasoning, and Vision

GLM-4.5 delivers state-of-the-art open-source capabilities across language, code, and multimodal vision. Combining a 355B-parameter Mixture-of-Experts architecture, dual-mode reasoning, and native tool use, it sets new standards for coding, agentic, and multilingual tasks.

Advanced Vision Language Models: Gemma 3 And 3N Explained

Gemma 3 represent a leap in vision-language AI, featuring SigLIP-based visual encoders, up to 128k-token context windows, and state-of-the-art multilingual and function-calling capabilities.

Qwen-Image & Qwen-Image-Edit

Image Generation

Qwen: AI-Powered Visual Creation and Precise Image Editing

Qwen-Image & Qwen-Image-Edit leverage 20B parameter Multimodal Diffusion Transformers for sophisticated image understanding and editing—from adding/removing objects to style transfer and bilingual text editing.

NVIDIA Isaac GR00T N1

NVIDIA Isaac GR00T N1 Foundation Model For Humanoid Robots

Isaac GR00T N1—from NVIDIA—is an open foundation model and simulation toolkit enabling powerful reasoning, multistep manipulation, and cross-embodiment skills for humanoid robots. Modular, simulation-powered, and scalable for rapid AI robot development

How Genie 3 Builds Interactive 3D Scenes from Text

Genie 3, Google DeepMind’s latest AI world model, creates immersive 3D environments in real time from simple text prompts. It supports continuous interaction, dynamic world changes, and persistent memory—revolutionizing AI simulation and agent training.

GPT-OSS Review: OpenAI's Free Model

GPT-OSS is an open-source framework for working with GPT-like models. It supports training, fine-tuning, deployment, and integration while ensuring transparency, community-driven development, and flexibility for research and production.

DINOv3 Explained: The Future of Self-Supervised Learning

DINOv3 is Meta’s open-source vision backbone trained on over a billion images using self-supervised learning. It provides pretrained models, adapters, training code, and deployment support for advanced, annotation-free vision solutions.

Product Update: July 2025

Product Update: July 2025

Our July 2025 update bring lots of improvement in image and video annotation experience, software latency and some new features. These changes will help you work faster, stay organized, and get more done. Advanced File Search & Filtering We've significantly enhanced our file search capabilities with powerful new

CVPR 2025: Breakthroughs in GenAI and Computer Vision

CVPR 2025: Breakthroughs in GenAI and Computer Vision

CVPR 2025 (June 11–15, Music City Center, Nashville & virtual) features top-tier computer vision research: 3D modeling, multimodal AI, embodied agents, AR/VR, deep learning, workshops, demos, art exhibits and robotics innovations.

Microsoft's KOSMOS-2

KOSMOS-2 Explained: Microsoft’s Multimodal Marvel

KOSMOS-2 brings grounding to vision-language models, letting AI pinpoint visual regions based on text. In this blog, I explore how well it performs through real-world experiments and highlight both its promise and limitations in grounding and image understanding.

CVPR 2025: Innovations in Computer Vision (Part 1)

CVPR 2025: Breakthroughs in Object Detection & Segmentation

CVPR 2025 (June 11–15, Music City Center, Nashville & virtual) features top-tier computer vision research: 3D modeling, multimodal AI, embodied agents, AR/VR, deep learning, workshops, demos, art exhibits and robotics innovations.

Vision Language Model

BLIP Explained: Use It For VQA & Captioning

BLIP (Bootstrapping Language‑Image Pre‑training) is a Vision‑Language Model that fuses image and text understanding. This blog dives into BLIP’s architecture, training tasks, and shows you how to set it up locally for captioning, visual QA, and cross‑modal retrieval.

SAM Fine-Tuning Using LoRA

SAM Fine-Tuning Using LoRA

Learn how to fine‑tune SAM with LoRA to achieve precise, domain‑specific segmentation without massive GPU costs. Freeze the SAM backbone, train only tiny low‑rank adapters, and deploy high‑accuracy models on modest hardware—fast, modular, and efficient.