YOLO11 + OCR: AI-Based Fashion Brand Scanner
What Is AI-Based Inventory Automation? Inventory management serves as the backbone of the retail industry, yet the process has remained surprisingly outdated for decades. It involves logging critical product details Brand, Size, and Price to track stock levels efficiently. In the past, this was a strictly manual job. Shopkeepers and warehouse managers would spend hours with clipboards, physically handling every garment to decipher the tags and manually enter digits into Excel. This method is slow, prone to costly human errors, and difficult to scale as a business grows. Today, we are using computer vision to completely transform this workflow.
Transitioning from manual entry to AI automation improves both speed and data integrity. An AI system treats a price tag scan as a sequence of intelligent, high-speed decisions. For every video frame captured by a webcam, the AI must answer two critical questions: "Where is the data?" and "What does it say?"
The first question presents a localization challenge, which we solve using object detection. The second is a semantic problem, solved by Optical Character Recognition (OCR). By combining these technologies, we create a powerful system that "sees" a shirt tag not just as a piece of paper, but as structured data waiting to be captured.
The Core Tech: YOLO11 Meets EasyOCR
This project combines two distinct branches of Artificial Intelligence to solve a specific problem: reading messy fashion tags in real-time. Unlike standardized barcodes, retail tags vary wildly in font, color, and layout.
We paired YOLO11 with EasyOCR to tackle this. YOLO11 represents the latest evolution in the famous "You Only Look Once" family of models, while EasyOCR is a robust tool for text recognition. This hybrid approach allows us to leverage the specific strengths of each model effectively.
Real-Time Tag data Extraction with Automated Excel Logging
- YOLO11 (The Detector): YOLO11 is engineered for speed and serves as the "eyes" of our system. Its primary job is object detection. It analyzes a chaotic input image—which might contain the shop floor, a user's hand, or the garment fabric—and filters out the noise. It draws precise bounding boxes around the MRP (Price) and the Size. We chose YOLO11 over older versions because of its superior ability to detect small objects. Price digits on a tag are often tiny, occupying less than 5% of the frame. YOLO11 can detect these micro-objects with high confidence, ensuring we don't miss a price tag just because the camera is a few inches away. Its efficiency means it can run in real-time on standard consumer hardware, making this solution accessible to small shop owners who lack powerful computers.
- EasyOCR (The Reader): While YOLO locates the box, it cannot understand what is inside. That is where EasyOCR comes in. EasyOCR acts as the "brain," interpreting pixels into alphanumeric text. Older OCR tools often struggle with artistic fonts or busy backgrounds, but EasyOCR uses deep learning to read text reliably. However, we do not run EasyOCR on the entire image. That would capture irrelevant noise like washing instructions. Instead, we feed it only the cropped regions provided by YOLO. This pipeline ensures maximum accuracy: YOLO provides the focus, and EasyOCR provides the reading comprehension.
Handling Real-World Challenges
Testing in a controlled lab is simple, but a real retail environment is messy. When we moved from static images to live webcam feeds, we faced several real world issues that required specific engineering solutions.
- Currency Symbols (The "Rupee" Problem): We encountered a persistent issue with currency symbols. Price tags in India often display ₹ 1299 or Rs. 1299. While a human easily distinguishes the symbol from the number, an OCR model often confuses the ₹ symbol with the number 2, or reads Rs. as 85. This caused significant data errors, logging a ₹ 1299 shirt as 21299. Since fonts vary too much to simply train the OCR to ignore them, we implemented a geometric solution called Surgical Image Preprocessing. Before the code reads the price, it applies a "Left Cut," physically slicing off the leftmost 20% of the image. This removes the symbol entirely, leaving only clean digits for the AI to read.
- Stylized Fonts & Brand Logos: Fashion brands use artistic fonts to stand out, which creates problems for AI. A "PEPE JEANS" logo might look like "PEPF JENS" to the model, or a "MAX" logo might appear as a graphic rather than text. If our database logs "PEPF JENS," it creates a corrupt entry. To fix this, we implemented Fuzzy Matching. This logic acts like an auto-correct system. It scans the tag and compares every word against a known database of brands. If the AI detects "ADIDY BIRLA," the system recognizes "ADITYA BIRLA" as a close match and automatically corrects the entry, keeping the database clean and standardized.
- Lighting & Blur: Some shops are bright, while others are dim. Additionally, handheld scanning introduces motion blur. We needed the model to be robust, so we included diverse lighting conditions in our training data. We also implemented 2x Cubic Upscaling. Before reading any text, we resize the tiny cropped image to double its size. This makes the small digits appear "fatter" and more distinct, giving the OCR engine more pixels to work with and reducing errors on low-quality webcams.
The Pipeline
We built a complete, end-to-end pipeline that transforms a raw video feed into a structured Excel database. It consists of four main stages.
The AI Tag OCR System Workflow
- Data Preparation and Annotation: This serves as the foundation. We collected hundreds of tag images featuring various brands, angles, and lighting conditions. We used Labellerr to draw precise bounding boxes around Brand, Size, and MRP, teaching the AI exactly what to look for.
- YOLO Model Training (Detection): We trained the YOLO11 model on this dataset using transfer learning. We started with a model that already understood basic shapes and then fine tuned it to detect price tags. This allowed the model to learn specific features quickly without needing millions of images.
- Surgical Cropping & OCR Optimization: This is the inference engine. The live video feed passes to YOLO, which detects the boxes. We crop and upscale these regions, apply the "Left Cut" to remove symbols, and then pass the clean image to EasyOCR. Finally, we use Regex to strip any remaining unwanted characters.
- Database Validation & Logging: The final step involves business logic. The system extracts the data (Brand: MAX, Size: L, Price: 999) and checks the database. If it detects a new item, it appends a row to the Excel file with a timestamp. If the item is a duplicate, it is ignored. This automation turns the physical action of scanning into a digital record instantly.
No AI Model Works Without High-Quality Data
This is especially true in retail, where tags come in infinite shapes, colors, and layouts. Backgrounds are noisy, and text is often tiny. An AI model is only as good as the data it learns from.
We utilized Labellerr for this critical step because it allowed us to annotate our image dataset with high precision. We drew tight bounding boxes for the YOLO training set, ensuring we distinguished clearly between MRP, Size, and Brand. Consistency is vital. Many tags contain multiple numbers, such as barcodes, style codes, and prices. If a human annotator accidentally labels a barcode as a "Price," the model will become confused and fail during deployment.
Using Labellerr’s intuitive interface, we ensured every label was accurate. We also captured tags in difficult conditions crumpled, shadowed, or angled to ensure the model wouldn't fail when a shopkeeper holds a tag loosely. We specifically included "negative samples," which are images with no tags, to teach the model what not to detect. Once annotated, the data was exported for training. This rigorous preparation enables our model to distinguish between a price of "999" and a style code of "99955" situated right next to it.
Validation and Results
The validation phase was the proving ground. We tested the system on unseen tags from brands the model had never encountered during training. We looked for specific issues like "flicker" where the bounding box appears and disappears rapidly and "hallucination," where the OCR reads text that isn't there.
YOLO11 showed impressive stability in locating the tiny text boxes, even when we rotated the tag. The integration of the "Left Cut" logic with EasyOCR proved to be a game changer, increasing our price reading accuracy from roughly 70% to over 98%. The fuzzy matching logic was equally successful, correcting 9 out of 10 brand name typos.
The final output is incredibly valuable for business owners. It turns a physical pile of clothes into a structured Excel database instantly. It reveals stock levels in real-time and eliminates the human error of manual typing. For a small business owner, this means saving hours of labor every week and having a financial record they can trust.
Conclusion
This project demonstrates the immense potential of AI in retail automation. By moving from simple, fragile scripts to a fully optimized YOLO11 + OCR pipeline, we built a system that rivals expensive industrial barcode scanners using nothing but a standard webcam and Python. It highlights that the future of retail isn't just about collecting data, but about using intelligent vision systems to make that process effortless, accurate, and invisible.
Frequently Asked Questions
What models are used to detect and read the text on the price tags?
The system uses a two-step approach: a YOLO (You Only Look Once) model is trained to detect the specific regions of the tag (like the price or size box), and an OCR model (such as EasyOCR) is used to recognize and extract the text from those cropped regions.
How does the system handle data storage after scanning a tag?
Once the text is extracted and structured by the Python logic script, it is automatically pushed to an Excel file. This allows for real-time inventory updates without manual data entry.
What is the role of Labellerr in this workflow?
Labellerr is utilized in Phase 1 (Data & Model Preparation). It is the platform used to annotate the collected dataset of brand tags, which is essential for fine-tuning the custom YOLO model to accurately identify tag regions.