akash rawal - Labellerr AI

akash rawal

AI Engineer Intern at Labelerr AI | Pre-Final Year Engineering Student Passionate about building real-world AI applications at the intersection of engineering and machine learning.

Google Gemini 3.1 Pro Review and Analysis

Gemini 3.1 Pro is Google’s most advanced reasoning model yet, built for deep agentic workflows, large-scale code generation, and multimodal tasks. With 65K output tokens and major benchmark gains, it shifts AI from conversation to autonomous execution.

Benchmarking SAM and SAM 3 on Aerial Data

Compare SAM and SAM 3 for aerial image segmentation. See zero-shot benchmark results across satellite datasets, performance differences, and how to use both models inside Labellerr for faster geospatial annotation workflows.

humanoid robot learning

VideoMimic: How Robots Learn Human Motion

VideoMimic turns monocular human videos into deployable humanoid robot policies. By combining 4D reconstruction, scene geometry, and reinforcement learning, it enables context-aware robot control without motion capture or handcrafted rewards.

Spatial Reasoning

How Think3D Gives Vision Models a Real Sense of Space

Think3D enables AI models to reason directly in 3D space instead of flat images. By combining 3D reconstruction, camera geometry, and reinforcement learning, it transforms how vision-language models understand depth, occlusion, and viewpoint change.

egocentric video generation

EgoControl: Controllable First Person Video Simulation

EgoControl reframes egocentric video generation as embodied simulation. By conditioning diffusion models on future 3D full-body poses, it enables controllable, physically grounded first-person video prediction aligned with intended human motion.

Why SemanticGen Is a Leap for Long-Form Video AI

SemanticGen redefines video generation by separating semantic planning from pixel synthesis. Using a two-stage diffusion process, it enables long-form, coherent videos while avoiding the computational limits of traditional diffusion models.

Genie 3 Doesn't Make Videos, It Builds Worlds

Genie 3 by Google DeepMind is a real-time 3D world model that creates interactive, persistent environments. It enables scalable egocentric data for robotics training, helping embodied AI learn navigation, perception, and long-horizon reasoning.

NeoVerse 4D World Model: Escaping the 4D Data Bottleneck

NeoVerse is a scalable 4D world model that reconstructs dynamic scenes directly from in-the-wild monocular videos. Using a pose-free, feed-forward design, it eliminates multi-view capture and heavy preprocessing while enabling fast, high-quality 4D reconstruction and video generation.

egocentric datasets

How EgoX Converts Third-Person to First-Person Video

EgoX transforms a single third-person video into a realistic first-person experience by grounding video diffusion models in 3D geometry, enabling accurate egocentric perception without extra sensors or ground-truth data.

Generate Video and Audio Together with LTX-2

LTX-2 is the first open-source model that generates synchronized audio and video together using a joint diffusion process, enabling realistic speech, sound effects, and motion alignment in a single system.

Small object detection using SAHI

computer vision

Small Object Detection using YOLO with SAHI Explained

Small object detection often fails with standard YOLO inference due to image resizing. This blog shows how Slicing Aided Hyper Inference (SAHI) improves recall by breaking images into slices and recovering missed objects.

Omni-Bodied Robot Brain

robot brain architecture

Omni-Bodied Robot Brain: How One Brain Controls Many Robots

Omni-bodied robot brains separate intelligence from hardware, enabling robots to share skills, adapt across bodies, and scale faster using foundation models, simulation, and shared data.

Synthetic Training Data

Synthetic Training Data

The Truth About Synthetic Robot Data

Synthetic training data enables robots to learn perception, motion, and interaction at scale. Generated in simulation, it offers low-cost labeling, safe edge-case testing, and faster development while addressing real-world data scarcity.

Robot Teleoperation

Teleoperation Datasets

Teleoperation Datasets: The Fuel for Robot Learning

Teleoperation datasets capture real robot behavior through human control. They provide high-quality demonstrations that help robots learn manipulation, navigation, and coordination in real-world environments.

Bottle Cap Inspection

computer vision

End-to-End AI-Based Bottle Cap Quality Inspection System

Learn how to build an AI-powered bottle cap inspection system using computer vision. Detect missing caps in real time, reduce defects, and improve quality control on high-speed production lines.

Egocentric Vision

How Egocentric Data Fixes Robot Perception

Egocentric datasets train robots using first-person vision, aligning perception with action. By capturing real hand–object interactions, they reduce perception–action mismatch and enable more reliable robot manipulation and learning.

The data Stack Behind Autonomous Robots

Why Data, Not Models, Is the Real Bottleneck in Robotics

Robots learn from data, not rules. This blog explains egocentric, teleoperation, simulation, and multimodal robotics datasets, why data quality matters, and how accurate labeling enables reliable real-world robot deployment.