Visual intelligence requires the ability to understand and detect novel concepts. In the recent CVPR 2023, which occurred in June, a paper was published introducing Grounding DINO. The authors aim to develop a robust system capable of detecting arbitrary objects specified through human language inputs.
They refer to this task