Shilong Liu Homepage


Hi! This is Shilong Liu, 刘世隆 in Chinese. I’m a ceil(now() - 2020.9)th year Ph.D. candidate at the Department of Computer Science and Technology, Tsinghua University, under the supervision of Prof. Lei Zhang, Prof. Hang Su, and Prof. Jun Zhu. I got my bachelor’s degree from the Department of Industrial Engineering, Tsinghua University in 2020.

I am an intern of computer vision at International Digital Economy Academy (IDEA), under the supervision of Prof. Lei Zhang.

I was a summer intern at Microsoft Research, Redmond in June to Septemeber 2023, under the supervision of Dr. Chunyuan Li and Dr. Hao Cheng. I have a close collaboration with Dr. Jianwei Yang as well.

My research interest includes computer vision, object detection, and multi-modal learning.

Contact me with my email: slongliu86 AT or liusl20 AT

I expect to graduate at June 2025. I am opening to both academic positions and industrial research positions. Download my CV.
Google Scholar GitHub Twitter Zhihu 知乎 my CV


May 17, 2024 We introduce Grounding DINO 1.5, which is our most powerful open-world object detection model series. View our blog and tech report for more details. Try our demo.
Feb 19, 2024 Invited talk at Rising Stars in AI Symposium 2024 at KAUST. I really enjoy the trip.
Dec 1, 2023 DINO and DAB-DETR are awarded as the most influential papers for ICLR 2023 and ICLR 2022, respectively. Mask DINO is selected as one of the most influential paper for CVPR 2023.
Nov 5, 2023 I was awarded as the CCF-CV Academic Emerging Scholar 2023 (CCF-CV 学术新锐学者, 3 people per year)! Thanks to the China Computer Federation.
Sep 29, 2023 Invited talks at Institute for Al Industry Research (AIR), Tsinghua University, HPC-AI Lab National University of Singapore, Gaoling School of Artificial Intelligence at Renmin University of China (RUC), and VALSE Student Webinar. View the slides here (Slides about detection, grounding, and large language models)
Mar 13, 2023 We release a strong open-set object detection model Grounding DINO that achieves the best results on open-set object detection tasks. It achieves 52.5 zero-shot AP on COCO detection, without any COCO training data! It achieves 63.0 AP on COCO after fine-tuning. Code and checkpoints will be available here.
Sep 22, 2022 We release a toolbox detrex that provides state-of-the-art Transformer-based detection algorithms. It includes DINO with better performance. Welcome to use it!

Selected Publications

  1. gd15.gif
    Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
    Tianhe Ren*, Qing Jiang*, Shilong Liu*, and 13 more authors
    arXiv:2405.10300, 2024
    Grounding DINO 1.5 Pro — our most capable model for open-set object detection.
  2. llava-plus-pv.png
    LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
    Shilong Liu, Hao Cheng, Haotian Liu, and 10 more authors
    arXiv:2311.05437, 2023
    Equip multimodal large language models with tools to create multimodal agents.
  3. groundingdino_pv.png
    Grounding DINO: Marrying dino with grounded pre-training for open-set object detection
    Shilong Liu, Zhaoyang Zeng, Tianhe Ren, and 8 more authors
    arXiv preprint arXiv:2303.05499, 2023
    SOTA open-set object detector. 52.5AP on COCO without COCO training data!
  4. dino_pv.png
    DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
    Hao Zhang*, Feng Li*, Shilong Liu*, and 5 more authors
    In International Conference on Learning Representations, 2023
    The first DETR-based object detector that achieved 1st on the COCO detection leaderboard.
  5. dabdetr_pv.png
    DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
    Shilong Liu, Feng Li, Hao Zhang, and 5 more authors
    In International Conference on Learning Representations, 2022
    A deep understanding of DETR’s query, and formulating queries as anchor boxes.
  6. q2l_pv.png
    Query2Label: A Simple Transformer Way to Multi-Label Classification
    Shilong Liu, Lei Zhang, Xiao Yang, and 2 more authors
    arXiv:2107.10834, 2021
    A novel transformer-based multi-label classification model, achieving SOTA on four benchmarks.