Shilong Liu Homepage

Hi! This is Shilong Liu (刘世隆). I am a Research Scientist at Bytedance Seed.

I obtained my PhD from the Department of Computer Science and Technology, Tsinghua University, under the supervision of Prof. Lei Zhang, Prof. Hang Su, and Prof. Jun Zhu. I got my bachelor’s degree from the Department of Industrial Engineering, Tsinghua University in 2020.

During most of my PhD years, I had the opportunity to intern at the International Digital Economy Academy (IDEA), under the supervision of Prof. Lei Zhang. I was an intern at Bytedance, NVIDIA, Shengshu-tech, and Microsoft Research, where I was fortunate to collaborate with talented researchers including Guilin Liu, Zhiding Yu, Chunyuan Li, Hao Cheng, and Jianwei Yang.

My previous work focused on computer vision, multi-modal learning, and LLM agents. Here are some of my representative works:

Visual Perception: We proposed a series of improvements on Detection Transformer including DAB-DETR, DN-DETR, DINO, MaskDINO, and Stable-DINO. Notably, DINO was the FIRST DETR-like model to achieve state-of-the-art performance on the COCO object detection leaderboard.
Open-world Visual Understanding/Multi-model learning: Our innovative models Grounding DINO, Grounded-SAM, and Grounding-DINO-1.5/1.6/DINO-X have made significant strides in this field. Grounding DINO has become one of the most popular open-world object detection models, enabling us to DETECT and SEGMENT ANYTHING.
LLM Agents: We proposed Alita, a generalist agent that ranked 1st in GAIA benchmarks, outperforming OpenAI Deep Research. We introduced LLaVA-Plus, which enhances multi-modal large language models with visual expertise, and Crab, a Python-based framework for building and benchmarking LLM agent environments. We also extended agents to specific fields like medical and history.

If you’re interested in related topics and would like to collaborate, feel free to reach out with my email: slongliu86 [AT] gmail.com. (Note: The email liusl20 [AT] mails.tsinghua.edu.cn will be deprecated; please use the Gmail address instead.)

Feel free to add me with my WeChat: SLONG_88 (please include a brief note about yourself when sending a request).

News

Nov 11, 2024	Invited talk at EECS 542, University of Michigan. [Slides]
Jul 22, 2024	Start my internship at NVIDIA, collabrating with Guilin Liu and Zhiding Yu. See you at the Bay Area, USA.
Jul 4, 2024	I was awarded as the 2024 WAIC Yunfan Award – Rising Star. [News]
Jul 3, 2024	6 papers are accepted by ECCV 2024. See their details: Grounding DINO LLaVA-Plus TAPTR: Tracking Any Point T-Rex2: Text-Visual Prompted Detector LLaVA-Grounding Semantic-SAM
May 17, 2024	We introduce Grounding DINO 1.5, which is our most powerful open-world object detection model series. View our blog and tech report for more details. Try our demo.
Feb 19, 2024	Invited talk at Rising Stars in AI Symposium 2024 at KAUST. I really enjoy the trip.
Dec 1, 2023	DINO and DAB-DETR are awarded as the most influential papers for ICLR 2023 and ICLR 2022, respectively. Mask DINO is selected as one of the most influential paper for CVPR 2023.
Nov 5, 2023	I was awarded as the CCF-CV Academic Emerging Scholar 2023 (CCF-CV 学术新锐学者, 3 people per year)! Thanks to the China Computer Federation.
Sep 29, 2023	Invited talks at Institute for Al Industry Research (AIR), Tsinghua University, HPC-AI Lab National University of Singapore, Gaoling School of Artificial Intelligence at Renmin University of China (RUC), and VALSE Student Webinar. View the slides here (Slides about detection, grounding, and large language models)
Mar 13, 2023	We release a strong open-set object detection model Grounding DINO that achieves the best results on open-set object detection tasks. It achieves 52.5 zero-shot AP on COCO detection, without any COCO training data! It achieves 63.0 AP on COCO after fine-tuning. Code and checkpoints will be available here.
Sep 22, 2022	We release a toolbox detrex that provides state-of-the-art Transformer-based detection algorithms. It includes DINO with better performance. Welcome to use it!

Selected Publications

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Tianhe Ren*, Qing Jiang*, Shilong Liu*, and 13 more authors

arXiv:2405.10300, 2024

Grounding DINO 1.5 Pro — our most capable model for open-set object detection.

Bib HTML

@article{ren2024grounding,
  title = {Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection},
  author = {Ren*, Tianhe and Jiang*, Qing and Liu*, Shilong and Zeng*, Zhaoyang and Liu, Wenlong and Gao, Han and Huang, Hongjie and Ma, Zhengyu and Jiang, Xiaoke and Chen, Yihao and Xiong, Yuda and Zhang, Hao and Li, Feng and Tang, Peijun and Yu, Kent and Zhang, Lei},
  year = {2024},
  journal = {arXiv:2405.10300},
  eprint = {2405.10300},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
}

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Shilong Liu, Hao Cheng, Haotian Liu, and 10 more authors

To be shown in ECCV, 2024

Equip multimodal large language models with tools to create multimodal agents.

Bib HTML Code

@article{liu2023grounding,
  title = {LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents},
  author = {Liu, Shilong and Cheng, Hao and Liu, Haotian and Zhang, Hao and Li, Feng and Ren, Tianhe and Zou, Xueyan and Yang, Jianwei and Su, Hang and Zhu, Jun and Zhang, Lei and Gao, Jianfeng and Li, Chunyuan},
  journal = {To be shown in ECCV},
  year = {2024},
  codebadge = {https://img.shields.io/github/stars/LLaVA-VL/LLaVA-Plus-Codebase},
}

Grounding DINO: Marrying dino with grounded pre-training for open-set object detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, and 8 more authors

To be shown in ECCV, 2023

SOTA open-set object detector. 52.5AP on COCO without COCO training data!

Bib Code

@article{liu2023groundinh,
  title = {Grounding {DINO}: Marrying dino with grounded pre-training for open-set object detection},
  author = {Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others},
  journal = {To be shown in ECCV},
  year = {2023},
  codebadge = {https://img.shields.io/github/stars/IDEA-Research/GroundingDINO,https://img.shields.io/github/stars/IDEA-Research/Grounded-Segment-Anything}
}

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Hao Zhang*, Feng Li*, Shilong Liu*, and 5 more authors

In International Conference on Learning Representations, 2023

The first DETR-based object detector that achieved 1st on the COCO detection leaderboard.

arXiv Bib Code

@inproceedings{zhang2022dino,
  title = {DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection},
  author = {Zhang*, Hao and Li*, Feng and Liu*, Shilong and Zhang, Lei and Su, Hang and Zhu, Jun and Ni, Lionel M. and Shum, Heung-Yeung},
  booktitle = {International Conference on Learning Representations},
  year = {2023},
  codebadge = {https://img.shields.io/github/stars/IDEA-Research/DINO}
}

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

Shilong Liu, Feng Li, Hao Zhang, and 5 more authors

In International Conference on Learning Representations, 2022

A deep understanding of DETR’s query, and formulating queries as anchor boxes.

arXiv Bib Code

@inproceedings{liu2022dabdetr,
  title = {{DAB}-{DETR}: Dynamic Anchor Boxes are Better Queries for {DETR}},
  author = {Liu, Shilong and Li, Feng and Zhang, Hao and Yang, Xiao and Qi, Xianbiao and Su, Hang and Zhu, Jun and Zhang, Lei},
  booktitle = {International Conference on Learning Representations},
  year = {2022},
  url = {https://openreview.net/forum?id=oMI9PjOb9Jl},
  codebadge = {https://img.shields.io/github/stars/IDEA-Research/DAB-DETR}
}

Query2Label: A Simple Transformer Way to Multi-Label Classification

Shilong Liu, Lei Zhang, Xiao Yang, and 2 more authors

arXiv:2107.10834, 2021

A novel transformer-based multi-label classification model, achieving SOTA on four benchmarks.

arXiv Bib Code

@article{liu2021query2label,
  title = {Query2Label: A Simple Transformer Way to Multi-Label Classification},
  author = {Liu, Shilong and Zhang, Lei and Yang, Xiao and Su, Hang and Zhu, Jun},
  year = {2021},
  journal = {arXiv:2107.10834},
  eprint = {2107.10834},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
  codebadge = {https://img.shields.io/github/stars/SlongLiu/query2labels}
}