README.md 14.1 KB
Newer Older
Lihe Yang's avatar
Lihe Yang 已提交
1
2
3
<div align="center">
<h2>Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data</h2>

Lihe Yang's avatar
Lihe Yang 已提交
4
[**Lihe Yang**](https://liheyoung.github.io/)<sup>1</sup> · [**Bingyi Kang**](https://scholar.google.com/citations?user=NmHgX-wAAAAJ)<sup>2&dagger;</sup> · [**Zilong Huang**](http://speedinghzl.github.io/)<sup>2</sup> · [**Xiaogang Xu**](https://xiaogang00.github.io/)<sup>3,4</sup> · [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)<sup>2</sup> · [**Hengshuang Zhao**](https://hszhao.github.io/)<sup>1*</sup>
Lihe Yang's avatar
Lihe Yang 已提交
5

Lihe Yang's avatar
Lihe Yang 已提交
6
<sup>1</sup>HKU&emsp;&emsp;&emsp;&emsp;<sup>2</sup>TikTok&emsp;&emsp;&emsp;&emsp;<sup>3</sup>CUHK&emsp;&emsp;&emsp;&emsp;<sup>4</sup>ZJU
Lihe Yang's avatar
Lihe Yang 已提交
7

Lihe Yang's avatar
Lihe Yang 已提交
8
&dagger;project lead&emsp;*corresponding author
Lihe Yang's avatar
Lihe Yang 已提交
9

Lihe Yang's avatar
Lihe Yang 已提交
10
11
**CVPR 2024**

Lihe Yang's avatar
Lihe Yang 已提交
12
<a href="https://arxiv.org/abs/2401.10891"><img src='https://img.shields.io/badge/arXiv-Depth Anything-red' alt='Paper PDF'></a>
Lihe Yang's avatar
Lihe Yang 已提交
13
14
<a href='https://depth-anything.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything-green' alt='Project Page'></a>
<a href='https://huggingface.co/spaces/LiheYoung/Depth-Anything'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
Lihe Yang's avatar
Lihe Yang 已提交
15
<a href='https://huggingface.co/papers/2401.10891'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Paper-yellow'></a>
Lihe Yang's avatar
Lihe Yang 已提交
16
17
18
19
20
21
22
23
</div>

This work presents Depth Anything, a highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and **62M+ unlabeled images**.

![teaser](assets/teaser.png)

## News

24
* **2024-02-27:** Depth Anything is accepted by CVPR 2024.
Lihe Yang's avatar
Lihe Yang 已提交
25
* **2024-02-05:** [Depth Anything Gallery](./gallery.md) is released. Thank all the users!
Lihe Yang's avatar
Lihe Yang 已提交
26
* **2024-02-02:** Depth Anything serves as the default depth processor for [InstantID](https://github.com/InstantID/InstantID) and [InvokeAI](https://github.com/invoke-ai/InvokeAI/releases/tag/v3.6.1).
27
* **2024-01-25:** Support [video depth visualization](./run_video.py). An [online demo for video](https://huggingface.co/spaces/JohanDL/Depth-Anything-Video) is also available.
28
29
30
* **2024-01-23:** The new ControlNet based on Depth Anything is integrated into [ControlNet WebUI](https://github.com/Mikubill/sd-webui-controlnet) and [ComfyUI's ControlNet](https://github.com/Fannovel16/comfyui_controlnet_aux).
* **2024-01-23:** Depth Anything [ONNX](https://github.com/fabio-sim/Depth-Anything-ONNX) and [TensorRT](https://github.com/spacewalk01/depth-anything-tensorrt) versions are supported.
* **2024-01-22:** Paper, project page, code, models, and demo ([HuggingFace](https://huggingface.co/spaces/LiheYoung/Depth-Anything), [OpenXLab](https://openxlab.org.cn/apps/detail/yyfan/depth_anything)) are released.
Lihe Yang's avatar
Lihe Yang 已提交
31
32
33
34


## Features of Depth Anything

35
36
***If you need other features, please first check [existing community supports](#community-support).***

Lihe Yang's avatar
Lihe Yang 已提交
37
38
39
40
41
42
43
44
45
46
47
- **Relative depth estimation**:
    
    Our foundation models listed [here](https://huggingface.co/spaces/LiheYoung/Depth-Anything/tree/main/checkpoints) can provide relative depth estimation for any given image robustly. Please refer [here](#running) for details.

- **Metric depth estimation**

    We fine-tune our Depth Anything model with metric depth information from NYUv2 or KITTI. It offers strong capabilities of both in-domain and zero-shot metric depth estimation. Please refer [here](./metric_depth) for details.


- **Better depth-conditioned ControlNet**

Lihe Yang's avatar
Lihe Yang 已提交
48
    We re-train **a better depth-conditioned ControlNet** based on Depth Anything. It offers more precise synthesis than the previous MiDaS-based ControlNet. Please refer [here](./controlnet/) for details. You can also use our new ControlNet based on Depth Anything in [ControlNet WebUI](https://github.com/Mikubill/sd-webui-controlnet) or [ComfyUI's ControlNet](https://github.com/Fannovel16/comfyui_controlnet_aux).
Lihe Yang's avatar
Lihe Yang 已提交
49
50
51
52
53
54
55
56
57
58

- **Downstream high-level scene understanding**

    The Depth Anything encoder can be fine-tuned to downstream high-level perception tasks, *e.g.*, semantic segmentation, 86.2 mIoU on Cityscapes and 59.4 mIoU on ADE20K. Please refer [here](./semseg/) for details.


## Performance

Here we compare our Depth Anything with the previously best MiDaS v3.1 BEiT<sub>L-512</sub> model.

Lihe Yang's avatar
Lihe Yang 已提交
59
Please note that the latest MiDaS is also trained on KITTI and NYUv2, while we do not.
Lihe Yang's avatar
Lihe Yang 已提交
60
61
62
63
64
65
66
67
68
69
70
71
72

| Method | Params | KITTI || NYUv2 || Sintel || DDAD || ETH3D || DIODE ||
|-|-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
| | | AbsRel | $\delta_1$ | AbsRel | $\delta_1$ | AbsRel | $\delta_1$ | AbsRel | $\delta_1$ | AbsRel | $\delta_1$ | AbsRel | $\delta_1$ |
| MiDaS | 345.0M | 0.127 | 0.850 | 0.048 | *0.980* | 0.587 | 0.699 | 0.251 | 0.766 | 0.139 | 0.867 | 0.075 | 0.942 | 
| **Ours-S** | 24.8M | 0.080 | 0.936 | 0.053 | 0.972 | 0.464 | 0.739 | 0.247 | 0.768 | 0.127 | **0.885** | 0.076 | 0.939 |
| **Ours-B** | 97.5M | *0.080* | *0.939* | *0.046* | 0.979 | **0.432** | *0.756* | *0.232* | *0.786* | **0.126** | *0.884* | *0.069* | *0.946* |
| **Ours-L** | 335.3M | **0.076** | **0.947** | **0.043** | **0.981** | *0.458* | **0.760** | **0.230** | **0.789** | *0.127* | 0.882 | **0.066** | **0.952** |

We highlight the **best** and *second best* results in **bold** and *italic* respectively (**better results**: AbsRel $\downarrow$ , $\delta_1 \uparrow$).

## Pre-trained models

73
We provide three models of varying scales for robust relative depth estimation:
Lihe Yang's avatar
Lihe Yang 已提交
74

Lihe Yang's avatar
Lihe Yang 已提交
75
| Model | Params | Inference Time on V100 (ms) | A100 | RTX4090 ([TensorRT](https://github.com/spacewalk01/depth-anything-tensorrt)) |
Lihe Yang's avatar
Lihe Yang 已提交
76
|:-|-:|:-:|:-:|:-:|
Lihe Yang's avatar
Lihe Yang 已提交
77
78
79
| Depth-Anything-Small | 24.8M | 12 | 8 | 3 |
| Depth-Anything-Base | 97.5M | 13 | 9 | 6 |
| Depth-Anything-Large | 335.3M | 20 | 13 | 12 |
Lihe Yang's avatar
Lihe Yang 已提交
80

81
Note that the V100 and A100 inference time (*without TensorRT*) is computed by excluding the pre-processing and post-processing stages, whereas the last column RTX4090 (*with TensorRT*) is computed by including these two stages (please refer to [Depth-Anything-TensorRT](https://github.com/spacewalk01/depth-anything-tensorrt)).
Lihe Yang's avatar
Lihe Yang 已提交
82

83
84
85
86
87
88
89
You can easily load our pre-trained models by:
```python
from depth_anything.dpt import DepthAnything

encoder = 'vits' # can also be 'vitb' or 'vitl'
depth_anything = DepthAnything.from_pretrained('LiheYoung/depth_anything_{:}14'.format(encoder))
```
Lihe Yang's avatar
Lihe Yang 已提交
90

Lihe Yang's avatar
Lihe Yang 已提交
91
Depth Anything is also supported in [``transformers``](https://github.com/huggingface/transformers). You can use it for depth prediction within [3 lines of code](https://huggingface.co/docs/transformers/main/model_doc/depth_anything) (credit to [@niels](https://huggingface.co/nielsr)).
Lihe Yang's avatar
Lihe Yang 已提交
92

93
### *No network connection, cannot load these models?*
Lihe Yang's avatar
Lihe Yang 已提交
94
95

<details>
Lihe Yang's avatar
Lihe Yang 已提交
96
<summary>Click here for solutions</summary>
Lihe Yang's avatar
Lihe Yang 已提交
97

98
- First, manually download the three checkpoints: [depth-anything-large](https://huggingface.co/spaces/LiheYoung/Depth-Anything/blob/main/checkpoints/depth_anything_vitl14.pth), [depth-anything-base](https://huggingface.co/spaces/LiheYoung/Depth-Anything/blob/main/checkpoints/depth_anything_vitb14.pth), and [depth-anything-small](https://huggingface.co/spaces/LiheYoung/Depth-Anything/blob/main/checkpoints/depth_anything_vits14.pth).
Lihe Yang's avatar
Lihe Yang 已提交
99

100
- Second, upload the folder containing the checkpoints to your remote server.
Lihe Yang's avatar
Lihe Yang 已提交
101

102
- Lastly, load the model locally:
Lihe Yang's avatar
Lihe Yang 已提交
103
```python
104
from depth_anything.dpt import DepthAnything
Lihe Yang's avatar
Lihe Yang 已提交
105

106
107
108
109
110
111
112
model_configs = {
    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]}
}

encoder = 'vitl' # or 'vitb', 'vits'
Lihe Yang's avatar
Lihe Yang 已提交
113
depth_anything = DepthAnything(model_configs[encoder])
114
115
116
depth_anything.load_state_dict(torch.load(f'./checkpoints/depth_anything_{encoder}14.pth'))
```
Note that in this locally loading manner, you also do not have to install the ``huggingface_hub`` package. In this way, please feel free to delete this [line](https://github.com/LiheYoung/Depth-Anything/blob/e7ef4b4b7a0afd8a05ce9564f04c1e5b68268516/depth_anything/dpt.py#L5) and the ``PyTorchModelHubMixin`` in this [line](https://github.com/LiheYoung/Depth-Anything/blob/e7ef4b4b7a0afd8a05ce9564f04c1e5b68268516/depth_anything/dpt.py#L169).
Lihe Yang's avatar
Lihe Yang 已提交
117
118
119
</details>


Lihe Yang's avatar
Lihe Yang 已提交
120
121
122
123
124
125
126
127
128
129
130
131
132
## Usage 

### Installation

```bash
git clone https://github.com/LiheYoung/Depth-Anything
cd Depth-Anything
pip install -r requirements.txt
```

### Running

```bash
Lihe Yang's avatar
Lihe Yang 已提交
133
python run.py --encoder <vits | vitb | vitl> --img-path <img-directory | single-img | txt-file> --outdir <outdir> [--pred-only] [--grayscale]
Lihe Yang's avatar
Lihe Yang 已提交
134
```
Lihe Yang's avatar
Lihe Yang 已提交
135
136
137
138
Arguments:
- ``--img-path``: you can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.
- ``--pred-only`` is set to save the predicted depth map only. Without it, by default, we visualize both image and its depth map side by side.
- ``--grayscale`` is set to save the grayscale depth map. Without it, by default, we apply a color palette to the depth map.
Lihe Yang's avatar
Lihe Yang 已提交
139
140
141

For example:
```bash
142
python run.py --encoder vitl --img-path assets/examples --outdir depth_vis
Lihe Yang's avatar
Lihe Yang 已提交
143
144
```

145
146
147
148
**If you want to use Depth Anything on videos:**
```bash
python run_video.py --encoder vitl --video-path assets/examples_video --outdir video_depth_vis
```
Lihe Yang's avatar
Lihe Yang 已提交
149

Lihe Yang's avatar
Lihe Yang 已提交
150
### Gradio demo <a href='https://github.com/gradio-app/gradio'><img src='https://img.shields.io/github/stars/gradio-app/gradio'></a> 
Lihe Yang's avatar
Lihe Yang 已提交
151
152
153
154
155
156
157
158
159

To use our gradio demo locally:

```bash
python app.py
```

You can also try our [online demo](https://huggingface.co/spaces/LiheYoung/Depth-Anything).

Lihe Yang's avatar
Lihe Yang 已提交
160
161
### Import Depth Anything to your project

Lihe Yang's avatar
Lihe Yang 已提交
162
If you want to use Depth Anything in your own project, you can simply follow [``run.py``](run.py) to load our models and define data pre-processing. 
Lihe Yang's avatar
Lihe Yang 已提交
163
164

<details>
Lihe Yang's avatar
Lihe Yang 已提交
165
<summary>Code snippet (note the difference between our data pre-processing and that of MiDaS)</summary>
Lihe Yang's avatar
Lihe Yang 已提交
166
167

```python
168
from depth_anything.dpt import DepthAnything
Lihe Yang's avatar
Lihe Yang 已提交
169
170
171
172
from depth_anything.util.transform import Resize, NormalizeImage, PrepareForNet

import cv2
import torch
Lihe Yang's avatar
Lihe Yang 已提交
173
from torchvision.transforms import Compose
Lihe Yang's avatar
Lihe Yang 已提交
174

175
encoder = 'vits' # can also be 'vitb' or 'vitl'
176
depth_anything = DepthAnything.from_pretrained('LiheYoung/depth_anything_{:}14'.format(encoder)).eval()
Lihe Yang's avatar
Lihe Yang 已提交
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

transform = Compose([
    Resize(
        width=518,
        height=518,
        resize_target=False,
        keep_aspect_ratio=True,
        ensure_multiple_of=14,
        resize_method='lower_bound',
        image_interpolation_method=cv2.INTER_CUBIC,
    ),
    NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    PrepareForNet(),
])

image = cv2.cvtColor(cv2.imread('your image path'), cv2.COLOR_BGR2RGB) / 255.0
image = transform({'image': image})['image']
image = torch.from_numpy(image).unsqueeze(0)

# depth shape: 1xHxW
depth = depth_anything(image)
```
</details>

Lihe Yang's avatar
Lihe Yang 已提交
201
### Do not want to define image pre-processing or download model definition files?
202

Lihe Yang's avatar
Lihe Yang 已提交
203
Easily use Depth Anything through [``transformers``](https://github.com/huggingface/transformers) within 3 lines of code! Please refer to [these instructions](https://huggingface.co/docs/transformers/main/model_doc/depth_anything) (credit to [@niels](https://huggingface.co/nielsr)).
204

205
**Note:** If you encounter ``KeyError: 'depth_anything'``, please install the latest [``transformers``](https://github.com/huggingface/transformers) from source:
Lihe Yang's avatar
Lihe Yang 已提交
206
207
208
```bash
pip install git+https://github.com/huggingface/transformers.git
```
209
210
211
212
213
214
215
216
217
218
219
220
221
<details>
<summary>Click here for a brief demo:</summary>

```python
from transformers import pipeline
from PIL import Image

image = Image.open('Your-image-path')
pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-small-hf")
depth = pipe(image)["depth"]
```
</details>

Lihe Yang's avatar
Lihe Yang 已提交
222
223
## Community Support

224
**We sincerely appreciate all the extensions built on our Depth Anything from the community. Thank you a lot!**
Lihe Yang's avatar
Lihe Yang 已提交
225
226

Here we list the extensions we have found:
227
228
229
230
- Depth Anything TensorRT: 
    - https://github.com/spacewalk01/depth-anything-tensorrt
    - https://github.com/thinvy/DepthAnythingTensorrtDeploy
    - https://github.com/daniel89710/trt-depth-anything
Lihe Yang's avatar
Lihe Yang 已提交
231
- Depth Anything ONNX: https://github.com/fabio-sim/Depth-Anything-ONNX
232
233
- Depth Anything in Transformers.js (3D visualization): https://huggingface.co/spaces/Xenova/depth-anything-web
- Depth Anything for video (online demo): https://huggingface.co/spaces/JohanDL/Depth-Anything-Video
Lihe Yang's avatar
Lihe Yang 已提交
234
- Depth Anything in ControlNet WebUI: https://github.com/Mikubill/sd-webui-controlnet
235
- Depth Anything in ComfyUI's ControlNet: https://github.com/Fannovel16/comfyui_controlnet_aux
Lihe Yang's avatar
Lihe Yang 已提交
236
- Depth Anything in X-AnyLabeling: https://github.com/CVHub520/X-AnyLabeling
237
- Depth Anything in OpenXLab: https://openxlab.org.cn/apps/detail/yyfan/depth_anything
Lihe Yang's avatar
Lihe Yang 已提交
238
- Depth Anything in OpenVINO: https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/280-depth-anything
Lihe Yang's avatar
Lihe Yang 已提交
239
240
241
242
243
244
245
246
- Depth Anything ROS:
    - https://github.com/scepter914/DepthAnything-ROS
    - https://github.com/polatztrk/depth_anything_ros
- Depth Anything Android:
    - https://github.com/FeiGeChuanShu/ncnn-android-depth_anything
    - https://github.com/shubham0204/Depth-Anything-Android
- Depth Anything in TouchDesigner: https://github.com/olegchomp/TDDepthAnything
- LearnOpenCV research article on Depth Anything: https://learnopencv.com/depth-anything
Lihe Yang's avatar
Lihe Yang 已提交
247
- Learn more about the DPT architecture we used: https://github.com/heyoeyo/muggled_dpt
Lihe Yang's avatar
Lihe Yang 已提交
248
     
Lihe Yang's avatar
Lihe Yang 已提交
249
250

If you have your amazing projects supporting or improving (*e.g.*, speed) Depth Anything, please feel free to drop an issue. We will add them here.
Lihe Yang's avatar
Lihe Yang 已提交
251

252
253
254
255
256

## Acknowledgement

We would like to express our deepest gratitude to [AK(@_akhaliq)](https://twitter.com/_akhaliq) and the awesome HuggingFace team ([@niels](https://huggingface.co/nielsr), [@hysts](https://huggingface.co/hysts), and [@yuvraj](https://huggingface.co/ysharma)) for helping improve the online demo and build the HF models.

Lihe Yang's avatar
Lihe Yang 已提交
257
258
Besides, we thank the [MagicEdit](https://magic-edit.github.io/) team for providing some video examples for video depth estimation, and [Tiancheng Shen](https://scholar.google.com/citations?user=iRY1YVoAAAAJ) for evaluating the depth maps with MagicEdit.

Lihe Yang's avatar
Lihe Yang 已提交
259
260
261
262
263
## Citation

If you find this project useful, please consider citing:

```bibtex
264
@inproceedings{depthanything,
Lihe Yang's avatar
Lihe Yang 已提交
265
266
      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
      author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
267
      booktitle={CVPR},
Lihe Yang's avatar
Lihe Yang 已提交
268
      year={2024}
Lihe Yang's avatar
Lihe Yang 已提交
269
}
Lihe Yang's avatar
Lihe Yang 已提交
270
```