Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images

TPAMI 2024

Xiaoxiao Long*1, Yuhang Zheng*2, Yupeng Zheng2, Beiwen Tian2, Cheng Lin3,
Lingjie Liu4, Hao Zhao2, Guyue Zhou2, Wenping Wang5,

(*indicates equal contribution, indicates corresponding author)

1 HKU, 2 Tsinghua AIR, 3 Tencent Games, 4 UPenn, 5 Texas A&M


We introduce a novel approach to learn geometries such as depth and surface normal from images while incorporating geometric context. The difficulty of reliably capturing geometric context in existing methods impedes their ability to accurately enforce the consistency between the different geometric properties, thereby leading to a bottleneck of geometric estimation quality. We therefore propose the Adaptive Surface Normal (ASN) constraint, a simple yet efficient method. Our approach extracts geometric context that encodes the geometric variations present in the input image and correlates depth estimation with geometric constraints. By dynamically determining reliable local geometry from randomly sampled candidates, we establish a surface normal constraint, where the validity of these candidates is evaluated using the geometric context. Furthermore, our normal estimation leverages the geometric context to prioritize regions that exhibit significant geometric variations, which makes the predicted normals accurately capture intricate and detailed geometric information. Through the integration of geometric context, our method unifies depth and surface normal estimations within a cohesive framework, which enables the generation of high-quality 3D geometry from images. We validate the superiority of our approach over state-of-the-art methods through extensive evaluations and comparisons on diverse indoor and outdoor datasets, showcasing its efficiency and robustness.


  • We propose a unified scheme to jointly estimate depth and surface normal in the guidance of geometric context.
  • We propose a context guided normal estimation network to improve the normal prediction for capturing rich geometric details.
  • We prove that the ASN module is effective on both transformer backbone and CNN backbone.
  • From both qualitative and quantitative perspectives, our results are significantly better than our conference version ASN, especially on 3D metrics.
  • We show that our method not only works well in indoor scenes but also proves robustness in outdoor scenes.

Pipeline of ASN++


Overview of our method. Taking a single image as input, our model produces depth maps, geometric context, and surface normal maps from three decoders, respectively. We recover surface normal from the predicted depth map with our proposed Adaptive Surface Normal (ASN) computation method. The similarity kernels computed from geometric context enable our surface normal calculation to be aware of local geometry, like shape boundaries and corners. Furthermore, geometric context encodes the rich geometric variances which the predicted surface normal usually struggles to capture. Thus, we design an approach, using the geometric context to guide the surface normal estimation. Finally, pixel-wise depth/normal supervision is enforced on the predicted depth/normal, while the geometric supervision is enforced on the recovered surface normal.

Outdoor Dataset


Qualitative Results

NYUD-V2 Dataset (indoor)


MVS-SYNTH & SEVERS Dataset (outdoor)



KIITI Seq07 & Beijing Rose Garden (Outdoor)

scene1 scene2

Tsinghua AIR (Indoor)



  title={Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images},
  author={Long, Xiaoxiao and Zheng, Yuhang and Zheng, Yupeng and Tian, Beiwen and Lin, Cheng and Liu, Lingjie and Zhao, Hao and Zhou, Guyue and Wang, Wenping},
  journal={arXiv preprint arXiv:2402.05869},