P-MapNet: Far-seeing Map ConstructorEnhanced by
both SDMap and HDMap Priors

1 Beijing Institute of Technology     2 Institute for AI Industry Research (AIR), Tsinghua University    
3 Beihang University     4 Tsinghua University
Under Review

*Indicates Equal Contribution

Abstract

Autonomous vehicles are gradually entering city roads today, with the help of high-definition maps (HDMaps). However, the reliance on HDMaps prevents autonomous vehicles from stepping into regions without this expensive digital infrastructure. This fact drives many researchers to study online HDMap construction algorithms, but the performance of these algorithms at far regions is still unsatisfying. We present P-MapNet, in which the letter P highlights the fact that we focus on incorporating map priors to improve model performance. Specifically, we exploit priors in both SDMap and HDMap. On one hand, we extract weakly aligned SDMap from OpenStreetMap, and encode it as an additional conditioning branch. Despite the misalignment challenge, our attention-based architecture adaptively attends to relevant SDMap skeletons and significantly improves performance. On the other hand, we exploit a masked autoencoder to capture the prior distribution of HDMap, which can serve as a refinement module to mitigate occlusions and artifacts. We benchmark on the nuScenes and Argoverse2 datasets. Through comprehensive experiments, we show that: (1) our SDMap prior can improve online map construction performance, using both rasterized (by up to +18.73 mIoU) and vectorized (by up to +8.50 mAP) output representations. (2) our HDMap prior can improve map perceptual metrics by up to 6.34%. (3) P-MapNet can be switched into different inference modes that covers different regions of the accuracy-efficiency trade-off landscape. (4) P-MapNet is a far-seeing solution that brings larger improvements on longer ranges.

Pipeline

"Better performances in more occluded and rotated scenes."

Experiment

nuscenes

Performance comparison of HDMapNet baseline and ours on the nuScenes val set. "S" indicates that our method utilizes only the SDMap priors, while "S+H" indicates the utilization of the both priors. "M" represents the Modality of our method and "Epoch" represents the number of refinement epochs.

nuscenes

Perceptual Metric of HDmap Prior. We utilizing the LPIPS metric to evaluate the realism of fusion model on $120m\times 60m$ perception range. And the improvements in the HDMap Prior Module are more significant compared to those in the SDMap Prior Module.

Visualization

We provide additional perceptual results under diverse weather conditions, and our method exhibits superior performance.
nuscenes

We conduct a comparative analysis within a range of 240x60m on nuScenes dataset and 120x60m on Argoverse2 dataset, utilizing C+L as input. In our notation, "S" indicates that our method utilizes only the SDMap priors, while "S+H" indicates the utilization of both. Our method consistently outperforms the baseline method under various weather conditions and in scenarios involving viewpoint occlusion.

BibTeX

@misc{jiang2024pmapnet,
        title={P-MapNet: Far-seeing Map Generator Enhanced by both SDMap and HDMap Priors}, 
        author={Zhou Jiang and Zhenxin Zhu and Pengfei Li and Huan-ang Gao and Tianyuan Yuan and Yongliang Shi and Hang Zhao and Hao Zhao},
        year={2024},
        eprint={2403.10521},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }