Learning Naturally Aggregated Appearance for Efficient 3D Editing

1HKUST2Ant Group3Stanford

Efficient 3D Editing with the learned canonical image and projection field. Slide for comparison.
The novel view videos without editing is provided on the left.

Abstract

Neural radiance fields, which represent a 3D scene as a color field and a density field, have demonstrated great progress in novel view synthesis yet are unfavorable for editing due to the implicitness. This work studies the task of efficient 3D editing, where we focus on editing speed and user interactivity. To this end, we propose to learn the color field as an explicit 2D appearance aggregation, also called canonical image, with which users can easily customize their 3D editing via 2D image processing. We complement the canonical image with a projection field that maps 3D points onto 2D pixels for texture query. This field is initialized with a pseudo canonical camera model and optimized with offset regularity to ensure the naturalness of the canonical image. Extensive experiments on different datasets suggest that our representation, dubbed AGAP, well supports various ways of 3D editing (e.g., stylization, instance segmentation, and interactive drawing). Our approach demonstrates remarkable efficiency by being at least 20 times faster per edit compared to existing NeRF-based editing methods.

Method

AGAP Pipeline

The overview of our method. AGAP consists of two components: (1) an explicit 3D density grid ΦG to estimate geometry for density σ; (2) an explicit canonical image ΦI with an associated view-dependent projection field P to aggregate appearance for color c. By performing 2D image processing on the canonical image, our method enables various editing (e.g., instance segmentation, interactive drawing, and scene stylization) through volume rendering without the need for re-optimization.

3D Scene Editing

3D Scene Editing Teaser

Scene Stylization

Instance Segmentation

Texture Editing

BibTeX

@inproceedings{cheng2025learning,
    title     = {Learning Naturally Aggregated Appearance for Efficient 3D Editing}, 
    author    = {Ka Leong Cheng and Qiuyu Wang and Zifan Shi and Kecheng Zheng and Yinghao Xu and Hao Ouyang and Qifeng Chen and Yujun Shen},
    booktitle = {Proceedings of the International Conference on 3D Vision},
    year      = {2025},
    pages     = {},
}