Copart: Contextual Part Latents for 3D Generation

ICCV 2025

¹HKUST ²CUHK ³SenseTime Research

^*Equal contribution ^†Corresponding author

PartVerse Dataset

We are pleased to release the first large-scale 3D object part dataset PartVerse that has been manually annotated.

We follow the pipeline of "raw data - mesh segment algorithm - human post correction" to produce part-level data.

Part-level text captions are provided, including appearance, shape, and the relationship between parts and the whole.

Abstract

To generate 3D objects, early research focused on multi-view-driven approaches relying solely on 2D renderings. Recently, the 3D native latent diffusion paradigm has demonstrated superior performance in 3D generation, because it fully leverages the geometric information provided in ground truth 3D data. Despite its fast development, 3D diffusion still faces three challenges. First, the majority of these methods represent a 3D object by one single latent, regardless of its complexity. This may lead to detail loss when generating 3D objects with multiple complicated parts. Second, most 3D assets are designed parts by parts, yet the current holistic latent representation overlooks the independence of these parts and their interrelationships, limiting the model's generative ability. Third, current methods rely on global conditions (e.g., text, image, point cloud) to control the generation process, lacking detailed controllability. Therefore, motivated by how 3D designers create a 3D object, we present a new part-based 3D generation framework, CoPart, which represents a 3D object with multiple contextual part latents and simultaneously generates coherent 3D parts. This part-based framework has several advantages, including: i) reduces the encoding burden of intricate objects by decomposing them into simpler parts, ii) facilitates part learning and part relationship modeling, and iii) naturally supports part-level control. To ensure the coherence of part latents and to harness the powerful priors from foundation models, we propose a novel mutual guidance strategy to fine-tune pre-trained diffusion models for joint part latent denoising. We provide part-level text captions for each part, describing its shape, appearance, and relationship with the whole object.

BibTeX

@article{dong2025copart, title={From One to More: Contextual Part Latents for 3D Generation}, author={Shaocong Dong, Lihe Ding, Xiao Chen, Yaokun Li, Yuxin WANG, Yucheng Wang, Qi WANG, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu}, booktitle={ICCV}, year={2025} }

CoPart

From One to More: Contextual Part Latents for 3D Generation

ICCV 2025

CoPart: high quality part-based 3D generation.

PartVerse Dataset

Abstract

Method Overview

BibTeX