Recently many techniques have emerged to create high quality 3D assets and scenes. When it comes to editing of these objects, however, existing approaches are either slow, compromise on quality, or do not provide enough customization. We introduce a novel approach to quickly edit a 3D model from a single reference view. Our technique first segments the edit image, and then matches semantically corresponding regions across chosen segmented dataset views using DINO features. A color or texture change from a particular region of the edit image can then be applied to other views automatically in a semantically sensible manner. These edited views act as an updated dataset to further train and re-style the 3D scene. The end-result is therefore an edited 3D model. Our framework enables a wide variety of editing tasks such as manual local edits, correspondence based style transfer from any example image, and a combination of different styles from multiple example images. We use Gaussian Splats as our primary 3D representation due to their speed and ease of local editing, but our technique works for other methods such as NeRFs as well. We show through multiple examples that our method produces higher quality results while offering fine grained control of editing.
Image conditional editing is shown above, where an example image is segmented, and its parts are matched with the parts of each of the chosen and segmented dataset views. After our DINO based mask matching algorithm assigns these pairings, color and/or texture can be transferred among those matching parts. Alternatively, users can choose colors or textures to apply, as demonstrated in the user workflow below.
We show correspondence based editing on the left two examples where we can restyle the drumset to match the single drum's color scheme, and give the plant fall colors from the fall tree. Targeted editing is shown on the right two examples where sand texture is applied to the water and blue marble texture is applied to the plate.
Showing a variety of edits on mics and chairs. In the top row we have the original mic and chair. In the mid-left, we turn the mic green, in the bottom-left, we apply the texture of starry night to it. In the mid-right, we turn the chair back to a wood, and in the bottom-right we use correspondence to automatically style the chair with the appearance of the throne.
Applying the blue color to the truck and grass texture to the dining table, with both combined at the bottom.
Applying various colors to the tablecloth.
Turning grass in the bicycle scene to ice and snow.
Making the vegetation fall colors.
Comparing color editing across our method, Distilled Feature Fields, and CLIP-NeRF.
Comparing texture editing across our method, Blended-NeRF, and Vox-E.
Applying ice texture to the helmet, gold foil to the coffee cup, and toasted texture to the bread.
Turning the sidewalk light blue without affecting the street.
Turning just the yellow car red.
@misc{jaganathan2024iceg,
title={ICE-G: Image Conditional Editing of 3D Gaussian Splats},
author={Vishnu Jaganathan and Hannah Hanyun Huang and Muhammad Zubair Irshad and Varun Jampani and Amit Raj and Zsolt Kira},
year={2024},
eprint={2406.08488},
archivePrefix={arXiv}
}