【读论文】Gaussian Grouping: Segment and Edit Anything in 3D Scenes

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

文章目录

  • Gaussian Grouping: Segment and Edit Anything in 3D Scenes
    • 1. What
    • 2. Why
    • 3. How
      • 3.1 Anything Mask Input and Consistency
      • 3.2 3D Gaussian Rendering and Grouping
      • 3.3 Downstream: Local Gaussian Editing

1. What

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

The first 3D Gaussian-based approach to jointly reconstruct and segment anything in the open-world 3D scene.
Each Gaussian with a compact Identity Encoding, supervised by 2D masks by SAM along with introduced 3D spatial consistency regularization, can also be further used for editing.

  • Explanation of Open-world

    An open-world scenario refers to an uncertain, dynamic and complex environment that contains a variety of objects, scenes and tasks.

    Or “open-world scene understanding” refers to the ability of a model to generalize to scenes or environments that it has not been explicitly trained on. In this context, the term “open-world” implies that the model needs to be able to adapt to and understand a wide range of scenes, including those that may be very different from the scenes in its training data.

2. Why

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

  • Existing methods [8, 37] rely on manually-labeled datasets or require accurately scanned 3D point clouds [33, 42] as input.
  • Existing NeRFs-based methods [14, 17, 25, 39] are computation-hungry and hard to adjust for the downstream task because the learned neural networks, such as MLPs, cannot decompose each part or module in the 3D scene easily
  • As for Radiance-based Open World Scene Understanding: Unlike our approach, most of these methods are designed for in-domain scene modeling and cannot generalize to open-world scenarios.

3. How

Following this pipeline, we will introduce it in details.

在这里插入图片描述

3.1 Anything Mask Input and Consistency

Shown in Figure 2(a), a set of multi-view captures along with the automatically generated 2D segmentations by SAM, as well as the corresponding cameras calibrated via SfM are inputs.

Shown in Figure 2(b), to assign each 2D mask a unique ID in the 3D scene, a well-trained zero-shot tracker [7] was used to propagate and associate masks. Use colors to represent different segmentation labels, and the results are shown in Figure 2(b)

3.2 3D Gaussian Rendering and Grouping

Shown in Figure 2©, all of the core concepts of this paper were used.

  1. Identity Encoding

    A new parameter, i.e., Identity Encoding is introduced to each Gaussian with original S Θ i = { p i , s i , q i , α i , c i } S_{\Theta_{i}}=\{\mathbf{p}_{i},\mathbf{s}_{i},\mathbf{q}_{i},\alpha_{i},\mathbf{c}_{i}\} SΘi={pi,si,qi,αi,ci}. It is a compact vector of length 16 and similar to Spherical Harmonic (SH) coefficients in representing color, it is differentiable and learnable.

  2. Grouping via Rendering

    In the process of rendering labels, similar to α \alpha α-blending:

    E id = ∑ i ∈ N e i α i ′ ∏ j = 1 i − 1 ( 1 − α j ′ ) , E_{\text{id}}=\sum_{i\in\mathcal{N}}e_i\alpha_i'\prod_{j=1}^{i-1}(1-\alpha_j'), Eid=iNeiαij=1i1(1αj),

    but the denotations are different. e i e_i ei is the Identity Encoding of length 16 for each Gaussian and α i ′ \alpha_i' αi is a new weight, calculated by multiplying opacity α i \alpha_i αi and Σ 2 D \Sigma^{2\mathrm{D}} Σ2D, where Σ 2 D = J W Σ 3 D W T J T \Sigma^{2\mathrm{D}}=JW\Sigma^{3\mathrm{D}}W^TJ^T Σ2D=JWΣ3DWTJT according to [61].

  3. Grouping Loss

    • 2D Identity Loss: Given the rendered 2D features E i d E_{id} Eid before as input, first add a linear layer f f f to recover its feature dimension back to K+1 and then take s o f t m a x ( f ( E i d ) ) softmax (f(Eid)) softmax(f(Eid)) for identity classification. And cross-entropy loss was used.

    • 3D Regularization Loss:

      3D Regularization Loss leverages the 3D spatial consistency, which enforces the Identity Encodings of the top k-nearest 3D Gaussians to be close in their feature distance.

      L 3 d = 1 m ∑ j = 1 m D k l ( P ∥ Q ) = 1 m k ∑ j = 1 m ∑ i = 1 k F ( e j ) log ⁡ ( F ( e j ) F ( e i ′ ) ) \mathcal{L}_{\mathrm{3d}}=\frac{1}{m}\sum_{j=1}^{m}D_{\mathrm{kl}}(P\|Q)=\frac{1}{mk}\sum_{j=1}^{m}\sum_{i=1}^{k}F(e_{j})\log\left(\frac{F(e_{j})}{F(e_{i}^{\prime})}\right) L3d=m1j=1mDkl(PQ)=mk1j=1mi=1kF(ej)log(F(ei)F(ej))

      where P P P contains the sampled Identity Encoding e e e of a 3D Gaussian, while the set Q = { e 1 ′ , e 2 ′ , . . . , e k ′ } Q=\{e_1^{\prime},e_2^{\prime},...,e_k^{\prime}\} Q={e1,e2,...,ek} consists of its k k k nearest neighbors in 3D spatial space.

3.3 Downstream: Local Gaussian Editing

在这里插入图片描述

Pay more attention to inpainting, first, delete the relevant 3D Gaussians and then add a small number of new Gaussians to be supervised by the 2D inpainting results by LAMA [41] during rendering.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/608651.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

第二篇【AI与传奇开心果系列】Python的AI技术点库案例示例:详解AI工业应用算法原理

AI与传奇开心果系列博文 系列博文目录Python的AI技术点库案例示例系列 博文目录前言一、AI工业应用算法原理介绍二、机器学习在工业领域的应用算法示例代码三、深度学习算法在工业领域应用示例代码四、强化学习在工业领域应用示例代码五、自然语言处理在工业领域应用示例代码六…

C语言 变量的作用域

今天 我们来说变量的作用域和存储类型 每种事物 都有自己作用的范围限制 例如 汽车只能在路上跑 轮船只能在海洋 飞机只能通行于天空 函数的参数 也只有在函数被调用过程中分配内存资源 函数执行结束 空间也会被立即释放 这也说明了 行参变量只有在函数内才有效 离开了该函数 …

【JavaEE】博客系统(前端页面设计)

文章目录 一、预期效果二、实现博客列表页 一、预期效果 二、实现博客列表页 实现导航栏 编辑 blog_list.html, 创建导航栏的 html 代码. 导航栏里面包含 logo, 标题, 以及一些按钮(跳转链接). 为了实现左右排列, 在 logo 和 按钮 之间加一个 spacer 作为占位器. <!-- 导航…

初次查询大数据信用报告,需要注意哪些问题?

随着大数据的普及&#xff0c;基于大数据技术的大数据信用也变得越来越重要&#xff0c;比如在申贷之前&#xff0c;不少地方都会查询申贷人的大数据信用&#xff0c;作为风险控制的必要手段&#xff0c;那对于初次查询大数据信用报告的人来说&#xff0c;需要注意哪些问题呢?…

引领AI数据标注新纪元:景联文科技为智能未来筑基

在人工智能蓬勃发展的今天&#xff0c;数据如同燃料&#xff0c;驱动着每一次技术飞跃。在这场智能革命的浪潮中&#xff0c;景联文科技凭借其深厚的专业实力与前瞻性的战略眼光&#xff0c;正站在行业前沿&#xff0c;为全球的人工智能企业提供坚实的数据支撑。 全国布局&…

vscode的git插件使用教程

虽然git的命令我没有滚瓜烂熟&#xff0c;但vscode的git插件是尊嘟很好用啊&#xff0c;都被我用烂了。在网上看见一个讲的很不错的插件教程。借鉴一下。并在一些地方用块引用进行了补充说明&#xff01; 跳过了vscode安装过程。 克隆GitHub中的存储库&#xff1a; 1、复制Gi…

python3有serial库吗

一、概述 pyserial模块封装了对串口的访问。 二、特性 在支持的平台上有统一的接口。 通过python属性访问串口设置。 支持不同的字节大小、停止位、校验位和流控设置。 可以有或者没有接收超时。 类似文件的API&#xff0c;例如read和write&#xff0c;也支持readline等…

探案录 | KingbaseES+SqlSugar为医疗用户排忧解难

在2024年的初春&#xff0c;某大型三甲医院的CT预约系统上线测试&#xff0c;如同新芽破土&#xff0c;充满了希望与活力。然而&#xff0c;仅仅两天后&#xff0c;一个技术难题如同迷雾中的幽灵&#xff0c;悄然出现&#xff1a;The connection pool has been exhausted…… 福…

5.07 Pneumonia Detection in Chest X-Rays using Neural Networks

肺炎诊断是一个耗时的过程&#xff0c;需要高技能的专业人员分析胸部X光片chest X-ray (CXR)&#xff0c;并通过临床病史、生命体征和实验室检查确认诊断。 它可以帮助医生确定肺部感染的程度和位置。呼吸道疾病在 X 光片上表现为一处膨胀的不透明区域。然而&#xff0c;由于不…

STM32 ADC学习

ADC Analog-to-Digital Converter&#xff0c;即模拟/数字转换器 常见ADC类型 分辨率和采样速度相互矛盾&#xff0c;分辨率越高&#xff0c;采样速率越低。 ADC的特性参数 分辨率&#xff1a;表示ADC能辨别的最小模拟量&#xff0c;用二进制位数表示&#xff0c;比如8,10…

OpenAI 正在开发一种可以防止版权诉讼的工具

OpenAI 正在开发一种名为 Media Manager 的工具&#xff0c;该工具将使内容创建者和所有者能够确定他们是否愿意将自己的内容用于 ML 研究和 AI 模型训练。 Media Manager 将做什么&#xff1f; 明年推出的 Media Manager 将使内容创作者和所有者能够更好地控制他们的工作是否…

C语言初阶(6) - 指针

目录 1.指针是什么&#xff1f; 2. 指针和指针类型 2.1 指针 - 整数 2.2 指针的解引用 3. 野指针 3.1 野指针成因 3.2 如何规避野指针 4. 常量指针和指针常量 (const) 4.1.常量指针 4.2.指针常量 5. 指针运算 5.1 指针-整数 5.2 指针-指针 5.3指针的关系运算 6.…

Vitis HLS 学习笔记--理解串流Stream(2)

目录 1. 简介 2. 极简的对比 3. 硬件模块的多次触发 4. 进一步探讨 do-while 5. 总结 1. 简介 在这篇博文中《Vitis HLS 学习笔记--AXI_STREAM_TO_MASTER-CSDN博客》&#xff0c;我分享了关于 AXI Stream 接口的实际应用案例。然而&#xff0c;尽管文章中提供了代码示例&…

如何向Linux内核提交开源补丁?

2021年&#xff0c;我曾经在openEuler社区上看到一项改进Linux内核工具的需求&#xff0c;因此参与过Linux内核社区的开源贡献。贡献开源社区的流程都可以在内核社区文档中找到&#xff0c;但是&#xff0c;单独学习需要一个较长的过程&#xff0c;新手难以入门&#xff0c;因此…

分享四种免费获取SSL的方式

SSL证书目前需要部署安装的网站很多&#xff0c;主要还是基于国内目前对证书的需求度在不断的升高&#xff0c;网站多了、服务器多了之后。网络安全问题就成为了大家不得不面对的一个重要的问题了。SSL证书的作用有很多&#xff0c;这里就不一一详述了&#xff0c;本期作品主要…

如何在线阅读Linux内核源码?

开源社区有一句名言&#xff1a;Talk is cheap, show me your code。阅读源代码是学习Linux操作系统的必经之路。但是&#xff0c;Linux内核的代码量超过3000万行&#xff0c;工程包非常大&#xff0c;直接下载耗时较长&#xff0c;这就需要使用一些在线阅读的技巧。 方式1&am…

【深度学习】【Lora训练0】StabelDiffusion,Lora训练,kohya_ss训练

文章目录 环境数据自动标注kohya_ss BLIP2kohya_ss WD14 后续 资源&#xff1a; &#xff08;1&#xff09;训练ui kohya_ss&#xff1a; https://github.com/bmaltais/kohya_ss &#xff08;2&#xff09;kohya_ss 的docker 其他docker https://github.com/ashleykleynhans…

韩顺平0基础学Java——第7天

p110-p154 控制结构&#xff08;第四章&#xff09; 多分支 if-elseif-else import java.util.Scanner; public class day7{public static void main(String[] args) {Scanner myscanner new Scanner(System.in);System.out.println("input your score?");int s…

Word表格标题间距大修改环绕为无仍无法解决

1.选中表格&#xff0c;右键选择【表格属性】 2.选择【环绕】&#xff0c;此时【定位】可以被启用&#xff08;如下&#xff09;&#xff0c;点击进入窗口 3.修改参数和下面一模一样 注意&#xff1a;【垂直】那里的修改方式是先选段落&#xff0c;后在位置输入0

【linux】主分区,扩展分区,逻辑分区,动态分区,引导分区,标准分区

目录 主分区&#xff0c;扩展分区&#xff0c;逻辑分区 主分区和引导分区 主分区&#xff0c;扩展分区&#xff0c;逻辑分区&#xff08;标准分区&#xff09; 硬盘一般划分为一个“主分区”和“扩展分区”&#xff0c;然后在扩展分区上再分成数个逻辑分区。 磁盘主分区扩展…
最新文章