<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>BeYoung</title>
    <link>https://lyapple2008.github.io/</link>
    <description>Recent content on BeYoung</description>
    <image>
      <title>BeYoung</title>
      <url>https://lyapple2008.github.io/%3Clink%20or%20path%20of%20image%20for%20opengraph,%20twitter-cards%3E</url>
      <link>https://lyapple2008.github.io/%3Clink%20or%20path%20of%20image%20for%20opengraph,%20twitter-cards%3E</link>
    </image>
    <generator>Hugo -- 0.147.9</generator>
    <language>zh</language>
    <copyright>See this site&amp;rsquo;s source code here, licensed under GPLv3 ·</copyright>
    <lastBuildDate>Mon, 20 Apr 2026 15:02:03 +0800</lastBuildDate>
    <atom:link href="https://lyapple2008.github.io/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>【置顶】2026</title>
      <link>https://lyapple2008.github.io/posts/2026-todo/</link>
      <pubDate>Sun, 22 Feb 2026 09:43:54 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/2026-todo/</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;记录跟踪2026年要做的事情和正在做的事情。&lt;/p&gt;&lt;/blockquote&gt;
&lt;h2 id=&#34;目标&#34;&gt;目标&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;健康生活&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; 体重减到60kg&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div id=&#34;weight-chart&#34; style=&#34;width: 100%; height: 300px;&#34;&gt;&lt;/div&gt;
&lt;script src=&#34;https://cdn.jsdelivr.net/npm/echarts@5/dist/echarts.min.js&#34;&gt;&lt;/script&gt;
&lt;script&gt;
var chart = echarts.init(document.getElementById(&#39;weight-chart&#39;));
chart.setOption({
    tooltip: { trigger: &#39;axis&#39; },
    xAxis: { type: &#39;category&#39;, data: [&#39;1月&#39;, &#39;2月&#39;, &#39;3月&#39;, &#39;4月&#39;] },
    yAxis: { type: &#39;value&#39;, min: 55, max: 70 },
    series: [{
        data: [65, 65.8, 65.6, 64.4],
        type: &#39;line&#39;,
        smooth: true,
        areaStyle: { opacity: 0.3 },
        markLine: { data: [{ yAxis: 60, name: &#39;目标&#39; }], linestyle: { color: &#39;#52c41a&#39;, type: &#39;dashed&#39; } }
    }]
});
&lt;/script&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;探索可能&lt;/p&gt;</description>
      <content:encoded><![CDATA[<blockquote>
<p>记录跟踪2026年要做的事情和正在做的事情。</p></blockquote>
<h2 id="目标">目标</h2>
<ul>
<li>
<p>健康生活</p>
<ul>
<li><input disabled="" type="checkbox"> 体重减到60kg</li>
</ul>
</li>
</ul>
<div id="weight-chart" style="width: 100%; height: 300px;"></div>
<script src="https://cdn.jsdelivr.net/npm/echarts@5/dist/echarts.min.js"></script>
<script>
var chart = echarts.init(document.getElementById('weight-chart'));
chart.setOption({
    tooltip: { trigger: 'axis' },
    xAxis: { type: 'category', data: ['1月', '2月', '3月', '4月'] },
    yAxis: { type: 'value', min: 55, max: 70 },
    series: [{
        data: [65, 65.8, 65.6, 64.4],
        type: 'line',
        smooth: true,
        areaStyle: { opacity: 0.3 },
        markLine: { data: [{ yAxis: 60, name: '目标' }], linestyle: { color: '#52c41a', type: 'dashed' } }
    }]
});
</script>
<ul>
<li>
<p>探索可能</p>
<ul>
<li><input disabled="" type="checkbox"> 英语能力，熟练3篇演讲短文</li>
<li><input disabled="" type="checkbox"> 独立开发，目标完成4个独立项目</li>
<li><input disabled="" type="checkbox"> 公众号输出，目标输出文章10篇</li>
</ul>
</li>
</ul>
<h2 id="正在做的项目">正在做的项目</h2>
<ul>
<li>
<p><input checked="" disabled="" type="checkbox"> iOS双语字幕App</p>
</li>
<li>
<p><input checked="" disabled="" type="checkbox"> 拼了还拼小游戏外挂</p>
</li>
<li>
<p><input disabled="" type="checkbox"> <a href="https://github.com/karpathy/autoresearch">autoresearch</a>应用在音频降噪模式训练上</p>
</li>
</ul>
<h2 id="一些随想">一些随想</h2>
<blockquote>
<p>记录一些生活工作中一些小感受和小想法</p></blockquote>
<ul>
<li>2026.03.29</li>
</ul>
<blockquote>
<p>每天记录和量化进度，本身就是一种负担，而且对于效率提高帮助不大，也不利于长期任务，与其每天盯着任务进度，
不如现在开始试下，里程碑式的工作方法，给自己一段时间设定一些目标，到期后来检查完成情况，中间如何只记录
据有里程碑意义的事件。</p></blockquote>
]]></content:encoded>
    </item>
    <item>
      <title>音频重采样：从原理到工程实现</title>
      <link>https://lyapple2008.github.io/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/</link>
      <pubDate>Mon, 20 Apr 2026 15:02:03 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/</guid>
      <description>&lt;p&gt;深入理解 sinc 卷积如何完成信号重建，探索从朴素算法到 SpeexDSP 的优化之路&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;第一部分重采样的原理与处理过程&#34;&gt;第一部分：重采样的原理与处理过程&lt;/h2&gt;
&lt;h3 id=&#34;目录&#34;&gt;目录&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&#34;https://lyapple2008.github.io/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#sampling-nature&#34;&gt;采样操作的本质：频谱复制&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://lyapple2008.github.io/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#reconstruction-condition&#34;&gt;重建的条件：无混叠&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://lyapple2008.github.io/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#ideal-lpf&#34;&gt;理想低通滤波器&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://lyapple2008.github.io/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#convolution-reconstruction&#34;&gt;卷积重建&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://lyapple2008.github.io/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#resampling-process&#34;&gt;重采样过程&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://lyapple2008.github.io/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#why-sinc&#34;&gt;为什么是 sinc 而不是其他函数&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h3 id=&#34;11-采样操作的本质频谱复制&#34;&gt;1.1 采样操作的本质：频谱复制&lt;/h3&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;核心结论 — Whittaker-Shannon 插值公式：&lt;/strong&gt;&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>深入理解 sinc 卷积如何完成信号重建，探索从朴素算法到 SpeexDSP 的优化之路</p>
<hr>
<h2 id="第一部分重采样的原理与处理过程">第一部分：重采样的原理与处理过程</h2>
<h3 id="目录">目录</h3>
<ol>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#sampling-nature">采样操作的本质：频谱复制</a></li>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#reconstruction-condition">重建的条件：无混叠</a></li>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#ideal-lpf">理想低通滤波器</a></li>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#convolution-reconstruction">卷积重建</a></li>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#resampling-process">重采样过程</a></li>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#why-sinc">为什么是 sinc 而不是其他函数</a></li>
</ol>
<hr>
<h3 id="11-采样操作的本质频谱复制">1.1 采样操作的本质：频谱复制</h3>
<blockquote>
<p><strong>核心结论 — Whittaker-Shannon 插值公式：</strong></p>
<p>$$x_c(t) = \sum_{n=-\infty}^{\infty} x[n] \cdot \text{sinc}\left(\frac{t - nT}{T}\right)$$</p>
<p>其中 $\text{sinc}(u) = \frac{\sin(\pi u)}{\pi u}$。
<strong>这意味着：任意时刻的连续值 = 离散样本与 sinc 函数的卷积。</strong></p></blockquote>
<h4 id="第一步采样定理的数学表达">第一步：采样定理的数学表达</h4>
<p>设连续信号 $x_c(t)$ 的傅里叶变换为 $X_c(j\Omega)$，带宽限制在 $|\Omega| \leq \Omega_0$（即 bandlimited）。</p>
<p><strong>采样</strong>（采样周期 $T$）等价于将 $x_c(t)$ 乘以冲激串：</p>
<p>$$x_s(t) = x_c(t) \cdot \sum_{n=-\infty}^{\infty} \delta(t - nT)$$</p>
<p>频域上，乘积变成卷积。冲激串的傅里叶变换仍是冲激串（周期为 $\Omega_s = 2\pi/T$）：</p>
<p>$$\sum_{n=-\infty}^{\infty} \delta(t - nT) \xrightarrow{\mathcal{F}} \frac{2\pi}{T}\sum_{k=-\infty}^{\infty} \delta(\Omega - k\Omega_s)$$</p>
<p>卷积结果：<strong>采样信号的频谱是原始频谱的无限复制</strong>：</p>
<p>$$X_s(j\Omega) = \frac{1}{T}\sum_{k=-\infty}^{\infty} X_c\left(j(\Omega - k\Omega_s)\right)$$</p>
<h4 id="直观理解为什么采样--频谱复制">直观理解：为什么采样 = 频谱复制</h4>
<p><strong>时域直觉：采样就是&quot;按快门&quot;</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">连续信号:  ∿∿∿∿∿∿∿∿∿∿∿∿∿∿∿∿∿∿∿∿  (无限多点)
</span></span><span class="line"><span class="cl">采样后:      ●     ●     ●     ●     (有限个点)
</span></span><span class="line"><span class="cl">           t=0   t=T  t=2T  t=3T</span></span></code></pre></td></tr></table>
</div>
</div>
<p>数学上，这个操作是信号乘以冲激串：$x_s(t) = x_c(t) \cdot \sum_n \delta(t - nT)$。冲激串就像一把&quot;梳子&quot;，只在整数倍 T 的位置&quot;抓住&quot;信号值，其余全部归零。</p>
<p><strong>频域发生了什么？</strong></p>
<p>时域相乘 → 频域卷积（傅里叶变换的基本性质）。而冲激串的傅里叶变换<strong>仍然是冲激串</strong>：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">时域:  ↑     ↑     ↑     ↑     (冲激串，周期 T)
</span></span><span class="line"><span class="cl">       ↓ 傅里叶变换 ↓
</span></span><span class="line"><span class="cl">频域:  ↑     ↑     ↑     ↑     (冲激串，周期 Ωs = 2π/T)</span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>卷积一个冲激串 = 把函数复制粘贴到每个冲激的位置。</strong> 好比一张照片（原始频谱）被复印无数份，每份放在一个相框里。</p>
<h4 id="复制品重叠与否决定混叠的关键">复制品重叠与否：决定混叠的关键</h4>
<p><strong>采样率足够高</strong>（Ωs &gt; 2Ω₀）—— 复制品之间有间隔，可以用低通滤波器完美提取原始频谱：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">     ╱╲       ╱╲       ╱╲       ╱╲
</span></span><span class="line"><span class="cl">    ╱  ╲     ╱  ╲     ╱  ╲     ╱  ╲
</span></span><span class="line"><span class="cl">   ╱    ╲   ╱    ╲   ╱    ╲   ╱    ╲
</span></span><span class="line"><span class="cl">──╱──────╲─╱──────╲─╱──────╲─╱──────╲──→ Ω
</span></span><span class="line"><span class="cl">              ↑
</span></span><span class="line"><span class="cl">         间隔存在，可完美重建</span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>采样率不够</strong>（Ωs &lt; 2Ω₀）—— 复制品重叠，原始频谱被破坏，这就是混叠：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">     ╱╲     ╱╲
</span></span><span class="line"><span class="cl">    ╱  ╲╱╲╱  ╲
</span></span><span class="line"><span class="cl">   ╱   ╳╳╳╳   ╲       ← 重叠！原始频谱被破坏
</span></span><span class="line"><span class="cl">──╱──╱──────╲──╲──→ Ω
</span></span><span class="line"><span class="cl">         ↑
</span></span><span class="line"><span class="cl">    高低频信息混在一起无法分离</span></span></code></pre></td></tr></table>
</div>
</div>
<h4 id="数字例子30hz-信号在-40hz-采样率下">数字例子：30Hz 信号在 40Hz 采样率下</h4>
<p>信号最高频率 30Hz，采样率 40Hz，Ωs = 40Hz：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">原始频谱: 0~30Hz 有能量
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">采样后频谱复制 (每 40Hz 一份):
</span></span><span class="line"><span class="cl">  复制品0:  [0 ~ 30Hz]       ← 原始
</span></span><span class="line"><span class="cl">  复制品1:  [40 ~ 70Hz]      ← 40+0 ~ 40+30
</span></span><span class="line"><span class="cl">  复制品-1: [-40 ~ -10Hz]    ← 折叠到正频率变成 10~40Hz</span></span></code></pre></td></tr></table>
</div>
</div>
<p>复制品-1 的负频率部分（-30~-10Hz）折叠到正频率变成 10<del>30Hz，与原始频谱的 10</del>30Hz <strong>重叠</strong>——30Hz 被&quot;误认&quot;为 10Hz，这就是混叠。</p>
<hr>
<h3 id="12-重建的条件无混叠">1.2 重建的条件：无混叠</h3>
<p>若 $\Omega_s \geq 2\Omega_0$（即采样频率高于奈奎斯特频率），则复制频谱<strong>互不重叠</strong>，可以通过一个理想低通滤波器提取出原始频谱。</p>
<p><strong>重建条件（采样定理）：</strong></p>
<p>$$\Omega_s = \frac{2\pi}{T} &gt; 2\Omega_0 \implies T &lt; \frac{\pi}{\Omega_0}$$</p>
<hr>
<h3 id="13-理想低通滤波器">1.3 理想低通滤波器</h3>
<p>要完美重建 $x_c(t)$，需要让 $x_s(t)$ 通过一个<strong>理想砖墙低通滤波器</strong> $H(j\Omega)$：</p>
<p>$$H(j\Omega) = \begin{cases} T, &amp; |\Omega| \leq \Omega_c \ 0, &amp; |\Omega| &gt; \Omega_c \end{cases}$$</p>
<p>其中 $\Omega_c$ 满足 $\Omega_0 \leq \Omega_c \leq \Omega_s - \Omega_0$（通常取 $\Omega_c = \Omega_s/2 = \pi/T$）。</p>
<p>该滤波器的<strong>单位冲激响应</strong>（即逆傅里叶变换）为：</p>
<p>$$h(t) = \mathcal{F}^{-1}{H(j\Omega)} = T \cdot \frac{\sin(\Omega_c t)}{\Omega_c t}$$</p>
<p>令 $\Omega_c = \pi/T$，则：</p>
<p>$$h(t) = T \cdot \frac{\sin(\pi t/T)}{\pi t/T} = \text{sinc}\left(\frac{t}{T}\right)$$</p>
<blockquote>
<p><strong>这就是 sinc 函数！</strong></p></blockquote>
<h4 id="sinc-同时完成重建与低通">sinc 同时完成重建与低通</h4>
<p>sinc 卷积在频域做了一件事：<strong>只保留目标频段，切除其余复制品</strong>。从不同角度看，它有两个名字：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">频域视角：
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">采样信号频谱:    |原始|  复制品  |  复制品  |  复制品
</span></span><span class="line"><span class="cl">                      ↑
</span></span><span class="line"><span class="cl">                 只提取这一块
</span></span><span class="line"><span class="cl">                 = 重建 = 低通滤波 = 去除复制品</span></span></code></pre></td></tr></table>
</div>
</div>
<table>
  <thead>
      <tr>
          <th>视角</th>
          <th>sinc 做了什么</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>重建</strong></td>
          <td>从离散样本恢复出连续信号</td>
      </tr>
      <tr>
          <td><strong>低通 / 防混叠</strong></td>
          <td>去除采样产生的频谱复制品</td>
      </tr>
  </tbody>
</table>
<p><strong>它们是同一个滤波操作的两种叫法。</strong> 重建 = 低通滤波 = sinc 卷积。</p>
<h4 id="sinc-如何确定截止频率">sinc 如何确定截止频率？</h4>
<p>sinc 函数不是&quot;带参数的滤波器&quot;，它<strong>就是</strong>理想低通滤波器的时域形态。截止频率由 sinc 的&quot;胖瘦&quot;决定：</p>
<p>**缩放规则：**sinc 在时域越&quot;窄&quot;，频域的矩形窗越&quot;宽&quot;（时频互逆）。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">sinc(t/T)          →  截止频率 = π/T = Ωs/2
</span></span><span class="line"><span class="cl">sinc(2Wt)          →  截止频率 = W    (角频率)
</span></span><span class="line"><span class="cl">sinc(2πf_c · t)    →  截止频率 = f_c  (Hz)</span></span></code></pre></td></tr></table>
</div>
</div>
<h4 id="重采样中截止频率的选择">重采样中截止频率的选择</h4>
<p>截止频率不是随便选的，它由<strong>目标采样率</strong>决定：<strong>sinc 的截止频率 = 目标采样率的 Nyquist 频率（$F_{\text{target}}/2$）。</strong></p>
<p>**上采样（插值）：**从 $F_s$ 上采样到 $M \cdot F_s$</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">原始 Nyquist = Fs/2
</span></span><span class="line"><span class="cl">新 Nyquist   = M·Fs/2
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">插零后频谱:   |原始|  镜像  |  镜像  |  镜像
</span></span><span class="line"><span class="cl">                     ↑
</span></span><span class="line"><span class="cl">              sinc 滤波器只保留这里
</span></span><span class="line"><span class="cl">              截止 = 新采样率的 Nyquist = M·Fs/2</span></span></code></pre></td></tr></table>
</div>
</div>
<p>**下采样（抽取）：**从 $F_s$ 下采样到 $F_s/M$</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">原始 Nyquist = Fs/2
</span></span><span class="line"><span class="cl">新 Nyquist   = Fs/(2M)
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">原始频谱:     |████████████████|
</span></span><span class="line"><span class="cl">                      ↑
</span></span><span class="line"><span class="cl">              只保留这部分，截止 = 新采样率的 Nyquist = Fs/(2M)
</span></span><span class="line"><span class="cl">              然后再抽取</span></span></code></pre></td></tr></table>
</div>
</div>
<hr>
<h3 id="14-卷积重建">1.4 卷积重建</h3>
<p>滤波器输出为输入与冲激响应的卷积：</p>
<p>$$x_c(t) = x_s(t) * h(t) = \left[x_c(t) \cdot \sum_{n}\delta(t-nT)\right] * \text{sinc}\left(\frac{t}{T}\right)$$</p>
<p>乘以冲激串后得到一系列移位冲激：</p>
<p>$$x_s(t) = \sum_{n=-\infty}^{\infty} x_c(nT) \cdot \delta(t - nT)$$</p>
<p>与 sinc 卷积：</p>
<p>$$x_c(t) = \sum_{n=-\infty}^{\infty} x_c(nT) \cdot \text{sinc}\left(\frac{t - nT}{T}\right)$$</p>
<blockquote>
<p><strong>核心公式的意义：</strong></p>
<ul>
<li>连续信号可以<strong>唯一地</strong>由其样本 ${x_c(nT)}$ 决定</li>
<li>每个样本乘以一个 sinc 函数，所有这些 sinc 函数叠加即重建连续信号</li>
<li>sinc 函数的峰值在 $t = nT$，在其他采样点为零（正交性）</li>
</ul></blockquote>
<hr>
<h3 id="15-重采样过程">1.5 重采样过程</h3>
<p>现在要将信号从采样率 $T_1$<strong>重采样</strong>到 $T_2$。</p>
<p><strong>整数倍上采样</strong>（$T_2 = T_1/N$，增加 $N$ 倍）：</p>
<ol>
<li>在样本间插入 $N-1$ 个零 → 频谱压缩 $N$ 倍</li>
<li>通过低通滤波器（增益 $N$）去除镜像频谱</li>
<li>该滤波器就是 sinc 函数</li>
</ol>
<p><strong>有理数比上/下采样</strong>（$T_2 = \frac{P}{Q}T_1$）：</p>
<ul>
<li>先上采样 $P$ 倍（插值）</li>
<li>再下采样 $Q$ 倍（抽取）</li>
<li>每一步都用 sinc 滤波</li>
</ul>
<p><strong>任意比值重采样：</strong></p>
<p>在目标时刻 $t = mT_2$ 直接计算：</p>
<p>$$x[m] = \sum_{n=-\infty}^{\infty} x_1[n] \cdot \text{sinc}\left(\frac{mT_2 - nT_1}{T_1}\right)$$</p>
<hr>
<h3 id="16-为什么是-sinc-而不是其他函数">1.6 为什么是 sinc 而不是其他函数？</h3>
<table>
  <thead>
      <tr>
          <th>函数</th>
          <th>来源</th>
          <th>性质</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>sinc</strong></td>
          <td>理想低通滤波器的逆 FT</td>
          <td>唯一满足完美重建（奈奎斯特条件）</td>
      </tr>
      <tr>
          <td>其他插值核（线性、立方等）</td>
          <td>窗化的 sinc 近似</td>
          <td>是 sinc 的截断/平滑版本，有误差</td>
      </tr>
  </tbody>
</table>
<blockquote>
<p><strong>sinc 是唯一的：<strong>在采样定理框架下，只有 sinc 能实现</strong>理论上完美</strong>的连续信号重建。任何其他核都是对 sinc 的近似，代价是高频失真（Gibbs 现象 / 振铃效应）。</p></blockquote>
<hr>
<h3 id="17-物理直觉">1.7 物理直觉</h3>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">原始连续信号:  ~~~波形~~~
</span></span><span class="line"><span class="cl">采样点:          ●    ●    ●    ●
</span></span><span class="line"><span class="cl">                  ↖   ↗   ↖   ↗
</span></span><span class="line"><span class="cl">              每个点&#34;抓住&#34;一个sinc,叠加重建波形</span></span></code></pre></td></tr></table>
</div>
</div>
<p>sinc 函数是<strong>理想低通滤波器的时域响应</strong>。卷积即滤波，滤波即重建。</p>
<hr>
<h2 id="第二部分工程实现从朴素到优化">第二部分：工程实现：从朴素到优化</h2>
<p>以 SpeexDSP 重采样器为例，从最原始的 sinc 插值出发，逐步引入优化手段，理解每个优化解决什么问题。</p>
<h3 id="目录-1">目录</h3>
<ol>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#naive-implementation">最原始的实现：直接算 sinc</a></li>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#lookup-table">第一步优化：查表代替实时计算</a></li>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#polyphase">第二步优化：多相滤波</a></li>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#speex-hybrid">第三步优化：Speex 的混合策略</a></li>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#accumulator">第四步优化：分数位置推进</a></li>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#fixed-point">第五步优化：定点运算</a></li>
<li><a href="/posts/202604/2026-04-20-%E9%9F%B3%E9%A2%91%E9%87%8D%E9%87%87%E6%A0%B7/#summary">全景总结</a></li>
</ol>
<hr>
<h3 id="21-最原始的实现直接算-sinc">2.1 最原始的实现：直接算 sinc</h3>
<p>根据 Whittaker-Shannon 公式，要计算任意时刻 $t$ 的信号值：</p>
<p>$$x(t) = \sum_{n} x[n] \cdot \text{sinc}\left(\frac{t - nT}{T}\right)$$</p>
<p>最直觉的实现：<strong>对每个输出样本，遍历所有相关输入样本，实时计算 sinc 值。</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">naive_resample</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">fs_in</span><span class="p">,</span> <span class="n">fs_out</span><span class="p">,</span> <span class="n">num_samples_out</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;最原始的重采样：直接算 sinc&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">output</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">num_samples_out</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_samples_out</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">t</span> <span class="o">=</span> <span class="n">m</span> <span class="o">/</span> <span class="n">fs_out</span>
</span></span><span class="line"><span class="cl">        <span class="n">pos</span> <span class="o">=</span> <span class="n">t</span> <span class="o">*</span> <span class="n">fs_in</span>  <span class="c1"># 在输入序列中的&#34;虚拟位置&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="n">n_center</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="n">pos</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">        <span class="n">half_len</span> <span class="o">=</span> <span class="mi">16</span>  <span class="c1"># 只取 ±16 个点（截断 sinc）</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="n">s</span> <span class="o">=</span> <span class="mf">0.0</span>
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_center</span> <span class="o">-</span> <span class="n">half_len</span><span class="p">,</span> <span class="n">n_center</span> <span class="o">+</span> <span class="n">half_len</span> <span class="o">+</span> <span class="mi">1</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="mi">0</span> <span class="o">&lt;=</span> <span class="n">n</span> <span class="o">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">                <span class="n">sinc_val</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sinc</span><span class="p">(</span><span class="n">pos</span> <span class="o">-</span> <span class="n">n</span><span class="p">)</span>  <span class="c1"># sin(πx)/(πx)</span>
</span></span><span class="line"><span class="cl">                <span class="n">s</span> <span class="o">+=</span> <span class="n">x</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">*</span> <span class="n">sinc_val</span>
</span></span><span class="line"><span class="cl">        <span class="n">output</span><span class="p">[</span><span class="n">m</span><span class="p">]</span> <span class="o">=</span> <span class="n">s</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">output</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h4 id="问题计算量太大">问题：计算量太大</h4>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">对每个输出样本:
</span></span><span class="line"><span class="cl">  → 遍历 2 × half_len 个输入样本
</span></span><span class="line"><span class="cl">  → 每个都要调用 np.sinc()（内部算 sin + 除法）
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">总计算量: num_out × 2 × half_len × (sin + 除法)
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">示例: 10 秒 48kHz 音频，half_len=32
</span></span><span class="line"><span class="cl">  480000 × 64 = 3072 万次 sin() 调用</span></span></code></pre></td></tr></table>
</div>
</div>
<blockquote>
<p><strong>核心瓶颈：</strong> sin() 是很贵的运算，每次都要算。</p></blockquote>
<hr>
<h3 id="22-第一步优化查表代替实时计算">2.2 第一步优化：查表代替实时计算</h3>
<p>sinc 函数是固定的，可以<strong>预先计算好一张表</strong>，运行时直接查表。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">优化前:  每次算 sin(πx)/(πx)     → 贵（三角函数 + 除法）
</span></span><span class="line"><span class="cl">优化后:  直接查表 sinc_table[i]   → 快（一次内存访问）</span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>新问题：精度不足。</strong> 输出位置通常是分数（比如 <code>pos = 3.7</code>），而 sinc 表只有固定间隔的值。需要用<strong>插值</strong>逼近任意分数位置的 sinc 值。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">目标: sinc(3.7)
</span></span><span class="line"><span class="cl">表里有: sinc(3.50), sinc(3.75)   ← 间隔 1/4
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">选项A: 线性插值   → sinc(3.7) ≈ 0.4×sinc(3.50) + 0.6×sinc(3.75)
</span></span><span class="line"><span class="cl">选项B: 三次插值   → 用相邻 4 个点做 cubic 插值（更准）
</span></span><span class="line"><span class="cl">选项C: 存更大的表  → oversample=16 或 32，直接取最近的（但费内存）</span></span></code></pre></td></tr></table>
</div>
</div>
<p>Speex 的 <strong>Interpolated 模式</strong>采用选项 B：存一张较稀疏的 sinc 表，用 Cubic 插值逼近任意位置。</p>
<hr>
<h3 id="23-第二步优化多相滤波--避免重复计算">2.3 第二步优化：多相滤波 — 避免重复计算</h3>
<p>上面的方法对每个输出样本独立做一次卷积。但如果输入输出采样率是<strong>整数比</strong>关系，可以利用<strong>周期性</strong>复用计算。</p>
<h4 id="关键观察">关键观察</h4>
<p>上采样 4 倍时，sinc 系数只有 4 种不同的偏移：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">输出样本 0: sinc 偏移 0/4 → 系数 [a0, a1, a2, a3]
</span></span><span class="line"><span class="cl">输出样本 1: sinc 偏移 1/4 → 系数 [b0, b1, b2, b3]
</span></span><span class="line"><span class="cl">输出样本 2: sinc 偏移 2/4 → 系数 [c0, c1, c2, c3]
</span></span><span class="line"><span class="cl">输出样本 3: sinc 偏移 3/4 → 系数 [d0, d1, d2, d3]
</span></span><span class="line"><span class="cl">输出样本 4: sinc 偏移 4/4 = 0/4 → 系数 [a0, a1, a2, a3]  ← 重复！</span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>只有 4 组不同的系数</strong>，不需要为每个输出样本重新查表。</p>
<h4 id="多相滤波器结构">多相滤波器结构</h4>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">输入 x[n] ───┬──→ [滤波器组 a] ──→ 输出 t=0/4
</span></span><span class="line"><span class="cl">             ├──→ [滤波器组 b] ──→ 输出 t=1/4
</span></span><span class="line"><span class="cl">             ├──→ [滤波器组 c] ──→ 输出 t=2/4
</span></span><span class="line"><span class="cl">             └──→ [滤波器组 d] ──→ 输出 t=3/4
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">4 组滤波器，每组只对同一个输入段计算一次</span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>计算量从 N×M 降到 N×M/P</strong>（P = 上采样倍数）。</p>
<hr>
<h3 id="24-第三步优化speex-的混合策略">2.4 第三步优化：Speex 的混合策略</h3>
<p>SpeexDSP 面对的是<strong>任意比例</strong>重采样（不是整数比），不能直接用纯多相。它的策略是根据参数自动选择查表方式。</p>
<h4 id="策略一direct-sinc-tableden_rate-较小时">策略一：Direct Sinc Table（den_rate 较小时）</h4>
<p>当分母 <code>den_rate</code> 不太大时，直接存 <code>filt_len × den_rate</code> 大小的 sinc 表，精确查找。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span><span class="lnt">9
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">den_rate = 160 (分母)
</span></span><span class="line"><span class="cl">filt_len = 64  (滤波器长度)
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">sinc 表大小 = 64 × 160 = 10240 个系数
</span></span><span class="line"><span class="cl">内存 ≈ 10240 × 2 bytes = 20KB
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">查找: sinc_table[j * den_rate + samp_frac_num]
</span></span><span class="line"><span class="cl">      ↑                ↑
</span></span><span class="line"><span class="cl">      第 j 个抽头      当前分数位置</span></span></code></pre></td></tr></table>
</div>
</div>
<p>**优点：**直接查表，零插值误差
**缺点：**内存随 den_rate 线性增长</p>
<h4 id="策略二interpolated-sinc-tableden_rate-较大时">策略二：Interpolated Sinc Table（den_rate 较大时）</h4>
<p>**核心矛盾：**Direct 模式要存 <code>filt_len × den_rate</code> 个系数。当 <code>den_rate = 48000</code> 时：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">filt_len=64, den_rate=48000 → 64 × 48000 = 307 万个系数 → 6MB 内存</span></span></code></pre></td></tr></table>
</div>
</div>
<p>太大了。能不能存一个小表，用插值来&quot;猜&quot;中间的值？</p>
<h4 id="sinc-表存的到底是什么">sinc 表存的到底是什么？</h4>
<p>每个输出样本需要的 sinc 系数，就是 sinc 函数在<strong>不同偏移位置</strong>的采样值。</p>
<p><strong>Direct 模式：为每个可能的分数位置都存一份</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">输出位置 0.000:  用 sinc 在 ..., -2.000, -1.000, 0.000, 1.000, ... 处的值
</span></span><span class="line"><span class="cl">输出位置 0.001:  用 sinc 在 ..., -1.999, -0.999, 0.001, 1.001, ... 处的值
</span></span><span class="line"><span class="cl">输出位置 0.002:  用 sinc 在 ..., -1.998, -0.998, 0.002, 1.002, ... 处的值
</span></span><span class="line"><span class="cl">...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">den_rate=1000 时，需要为 1000 种不同的偏移各存 filt_len 个系数
</span></span><span class="line"><span class="cl">表大小 = filt_len × 1000</span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>Interpolated 模式：只存 oversample 份，用插值补中间</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span><span class="lnt">9
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">oversample=16 时，只存 16 种偏移的系数：
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">偏移 0/16:  sinc 在 0.000, 1.000, 2.000, ... 处的值
</span></span><span class="line"><span class="cl">偏移 1/16:  sinc 在 0.0625, 1.0625, 2.0625, ... 处的值
</span></span><span class="line"><span class="cl">偏移 2/16:  sinc 在 0.125, 1.125, 2.125, ... 处的值
</span></span><span class="line"><span class="cl">...
</span></span><span class="line"><span class="cl">偏移 15/16: sinc 在 0.9375, 1.9375, 2.9375, ... 处的值
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">表大小 = filt_len × 16</span></span></code></pre></td></tr></table>
</div>
</div>
<h4 id="完整的查找过程数字示例">完整的查找过程（数字示例）</h4>
<p>设定：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">信号: 48kHz → 44.1kHz 重采样
</span></span><span class="line"><span class="cl">filt_len = 8  (滤波器长度，取 8 个输入样本)
</span></span><span class="line"><span class="cl">oversample = 16
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">sinc 表大小 = 8 × 16 + 8 = 136 个系数</span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>第一步：确定在输入序列中的位置</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">输入位置 pos = m × (fs_in / fs_out) = 100 × (48000/44100) = 108.8480...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">整数部分: n_center = 108
</span></span><span class="line"><span class="cl">小数部分: frac_input = 0.8480...</span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>第二步：将小数部分映射到 oversample 网格</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">oversample_index = frac_input × oversample = 0.8480 × 16 = 13.568
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">整数偏移: offset = 13        ← 在 sinc 表中的基准位置
</span></span><span class="line"><span class="cl">小数偏移: frac = 0.568       ← cubic 插值的输入</span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>第三步：对每个滤波器抽头，取 4 个相邻表值做 cubic 插值</strong></p>
<p><strong>第四步：所有抽头加权求和</strong></p>
<p>$$\mathrm{output}[m] = \sum_{j=0}^{7} \mathrm{sinc_value}[j] \times \mathrm{input}\big[n_{\mathrm{center}} + k_j\big]$$</p>
<h4 id="cubic-插值系数">Cubic 插值系数</h4>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="n">interp</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="o">-</span><span class="mf">0.16667</span><span class="o">*</span><span class="n">frac</span> <span class="o">+</span> <span class="mf">0.16667</span><span class="o">*</span><span class="n">frac</span><span class="o">*</span><span class="n">frac</span><span class="o">*</span><span class="n">frac</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="n">interp</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span>  <span class="n">frac</span> <span class="o">+</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">frac</span><span class="o">*</span><span class="n">frac</span> <span class="o">-</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">frac</span><span class="o">*</span><span class="n">frac</span><span class="o">*</span><span class="n">frac</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="n">interp</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="o">-</span><span class="mf">0.33333</span><span class="o">*</span><span class="n">frac</span> <span class="o">+</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">frac</span><span class="o">*</span><span class="n">frac</span> <span class="o">-</span> <span class="mf">0.16667</span><span class="o">*</span><span class="n">frac</span><span class="o">*</span><span class="n">frac</span><span class="o">*</span><span class="n">frac</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="n">interp</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span>  <span class="mf">1.f</span> <span class="o">-</span> <span class="n">interp</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">interp</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">interp</span><span class="p">[</span><span class="mi">3</span><span class="p">];</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p>这是对 sinc 函数的 <strong>MMSE（最小均方误差）最优三次逼近</strong>。</p>
<p><strong>为什么用 Cubic 而不是线性？</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">线性插值:  在两个点之间拉直线 → 对 sinc 这种振荡形状误差大
</span></span><span class="line"><span class="cl">Cubic 插值: 用三次曲线拟合 → 高频失真更小，接近直接查表精度</span></span></code></pre></td></tr></table>
</div>
</div>
<h4 id="对比省了多少内存">对比：省了多少内存？</h4>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">场景: 48kHz → 44.1kHz, filt_len=64
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Direct (den_rate=48000):
</span></span><span class="line"><span class="cl">  表大小 = 64 × 48000 = 3,072,000 个系数
</span></span><span class="line"><span class="cl">  内存 = 3,072,000 × 2 bytes = 6 MB
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Interpolated (oversample=16):
</span></span><span class="line"><span class="cl">  表大小 = 64 × 16 + 8 = 1,032 个系数
</span></span><span class="line"><span class="cl">  内存 = 1,032 × 2 bytes = 2 KB
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">节省: 6MB → 2KB，缩小 3000 倍
</span></span><span class="line"><span class="cl">代价: cubic 插值引入微小误差（~0.01dB，人耳不可感知）</span></span></code></pre></td></tr></table>
</div>
</div>
<hr>
<h3 id="25-第四步优化分数位置推进--避免除法">2.5 第四步优化：分数位置推进 — 避免除法</h3>
<p>每个输出样本的分数位置怎么算？朴素做法：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="c1">// 朴素：每个样本都做一次除法
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">frac_pos</span> <span class="o">=</span> <span class="p">(</span><span class="n">m</span> <span class="o">*</span> <span class="n">fs_in</span><span class="p">)</span> <span class="o">%</span> <span class="n">fs_out</span> <span class="o">/</span> <span class="n">fs_out</span><span class="p">;</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>除法很贵。</strong> Speex 用<strong>累加器</strong>避免除法：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="c1">// 预计算一次（整数除法，只在初始化时执行）
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">int_advance</span>  <span class="o">=</span> <span class="n">num_rate</span> <span class="o">/</span> <span class="n">den_rate</span><span class="p">;</span>     <span class="c1">// 整数部分
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">frac_advance</span> <span class="o">=</span> <span class="n">num_rate</span> <span class="o">%</span> <span class="n">den_rate</span><span class="p">;</span>     <span class="c1">// 分数累加步长
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>
</span></span><span class="line"><span class="cl"><span class="c1">// 每个输出样本（只有整数加法和比较）
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">last_sample</span>  <span class="o">+=</span> <span class="n">int_advance</span><span class="p">;</span>            <span class="c1">// 整数步进
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">samp_frac_num</span> <span class="o">+=</span> <span class="n">frac_advance</span><span class="p">;</span>          <span class="c1">// 分数累加
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="k">if</span> <span class="p">(</span><span class="n">samp_frac_num</span> <span class="o">&gt;=</span> <span class="n">den_rate</span><span class="p">)</span> <span class="p">{</span>        <span class="c1">// 进位
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">samp_frac_num</span> <span class="o">-=</span> <span class="n">den_rate</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">last_sample</span><span class="o">++</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>整个过程只有整数加法和比较，零除法，零浮点运算。</strong></p>
<hr>
<h3 id="26-第五步优化定点运算">2.6 第五步优化：定点运算</h3>
<p>浮点运算在某些平台（嵌入式 DSP 芯片）很慢。Speex 支持 16-bit 定点模式：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="c1">// 浮点版
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">sum</span> <span class="o">+=</span> <span class="n">sinc_val</span> <span class="o">*</span> <span class="n">input_val</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// 定点版 (Q15 格式: 1 位符号 + 15 位小数)
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">sum</span> <span class="o">+=</span> <span class="nf">MULT16_16</span><span class="p">(</span><span class="n">sinc_val_q15</span><span class="p">,</span> <span class="n">input_val_q15</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="c1">// 其中 MULT16_16 = (int32_t)a * (int32_t)b &gt;&gt; 15
</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>精度换速度：</strong> 16-bit 定点的阻带衰减约 90dB，对语音应用足够。</p>
<hr>
<h3 id="27-全景总结">2.7 全景总结</h3>
<h4 id="优化路径">优化路径</h4>
<table>
  <thead>
      <tr>
          <th>步骤</th>
          <th>优化内容</th>
          <th>解决的问题</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>0</td>
          <td>直接算 sinc</td>
          <td>瓶颈：sin() 太慢</td>
      </tr>
      <tr>
          <td>1</td>
          <td>sinc 查表 + 插值</td>
          <td>瓶颈：每个样本独立查表，重复计算</td>
      </tr>
      <tr>
          <td>2</td>
          <td>多相滤波（整数比时复用系数）</td>
          <td>瓶颈：任意比怎么办</td>
      </tr>
      <tr>
          <td>3</td>
          <td>Speex 混合策略</td>
          <td>瓶颈：每个样本做除法求分数位置</td>
      </tr>
      <tr>
          <td>4</td>
          <td>累加器推进（整数加法代替除法）</td>
          <td>瓶颈：浮点运算慢</td>
      </tr>
      <tr>
          <td>5</td>
          <td>定点运算（16-bit 定点乘加）</td>
          <td>最终：纯乘加，无除法，无 sin()，SIMD 友好</td>
      </tr>
  </tbody>
</table>
<h4 id="各步骤对比">各步骤对比</h4>
<table>
  <thead>
      <tr>
          <th>优化步骤</th>
          <th>解决的问题</th>
          <th>核心手段</th>
          <th>代价</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>查表</td>
          <td>sin() 太贵</td>
          <td>预计算 + 内存访问</td>
          <td>需要插值弥补精度</td>
      </tr>
      <tr>
          <td>Cubic 插值</td>
          <td>sinc 表太大</td>
          <td>小表 + 三次插值逼近</td>
          <td>微小的高频误差</td>
      </tr>
      <tr>
          <td>多相/累加器</td>
          <td>重复计算 + 除法</td>
          <td>周期性复用 + 整数累加</td>
          <td>仅适用于整数比</td>
      </tr>
      <tr>
          <td>混合策略</td>
          <td>任意比 + 内存/精度权衡</td>
          <td>自适应选择 Direct/Interpolated</td>
          <td>代码复杂度</td>
      </tr>
      <tr>
          <td>定点</td>
          <td>浮点太慢</td>
          <td>Q15 定点乘加</td>
          <td>精度有限（~90dB）</td>
      </tr>
  </tbody>
</table>
<h4 id="speex-质量等级与滤波器参数">Speex 质量等级与滤波器参数</h4>
<table>
  <thead>
      <tr>
          <th>等级</th>
          <th>filt_len</th>
          <th>Oversample</th>
          <th>阻带衰减</th>
          <th>适用场景</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Q0</td>
          <td>8</td>
          <td>4</td>
          <td>~60dB</td>
          <td>低功耗实时</td>
      </tr>
      <tr>
          <td>Q4</td>
          <td>64</td>
          <td>8</td>
          <td>~80dB</td>
          <td>一般语音</td>
      </tr>
      <tr>
          <td>Q5</td>
          <td>80</td>
          <td>16</td>
          <td>~100dB</td>
          <td>高质量语音</td>
      </tr>
      <tr>
          <td>Q10</td>
          <td>256</td>
          <td>32</td>
          <td>~100dB+</td>
          <td>音乐 / 离线处理</td>
      </tr>
  </tbody>
</table>
<p>filt_len 越长 → sinc 截断越接近理想 → 阻带衰减越好 → 计算量越大。</p>
<hr>
<h2 id="参考">参考</h2>
<ul>
<li><a href="https://ccrma.stanford.edu/~jos/resample/">Stanford CCRMA - Digital Audio Resampling</a></li>
<li>SpeexDSP 源码：<code>speex/libspeexdsp/resample.c</code></li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Claude Code 是如何工作的</title>
      <link>https://lyapple2008.github.io/posts/202604/2026-04-02-claude-code%E6%98%AF%E5%A6%82%E4%BD%95%E5%B7%A5%E4%BD%9C%E7%9A%84/</link>
      <pubDate>Thu, 02 Apr 2026 09:58:19 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202604/2026-04-02-claude-code%E6%98%AF%E5%A6%82%E4%BD%95%E5%B7%A5%E4%BD%9C%E7%9A%84/</guid>
      <description>&lt;h1 id=&#34;从claude-code学习agent开发&#34;&gt;从Claude Code学习Agent开发&lt;/h1&gt;
&lt;p&gt;最近一直在重度使用 Claude Code，也一直被各种新名词轰炸：vibe coding、spec coding、agent、openclaw、harness 等等。刚好看到一个介绍 Claude Code 工程原理的教程，对Agent Engneer有了一些认识，记录总结一下。&lt;/p&gt;</description>
      <content:encoded><![CDATA[<h1 id="从claude-code学习agent开发">从Claude Code学习Agent开发</h1>
<p>最近一直在重度使用 Claude Code，也一直被各种新名词轰炸：vibe coding、spec coding、agent、openclaw、harness 等等。刚好看到一个介绍 Claude Code 工程原理的教程，对Agent Engneer有了一些认识，记录总结一下。</p>
<blockquote>
<p>教程地址：https://learn.shareai.run/en/<br>
项目仓库：https://github.com/shareAI-lab/learn-claude-code</p></blockquote>
<h2 id="课程介绍">课程介绍</h2>
<p>该课程包含12个递进式课程，从简单循环到隔离化的自治执行，每个课程添加一个 harness 机制，每个机制有一句格言，非常推荐大家去学习：</p>
<blockquote>
<p><strong>s01</strong>   <em>One loop &amp; Bash is all you need</em> — 一个工具 + 一个循环 = 一个 Agent</p>
<p><strong>s02</strong>   <em>加一个工具, 只加一个 handler</em> — 循环不用动, 新工具注册进 dispatch map 就行</p>
<p><strong>s03</strong>   <em>没有计划的 agent 走哪算哪</em> — 先列步骤再动手, 完成率翻倍</p>
<p><strong>s04</strong>   <em>大任务拆小, 每个小任务干净的上下文</em> — Subagent 用独立 messages[], 不污染主对话</p>
<p><strong>s05</strong>   <em>用到什么知识, 临时加载什么知识</em> — 通过 tool_result 注入, 不塞 system prompt</p>
<p><strong>s06</strong>   <em>上下文总会满, 要有办法腾地方</em> — 三层压缩策略, 换来无限会话</p>
<p><strong>s07</strong>   <em>大目标要拆成小任务, 排好序, 记在磁盘上</em> — 文件持久化的任务图, 为多 agent 协作打基础</p>
<p><strong>s08</strong>   <em>慢操作丢后台, agent 继续想下一步</em> — 后台线程跑命令, 完成后注入通知</p>
<p><strong>s09</strong>   <em>任务太大一个人干不完, 要能分给队友</em> — 持久化队友 + 异步邮箱</p>
<p><strong>s10</strong>   <em>队友之间要有统一的沟通规矩</em> — 一个 request-response 模式驱动所有协商</p>
<p><strong>s11</strong>   <em>队友自己看看板, 有活就认领</em> — 不需要领导逐个分配, 自组织</p>
<p><strong>s12</strong>   <em>各干各的目录, 互不干扰</em> — 任务管目标, worktree 管目录, 按 ID 绑定</p></blockquote>
<h2 id="什么是agentharness">什么是Agent/Harness</h2>
<h2 id="如何开发适用于自己工作流的agentharness">如何开发适用于自己工作流的Agent/Harness</h2>
<ol>
<li>流程是怎样的</li>
<li>需要哪些特定的知识</li>
<li>其中需要用到哪些工具，这些工具适用的场景和使用说明</li>
</ol>
<h2 id="总结">总结</h2>
<ol>
<li><strong>设计参考人类开发过程</strong> — 每个特性都是实际工作流会用的</li>
<li><strong>节约 Token，按需加载</strong> — skill 的两层 load 机制 + context 的压缩机制</li>
<li><strong>新名词殊途同归</strong> — prompt/harness/skill，皆是给 Agent 提供知识背景</li>
<li><strong>典范案例</strong> — Claude Code 是编码工作流与 AI 大模型结合的典范</li>
</ol>
<h3 id="claude-code-与-ai-模型的关系">Claude Code 与 AI 模型的关系</h3>
<ul>
<li><strong>Claude 模型</strong>：负责「思考」和「推理」，理解代码、分析任务、决定下一步</li>
<li><strong>Claude Code (agentic harness)</strong>：负责「赋能」和「执行」，提供工具接口、上下文管理、终端环境，让推理转化为实际操作</li>
</ul>
<blockquote>
<p>关系模式：Claude Code 相当于「外壳/操作系统」，Claude 模型是其中的「智能核心」。模型提出计划，Claude Code 通过工具和环境落实这些计划。</p></blockquote>
<hr>
<p>Agent 开发不过是将自身能力的一次实体化。如同那句话所说：大模型是放大器，自身能力决定了 Agent 的上限。这个时代对人的要求，反而更高了。只有人本身把工作流想清楚了，才有可以设计出高效合理的Agent流程。</p>
<p>当整个系统的短板是人的时候，再也没有借口可找。</p>
<p>参考：</p>
<ol>
<li><a href="https://github.com/shareAI-lab/learn-claude-code">github: learn-claude-code</a></li>
<li><a href="https://x.com/HiTw93/status/2032091246588518683">你不知道的 Claude Code：架构、治理与工程实践</a></li>
<li><a href="https://x.com/HiTw93/status/2034627967926825175">你不知道的 Agent：原理、架构与工程实践</a></li>
</ol>
]]></content:encoded>
    </item>
    <item>
      <title>一个程序员为了不看广告有多拼</title>
      <link>https://lyapple2008.github.io/posts/202603/2026-03-28-%E4%B8%80%E4%B8%AA%E7%A8%8B%E5%BA%8F%E5%91%98%E4%B8%BA%E4%BA%86%E4%B8%8D%E7%9C%8B%E5%B9%BF%E5%91%8A%E6%9C%89%E5%A4%9A%E6%8B%BC/</link>
      <pubDate>Sat, 28 Mar 2026 14:17:17 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202603/2026-03-28-%E4%B8%80%E4%B8%AA%E7%A8%8B%E5%BA%8F%E5%91%98%E4%B8%BA%E4%BA%86%E4%B8%8D%E7%9C%8B%E5%B9%BF%E5%91%8A%E6%9C%89%E5%A4%9A%E6%8B%BC/</guid>
      <description>&lt;h2 id=&#34;一缘起&#34;&gt;一、缘起&lt;/h2&gt;
&lt;p&gt;春节期间，为了打发时间，我迷上了一款拼图小游戏。游戏规则很简单——给你一张漂亮的图片，打乱成若干块，然后在时间限制内把它拼好。听起来很休闲对不对？&lt;strong&gt;too young too simple&lt;/strong&gt;。&lt;/p&gt;</description>
      <content:encoded><![CDATA[<h2 id="一缘起">一、缘起</h2>
<p>春节期间，为了打发时间，我迷上了一款拼图小游戏。游戏规则很简单——给你一张漂亮的图片，打乱成若干块，然后在时间限制内把它拼好。听起来很休闲对不对？<strong>too young too simple</strong>。</p>
<img src="/images/2026-03/微信拼图小游戏.PNG" height="400">
<p>问题在于：<strong>每次超时都要看30秒视频广告</strong>。人菜瘾又大的我，经常需要看广告才能过关。</p>
<p>终于在某次看了第17遍&quot;是兄弟就来砍我&quot;之后，我一拍桌子——<strong>不行，我要搞点事情</strong>。</p>
<blockquote>
<p>📢 友情提示：本文不涉及游戏外挂或修改游戏本身，只是研究如何用程序自动操作已知的拼图图片。</p></blockquote>
<h2 id="二动手干">二、动手干</h2>
<h3 id="21-拼图还原问题">2.1 拼图还原问题</h3>
<p><strong>拼图还原问题（Jigsaw Puzzle Reassembly Problem）</strong> 是计算机视觉和图像处理领域的一个重要研究课题。简单来说，就是给定一幅被打碎的图像，如何让计算机自动识别各个碎片之间的关系，并将它们重新拼合成完整的原始图像。</p>
<h4 id="问题分类">问题分类</h4>
<p>根据不同的碎片特征，拼图还原问题可以分为以下几类：</p>
<table>
  <thead>
      <tr>
          <th>类型</th>
          <th>碎片特征</th>
          <th>难度</th>
          <th>典型场景</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Type 1</strong></td>
          <td>矩形碎片，排列规则，<strong>无旋转</strong></td>
          <td>⭐⭐</td>
          <td>网格切割的图片</td>
      </tr>
      <tr>
          <td><strong>Type 2</strong></td>
          <td>矩形碎片，排列规则，<strong>有旋转</strong></td>
          <td>⭐⭐⭐</td>
          <td>可旋转的拼图游戏</td>
      </tr>
      <tr>
          <td><strong>Type 3</strong></td>
          <td>传统锯齿边缘，<strong>无旋转</strong></td>
          <td>⭐⭐⭐⭐</td>
          <td>传统拼图</td>
      </tr>
      <tr>
          <td><strong>Type 4</strong></td>
          <td>传统锯齿边缘，<strong>有旋转</strong></td>
          <td>⭐⭐⭐⭐⭐</td>
          <td>复杂拼图、考古文物修复</td>
      </tr>
  </tbody>
</table>
<h4 id="本文面对的问题">本文面对的问题</h4>
<p>回顾本文面对的拼图问题：</p>
<ul>
<li>✅ <strong>矩形分块</strong>：每块都是规整的矩形，没有传统拼图的锯齿边缘</li>
<li>✅ <strong>网格排列</strong>：6×6、7×7、8×8 三种规格，整齐排列</li>
<li>✅ <strong>无旋转</strong>：所有碎片都是正向朝上，不涉及旋转匹配</li>
<li>❌ <strong>无参考图</strong>：原图被打碎后完全不可见，需要通过碎片自身特征推断位置</li>
</ul>
<p>这属于上表中的 <strong>Type 1</strong> 类型——矩形碎片、无旋转的规则拼图。虽然是相对简单的一类，但结合&quot;无参考图&quot;这一约束条件，仍然需要借助图像边缘匹配等计算机视觉技术来解决。</p>
<h3 id="22-实施方案">2.2 实施方案</h3>
<h4 id="221-提取拼图区域">2.2.1 提取拼图区域</h4>
<p><strong>输入</strong>：含 UI 的原始截图
<strong>输出</strong>：纯拼图区域图像 + bbox</p>
<p><strong>方法</strong>：基于背景颜色分割</p>
<p>从图像四周边缘采样获取背景色，对每个像素计算 RGB 欧氏距离。距离 &gt; 35 的像素标记为非背景，经形态学开闭运算去噪填洞后，查找最大连通域作为拼图区域。</p>
<hr>
<h4 id="222-检测拼图块边界">2.2.2 检测拼图块边界</h4>
<p><strong>输入</strong>：纯拼图区域图像
<strong>输出</strong>：M×N 个等大小 patch</p>
<p><strong>方法</strong>：基于轴向纹理信号检测分割线</p>
<p>灰度化后沿行/列方向计算标准差。拼图块内部纹理有变化→信号高，块间缝隙处纹理一致→信号低。通过搜索信号局部最小值定位每条分割线，最后归一化所有 patch 为统一尺寸。</p>
<hr>
<h4 id="223-遗传算法拼图">2.2.3 遗传算法拼图</h4>
<p><strong>输入</strong>：打乱的 M×N 个 patch
<strong>输出</strong>：还原后的 grid + 重建图像</p>
<p><strong>方法</strong>：进化算法最小化相邻块边缘差异</p>
<p>种群 200 个随机排列个体，以相邻块边缘像素差（MSE/Median/Percentile/Huber）的总和作为适应度。轮盘赌选择父代、交叉产生子代、精英保留 Top2，连续 10 代无提升则早停。</p>
<p><img alt="拼图还原示例" loading="lazy" src="/images/2026-03/jigsaw-recontruct-example.png"></p>
<h2 id="三最终效果展示">三、最终效果展示</h2>
<iframe src="//player.bilibili.com/player.html?bvid=BV1E2XDBBEN1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true" width="100%" height="500"></iframe>
<h2 id="四总结">四、总结</h2>
<p>这个项目大概花了我<strong>四周的周末时间</strong>，从最初的&quot;看广告好烦&quot;到最后的&quot;这玩意儿居然还真能跑&quot;，过程颇为曲折。虽然全程vibe coding，但是大方向还是需要自己去研究清楚后，再给AI讲清楚后，才能有个好的结果。</p>
<p>目前这个效果离我最初设想的效果，一键启动通关，还差很远，还存在不少问题，不过也算基本达到当时的目的，用了这个后，玩这个拼图游戏再也没有看过广告了。也许后面有时间，再看看没有其它能够实现一键通关的方法，比如RL之类的，现在就暂时到这里吧，毕竟只是一个副本任务。</p>
<p>项目地址：<a href="https://github.com/lyapple2008/JigsawPuzzleReconstruction">JigsawPuzzleReconstruction</a></p>
<hr>
<p><strong>最后</strong>，如果你也被游戏广告折磨得不行，不妨试试这个思路——<strong>有时候最好的外挂，是用自己的技术绕过广告</strong>（而不是去黑游戏服务器）。</p>
<p>祝大家拼图愉快，<strong>永不看广告</strong>！</p>
]]></content:encoded>
    </item>
    <item>
      <title>基于iOS系统接口实现双语字幕App</title>
      <link>https://lyapple2008.github.io/posts/202603/2026-03-09-%E5%9F%BA%E4%BA%8Eios%E7%B3%BB%E7%BB%9F%E6%8E%A5%E5%8F%A3%E5%AE%9E%E7%8E%B0%E5%8F%8C%E8%AF%AD%E5%AD%97%E5%B9%95app/</link>
      <pubDate>Mon, 09 Mar 2026 18:24:47 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202603/2026-03-09-%E5%9F%BA%E4%BA%8Eios%E7%B3%BB%E7%BB%9F%E6%8E%A5%E5%8F%A3%E5%AE%9E%E7%8E%B0%E5%8F%8C%E8%AF%AD%E5%AD%97%E5%B9%95app/</guid>
      <description>&lt;p&gt;之前预告的 iOS 系统双语字幕 App，终于完成了。废话不多说，直接展示效果。目前的实现是通过 iOS 系统接口进行的，作为一个 baseline，后面也可以接入第三方开源方案。文末有项目链接，各位道友自取。&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>之前预告的 iOS 系统双语字幕 App，终于完成了。废话不多说，直接展示效果。目前的实现是通过 iOS 系统接口进行的，作为一个 baseline，后面也可以接入第三方开源方案。文末有项目链接，各位道友自取。</p>
<h2 id="模块流程">模块流程</h2>

<div class="mermaid">graph LR
    A[捕获系统播放音频] --&gt; B[语音识别ASR]
    B --&gt; C[语言翻译]
    B --&gt; D[双语字幕]
    C --&gt; D</div>
<h2 id="效果展示">效果展示</h2>
<iframe width="100%" max-width="400" height="600" src="//player.bilibili.com/player.html?bvid=BV1irXGBtEnU&autoplay=0" frameborder="0" allowfullscreen></iframe>
<h2 id="一些感想">一些感想</h2>
<p>这也是第一个完全 AI Coding 的项目，有了 AI Coding 后有了一种自己无所不能的错觉。作为一个大龄程序员，已经放弃抵抗了，打不过咱就加入。</p>
<h3 id="ai-是放大器">AI 是放大器</h3>
<p>在做这个 Project 的过程中，对我来说最大的困难是 UI 部分。<a href="/posts/202601/2026-02-01-%E5%85%B3%E4%BA%8Eaicoding%E7%9A%84%E4%B8%80%E4%BA%9B%E6%84%9F%E6%83%B3%E5%92%8C%E5%90%8E%E7%BB%AD%E7%9A%84%E8%A7%84%E5%88%92/">关于 AICoding 的一些感想和后续的规划</a> 中提到的，AI 的能力约等于使用者自身的能力。最近听到&quot;AI 是放大器&quot;这个说法，这个比喻可真的是贴切形象了。假如 AI 可以放大 10 倍，如果应用在你熟悉的领域是 10 分，放大后就是 100 分；如果应用在你不熟悉的领域，比如 iOS 客户端开发，对我来说约等于 0 分，放大后也是约等于 0 分。那这个时候就看天吃饭了，AI 给什么就吃什么，就算有问题我也无法判断，更别说是纠正 AI 了。所以网上一堆零基础开发一个 App 上线并产生收入的，真的可能吗？难道我的使用姿势不对？</p>
<h3 id="狠狠地用起来">狠狠地用起来</h3>
<p>虽然对于自己不熟悉的领域，使用 AI 产生可用代码的偶然性挺大的，不过作为 AI 协作者，AI 在学习人类的代码，人类也可以学习 AI 的代码。在这个项目中，对于自己不熟悉的客户端 UI 部分的代码，我就会让 AI 给我讲解一遍，从语法到架构、为什么这么写。在提问 → 学习 → 再提问 → 再学习的不断迭代过程中，也在慢慢提高 AI 输出自己不熟悉代码的可控性和判断力。</p>
<p>现在是将想法落地最好的时代。以前你可能会卡在某个问题因为找不到有效的解答而不了了之，或者因为某个技术不懂而不能落地，但是现在你随时可以通过 AI 大模型得到你想要的答案，协助你实现自己的 idea。要像大神 Karpathy 一样，因为 token 没用完而感到焦虑，不停去跟 AI 交流、提问、讨论，落地想法。去尝试将 AI 嵌入到自己的工程流程中，这个我也还没找到方向，但是我觉得这个方向是没错的。狠狠地、使劲用起来吧。</p>
<h2 id="项目链接">项目链接</h2>
<p><a href="https://github.com/lyapple2008/DoubleSubtitleUseSystemAPI">DoubleSubtitleUseSystemAPI</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>ASR任务初体验</title>
      <link>https://lyapple2008.github.io/posts/202602/2026-02-14-asr%E4%BB%BB%E5%8A%A1%E5%88%9D%E4%BD%93%E9%AA%8C/</link>
      <pubDate>Sat, 14 Feb 2026 17:59:07 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202602/2026-02-14-asr%E4%BB%BB%E5%8A%A1%E5%88%9D%E4%BD%93%E9%AA%8C/</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;这篇继续是开发日志，正在开发一款iOS端的实时双语字幕APP，由于需要用到语音识别，了解了下语音识别任务的现状和主流方案，为后面方案选择做准备。&lt;/p&gt;</description>
      <content:encoded><![CDATA[<blockquote>
<p>这篇继续是开发日志，正在开发一款iOS端的实时双语字幕APP，由于需要用到语音识别，了解了下语音识别任务的现状和主流方案，为后面方案选择做准备。</p></blockquote>
<h3 id="模块流程">模块流程</h3>

<div class="mermaid">graph LR
    A[捕获系统播放音频] --&gt; B[语音识别ASR]
    B --&gt; C[语言翻译]
    B --&gt; D[双语字幕]
    C --&gt; D</div>
<h1 id="什么是asr任务">什么是ASR任务</h1>
<p>ASR（Automatic Speech Recognition，自动语音识别）任务，指的是将连续的音频信号转换为对应的文本序列的过程。这是一种典型的序列到序列（Sequence-to-Sequence）的转换任务：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">audio (连续信号)  →  text (离散 token)</span></span></code></pre></td></tr></table>
</div>
</div>
<p>从信号处理的角度来看，传统ASR需要解决以下核心问题：</p>
<ol>
<li><strong>声学建模</strong>：将音频特征映射到音素或字符</li>
<li><strong>语言建模</strong>：捕捉词汇之间的概率关系</li>
<li><strong>对齐问题</strong>：音频帧与输出token之间的对齐</li>
</ol>
<p>不过，在现代端到端深度学习方案中，这三个问题都被统一在一个神经网络中解决，不再需要独立的声学模型和语言模型。模型通过端到端训练自动学习如何从音频特征直接映射到文本输出。</p>
<h1 id="与降噪任务的区别">与降噪任务的区别</h1>
<p>在开发iOS双语字幕的过程中，我之前可能接触过降噪任务，这两者有本质区别：</p>
<table>
  <thead>
      <tr>
          <th>维度</th>
          <th>降噪任务</th>
          <th>ASR任务</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>输入输出</strong></td>
          <td>音频 → 音频</td>
          <td>音频 → 文本</td>
      </tr>
      <tr>
          <td><strong>任务类型</strong></td>
          <td>信号回归</td>
          <td>序列到序列</td>
      </tr>
      <tr>
          <td><strong>评估指标</strong></td>
          <td>SNR、PESQ</td>
          <td>WER、CER</td>
      </tr>
      <tr>
          <td><strong>难点</strong></td>
          <td>保留语音质量</td>
          <td>识别准确性</td>
      </tr>
      <tr>
          <td><strong>模型结构</strong></td>
          <td>Encoder</td>
          <td>Encoder-Decoder</td>
      </tr>
  </tbody>
</table>
<p>简单来说：</p>
<ul>
<li><strong>降噪任务</strong>：输入一段有噪声的音频，输出干净的音频（同类转换）</li>
<li><strong>ASR任务</strong>：输入音频，输出文字（跨模态转换）</li>
</ul>
<p>ASR的难点在于它需要&quot;理解&quot;音频内容并转换为语义符号，而不是简单地处理信号波形。同时，ASR任务的输入和输出也不像降噪任务那样是一一对应的关系，还需要处理对齐问题。</p>
<h1 id="目前主流的实现方案">目前主流的实现方案</h1>
<p>主流的ASR实现方案主要有三种：CTC、RNN-T和基于Attention的Seq2Seq。</p>
<h2 id="ctc-connectionist-temporal-classification">CTC (Connectionist Temporal Classification)</h2>
<p>CTC是一种经典的对齐方法，核心思想是<strong>不需要显式的对齐标签</strong>，而是通过&quot;blank&quot;机制自动学习对齐。</p>
<p>CTC引入了空白符（blank）和折叠机制：</p>
<ul>
<li>重复的字符会被折叠（如 &ldquo;aaabbb&rdquo; → &ldquo;ab&rdquo;）</li>
<li>blank符号不产生任何输出</li>
<li>通过动态规划计算所有可能路径的概率</li>
</ul>
<p><a href="https://distill.pub/2017/ctc/">Sequence Modeling With CTC</a> 详细原理可参考这篇文章。</p>
<p>训练阶段就是使目标token序列路径的概率最大，而推理阶段就是搜索概率最大的token路径并输出，这里通常有两种方法：</p>
<table>
  <thead>
      <tr>
          <th>方法</th>
          <th>描述</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Greedy Search</strong></td>
          <td>每一步都选择概率最大的token输出</td>
      </tr>
      <tr>
          <td><strong>Beam Search</strong></td>
          <td>每一步保留top-N概率的token结果，输出token序列路径概率最大的结果</td>
      </tr>
  </tbody>
</table>
<h2 id="rnn-t-recurrent-neural-network-transducer">RNN-T (Recurrent Neural Network Transducer)</h2>
<p>RNN-T是CTC的扩展，引入了一个额外的预测网络（Prediction Network）来建模输出token之间的依赖关系。</p>
<p>RNN-T由三部分组成：</p>
<ol>
<li><strong>编码器（Encoder）</strong>：将音频特征转换为声学表征</li>
<li><strong>预测网络（Prediction Network）</strong>：基于已输出的token预测下一个token</li>
<li><strong>联合网络（Joint Network）</strong>：结合Encoder和Prediction的输出，预测下一个token</li>
</ol>
<p>RNN-T在输出时，不仅会输入当前帧音频数据，还会输入历史输出作为参考</p>
<h2 id="seq2seq-with-attention">Seq2Seq with Attention</h2>
<p>基于Attention机制的序列到序列模型是目前最流行的方案，被广泛用于Whisper、Paraformer等现代ASR系统。</p>
<h3 id="原理">原理</h3>
<p>经典结构：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">Encoder → Attention → Decoder → Output</span></span></code></pre></td></tr></table>
</div>
</div>
<ul>
<li><strong>Encoder</strong>：将音频特征编码为高维表征（通常使用Transformer或Conformer）</li>
<li><strong>Attention</strong>：让Decoder在生成每个token时&quot;关注&quot;输入的不同部分</li>
<li><strong>Decoder</strong>：自回归生成输出文本</li>
</ul>
<h3 id="优缺点">优缺点</h3>
<p><strong>优点</strong>：</p>
<ul>
<li>可以建模任意长度序列的依赖关系</li>
<li>识别准确率高</li>
<li>易于添加语言模型集成</li>
<li>适合大规模预训练</li>
</ul>
<p><strong>缺点</strong>：</p>
<ul>
<li>推理延迟较高（需要完整音频或较大chunk）</li>
<li>流式识别实现复杂</li>
<li>计算资源需求大</li>
</ul>
<h1 id="流式模型和非流式模型">流式模型和非流式模型</h1>
<p>流式（Streaming）和非流式（Offline）是ASR系统根据实时性要求的两种设计模式。</p>
<h2 id="什么是非流式模型">什么是非流式模型</h2>
<p>非流式模型（Offline ASR）需要<strong>等待完整音频输入后才能开始识别</strong>。</p>
<p>特点：</p>
<ul>
<li>输入：完整的音频文件或长音频段</li>
<li>延迟：较高，需要等待音频结束</li>
<li>准确率：通常较高，因为可以看到完整的上下文</li>
<li>适用场景：视频字幕生成、会议转录、录音文件处理</li>
</ul>
<h2 id="什么是流式模型">什么是流式模型</h2>
<p>流式模型（Streaming ASR）可以<strong>边接收音频输入边输出识别结果</strong>，实现实时识别。</p>
<p>特点：</p>
<ul>
<li>输入：连续的音频流（通常是短片段，如30ms一帧）</li>
<li>延迟：低，可以做到几百毫秒内输出</li>
<li>准确率：通常略低于非流式，因为只有历史和部分未来上下文</li>
<li>适用场景：实时语音对话、语音助手、直播字幕</li>
</ul>
<h2 id="技术实现差异">技术实现差异</h2>
<table>
  <thead>
      <tr>
          <th>维度</th>
          <th>流式模型</th>
          <th>非流式模型</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>上下文</strong></td>
          <td>有限上下文（lookahead）</td>
          <td>完整上下文</td>
      </tr>
      <tr>
          <td><strong>延迟</strong></td>
          <td>低（&lt;500ms）</td>
          <td>高</td>
      </tr>
      <tr>
          <td><strong>准确率</strong></td>
          <td>略低</td>
          <td>较高</td>
      </tr>
      <tr>
          <td><strong>模型复杂度</strong></td>
          <td>较高（需处理分段）</td>
          <td>较低</td>
      </tr>
      <tr>
          <td><strong>内存占用</strong></td>
          <td>较小</td>
          <td>较大</td>
      </tr>
  </tbody>
</table>
<p>流式模型通常需要特殊设计，如：</p>
<ul>
<li><strong>Chunked Attention</strong>：将音频分成小块处理</li>
<li><strong>CTC Prefix</strong>：使用CTC的前缀解码</li>
<li><strong>Lookahead</strong>：只考虑有限的未来帧</li>
</ul>
<hr>
<h1 id="自回归解码和非自回归解码">自回归解码和非自回归解码</h1>
<p>解码方式决定了模型如何生成输出文本。</p>
<h2 id="自回归解码-autoregressive-decoding">自回归解码 (Autoregressive Decoding)</h2>
<p>自回归解码是目前最主流的方式，特点是<strong>逐 token 生成，每个 token 的生成依赖之前所有生成的 token</strong>。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">输出: &#34;hello world&#34;
</span></span><span class="line"><span class="cl">生成过程:
</span></span><span class="line"><span class="cl">  1. 生成 &#34;h&#34;
</span></span><span class="line"><span class="cl">  2. 基于 &#34;h&#34; 生成 &#34;he&#34;
</span></span><span class="line"><span class="cl">  3. 基于 &#34;he&#34; 生成 &#34;hel&#34;
</span></span><span class="line"><span class="cl">  4. ...</span></span></code></pre></td></tr></table>
</div>
</div>
<p>特点：</p>
<ul>
<li><strong>优点</strong>：生成质量高，可以建模长期依赖</li>
<li><strong>缺点</strong>：串行生成，推理速度慢（O(n) 复杂度，n为输出长度）</li>
<li>典型模型：Transformer Decoder、RNN-T</li>
</ul>
<h2 id="非自回归解码-non-autoregressive-decoding">非自回归解码 (Non-autoregressive Decoding)</h2>
<p>非自回归解码是一种<strong>并行生成</strong>方式，一次性输出整个序列。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">输出: &#34;hello world&#34;
</span></span><span class="line"><span class="cl">生成过程:
</span></span><span class="line"><span class="cl">  1. 直接输出完整句子 &#34;hello world&#34;</span></span></code></pre></td></tr></table>
</div>
</div>
<p>特点：</p>
<ul>
<li><strong>优点</strong>：并行生成，推理速度快（O(1) 复杂度）</li>
<li><strong>缺点</strong>：难以建模输出token之间的依赖，生成质量可能较低</li>
<li>典型实现：CTC、FastCorrect、Mask-Predict</li>
</ul>
<h2 id="对比">对比</h2>
<table>
  <thead>
      <tr>
          <th>维度</th>
          <th>自回归解码</th>
          <th>非自回归解码</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>生成方式</strong></td>
          <td>串行，逐token</td>
          <td>并行，一次性输出</td>
      </tr>
      <tr>
          <td><strong>推理速度</strong></td>
          <td>慢</td>
          <td>快</td>
      </tr>
      <tr>
          <td><strong>生成质量</strong></td>
          <td>高</td>
          <td>略低</td>
      </tr>
      <tr>
          <td><strong>依赖关系</strong></td>
          <td>建模token间依赖</td>
          <td>假设条件独立</td>
      </tr>
      <tr>
          <td><strong>典型模型</strong></td>
          <td>RNN-T、Seq2Seq</td>
          <td>CTC、FastConformer</td>
      </tr>
  </tbody>
</table>
<h1 id="方案选择">方案选择</h1>
<p>OK，前面了解了这么多，都是为了后续实现最小原型产品，选择语音识别方案做准备，目标不是要训练一个SOTA模型，因此这里暂时只是粗浅的了解。</p>
<p>根据需求分析：</p>
<ul>
<li><strong>目标设备</strong>：iOS移动端（资源有限）</li>
<li><strong>语言支持</strong>：多语言</li>
<li><strong>实时性</strong>：实时输入音频，实时输出文字</li>
</ul>
<h2 id="选择优先级">选择优先级</h2>
<ol>
<li><strong>首先能跑</strong>：模型大小和计算量必须在移动端可承受范围内</li>
<li><strong>然后多语言</strong>：需要支持多种语言识别</li>
<li><strong>最后准确率</strong>：在功能可用后再优化性能</li>
</ol>
<h2 id="优先级一模型能在移动端跑起来">优先级一：模型能在移动端跑起来</h2>
<h3 id="参数量选择">参数量选择</h3>
<p>移动端资源有限，模型参数量直接决定了能否运行。这里暂时没有明确的数值界限，实际运行起来再看，后面也可以用量化技术缩小模型体积。</p>
<p><strong>量化技术</strong>：INT8量化可将模型体积缩小约4倍，准确率损失通常&lt;5%；INT4可进一步缩小到1/8，但准确率下降更明显。</p>
<h3 id="架构选择">架构选择</h3>
<table>
  <thead>
      <tr>
          <th>架构</th>
          <th>计算量</th>
          <th>内存占用</th>
          <th>移动端适用性</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Transformer Encoder</strong></td>
          <td>中</td>
          <td>中</td>
          <td>★★★★☆</td>
      </tr>
      <tr>
          <td><strong>Conformer</strong></td>
          <td>中高</td>
          <td>中</td>
          <td>★★★★☆</td>
      </tr>
      <tr>
          <td><strong>RNN/LSTM</strong></td>
          <td>低</td>
          <td>低</td>
          <td>★★★☆☆</td>
      </tr>
  </tbody>
</table>
<p><strong>建议</strong>：选择Encoder-only或轻量级的Conformer结构，参数量控制在50M以内。</p>
<h2 id="优先级二支持多语言">优先级二：支持多语言</h2>
<p>确认模型能在移动端运行后，需要考虑多语言支持能力。</p>
<h3 id="不同模型架构的多语言能力">不同模型架构的多语言能力</h3>
<table>
  <thead>
      <tr>
          <th>模型架构</th>
          <th>多语言支持</th>
          <th>说明</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>端到端Seq2Seq</strong></td>
          <td>★★★★★</td>
          <td>训练时使用多语言数据，自然支持多语言</td>
      </tr>
      <tr>
          <td><strong>RNN-T</strong></td>
          <td>★★★☆☆</td>
          <td>可支持，但需要针对性训练多语言版本</td>
      </tr>
      <tr>
          <td><strong>CTC</strong></td>
          <td>★★☆☆☆</td>
          <td>通常针对单一语言，多语言版本较少</td>
      </tr>
  </tbody>
</table>
<h3 id="结论">结论</h3>
<p>需要支持多语言时，<strong>优先选择端到端Seq2Seq架构</strong>，这类模型的预训练版本通常已支持数十到上百种语言。</p>
<h2 id="优先级三实时性要求">优先级三：实时性要求</h2>
<p>实时字幕要求模型能够在接收音频的同时输出文字。</p>
<h3 id="流式-vs-非流式">流式 vs 非流式</h3>
<table>
  <thead>
      <tr>
          <th>类型</th>
          <th>延迟</th>
          <th>实现难度</th>
          <th>实时字幕适用性</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>流式模型</strong></td>
          <td>&lt;500ms</td>
          <td>高</td>
          <td>★★★★★</td>
      </tr>
      <tr>
          <td><strong>非流式模型</strong></td>
          <td>&gt;1s</td>
          <td>低</td>
          <td>★★☆☆☆</td>
      </tr>
  </tbody>
</table>
<p><strong>折中方案</strong>：使用非流式模型时，可通过&quot;分块处理&quot;策略模拟流式效果：</p>
<ul>
<li>将音频切分为固定长度的chunk（如1秒）</li>
<li>逐块识别并拼接结果</li>
<li>通过缓存历史上下文减少误差</li>
</ul>
<h3 id="解码方式">解码方式</h3>
<table>
  <thead>
      <tr>
          <th>解码方式</th>
          <th>延迟</th>
          <th>实现复杂度</th>
          <th>实时性</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>非自回归（Greedy）</strong></td>
          <td>低</td>
          <td>简单</td>
          <td>★★★★★</td>
      </tr>
      <tr>
          <td><strong>非自回归（Beam Search）</strong></td>
          <td>中</td>
          <td>中</td>
          <td>★★★★☆</td>
      </tr>
      <tr>
          <td><strong>自回归</strong></td>
          <td>高</td>
          <td>复杂</td>
          <td>★★☆☆☆</td>
      </tr>
  </tbody>
</table>
<p><strong>建议</strong>：实时场景优先选择<strong>非自回归解码</strong>（Greedy），延迟最低。</p>
<h2 id="总结">总结</h2>
<p><strong>核心理念</strong>：先完成端到端流程验证，再根据实际体验进行针对性优化。移动端ASR是一个迭代过程，不必追求一步到位。</p>
<p>后续我会记录在iOS端的具体实现过程，各位道友记得点赞追番哦。</p>
<p><img alt="各位道友记得一键三连" loading="lazy" src="/images/%E4%B8%80%E9%94%AE%E4%B8%89%E8%BF%9E.jpg"></p>
]]></content:encoded>
    </item>
    <item>
      <title>关于AICoding的一些感想和后续的规划</title>
      <link>https://lyapple2008.github.io/posts/202601/2026-02-01-%E5%85%B3%E4%BA%8Eaicoding%E7%9A%84%E4%B8%80%E4%BA%9B%E6%84%9F%E6%83%B3%E5%92%8C%E5%90%8E%E7%BB%AD%E7%9A%84%E8%A7%84%E5%88%92/</link>
      <pubDate>Sun, 01 Feb 2026 09:28:45 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202601/2026-02-01-%E5%85%B3%E4%BA%8Eaicoding%E7%9A%84%E4%B8%80%E4%BA%9B%E6%84%9F%E6%83%B3%E5%92%8C%E5%90%8E%E7%BB%AD%E7%9A%84%E8%A7%84%E5%88%92/</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;最近要说最热最火的话题，那一定是 AI，而其中 AI 应用场景 AI Coding 也是发展迅速，大有替代程序员的势头。最近几个月一直在使用 AI Coding，心态上经历了最初的「完了，失业要提前了」，到「稳了，还能苟几年」。下面就分享一些使用上的感受，以及面对 AI Coding 这股后浪，这个号后面的一些规划。&lt;/p&gt;</description>
      <content:encoded><![CDATA[<blockquote>
<p>最近要说最热最火的话题，那一定是 AI，而其中 AI 应用场景 AI Coding 也是发展迅速，大有替代程序员的势头。最近几个月一直在使用 AI Coding，心态上经历了最初的「完了，失业要提前了」，到「稳了，还能苟几年」。下面就分享一些使用上的感受，以及面对 AI Coding 这股后浪，这个号后面的一些规划。</p></blockquote>
<p><img alt="程序员使用 AI 编码提升效率" loading="lazy" src="/images/2026-02-01/%E7%A8%8B%E5%BA%8F%E5%91%98%E4%BD%BF%E7%94%A8AI%E7%BC%96%E7%A0%81%E6%8F%90%E5%8D%87%E6%95%88%E7%8E%87.jpg"></p>
<h3 id="1-ai-coding-非常有帮助收费的优于免费的">1. AI Coding 非常有帮助，收费的优于免费的</h3>
<p>这几年 AI 发展迅速，早已不是当年的「人工智障」了，确实是可以作为生产力工具帮忙提升效率的。对于 AI Coding 来说，到底能提升多少效率，取决于多方面因素：比如 Coding 在日常开发中占的比重、使用 AI 的熟练程度、AI Coding 工具本身的性能等等。但不管怎么说，AI Coding 对于程序员来说都是有正向作用的，是值得花时间去用起来的。</p>
<p>我自己用得比较多的是 Cursor 跟 Claude Code。Cursor 是公司给配置的，Claude Code 是我自己订阅了 MiniMax 的 Coding Plan 配置的。在此之前也用过国内一些免费的 AI Coding 工具（比如 Trae 和 Qcoder），短暂的使用给我的感受就是没有付费的好用，一个问题反复折腾好几轮都没有跑通。</p>
<p>当然这些纯粹是个人拍脑袋的感受，使用时间不长，也没有很深入地去使用。网上也有一些使用 Trae 和 Qcoder 做出可用应用的案例。重要的是 AI Coding 确实能够提升开发效率的，大家都应该用起来。</p>
<h3 id="2-ai-coding-的能力约等于使用者自身">2. AI Coding 的能力约等于使用者自身</h3>
<p><img alt="AI Coding 能力边界" loading="lazy" src="/images/2026-02-01/1770474310208.jpg"></p>
<p>AI Coding 在目前这个阶段还是一个被动的工具，需要使用者去给它进行规划和任务指引，因此使用者的能力边界就约等于使用 AI Coding 后的能力边界。</p>
<p>程序开发是一个复杂的系统工程，每个需求拆分任务后就有很多步骤。就算 AI Coding 每一步的成功率为 95%，8 步之后最终能成功的概率也只剩下 66%。另外，AI Coding 获取的需求信息主要来源于使用者，使用者在与 AI 沟通描述时，这里又得损耗一些性能，导致成功率下降。不过好消息是，使用者可以在与 AI 互动过程中不断扩展自己的能力边界，把原来了解的领域变成熟悉领域，所以理论上 AI Coding 是没有边界的。</p>
<p>这里给我几点启发：</p>
<ol>
<li>生产环境只在自己熟悉的范围内使用，最多只触及到了解的区域，再往外扩展能得到什么成果就只能听天由命了，试验性的尝试性的项目不受这个限制。</li>
<li>给 AI 描述需求时，尽可能详细，就像你自己开发一样，想好整个代码的架构和目标，并记录成文档。文档不仅可以给 AI 描述需求，也可以随时给 AI 找回上下文，弥补 AI Coding 上下文有限的问题。</li>
<li>大龄程序员凭借多年积累的经验，辅以AI Coding弥补”编程体力不足“，又可以焕发第二春了。</li>
</ol>
<h3 id="3-ai-coding-还只是辅助编程使用者需要为其质量兜底">3. AI Coding 还只是辅助编程，使用者需要为其质量兜底</h3>
<p><img alt="AI 与程序员" loading="lazy" src="/images/2026-02-01/%E7%A8%8B%E5%BA%8F%E5%91%98%E8%A2%ABAI%E9%A9%BE%E9%A9%B6%E6%B1%BD%E8%BD%A6%E6%8B%A6%E4%BD%8F%E7%BD%9A%E6%AC%BE.png"></p>
<p>类似于汽车自动驾驶，AI Coding 现在也还只是 L2 水平，辅助编程阶段，出了事故还是得使用者来负责。所以对于生产环境使用 AI Coding 时，要对 AI Coding 生成的代码进行严格的 Review。</p>
<p>不过最近也看到另一种解法：随着 AI Coding 生成的代码量级快速上升，传统逐行代码 Review 的方式已经不太现实，而且效率不高。所以也有不 Review 只测试的方式来验收 AI Coding 生成的代码，在修改和生成代码的同时，让AI也生成对应的单元测试，人类负责把关单元测试的通过情况，可能也许也是一种可行的AI Coding协同方式。</p>
<h2 id="后续的规划">后续的规划</h2>
<p>随着 AI 的普及和渗透，获取知识、甚至获取技能都越来越轻而易举。坏的方面是，个人的知识和技能越来越贬值；乐观的方面是，个人可以利用的知识和技能也是越来越多，也越来越轻松。</p>
<p><img alt="Peter Steinberger github主页" loading="lazy" src="/images/2026-02-01/20260208-132545.jpg"></p>
<p>最近 OpenClaw 非常火，<a href="https://github.com/steipete">Peter Steinberger</a> 是其核心开发者。从其 GitHub 主页可以看到，在 Claude 爆火之前，他一直在做迭代各种各样的工具或者产品（具有完整定义功能的程序就算产品，不一定是像 WeChat/WhatsApp/TikTok 这样才算），OpenClaw的爆火不过是其多年程序迭代的一次厚积薄发。后面希望可以学习他这种方式，在做中学，在学中做，这也许是 AI 时代最快的进步方式。</p>
<p>后面将不再进行纯粹的单点知识分享，希望可以从打造一个真实可用的产品角度去分享打造的过程。这个产品只需要能提供完整可用的功能就行。</p>
<p>在这个知识和技能获取越来越便捷的时代，迭代速度也许才是制胜法宝。</p>
<hr>
<p>OK，以上就是一些个人浅显的、混乱的感想和 2026 年的 Flag，剩下的就交给时间和执行了，各位道友记得点赞追番一起加油吧！</p>
<p><img alt="请一键三连" loading="lazy" src="/images/2026-02-01/1770529335422.jpg"></p>
]]></content:encoded>
    </item>
    <item>
      <title>iOS音频捕获</title>
      <link>https://lyapple2008.github.io/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/</link>
      <pubDate>Sun, 25 Jan 2026 07:52:25 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;这篇是iOS双语字幕软件的开发日志，目标是在iOS端实现，在观看视频时，实时对播放的内容进行识别和翻译，显示双语字幕，用于打破外语视频内容观看门槛。&lt;/p&gt;</description>
      <content:encoded><![CDATA[<blockquote>
<p>这篇是iOS双语字幕软件的开发日志，目标是在iOS端实现，在观看视频时，实时对播放的内容进行识别和翻译，显示双语字幕，用于打破外语视频内容观看门槛。</p></blockquote>
<h3 id="模块流程">模块流程</h3>

<div class="mermaid">graph LR
    A[捕获系统播放音频] --&gt; B[语音识别ASR]
    B --&gt; C[语言翻译]
    B --&gt; D[双语字幕]
    C --&gt; D</div>
<h1 id="ios音频捕获与数据共享">iOS音频捕获与数据共享</h1>
<p>本文介绍iOS系统音频捕获的实现方案，使用Broadcast Upload Extension捕获系统播放的音频，并通过App Group与主应用共享数据。</p>
<h2 id="目录">目录</h2>
<ul>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#%e4%b8%80%e7%b3%bb%e7%bb%9f%e9%9f%b3%e9%a2%91%e6%8d%95%e8%8e%b7">一、系统音频捕获</a>
<ul>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#broadcast-upload-extension%e9%85%8d%e7%bd%ae">Broadcast Upload Extension配置</a></li>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#%e5%90%af%e5%8a%a8%e4%b8%8e%e5%85%b3%e9%97%ad">启动与关闭</a></li>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#%e9%9f%b3%e9%a2%91%e6%a0%bc%e5%bc%8f%e8%bd%ac%e6%8d%a2">音频格式转换</a></li>
</ul>
</li>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#%e4%ba%8cextension%e4%b8%8e%e4%b8%bb%e5%ba%94%e7%94%a8%e6%95%b0%e6%8d%ae%e5%85%b1%e4%ba%ab">二、Extension与主应用数据共享</a>
<ul>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#app-group%e9%85%8d%e7%bd%ae">App Group配置</a></li>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#%e6%95%b0%e6%8d%ae%e8%af%bb%e5%86%99%e5%ae%9e%e7%8e%b0">数据读写实现</a></li>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#darwin%e9%80%9a%e7%9f%a5">Darwin通知</a></li>
</ul>
</li>
</ul>
<hr>
<h2 id="一系统音频捕获">一、系统音频捕获</h2>
<p>iOS系统出于安全和隐私考虑，<strong>不允许应用直接捕获系统音频</strong>（如视频播放、音乐等，使用通话模式的APP播放的声音捕获不到）。必须使用Broadcast Upload Extension，通过屏幕录制的形式获取音频数据。</p>
<h3 id="broadcast-upload-extension配置">Broadcast Upload Extension配置</h3>
<p>要让Extension收到ReplayKit的数据，必须同时满足：</p>
<ol>
<li>工程里有Broadcast Upload Extension target</li>
<li>主App用系统UI启动broadcast</li>
<li>Extension的Info.plist / Capabilities / 类继承全部正确</li>
</ol>
<p><strong>创建步骤：</strong></p>
<ol>
<li>
<p>在Xcode中新建Extension类型中选择Broadcast Upload Extension</p>
</li>
<li>
<p>配置Info.plist：</p>
</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-xml" data-lang="xml"><span class="line"><span class="cl"><span class="nt">&lt;key&gt;</span>NSExtension<span class="nt">&lt;/key&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;dict&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;key&gt;</span>NSExtensionPointIdentifier<span class="nt">&lt;/key&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;string&gt;</span>com.apple.broadcast-services-upload<span class="nt">&lt;/string&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;key&gt;</span>NSExtensionPrincipalClass<span class="nt">&lt;/key&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;string&gt;</span>$(PRODUCT_MODULE_NAME).SampleHandler<span class="nt">&lt;/string&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;/dict&gt;</span></span></span></code></pre></td></tr></table>
</div>
</div>
<ol start="3">
<li>主App配置UIBackgroundModes：</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-xml" data-lang="xml"><span class="line"><span class="cl"><span class="nt">&lt;key&gt;</span>UIBackgroundModes<span class="nt">&lt;/key&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;array&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;string&gt;</span>audio<span class="nt">&lt;/string&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;/array&gt;</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>注意</strong>：未配置audio可能导致音频接收不到或屏幕锁定后Extension被暂停。</p>
<h3 id="启动与关闭">启动与关闭</h3>
<p>Broadcast upload extension不能在代码中直接启动，只能由系统UI触发。Extension只能自己调用<code>finishBroadcastWithError</code>关闭，主App只能&quot;间接控制&quot;关闭。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kd">let</span> <span class="nv">picker</span> <span class="p">=</span> <span class="n">RPSystemBroadcastPickerView</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">frame</span><span class="p">:</span> <span class="n">CGRect</span><span class="p">(</span><span class="n">x</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="n">width</span><span class="p">:</span> <span class="mi">44</span><span class="p">,</span> <span class="n">height</span><span class="p">:</span> <span class="mi">44</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">picker</span><span class="p">.</span><span class="n">preferredExtension</span> <span class="p">=</span> <span class="s">&#34;com.xxx.broadcast&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">picker</span><span class="p">.</span><span class="n">showsMicrophoneButton</span> <span class="p">=</span> <span class="kc">true</span>
</span></span><span class="line"><span class="cl"><span class="n">view</span><span class="p">.</span><span class="n">addSubview</span><span class="p">(</span><span class="n">picker</span><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h3 id="音频格式转换">音频格式转换</h3>
<p>语音识别引擎接收的音频格式需要是16kHz单声道音频，因此这里需要先进行格式转换。这里需要注意是，并没有官方文档说ReplayKit回调的数据格式类型是怎样的，因此这里需要兼容各种格式。</p>
<p><strong>格式检测与提取：</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kr">override</span> <span class="kd">func</span> <span class="nf">processSampleBuffer</span><span class="p">(</span><span class="kc">_</span> <span class="n">sampleBuffer</span><span class="p">:</span> <span class="n">CMSampleBuffer</span><span class="p">,</span> <span class="n">with</span> <span class="n">sampleBufferType</span><span class="p">:</span> <span class="n">RPSampleBufferType</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="k">case</span> <span class="p">.</span><span class="n">audioApp</span> <span class="p">=</span> <span class="n">sampleBufferType</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="kd">let</span> <span class="nv">formatDescription</span> <span class="p">=</span> <span class="n">CMSampleBufferGetFormatDescription</span><span class="p">(</span><span class="n">sampleBuffer</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">          <span class="kd">let</span> <span class="nv">streamDesc</span> <span class="p">=</span> <span class="n">CMAudioFormatDescriptionGetStreamBasicDescription</span><span class="p">(</span><span class="n">formatDescription</span><span class="p">)?.</span><span class="n">pointee</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">inputSampleRate</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mSampleRate</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">channelCount</span> <span class="p">=</span> <span class="nb">Int</span><span class="p">(</span><span class="n">streamDesc</span><span class="p">.</span><span class="n">mChannelsPerFrame</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">bitsPerChannel</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mBitsPerChannel</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">formatFlags</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mFormatFlags</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isFloat</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsFloat</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isNonInterleaved</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsNonInterleaved</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isBigEndian</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsBigEndian</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// 提取音频数据...</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>完整的音频处理实现：</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">  1
</span><span class="lnt">  2
</span><span class="lnt">  3
</span><span class="lnt">  4
</span><span class="lnt">  5
</span><span class="lnt">  6
</span><span class="lnt">  7
</span><span class="lnt">  8
</span><span class="lnt">  9
</span><span class="lnt"> 10
</span><span class="lnt"> 11
</span><span class="lnt"> 12
</span><span class="lnt"> 13
</span><span class="lnt"> 14
</span><span class="lnt"> 15
</span><span class="lnt"> 16
</span><span class="lnt"> 17
</span><span class="lnt"> 18
</span><span class="lnt"> 19
</span><span class="lnt"> 20
</span><span class="lnt"> 21
</span><span class="lnt"> 22
</span><span class="lnt"> 23
</span><span class="lnt"> 24
</span><span class="lnt"> 25
</span><span class="lnt"> 26
</span><span class="lnt"> 27
</span><span class="lnt"> 28
</span><span class="lnt"> 29
</span><span class="lnt"> 30
</span><span class="lnt"> 31
</span><span class="lnt"> 32
</span><span class="lnt"> 33
</span><span class="lnt"> 34
</span><span class="lnt"> 35
</span><span class="lnt"> 36
</span><span class="lnt"> 37
</span><span class="lnt"> 38
</span><span class="lnt"> 39
</span><span class="lnt"> 40
</span><span class="lnt"> 41
</span><span class="lnt"> 42
</span><span class="lnt"> 43
</span><span class="lnt"> 44
</span><span class="lnt"> 45
</span><span class="lnt"> 46
</span><span class="lnt"> 47
</span><span class="lnt"> 48
</span><span class="lnt"> 49
</span><span class="lnt"> 50
</span><span class="lnt"> 51
</span><span class="lnt"> 52
</span><span class="lnt"> 53
</span><span class="lnt"> 54
</span><span class="lnt"> 55
</span><span class="lnt"> 56
</span><span class="lnt"> 57
</span><span class="lnt"> 58
</span><span class="lnt"> 59
</span><span class="lnt"> 60
</span><span class="lnt"> 61
</span><span class="lnt"> 62
</span><span class="lnt"> 63
</span><span class="lnt"> 64
</span><span class="lnt"> 65
</span><span class="lnt"> 66
</span><span class="lnt"> 67
</span><span class="lnt"> 68
</span><span class="lnt"> 69
</span><span class="lnt"> 70
</span><span class="lnt"> 71
</span><span class="lnt"> 72
</span><span class="lnt"> 73
</span><span class="lnt"> 74
</span><span class="lnt"> 75
</span><span class="lnt"> 76
</span><span class="lnt"> 77
</span><span class="lnt"> 78
</span><span class="lnt"> 79
</span><span class="lnt"> 80
</span><span class="lnt"> 81
</span><span class="lnt"> 82
</span><span class="lnt"> 83
</span><span class="lnt"> 84
</span><span class="lnt"> 85
</span><span class="lnt"> 86
</span><span class="lnt"> 87
</span><span class="lnt"> 88
</span><span class="lnt"> 89
</span><span class="lnt"> 90
</span><span class="lnt"> 91
</span><span class="lnt"> 92
</span><span class="lnt"> 93
</span><span class="lnt"> 94
</span><span class="lnt"> 95
</span><span class="lnt"> 96
</span><span class="lnt"> 97
</span><span class="lnt"> 98
</span><span class="lnt"> 99
</span><span class="lnt">100
</span><span class="lnt">101
</span><span class="lnt">102
</span><span class="lnt">103
</span><span class="lnt">104
</span><span class="lnt">105
</span><span class="lnt">106
</span><span class="lnt">107
</span><span class="lnt">108
</span><span class="lnt">109
</span><span class="lnt">110
</span><span class="lnt">111
</span><span class="lnt">112
</span><span class="lnt">113
</span><span class="lnt">114
</span><span class="lnt">115
</span><span class="lnt">116
</span><span class="lnt">117
</span><span class="lnt">118
</span><span class="lnt">119
</span><span class="lnt">120
</span><span class="lnt">121
</span><span class="lnt">122
</span><span class="lnt">123
</span><span class="lnt">124
</span><span class="lnt">125
</span><span class="lnt">126
</span><span class="lnt">127
</span><span class="lnt">128
</span><span class="lnt">129
</span><span class="lnt">130
</span><span class="lnt">131
</span><span class="lnt">132
</span><span class="lnt">133
</span><span class="lnt">134
</span><span class="lnt">135
</span><span class="lnt">136
</span><span class="lnt">137
</span><span class="lnt">138
</span><span class="lnt">139
</span><span class="lnt">140
</span><span class="lnt">141
</span><span class="lnt">142
</span><span class="lnt">143
</span><span class="lnt">144
</span><span class="lnt">145
</span><span class="lnt">146
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kd">private</span> <span class="kd">let</span> <span class="nv">targetSampleRate</span><span class="p">:</span> <span class="nb">Double</span> <span class="p">=</span> <span class="mf">16000.0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">/// 处理音频样本buffer</span>
</span></span><span class="line"><span class="cl"><span class="kd">private</span> <span class="kd">func</span> <span class="nf">processAudioBuffer</span><span class="p">(</span><span class="kc">_</span> <span class="n">sampleBuffer</span><span class="p">:</span> <span class="n">CMSampleBuffer</span><span class="p">,</span> <span class="n">source</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="kd">let</span> <span class="nv">formatDescription</span> <span class="p">=</span> <span class="n">CMSampleBufferGetFormatDescription</span><span class="p">(</span><span class="n">sampleBuffer</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">          <span class="kd">let</span> <span class="nv">streamDesc</span> <span class="p">=</span> <span class="n">CMAudioFormatDescriptionGetStreamBasicDescription</span><span class="p">(</span><span class="n">formatDescription</span><span class="p">)?.</span><span class="n">pointee</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">inputSampleRate</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mSampleRate</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">channelCount</span> <span class="p">=</span> <span class="nb">Int</span><span class="p">(</span><span class="n">streamDesc</span><span class="p">.</span><span class="n">mChannelsPerFrame</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">bitsPerChannel</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mBitsPerChannel</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">formatFlags</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mFormatFlags</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isFloat</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsFloat</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isNonInterleaved</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsNonInterleaved</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isBigEndian</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsBigEndian</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// 获取 AudioBufferList</span>
</span></span><span class="line"><span class="cl">    <span class="kd">var</span> <span class="nv">audioBufferList</span> <span class="p">=</span> <span class="n">AudioBufferList</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="kd">var</span> <span class="nv">blockBuffer</span><span class="p">:</span> <span class="n">CMBlockBuffer</span><span class="p">?</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">status</span> <span class="p">=</span> <span class="n">CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">sampleBuffer</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">bufferListSizeNeededOut</span><span class="p">:</span> <span class="kc">nil</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">bufferListOut</span><span class="p">:</span> <span class="p">&amp;</span><span class="n">audioBufferList</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">bufferListSize</span><span class="p">:</span> <span class="n">MemoryLayout</span><span class="p">&lt;</span><span class="n">AudioBufferList</span><span class="p">&gt;.</span><span class="n">size</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">blockBufferAllocator</span><span class="p">:</span> <span class="kc">nil</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">blockBufferMemoryAllocator</span><span class="p">:</span> <span class="kc">nil</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">flags</span><span class="p">:</span> <span class="n">kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">blockBufferOut</span><span class="p">:</span> <span class="p">&amp;</span><span class="n">blockBuffer</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="n">status</span> <span class="p">==</span> <span class="n">noErr</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">audioBufferListPointer</span> <span class="p">=</span> <span class="n">UnsafeMutableAudioBufferListPointer</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="nb">UnsafeMutablePointer</span><span class="p">&lt;</span><span class="n">AudioBufferList</span><span class="p">&gt;.</span><span class="n">allocate</span><span class="p">(</span><span class="n">capacity</span><span class="p">:</span> <span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">defer</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="n">audioBufferListPointer</span><span class="p">.</span><span class="n">unsafeMutablePointer</span><span class="p">.</span><span class="n">deallocate</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">numBuffers</span> <span class="p">=</span> <span class="n">audioBufferListPointer</span><span class="p">.</span><span class="bp">count</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">frameCount</span> <span class="p">=</span> <span class="n">CMSampleBufferGetNumSamples</span><span class="p">(</span><span class="n">sampleBuffer</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="kd">var</span> <span class="nv">floatSamples</span><span class="p">:</span> <span class="p">[</span><span class="nb">Float</span><span class="p">]</span> <span class="p">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// 处理非交错格式</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">isNonInterleaved</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="kd">var</span> <span class="nv">channelData</span><span class="p">:</span> <span class="p">[[</span><span class="nb">Float</span><span class="p">]]</span> <span class="p">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="n">bufferIndex</span> <span class="k">in</span> <span class="mf">0.</span><span class="p">.&lt;</span><span class="n">numBuffers</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="kd">let</span> <span class="nv">buffer</span> <span class="p">=</span> <span class="n">audioBufferListPointer</span><span class="p">[</span><span class="n">bufferIndex</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">            <span class="k">guard</span> <span class="kd">let</span> <span class="nv">data</span> <span class="p">=</span> <span class="n">buffer</span><span class="p">.</span><span class="n">mData</span> <span class="k">else</span> <span class="p">{</span> <span class="k">continue</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">            <span class="kd">let</span> <span class="nv">dataByteSize</span> <span class="p">=</span> <span class="nb">Int</span><span class="p">(</span><span class="n">buffer</span><span class="p">.</span><span class="n">mDataByteSize</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="kd">var</span> <span class="nv">channelSamples</span><span class="p">:</span> <span class="p">[</span><span class="nb">Float</span><span class="p">]</span> <span class="p">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="n">isFloat</span> <span class="o">&amp;&amp;</span> <span class="n">bitsPerChannel</span> <span class="p">==</span> <span class="mi">32</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="kd">let</span> <span class="nv">floatPtr</span> <span class="p">=</span> <span class="n">data</span><span class="p">.</span><span class="n">assumingMemoryBound</span><span class="p">(</span><span class="n">to</span><span class="p">:</span> <span class="nb">Float</span><span class="p">.</span><span class="kc">self</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="kd">let</span> <span class="nv">count</span> <span class="p">=</span> <span class="n">dataByteSize</span> <span class="o">/</span> <span class="n">MemoryLayout</span><span class="p">&lt;</span><span class="nb">Float</span><span class="p">&gt;.</span><span class="n">size</span>
</span></span><span class="line"><span class="cl">                <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mf">0.</span><span class="p">.&lt;</span><span class="bp">count</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="kd">var</span> <span class="nv">value</span> <span class="p">=</span> <span class="n">floatPtr</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">                    <span class="k">if</span> <span class="n">isBigEndian</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                        <span class="n">value</span> <span class="p">=</span> <span class="nb">Float</span><span class="p">(</span><span class="n">bitPattern</span><span class="p">:</span> <span class="n">value</span><span class="p">.</span><span class="n">bitPattern</span><span class="p">.</span><span class="n">bigEndian</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="cl">                    <span class="n">channelSamples</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="n">bitsPerChannel</span> <span class="p">==</span> <span class="mi">16</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="kd">let</span> <span class="nv">int16Ptr</span> <span class="p">=</span> <span class="n">data</span><span class="p">.</span><span class="n">assumingMemoryBound</span><span class="p">(</span><span class="n">to</span><span class="p">:</span> <span class="nb">Int16</span><span class="p">.</span><span class="kc">self</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="kd">let</span> <span class="nv">count</span> <span class="p">=</span> <span class="n">dataByteSize</span> <span class="o">/</span> <span class="n">MemoryLayout</span><span class="p">&lt;</span><span class="nb">Int16</span><span class="p">&gt;.</span><span class="n">size</span>
</span></span><span class="line"><span class="cl">                <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mf">0.</span><span class="p">.&lt;</span><span class="bp">count</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="kd">var</span> <span class="nv">value</span> <span class="p">=</span> <span class="n">int16Ptr</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">                    <span class="k">if</span> <span class="n">isBigEndian</span> <span class="p">{</span> <span class="n">value</span> <span class="p">=</span> <span class="n">value</span><span class="p">.</span><span class="n">bigEndian</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">                    <span class="n">channelSamples</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="nb">Float</span><span class="p">(</span><span class="n">value</span><span class="p">)</span> <span class="o">/</span> <span class="mf">32768.0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">            <span class="n">channelData</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">channelSamples</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="c1">// 混音为单声道</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="kd">let</span> <span class="nv">firstChannel</span> <span class="p">=</span> <span class="n">channelData</span><span class="p">.</span><span class="bp">first</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="n">channelData</span><span class="p">.</span><span class="bp">count</span> <span class="p">==</span> <span class="mi">1</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="n">floatSamples</span> <span class="p">=</span> <span class="n">firstChannel</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mf">0.</span><span class="p">.&lt;</span><span class="n">firstChannel</span><span class="p">.</span><span class="bp">count</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="kd">var</span> <span class="nv">sum</span><span class="p">:</span> <span class="nb">Float</span> <span class="p">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">                    <span class="k">for</span> <span class="n">ch</span> <span class="k">in</span> <span class="n">channelData</span> <span class="k">where</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">ch</span><span class="p">.</span><span class="bp">count</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                        <span class="n">sum</span> <span class="o">+=</span> <span class="n">ch</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="cl">                    <span class="n">floatSamples</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">sum</span> <span class="o">/</span> <span class="nb">Float</span><span class="p">(</span><span class="n">channelData</span><span class="p">.</span><span class="bp">count</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="c1">// 交错格式处理...</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// 重采样到16kHz</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">inputSampleRate</span> <span class="o">!=</span> <span class="n">targetSampleRate</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="n">floatSamples</span> <span class="p">=</span> <span class="n">resample</span><span class="p">(</span><span class="n">floatSamples</span><span class="p">,</span> <span class="n">from</span><span class="p">:</span> <span class="n">inputSampleRate</span><span class="p">,</span> <span class="n">to</span><span class="p">:</span> <span class="n">targetSampleRate</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">sharedBuffer</span><span class="p">.</span><span class="n">writeAudioSamples</span><span class="p">(</span><span class="n">floatSamples</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">/// 使用AVAudioConverter重采样</span>
</span></span><span class="line"><span class="cl"><span class="kd">private</span> <span class="kd">func</span> <span class="nf">resample</span><span class="p">(</span><span class="kc">_</span> <span class="n">samples</span><span class="p">:</span> <span class="p">[</span><span class="nb">Float</span><span class="p">],</span> <span class="n">from</span> <span class="n">inputRate</span><span class="p">:</span> <span class="nb">Double</span><span class="p">,</span> <span class="n">to</span> <span class="n">outputRate</span><span class="p">:</span> <span class="nb">Double</span><span class="p">)</span> <span class="p">-&gt;</span> <span class="p">[</span><span class="nb">Float</span><span class="p">]</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="n">inputRate</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">outputRate</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">,</span> <span class="n">inputRate</span> <span class="o">!=</span> <span class="n">outputRate</span><span class="p">,</span> <span class="o">!</span><span class="n">samples</span><span class="p">.</span><span class="bp">isEmpty</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">samples</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="kd">let</span> <span class="nv">inputFormat</span> <span class="p">=</span> <span class="n">AVAudioFormat</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">commonFormat</span><span class="p">:</span> <span class="p">.</span><span class="n">pcmFormatFloat32</span><span class="p">,</span> <span class="n">sampleRate</span><span class="p">:</span> <span class="n">inputRate</span><span class="p">,</span> <span class="n">channels</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="n">interleaved</span><span class="p">:</span> <span class="kc">false</span>
</span></span><span class="line"><span class="cl">    <span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">outputFormat</span> <span class="p">=</span> <span class="n">AVAudioFormat</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">commonFormat</span><span class="p">:</span> <span class="p">.</span><span class="n">pcmFormatFloat32</span><span class="p">,</span> <span class="n">sampleRate</span><span class="p">:</span> <span class="n">outputRate</span><span class="p">,</span> <span class="n">channels</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="n">interleaved</span><span class="p">:</span> <span class="kc">false</span>
</span></span><span class="line"><span class="cl">    <span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">converter</span> <span class="p">=</span> <span class="n">AVAudioConverter</span><span class="p">(</span><span class="n">from</span><span class="p">:</span> <span class="n">inputFormat</span><span class="p">,</span> <span class="n">to</span><span class="p">:</span> <span class="n">outputFormat</span><span class="p">)</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">samples</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="kd">let</span> <span class="nv">inputBuffer</span> <span class="p">=</span> <span class="n">AVAudioPCMBuffer</span><span class="p">(</span><span class="n">pcmFormat</span><span class="p">:</span> <span class="n">inputFormat</span><span class="p">,</span> <span class="n">frameCapacity</span><span class="p">:</span> <span class="n">AVAudioFrameCount</span><span class="p">(</span><span class="n">samples</span><span class="p">.</span><span class="bp">count</span><span class="p">)),</span>
</span></span><span class="line"><span class="cl">          <span class="kd">let</span> <span class="nv">outputBuffer</span> <span class="p">=</span> <span class="n">AVAudioPCMBuffer</span><span class="p">(</span><span class="n">pcmFormat</span><span class="p">:</span> <span class="n">outputFormat</span><span class="p">)</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">samples</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">inputBuffer</span><span class="p">.</span><span class="n">frameLength</span> <span class="p">=</span> <span class="n">AVAudioFrameCount</span><span class="p">(</span><span class="n">samples</span><span class="p">.</span><span class="bp">count</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">inputData</span> <span class="p">=</span> <span class="n">inputBuffer</span><span class="p">.</span><span class="n">floatChannelData</span><span class="p">!</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mf">0.</span><span class="p">.&lt;</span><span class="n">samples</span><span class="p">.</span><span class="bp">count</span> <span class="p">{</span> <span class="n">inputData</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="n">i</span><span class="p">]</span> <span class="p">=</span> <span class="n">samples</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">ratio</span> <span class="p">=</span> <span class="n">outputRate</span> <span class="o">/</span> <span class="n">inputRate</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">outputFrameCount</span> <span class="p">=</span> <span class="nb">Int</span><span class="p">(</span><span class="n">ceil</span><span class="p">(</span><span class="nb">Double</span><span class="p">(</span><span class="n">samples</span><span class="p">.</span><span class="bp">count</span><span class="p">)</span> <span class="o">*</span> <span class="n">ratio</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="n">outputBuffer</span><span class="p">.</span><span class="n">frameCapacity</span> <span class="p">=</span> <span class="n">AVAudioFrameCount</span><span class="p">(</span><span class="n">outputFrameCount</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">var</span> <span class="nv">error</span><span class="p">:</span> <span class="n">NSError</span><span class="p">?</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">status</span> <span class="p">=</span> <span class="n">converter</span><span class="p">.</span><span class="n">convert</span><span class="p">(</span><span class="n">to</span><span class="p">:</span> <span class="n">outputBuffer</span><span class="p">,</span> <span class="n">error</span><span class="p">:</span> <span class="p">&amp;</span><span class="n">error</span><span class="p">)</span> <span class="p">{</span> <span class="kc">_</span><span class="p">,</span> <span class="n">outStatus</span> <span class="k">in</span>
</span></span><span class="line"><span class="cl">        <span class="n">outStatus</span><span class="p">.</span><span class="n">pointee</span> <span class="p">=</span> <span class="p">.</span><span class="n">haveData</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">inputBuffer</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="n">status</span> <span class="p">==</span> <span class="p">.</span><span class="n">haveData</span><span class="p">,</span> <span class="n">error</span> <span class="p">==</span> <span class="kc">nil</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="n">samples</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">outputData</span> <span class="p">=</span> <span class="n">outputBuffer</span><span class="p">.</span><span class="n">floatChannelData</span><span class="p">!</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">outputLength</span> <span class="p">=</span> <span class="nb">Int</span><span class="p">(</span><span class="n">outputBuffer</span><span class="p">.</span><span class="n">frameLength</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="p">(</span><span class="mf">0.</span><span class="p">.&lt;</span><span class="n">outputLength</span><span class="p">).</span><span class="bp">map</span> <span class="p">{</span> <span class="n">outputData</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="nv">$0</span><span class="p">]</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>实现要点：</strong></p>
<ol>
<li><strong>内存安全</strong>：使用 <code>UnsafeMutableAudioBufferListPointer</code> 和 <code>defer</code> 确保内存正确释放</li>
<li><strong>多格式支持</strong>：支持 Float32、Int16、Int32 格式</li>
<li><strong>字节序处理</strong>：支持大端和小端字节序</li>
<li><strong>非交错格式</strong>：正确处理每个通道独立 buffer 的格式</li>
<li><strong>混音</strong>：立体声自动混音为单声道</li>
<li><strong>高质量重采样</strong>：使用系统 AVAudioConverter</li>
</ol>
<hr>
<h2 id="二extension与主应用数据共享">二、Extension与主应用数据共享</h2>
<p>Broadcast Extension与主应用运行在不同进程，涉及到进程间通信，这里选择使用实现比较简单的App Group共享容器进行数据交换。</p>
<h3 id="app-group配置">App Group配置</h3>
<p><strong>Entitlements配置：</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-xml" data-lang="xml"><span class="line"><span class="cl"><span class="nt">&lt;key&gt;</span>com.apple.security.application-groups<span class="nt">&lt;/key&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;array&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;string&gt;</span>group.com.xxx.shared<span class="nt">&lt;/string&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;/array&gt;</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p>需要在Apple Developer Portal创建App Group，并在XCode中为两个target启用。</p>
<h3 id="数据读写实现">数据读写实现</h3>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span><span class="lnt">31
</span><span class="lnt">32
</span><span class="lnt">33
</span><span class="lnt">34
</span><span class="lnt">35
</span><span class="lnt">36
</span><span class="lnt">37
</span><span class="lnt">38
</span><span class="lnt">39
</span><span class="lnt">40
</span><span class="lnt">41
</span><span class="lnt">42
</span><span class="lnt">43
</span><span class="lnt">44
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kd">class</span> <span class="nc">AudioSharedBuffer</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kd">static</span> <span class="kd">let</span> <span class="nv">appGroupId</span> <span class="p">=</span> <span class="s">&#34;group.com.xxx.shared&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="kd">private</span> <span class="kd">let</span> <span class="nv">sharedContainerURL</span> <span class="p">=</span> <span class="n">FileManager</span><span class="p">.</span><span class="k">default</span><span class="p">.</span><span class="n">containerURL</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">forSecurityApplicationGroupIdentifier</span><span class="p">:</span> <span class="kc">Self</span><span class="p">.</span><span class="n">appGroupId</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// Extension写入处理后的音频</span>
</span></span><span class="line"><span class="cl">    <span class="kd">func</span> <span class="nf">writeAudioSamples</span><span class="p">(</span><span class="kc">_</span> <span class="n">samples</span><span class="p">:</span> <span class="p">[</span><span class="nb">Float</span><span class="p">])</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">guard</span> <span class="kd">let</span> <span class="nv">url</span> <span class="p">=</span> <span class="n">sharedContainerURL</span><span class="p">?.</span><span class="n">appendingPathComponent</span><span class="p">(</span><span class="s">&#34;audio.raw&#34;</span><span class="p">)</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="kd">let</span> <span class="nv">data</span> <span class="p">=</span> <span class="n">samples</span><span class="p">.</span><span class="n">withUnsafeBufferPointer</span> <span class="p">{</span> <span class="n">Data</span><span class="p">(</span><span class="n">buffer</span><span class="p">:</span> <span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">FileManager</span><span class="p">.</span><span class="k">default</span><span class="p">.</span><span class="n">fileExists</span><span class="p">(</span><span class="n">atPath</span><span class="p">:</span> <span class="n">url</span><span class="p">.</span><span class="n">path</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="kd">let</span> <span class="nv">handle</span> <span class="p">=</span> <span class="k">try</span><span class="p">?</span> <span class="n">FileHandle</span><span class="p">(</span><span class="n">forWritingTo</span><span class="p">:</span> <span class="n">url</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">handle</span><span class="p">?.</span><span class="n">seekToEndOfFile</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">            <span class="n">handle</span><span class="p">?.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">handle</span><span class="p">?.</span><span class="n">closeFile</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">try</span><span class="p">?</span> <span class="n">data</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">to</span><span class="p">:</span> <span class="n">url</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="n">postDarwinNotification</span><span class="p">(</span><span class="s">&#34;com.xxx.newAudioData&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// 主应用读取音频</span>
</span></span><span class="line"><span class="cl">    <span class="kd">func</span> <span class="nf">readAudioSamples</span><span class="p">()</span> <span class="p">-&gt;</span> <span class="p">[</span><span class="nb">Float</span><span class="p">]?</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">guard</span> <span class="kd">let</span> <span class="nv">url</span> <span class="p">=</span> <span class="n">sharedContainerURL</span><span class="p">?.</span><span class="n">appendingPathComponent</span><span class="p">(</span><span class="s">&#34;audio.raw&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">              <span class="n">FileManager</span><span class="p">.</span><span class="k">default</span><span class="p">.</span><span class="n">fileExists</span><span class="p">(</span><span class="n">atPath</span><span class="p">:</span> <span class="n">url</span><span class="p">.</span><span class="n">path</span><span class="p">)</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="kc">nil</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="kd">let</span> <span class="nv">data</span> <span class="p">=</span> <span class="k">try</span><span class="p">?</span> <span class="n">Data</span><span class="p">(</span><span class="n">contentsOf</span><span class="p">:</span> <span class="n">url</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">try</span><span class="p">?</span> <span class="n">FileManager</span><span class="p">.</span><span class="k">default</span><span class="p">.</span><span class="n">removeItem</span><span class="p">(</span><span class="n">at</span><span class="p">:</span> <span class="n">url</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="kd">let</span> <span class="nv">floatCount</span> <span class="p">=</span> <span class="n">data</span><span class="p">?.</span><span class="bp">count</span> <span class="p">??</span> <span class="mi">0</span> <span class="o">/</span> <span class="n">MemoryLayout</span><span class="p">&lt;</span><span class="nb">Float</span><span class="p">&gt;.</span><span class="n">size</span>
</span></span><span class="line"><span class="cl">        <span class="kd">var</span> <span class="nv">samples</span> <span class="p">=</span> <span class="p">[</span><span class="nb">Float</span><span class="p">](</span><span class="n">repeating</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="bp">count</span><span class="p">:</span> <span class="n">floatCount</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">data</span><span class="p">?.</span><span class="n">copyBytes</span><span class="p">(</span><span class="n">to</span><span class="p">:</span> <span class="n">samples</span><span class="p">.</span><span class="n">withUnsafeMutableBufferPointer</span> <span class="p">{</span> <span class="nv">$0</span> <span class="p">})</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">samples</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">private</span> <span class="kd">func</span> <span class="nf">postDarwinNotification</span><span class="p">(</span><span class="kc">_</span> <span class="n">name</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="kd">let</span> <span class="nv">center</span> <span class="p">=</span> <span class="n">CFNotificationCenterGetDarwinNotifyCenter</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="n">CFNotificationCenterPostNotification</span><span class="p">(</span><span class="n">center</span><span class="p">,</span> <span class="n">CFNotificationName</span><span class="p">(</span><span class="n">name</span> <span class="k">as</span> <span class="n">CFString</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">            <span class="kc">nil</span><span class="p">,</span> <span class="kc">nil</span><span class="p">,</span> <span class="kc">true</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h3 id="darwin通知">Darwin通知</h3>
<p>Extension写入数据后发送Darwin通知，主应用监听后立即读取：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">startListening</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">CFNotificationCenterAddObserver</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">CFNotificationCenterGetDarwinNotifyCenter</span><span class="p">(),</span>
</span></span><span class="line"><span class="cl">        <span class="nb">Unmanaged</span><span class="p">.</span><span class="n">passUnretained</span><span class="p">(</span><span class="kc">self</span><span class="p">).</span><span class="n">toOpaque</span><span class="p">(),</span>
</span></span><span class="line"><span class="cl">        <span class="p">{</span> <span class="kc">_</span><span class="p">,</span> <span class="n">observer</span><span class="p">,</span> <span class="kc">_</span><span class="p">,</span> <span class="kc">_</span><span class="p">,</span> <span class="kc">_</span> <span class="k">in</span>
</span></span><span class="line"><span class="cl">            <span class="k">guard</span> <span class="kd">let</span> <span class="nv">observer</span> <span class="p">=</span> <span class="n">observer</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="kd">let</span> <span class="nv">selfPtr</span> <span class="p">=</span> <span class="nb">Unmanaged</span><span class="p">&lt;</span><span class="n">YourClass</span><span class="p">&gt;.</span><span class="n">fromOpaque</span><span class="p">(</span><span class="n">observer</span><span class="p">).</span><span class="n">takeUnretainedValue</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="kd">let</span> <span class="nv">samples</span> <span class="p">=</span> <span class="n">selfPtr</span><span class="p">.</span><span class="n">audioBuffer</span><span class="p">.</span><span class="n">readAudioSamples</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="n">selfPtr</span><span class="p">.</span><span class="n">onAudioReceived</span><span class="p">?(</span><span class="n">samples</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="cl">        <span class="s">&#34;com.xxx.newAudioData&#34;</span> <span class="k">as</span> <span class="n">CFString</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="kc">nil</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="p">.</span><span class="n">deliverImmediately</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
<hr>
<h2 id="samplehandler完整示例">SampleHandler完整示例</h2>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kd">class</span> <span class="nc">SampleHandler</span><span class="p">:</span> <span class="n">RPBroadcastSampleHandler</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kd">private</span> <span class="kd">let</span> <span class="nv">sharedBuffer</span> <span class="p">=</span> <span class="n">AudioSharedBuffer</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="kd">private</span> <span class="kd">let</span> <span class="nv">targetSampleRate</span><span class="p">:</span> <span class="nb">Double</span> <span class="p">=</span> <span class="mf">16000.0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kr">override</span> <span class="kd">func</span> <span class="nf">broadcastStarted</span><span class="p">(</span><span class="n">withSetupInfo</span> <span class="n">setupInfo</span><span class="p">:</span> <span class="p">[</span><span class="nb">String</span> <span class="p">:</span> <span class="n">NSObject</span><span class="p">]?)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="n">sharedBuffer</span><span class="p">.</span><span class="n">clearAudioData</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kr">override</span> <span class="kd">func</span> <span class="nf">broadcastPaused</span><span class="p">()</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kr">override</span> <span class="kd">func</span> <span class="nf">broadcastResumed</span><span class="p">()</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kr">override</span> <span class="kd">func</span> <span class="nf">broadcastFinished</span><span class="p">()</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kr">override</span> <span class="kd">func</span> <span class="nf">processSampleBuffer</span><span class="p">(</span><span class="kc">_</span> <span class="n">sampleBuffer</span><span class="p">:</span> <span class="n">CMSampleBuffer</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                                     <span class="n">with</span> <span class="n">sampleBufferType</span><span class="p">:</span> <span class="n">RPSampleBufferType</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">switch</span> <span class="n">sampleBufferType</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">case</span> <span class="p">.</span><span class="n">audioApp</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="c1">// 处理应用音频（系统播放的音频）</span>
</span></span><span class="line"><span class="cl">            <span class="kd">let</span> <span class="nv">samples</span> <span class="p">=</span> <span class="n">convertTo16kMono</span><span class="p">(</span><span class="n">sampleBuffer</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">sharedBuffer</span><span class="p">.</span><span class="n">writeAudioSamples</span><span class="p">(</span><span class="n">samples</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">case</span> <span class="p">.</span><span class="n">audioMic</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="c1">// 忽略麦克风音频</span>
</span></span><span class="line"><span class="cl">            <span class="k">break</span>
</span></span><span class="line"><span class="cl">        <span class="k">case</span> <span class="p">.</span><span class="n">video</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="c1">// 忽略视频</span>
</span></span><span class="line"><span class="cl">            <span class="k">break</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>使用ORT进行语音降噪模型推理</title>
      <link>https://lyapple2008.github.io/posts/202511/2025-11-03-%E4%BD%BF%E7%94%A8ort%E8%BF%9B%E8%A1%8C%E8%AF%AD%E9%9F%B3%E9%99%8D%E5%99%AA%E6%8E%A8%E7%90%86/</link>
      <pubDate>Mon, 03 Nov 2025 19:39:53 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202511/2025-11-03-%E4%BD%BF%E7%94%A8ort%E8%BF%9B%E8%A1%8C%E8%AF%AD%E9%9F%B3%E9%99%8D%E5%99%AA%E6%8E%A8%E7%90%86/</guid>
      <description>&lt;p&gt;在深度学习语音降噪模型的部署过程中，选择合适的推理引擎至关重要。ONNX Runtime（ORT）作为微软开源的跨平台推理引擎，在性能、兼容性和易用性方面表现出色，已成为许多生产环境的首选。本文将介绍为什么选择ORT，ORT的核心概念和使用流程，以及在使用ORT进行语音降噪推理时需要注意的关键事项，特别是针对时序模型（如GRU/LSTM）的隐状态管理。&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>在深度学习语音降噪模型的部署过程中，选择合适的推理引擎至关重要。ONNX Runtime（ORT）作为微软开源的跨平台推理引擎，在性能、兼容性和易用性方面表现出色，已成为许多生产环境的首选。本文将介绍为什么选择ORT，ORT的核心概念和使用流程，以及在使用ORT进行语音降噪推理时需要注意的关键事项，特别是针对时序模型（如GRU/LSTM）的隐状态管理。</p>
<h2 id="一为什么选择ort">一、为什么选择ORT？</h2>
<h3 id="11-跨平台支持">1.1 跨平台支持</h3>
<p>ORT提供了广泛的平台支持，包括：</p>
<ul>
<li><strong>CPU推理</strong>：支持x86、ARM等架构，可在Windows、Linux、macOS、Android、iOS等系统运行</li>
<li><strong>GPU加速</strong>：支持CUDA（NVIDIA GPU）、DirectML（Windows）、TensorRT等</li>
<li><strong>专用硬件</strong>：支持CoreML（Apple Silicon）、OpenVINO（Intel）、QNN（Qualcomm）等</li>
</ul>
<p>这种跨平台特性使得同一套代码可以在不同设备上运行，大大降低了部署成本。</p>
<h3 id="12-性能优化">1.2 性能优化</h3>
<p>ORT在性能方面做了大量优化：</p>
<ul>
<li><strong>图优化</strong>：自动进行算子融合、常量折叠、死代码消除等优化</li>
<li><strong>执行提供者（Execution Provider）</strong>：针对不同硬件提供专门的优化实现</li>
<li><strong>动态形状支持</strong>：支持动态batch size和序列长度，适合实时推理场景</li>
</ul>
<h3 id="13-模型格式标准化">1.3 模型格式标准化</h3>
<p>ORT基于ONNX（Open Neural Network Exchange）格式，这是业界标准的模型交换格式：</p>
<ul>
<li><strong>框架无关</strong>：可以从PyTorch、TensorFlow、Keras等框架导出ONNX模型</li>
<li><strong>版本兼容</strong>：ONNX规范持续演进，ORT保持向后兼容</li>
<li><strong>工具生态</strong>：丰富的模型转换和优化工具</li>
</ul>
<h3 id="14-易于集成">1.4 易于集成</h3>
<p>ORT提供了多种语言绑定：</p>
<ul>
<li><strong>C++ API</strong>：适合高性能场景和嵌入式设备</li>
<li><strong>Python API</strong>：便于快速原型开发和调试</li>
<li><strong>C#、Java、JavaScript</strong>：支持多种应用场景</li>
</ul>
<h3 id="15-活跃的社区支持">1.5 活跃的社区支持</h3>
<p>作为微软开源项目，ORT拥有活跃的社区和持续的更新，bug修复和新功能迭代速度快。</p>
<h2 id="二ort基本概念与推理流程">二、ORT基本概念与推理流程</h2>
<h3 id="21-核心概念">2.1 核心概念</h3>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">   ┌───────────────────────────────┐
</span></span><span class="line"><span class="cl">   │ OrtEnv （运行时环境）         │
</span></span><span class="line"><span class="cl">   │ └─ 管理全局资源、线程池等     │
</span></span><span class="line"><span class="cl">   └──────────────┬────────────────┘
</span></span><span class="line"><span class="cl">                  │
</span></span><span class="line"><span class="cl">   ┌──────────────┴────────────────┐
</span></span><span class="line"><span class="cl">   │ OrtSession （推理会话）        │
</span></span><span class="line"><span class="cl">   │ └─ 持有已加载的 ONNX 模型      │
</span></span><span class="line"><span class="cl">   └──────────────┬────────────────┘
</span></span><span class="line"><span class="cl">                  │
</span></span><span class="line"><span class="cl">   ┌──────────────┴────────────────────────┐
</span></span><span class="line"><span class="cl">   │ OrtRun（一次推理调用）                │
</span></span><span class="line"><span class="cl">   │ ├─ 输入 OrtValue (Tensor 等)           │
</span></span><span class="line"><span class="cl">   │ ├─ 输出 OrtValue                      │
</span></span><span class="line"><span class="cl">   │ └─ 在 Env/Session 的线程池中执行      │
</span></span><span class="line"><span class="cl">   └────────────────────────────────────────┘</span></span></code></pre></td></tr></table>
</div>
</div>
<h4 id="ortenv运行时环境">OrtEnv（运行时环境）</h4>
<p><code>OrtEnv</code>是ORT的全局运行时环境，负责管理线程池、日志等全局资源。通常一个进程只需要创建一个<code>OrtEnv</code>实例：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;onnxruntime_c_api.h&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp"></span>
</span></span><span class="line"><span class="cl"><span class="c1">// 创建运行时环境
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">OrtEnv</span><span class="o">*</span> <span class="n">env</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="n">OrtStatus</span><span class="o">*</span> <span class="n">status</span> <span class="o">=</span> <span class="nf">OrtCreateEnv</span><span class="p">(</span><span class="n">ORT_LOGGING_LEVEL_WARNING</span><span class="p">,</span> <span class="s">&#34;ORT&#34;</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">env</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="c1">// 错误处理
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">msg</span> <span class="o">=</span> <span class="nf">OrtGetErrorMessage</span><span class="p">(</span><span class="n">status</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="nf">OrtReleaseStatus</span><span class="p">(</span><span class="n">status</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h4 id="ortsession推理会话">OrtSession（推理会话）</h4>
<p><code>OrtSession</code>负责加载ONNX模型并执行推理。创建会话需要先创建会话选项：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;onnxruntime_c_api.h&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp"></span>
</span></span><span class="line"><span class="cl"><span class="c1">// 1. 创建会话选项
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">OrtSessionOptions</span><span class="o">*</span> <span class="n">session_options</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtCreateSessionOptions</span><span class="p">(</span><span class="o">&amp;</span><span class="n">session_options</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// 2. 创建推理会话
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">OrtSession</span><span class="o">*</span> <span class="n">session</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">model_path</span> <span class="o">=</span> <span class="s">&#34;denoise_model.onnx&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="n">status</span> <span class="o">=</span> <span class="nf">OrtCreateSession</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">model_path</span><span class="p">,</span> <span class="n">session_options</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">session</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="c1">// 错误处理
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">msg</span> <span class="o">=</span> <span class="nf">OrtGetErrorMessage</span><span class="p">(</span><span class="n">status</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="nf">OrtReleaseStatus</span><span class="p">(</span><span class="n">status</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// 3. 释放资源（使用完毕后）
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtReleaseSessionOptions</span><span class="p">(</span><span class="n">session_options</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtReleaseSession</span><span class="p">(</span><span class="n">session</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtReleaseEnv</span><span class="p">(</span><span class="n">env</span><span class="p">);</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h4 id="execution-provider-ep">Execution Provider (EP)</h4>
<p>执行提供者决定了模型在哪个硬件上运行。在C API中，通过<code>OrtSessionOptionsAppendExecutionProvider</code>添加EP：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="c1">// CPU执行（默认，无需显式添加）
</span></span></span><span class="line"><span class="cl"><span class="c1">// 直接创建会话即可使用CPU
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>
</span></span><span class="line"><span class="cl"><span class="c1">// CUDA执行（需要NVIDIA GPU）
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtSessionOptionsAppendExecutionProvider_CUDA</span><span class="p">(</span><span class="n">session_options</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// TensorRT执行（需要NVIDIA GPU和TensorRT）
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">OrtTensorRTProviderOptions</span> <span class="n">trt_options</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtSessionOptionsAppendExecutionProvider_TensorRT</span><span class="p">(</span><span class="n">session_options</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">trt_options</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// CoreML执行（macOS/iOS）
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtSessionOptionsAppendExecutionProvider_CoreML</span><span class="p">(</span><span class="n">session_options</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// 创建会话（会按顺序尝试EP，失败则回退到下一个）
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtCreateSession</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">model_path</span><span class="p">,</span> <span class="n">session_options</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">session</span><span class="p">);</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h4 id="inputoutput">Input/Output</h4>
<p>模型的输入输出通过<code>OrtValue</code>传递，需要手动创建和管理：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span><span class="lnt">31
</span><span class="lnt">32
</span><span class="lnt">33
</span><span class="lnt">34
</span><span class="lnt">35
</span><span class="lnt">36
</span><span class="lnt">37
</span><span class="lnt">38
</span><span class="lnt">39
</span><span class="lnt">40
</span><span class="lnt">41
</span><span class="lnt">42
</span><span class="lnt">43
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="c1">// 1. 获取输入输出信息
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="kt">size_t</span> <span class="n">num_input_nodes</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="n">OrtStatus</span><span class="o">*</span> <span class="n">status</span> <span class="o">=</span> <span class="nf">OrtSessionGetInputCount</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">num_input_nodes</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">input_name</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="n">OrtTypeInfo</span><span class="o">*</span> <span class="n">input_type_info</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtSessionGetInputName</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">input_name</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtSessionGetInputTypeInfo</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">input_type_info</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// 2. 准备输入数据
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="kt">float</span> <span class="n">input_data</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="cm">/* audio_features数据 */</span> <span class="p">};</span>
</span></span><span class="line"><span class="cl"><span class="kt">int64_t</span> <span class="n">input_shape</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">480</span><span class="p">};</span>  <span class="c1">// batch_size, feature_dim
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="kt">size_t</span> <span class="n">input_tensor_size</span> <span class="o">=</span> <span class="mi">480</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">OrtValue</span><span class="o">*</span> <span class="n">input_tensor</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="n">OrtMemoryInfo</span><span class="o">*</span> <span class="n">memory_info</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtCreateCpuMemoryInfo</span><span class="p">(</span><span class="n">OrtArenaAllocator</span><span class="p">,</span> <span class="n">OrtMemTypeDefault</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">memory_info</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtCreateTensorWithDataAsOrtValue</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">memory_info</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">input_data</span><span class="p">,</span> <span class="n">input_tensor_size</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="n">input_shape</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="o">&amp;</span><span class="n">input_tensor</span>
</span></span><span class="line"><span class="cl"><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// 3. 执行推理
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">input_names</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="n">input_name</span><span class="p">};</span>
</span></span><span class="line"><span class="cl"><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">output_names</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="s">&#34;output&#34;</span><span class="p">};</span>  <span class="c1">// 根据模型实际输出名称
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="n">OrtValue</span><span class="o">*</span> <span class="n">output_tensor</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">status</span> <span class="o">=</span> <span class="nf">OrtRun</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">input_names</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">input_tensor</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">output_names</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">output_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// 4. 获取输出数据
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="kt">float</span><span class="o">*</span> <span class="n">output_data</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtGetTensorMutableData</span><span class="p">(</span><span class="n">output_tensor</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="o">&amp;</span><span class="n">output_data</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="c1">// 使用output_data...
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>
</span></span><span class="line"><span class="cl"><span class="c1">// 5. 释放资源
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtReleaseValue</span><span class="p">(</span><span class="n">output_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtReleaseValue</span><span class="p">(</span><span class="n">input_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtReleaseMemoryInfo</span><span class="p">(</span><span class="n">memory_info</span><span class="p">);</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h3 id="22-性能优化选项">2.2 性能优化选项</h3>
<p>ORT提供了多种性能优化选项，在C API中通过<code>OrtSessionOptions</code>进行配置：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="n">OrtSessionOptions</span><span class="o">*</span> <span class="n">session_options</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="nf">OrtCreateSessionOptions</span><span class="p">(</span><span class="o">&amp;</span><span class="n">session_options</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// 图优化级别
</span></span></span><span class="line"><span class="cl"><span class="c1">// ORT_DISABLE_ALL, ORT_ENABLE_BASIC, ORT_ENABLE_EXTENDED, ORT_ENABLE_ALL
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtSetSessionGraphOptimizationLevel</span><span class="p">(</span><span class="n">session_options</span><span class="p">,</span> <span class="n">ORT_ENABLE_ALL</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// 线程数设置
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtSetIntraOpNumThreads</span><span class="p">(</span><span class="n">session_options</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>  <span class="c1">// 算子内部并行线程数
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtSetInterOpNumThreads</span><span class="p">(</span><span class="n">session_options</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span> <span class="c1">// 算子间并行线程数
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>
</span></span><span class="line"><span class="cl"><span class="c1">// 内存模式
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtEnableMemPattern</span><span class="p">(</span><span class="n">session_options</span><span class="p">);</span>  <span class="c1">// 启用内存模式优化
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtEnableCpuMemArena</span><span class="p">(</span><span class="n">session_options</span><span class="p">);</span> <span class="c1">// 启用CPU内存池
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>
</span></span><span class="line"><span class="cl"><span class="c1">// 执行模式
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtSetSessionExecutionMode</span><span class="p">(</span><span class="n">session_options</span><span class="p">,</span> <span class="n">ORT_SEQUENTIAL</span><span class="p">);</span>  <span class="c1">// 顺序执行
</span></span></span><span class="line"><span class="cl"><span class="c1">// OrtSetSessionExecutionMode(session_options, ORT_PARALLEL);  // 并行执行
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>
</span></span><span class="line"><span class="cl"><span class="c1">// 优化配置文件（可选，用于更精细的控制）
</span></span></span><span class="line"><span class="cl"><span class="c1">// OrtSetOptimizedModelFilePath(session_options, &#34;optimized_model.onnx&#34;);
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>
</span></span><span class="line"><span class="cl"><span class="c1">// 创建会话时应用这些选项
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtCreateSession</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">model_path</span><span class="p">,</span> <span class="n">session_options</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">session</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// 使用完毕后释放
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="nf">OrtReleaseSessionOptions</span><span class="p">(</span><span class="n">session_options</span><span class="p">);</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h2 id="三语音降噪推理的特殊注意事项">三、语音降噪推理的特殊注意事项</h2>
<p>语音降噪模型通常使用时序建模网络（如GRU、LSTM），这些网络具有隐状态（hidden state），在实时推理时需要特别注意状态管理。</p>
<h3 id="31-为什么ort不保存隐状态">3.1 为什么ORT不保存隐状态？</h3>
<p>ORT（ONNX Runtime）采用**无状态（stateless）**的设计理念，即每次推理调用都是独立的，ORT不会在内部保存任何状态信息。这种设计有以下几个重要原因：</p>
<h4 id="311-设计理念无状态推理">3.1.1 设计理念：无状态推理</h4>
<p>ORT的核心设计原则是每次<code>OrtRun</code>调用都是完全独立的，不依赖之前的调用结果。这种设计带来以下优势：</p>
<ol>
<li><strong>线程安全</strong>：多个线程可以同时使用同一个<code>OrtSession</code>进行推理，而不会因为共享状态导致竞争条件</li>
<li><strong>可重现性</strong>：相同的输入总是产生相同的输出，不受历史状态影响</li>
<li><strong>灵活性</strong>：可以灵活控制何时重置状态、何时复用状态，适应不同的应用场景</li>
</ol>
<h4 id="312-状态管理的责任归属">3.1.2 状态管理的责任归属</h4>
<p>在ORT的设计中，<strong>状态管理是应用层的责任</strong>，而不是推理引擎的责任。这样做的好处是：</p>
<ul>
<li><strong>应用层控制</strong>：应用可以根据业务需求决定何时重置状态、如何管理多个流的状态</li>
<li><strong>内存管理</strong>：应用可以精确控制状态的内存分配和释放时机</li>
<li><strong>多实例支持</strong>：同一个模型可以同时处理多个独立的音频流，每个流维护自己的状态</li>
</ul>
<h4 id="313-与训练框架的差异">3.1.3 与训练框架的差异</h4>
<p>在训练框架（如PyTorch、TensorFlow）中，RNN/LSTM层通常会维护隐状态：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># PyTorch训练时的行为</span>
</span></span><span class="line"><span class="cl"><span class="n">lstm</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LSTM</span><span class="p">(</span><span class="n">input_size</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">output</span><span class="p">,</span> <span class="p">(</span><span class="n">hidden</span><span class="p">,</span> <span class="n">cell</span><span class="p">)</span> <span class="o">=</span> <span class="n">lstm</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="p">(</span><span class="n">hidden</span><span class="p">,</span> <span class="n">cell</span><span class="p">))</span>  <span class="c1"># 状态在层内部管理</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p>但在ONNX导出和ORT推理时，隐状态被<strong>显式化</strong>为模型的输入和输出：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="c1">// ONNX模型结构
</span></span></span><span class="line"><span class="cl"><span class="c1">// 输入: [audio_features, hidden_state, cell_state]  // 显式输入
</span></span></span><span class="line"><span class="cl"><span class="c1">// 输出: [denoised_features, new_hidden_state, new_cell_state]  // 显式输出
</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p>这种显式化的设计使得：</p>
<ul>
<li>状态在模型外部可见和可控</li>
<li>可以跨框架、跨平台保持一致的行为</li>
<li>便于调试和优化</li>
</ul>
<h4 id="314-实际影响">3.1.4 实际影响</h4>
<p>对于语音降噪等时序应用，ORT不保存隐状态意味着：</p>
<ol>
<li><strong>必须手动传递状态</strong>：每次推理时，需要将上一次的输出状态作为下一次的输入</li>
<li><strong>状态持久化由应用负责</strong>：如果需要保存状态（如断点续传），需要应用层实现</li>
<li><strong>多流处理需要独立状态</strong>：处理多个音频流时，需要为每个流维护独立的状态变量</li>
</ol>
<p>这种设计虽然增加了应用层的复杂度，但提供了更大的灵活性和控制力，特别适合生产环境中的复杂场景。</p>
<h3 id="32-实战使用ort进行rnnoise降噪推理">3.2 实战使用ORT进行Rnnoise降噪推理</h3>
<p>RNNoise是一个基于深度学习的实时语音降噪模型，使用了三个GRU层（VAD GRU、Noise GRU、Denoise GRU）进行时序建模。在使用ORT进行推理时，需要特别注意这三个GRU层的隐状态管理。</p>
<h4 id="321-转换成onnx模型时导出gru隐状态输入输出端口">3.2.1 转换成ONNX模型时导出GRU隐状态输入输出端口</h4>
<p>RNNoise的Keras训练模型通常只接受特征输入，GRU的隐状态在内部管理。但在导出ONNX模型用于ORT推理时，需要将隐状态显式化为模型的输入和输出端口，这样才能在应用层控制状态传递。</p>
<p><strong>关键步骤：</strong></p>
<ol>
<li><strong>重建模型结构</strong>：创建一个新的推理模型，为每个GRU层添加<code>initial_state</code>输入和<code>return_state=True</code>输出</li>
<li><strong>复制权重</strong>：从训练模型复制所有层的权重到新模型</li>
<li><strong>定义输入输出</strong>：新模型有4个输入（features + 3个GRU状态）和5个输出（denoise_output + vad_output + 3个GRU状态）</li>
</ol>
<p>下图中，左侧为没有导出隐状态的onnx模型可视化图，可以看到gru的隐状态每次都是被重置的；右侧为导出了隐状态的onnx模型可视化图，可以看到gru节点对应了一个gru state输入端口和一个gru state的输出端口。</p>
<p><img alt="rnnoise-onnx-导出隐状态" loading="lazy" src="/images/2025-11-03/rnnoise-onnx-%E5%AF%BC%E5%87%BA%E9%9A%90%E7%8A%B6%E6%80%81.jpg"></p>
<p>以下是完整的转换代码：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">  1
</span><span class="lnt">  2
</span><span class="lnt">  3
</span><span class="lnt">  4
</span><span class="lnt">  5
</span><span class="lnt">  6
</span><span class="lnt">  7
</span><span class="lnt">  8
</span><span class="lnt">  9
</span><span class="lnt"> 10
</span><span class="lnt"> 11
</span><span class="lnt"> 12
</span><span class="lnt"> 13
</span><span class="lnt"> 14
</span><span class="lnt"> 15
</span><span class="lnt"> 16
</span><span class="lnt"> 17
</span><span class="lnt"> 18
</span><span class="lnt"> 19
</span><span class="lnt"> 20
</span><span class="lnt"> 21
</span><span class="lnt"> 22
</span><span class="lnt"> 23
</span><span class="lnt"> 24
</span><span class="lnt"> 25
</span><span class="lnt"> 26
</span><span class="lnt"> 27
</span><span class="lnt"> 28
</span><span class="lnt"> 29
</span><span class="lnt"> 30
</span><span class="lnt"> 31
</span><span class="lnt"> 32
</span><span class="lnt"> 33
</span><span class="lnt"> 34
</span><span class="lnt"> 35
</span><span class="lnt"> 36
</span><span class="lnt"> 37
</span><span class="lnt"> 38
</span><span class="lnt"> 39
</span><span class="lnt"> 40
</span><span class="lnt"> 41
</span><span class="lnt"> 42
</span><span class="lnt"> 43
</span><span class="lnt"> 44
</span><span class="lnt"> 45
</span><span class="lnt"> 46
</span><span class="lnt"> 47
</span><span class="lnt"> 48
</span><span class="lnt"> 49
</span><span class="lnt"> 50
</span><span class="lnt"> 51
</span><span class="lnt"> 52
</span><span class="lnt"> 53
</span><span class="lnt"> 54
</span><span class="lnt"> 55
</span><span class="lnt"> 56
</span><span class="lnt"> 57
</span><span class="lnt"> 58
</span><span class="lnt"> 59
</span><span class="lnt"> 60
</span><span class="lnt"> 61
</span><span class="lnt"> 62
</span><span class="lnt"> 63
</span><span class="lnt"> 64
</span><span class="lnt"> 65
</span><span class="lnt"> 66
</span><span class="lnt"> 67
</span><span class="lnt"> 68
</span><span class="lnt"> 69
</span><span class="lnt"> 70
</span><span class="lnt"> 71
</span><span class="lnt"> 72
</span><span class="lnt"> 73
</span><span class="lnt"> 74
</span><span class="lnt"> 75
</span><span class="lnt"> 76
</span><span class="lnt"> 77
</span><span class="lnt"> 78
</span><span class="lnt"> 79
</span><span class="lnt"> 80
</span><span class="lnt"> 81
</span><span class="lnt"> 82
</span><span class="lnt"> 83
</span><span class="lnt"> 84
</span><span class="lnt"> 85
</span><span class="lnt"> 86
</span><span class="lnt"> 87
</span><span class="lnt"> 88
</span><span class="lnt"> 89
</span><span class="lnt"> 90
</span><span class="lnt"> 91
</span><span class="lnt"> 92
</span><span class="lnt"> 93
</span><span class="lnt"> 94
</span><span class="lnt"> 95
</span><span class="lnt"> 96
</span><span class="lnt"> 97
</span><span class="lnt"> 98
</span><span class="lnt"> 99
</span><span class="lnt">100
</span><span class="lnt">101
</span><span class="lnt">102
</span><span class="lnt">103
</span><span class="lnt">104
</span><span class="lnt">105
</span><span class="lnt">106
</span><span class="lnt">107
</span><span class="lnt">108
</span><span class="lnt">109
</span><span class="lnt">110
</span><span class="lnt">111
</span><span class="lnt">112
</span><span class="lnt">113
</span><span class="lnt">114
</span><span class="lnt">115
</span><span class="lnt">116
</span><span class="lnt">117
</span><span class="lnt">118
</span><span class="lnt">119
</span><span class="lnt">120
</span><span class="lnt">121
</span><span class="lnt">122
</span><span class="lnt">123
</span><span class="lnt">124
</span><span class="lnt">125
</span><span class="lnt">126
</span><span class="lnt">127
</span><span class="lnt">128
</span><span class="lnt">129
</span><span class="lnt">130
</span><span class="lnt">131
</span><span class="lnt">132
</span><span class="lnt">133
</span><span class="lnt">134
</span><span class="lnt">135
</span><span class="lnt">136
</span><span class="lnt">137
</span><span class="lnt">138
</span><span class="lnt">139
</span><span class="lnt">140
</span><span class="lnt">141
</span><span class="lnt">142
</span><span class="lnt">143
</span><span class="lnt">144
</span><span class="lnt">145
</span><span class="lnt">146
</span><span class="lnt">147
</span><span class="lnt">148
</span><span class="lnt">149
</span><span class="lnt">150
</span><span class="lnt">151
</span><span class="lnt">152
</span><span class="lnt">153
</span><span class="lnt">154
</span><span class="lnt">155
</span><span class="lnt">156
</span><span class="lnt">157
</span><span class="lnt">158
</span><span class="lnt">159
</span><span class="lnt">160
</span><span class="lnt">161
</span><span class="lnt">162
</span><span class="lnt">163
</span><span class="lnt">164
</span><span class="lnt">165
</span><span class="lnt">166
</span><span class="lnt">167
</span><span class="lnt">168
</span><span class="lnt">169
</span><span class="lnt">170
</span><span class="lnt">171
</span><span class="lnt">172
</span><span class="lnt">173
</span><span class="lnt">174
</span><span class="lnt">175
</span><span class="lnt">176
</span><span class="lnt">177
</span><span class="lnt">178
</span><span class="lnt">179
</span><span class="lnt">180
</span><span class="lnt">181
</span><span class="lnt">182
</span><span class="lnt">183
</span><span class="lnt">184
</span><span class="lnt">185
</span><span class="lnt">186
</span><span class="lnt">187
</span><span class="lnt">188
</span><span class="lnt">189
</span><span class="lnt">190
</span><span class="lnt">191
</span><span class="lnt">192
</span><span class="lnt">193
</span><span class="lnt">194
</span><span class="lnt">195
</span><span class="lnt">196
</span><span class="lnt">197
</span><span class="lnt">198
</span><span class="lnt">199
</span><span class="lnt">200
</span><span class="lnt">201
</span><span class="lnt">202
</span><span class="lnt">203
</span><span class="lnt">204
</span><span class="lnt">205
</span><span class="lnt">206
</span><span class="lnt">207
</span><span class="lnt">208
</span><span class="lnt">209
</span><span class="lnt">210
</span><span class="lnt">211
</span><span class="lnt">212
</span><span class="lnt">213
</span><span class="lnt">214
</span><span class="lnt">215
</span><span class="lnt">216
</span><span class="lnt">217
</span><span class="lnt">218
</span><span class="lnt">219
</span><span class="lnt">220
</span><span class="lnt">221
</span><span class="lnt">222
</span><span class="lnt">223
</span><span class="lnt">224
</span><span class="lnt">225
</span><span class="lnt">226
</span><span class="lnt">227
</span><span class="lnt">228
</span><span class="lnt">229
</span><span class="lnt">230
</span><span class="lnt">231
</span><span class="lnt">232
</span><span class="lnt">233
</span><span class="lnt">234
</span><span class="lnt">235
</span><span class="lnt">236
</span><span class="lnt">237
</span><span class="lnt">238
</span><span class="lnt">239
</span><span class="lnt">240
</span><span class="lnt">241
</span><span class="lnt">242
</span><span class="lnt">243
</span><span class="lnt">244
</span><span class="lnt">245
</span><span class="lnt">246
</span><span class="lnt">247
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">keras.backend</span> <span class="k">as</span> <span class="nn">K</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">keras.constraints</span> <span class="kn">import</span> <span class="n">Constraint</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">Input</span><span class="p">,</span> <span class="n">Dense</span><span class="p">,</span> <span class="n">GRU</span><span class="p">,</span> <span class="n">concatenate</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">keras.models</span> <span class="kn">import</span> <span class="n">Model</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">my_crossentropy</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">K</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">K</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">y_true</span> <span class="o">-</span> <span class="mf">0.5</span><span class="p">)</span> <span class="o">*</span> <span class="n">K</span><span class="o">.</span><span class="n">binary_crossentropy</span><span class="p">(</span><span class="n">y_pred</span><span class="p">,</span> <span class="n">y_true</span><span class="p">),</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">mymask</span><span class="p">(</span><span class="n">y_true</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">K</span><span class="o">.</span><span class="n">minimum</span><span class="p">(</span><span class="n">y_true</span> <span class="o">+</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">msse</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">K</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">mymask</span><span class="p">(</span><span class="n">y_true</span><span class="p">)</span> <span class="o">*</span> <span class="n">K</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">K</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">y_pred</span><span class="p">)</span> <span class="o">-</span> <span class="n">K</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">y_true</span><span class="p">)),</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">mycost</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">K</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">mymask</span><span class="p">(</span><span class="n">y_true</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="o">*</span> <span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="mi">10</span> <span class="o">*</span> <span class="n">K</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">K</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">K</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">y_pred</span><span class="p">)</span> <span class="o">-</span> <span class="n">K</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">y_true</span><span class="p">)))</span>
</span></span><span class="line"><span class="cl">            <span class="o">+</span> <span class="n">K</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">K</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">y_pred</span><span class="p">)</span> <span class="o">-</span> <span class="n">K</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">y_true</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">            <span class="o">+</span> <span class="mf">0.01</span> <span class="o">*</span> <span class="n">K</span><span class="o">.</span><span class="n">binary_crossentropy</span><span class="p">(</span><span class="n">y_pred</span><span class="p">,</span> <span class="n">y_true</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">my_accuracy</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">K</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">K</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">y_true</span> <span class="o">-</span> <span class="mf">0.5</span><span class="p">)</span> <span class="o">*</span> <span class="n">K</span><span class="o">.</span><span class="n">equal</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span> <span class="n">K</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="n">y_pred</span><span class="p">)),</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">WeightClip</span><span class="p">(</span><span class="n">Constraint</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Accept **kwargs to be compatible with Keras deserialization that may pass &#39;name&#39; etc.</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>  <span class="c1"># kwargs may include &#39;name&#39;</span>
</span></span><span class="line"><span class="cl">        <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">c</span> <span class="o">=</span> <span class="n">c</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">K</span><span class="o">.</span><span class="n">clip</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="o">-</span><span class="bp">self</span><span class="o">.</span><span class="n">c</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">c</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">get_config</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="p">{</span><span class="s1">&#39;name&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="p">,</span> <span class="s1">&#39;c&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">c</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">CUSTOM_OBJECTS</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="s1">&#39;my_crossentropy&#39;</span><span class="p">:</span> <span class="n">my_crossentropy</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s1">&#39;mymask&#39;</span><span class="p">:</span> <span class="n">mymask</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s1">&#39;msse&#39;</span><span class="p">:</span> <span class="n">msse</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s1">&#39;mycost&#39;</span><span class="p">:</span> <span class="n">mycost</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s1">&#39;my_accuracy&#39;</span><span class="p">:</span> <span class="n">my_accuracy</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s1">&#39;WeightClip&#39;</span><span class="p">:</span> <span class="n">WeightClip</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">rebuild_model_with_states</span><span class="p">(</span><span class="n">training_model</span><span class="p">:</span> <span class="n">Model</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Model</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;
</span></span></span><span class="line"><span class="cl"><span class="s2">    自动重建模型，添加GRU隐状态输入/输出端口。
</span></span></span><span class="line"><span class="cl"><span class="s2">    如果模型已经有GRU状态端口，直接返回原模型。
</span></span></span><span class="line"><span class="cl"><span class="s2">    &#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># 检查是否已有GRU状态端口</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">training_model</span><span class="o">.</span><span class="n">inputs</span><span class="p">)</span> <span class="o">==</span> <span class="mi">4</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">training_model</span><span class="o">.</span><span class="n">outputs</span><span class="p">)</span> <span class="o">==</span> <span class="mi">5</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;  Model already has GRU state ports, skipping rebuild&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">training_model</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;  Rebuilding model with GRU state inputs/outputs...&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># 新的推理输入（带状态）</span>
</span></span><span class="line"><span class="cl">    <span class="n">features_in</span> <span class="o">=</span> <span class="n">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="kc">None</span><span class="p">,</span> <span class="mi">42</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;features&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">vad_state_in</span> <span class="o">=</span> <span class="n">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">24</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;vad_gru_state&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">noise_state_in</span> <span class="o">=</span> <span class="n">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">48</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;noise_gru_state&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">denoise_state_in</span> <span class="o">=</span> <span class="n">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">96</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;denoise_gru_state&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># 复制训练模型的层配置并加载权重</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># 1) input_dense</span>
</span></span><span class="line"><span class="cl">    <span class="n">input_dense_src</span> <span class="o">=</span> <span class="n">training_model</span><span class="o">.</span><span class="n">get_layer</span><span class="p">(</span><span class="s1">&#39;input_dense&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">input_dense</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">24</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">&#39;tanh&#39;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;input_dense_export&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                        <span class="n">kernel_constraint</span><span class="o">=</span><span class="n">input_dense_src</span><span class="o">.</span><span class="n">kernel_constraint</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                        <span class="n">bias_constraint</span><span class="o">=</span><span class="n">input_dense_src</span><span class="o">.</span><span class="n">bias_constraint</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">tmp_export</span> <span class="o">=</span> <span class="n">input_dense</span><span class="p">(</span><span class="n">features_in</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">input_dense</span><span class="o">.</span><span class="n">set_weights</span><span class="p">(</span><span class="n">input_dense_src</span><span class="o">.</span><span class="n">get_weights</span><span class="p">())</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># 2) vad_gru (return_sequences+return_state)</span>
</span></span><span class="line"><span class="cl">    <span class="n">vad_gru_src</span> <span class="o">=</span> <span class="n">training_model</span><span class="o">.</span><span class="n">get_layer</span><span class="p">(</span><span class="s1">&#39;vad_gru&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">vad_gru_exp</span> <span class="o">=</span> <span class="n">GRU</span><span class="p">(</span><span class="mi">24</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">&#39;tanh&#39;</span><span class="p">,</span> <span class="n">recurrent_activation</span><span class="o">=</span><span class="s1">&#39;sigmoid&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                      <span class="n">return_sequences</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">return_state</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;vad_gru_export&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                      <span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">vad_gru_src</span><span class="o">.</span><span class="n">kernel_regularizer</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                      <span class="n">recurrent_regularizer</span><span class="o">=</span><span class="n">vad_gru_src</span><span class="o">.</span><span class="n">recurrent_regularizer</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                      <span class="n">kernel_constraint</span><span class="o">=</span><span class="n">vad_gru_src</span><span class="o">.</span><span class="n">kernel_constraint</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                      <span class="n">recurrent_constraint</span><span class="o">=</span><span class="n">vad_gru_src</span><span class="o">.</span><span class="n">recurrent_constraint</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                      <span class="n">bias_constraint</span><span class="o">=</span><span class="n">vad_gru_src</span><span class="o">.</span><span class="n">bias_constraint</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">vad_seq</span><span class="p">,</span> <span class="n">vad_state_out</span> <span class="o">=</span> <span class="n">vad_gru_exp</span><span class="p">(</span><span class="n">tmp_export</span><span class="p">,</span> <span class="n">initial_state</span><span class="o">=</span><span class="n">vad_state_in</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">vad_gru_exp</span><span class="o">.</span><span class="n">set_weights</span><span class="p">(</span><span class="n">vad_gru_src</span><span class="o">.</span><span class="n">get_weights</span><span class="p">())</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># 3) vad_output</span>
</span></span><span class="line"><span class="cl">    <span class="n">vad_output_src</span> <span class="o">=</span> <span class="n">training_model</span><span class="o">.</span><span class="n">get_layer</span><span class="p">(</span><span class="s1">&#39;vad_output&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">vad_output_exp_layer</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">&#39;sigmoid&#39;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;vad_output_export&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                                 <span class="n">kernel_constraint</span><span class="o">=</span><span class="n">vad_output_src</span><span class="o">.</span><span class="n">kernel_constraint</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                                 <span class="n">bias_constraint</span><span class="o">=</span><span class="n">vad_output_src</span><span class="o">.</span><span class="n">bias_constraint</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">vad_output_exp</span> <span class="o">=</span> <span class="n">vad_output_exp_layer</span><span class="p">(</span><span class="n">vad_seq</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">vad_output_exp_layer</span><span class="o">.</span><span class="n">set_weights</span><span class="p">(</span><span class="n">vad_output_src</span><span class="o">.</span><span class="n">get_weights</span><span class="p">())</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># 4) noise_gru 输入：concat([tmp_export, vad_seq, features_in])</span>
</span></span><span class="line"><span class="cl">    <span class="n">noise_in</span> <span class="o">=</span> <span class="n">concatenate</span><span class="p">([</span><span class="n">tmp_export</span><span class="p">,</span> <span class="n">vad_seq</span><span class="p">,</span> <span class="n">features_in</span><span class="p">],</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;noise_concat_export&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">noise_gru_src</span> <span class="o">=</span> <span class="n">training_model</span><span class="o">.</span><span class="n">get_layer</span><span class="p">(</span><span class="s1">&#39;noise_gru&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">noise_gru_exp</span> <span class="o">=</span> <span class="n">GRU</span><span class="p">(</span><span class="mi">48</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">&#39;relu&#39;</span><span class="p">,</span> <span class="n">recurrent_activation</span><span class="o">=</span><span class="s1">&#39;sigmoid&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                        <span class="n">return_sequences</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">return_state</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;noise_gru_export&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                        <span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">noise_gru_src</span><span class="o">.</span><span class="n">kernel_regularizer</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                        <span class="n">recurrent_regularizer</span><span class="o">=</span><span class="n">noise_gru_src</span><span class="o">.</span><span class="n">recurrent_regularizer</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                        <span class="n">kernel_constraint</span><span class="o">=</span><span class="n">noise_gru_src</span><span class="o">.</span><span class="n">kernel_constraint</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                        <span class="n">recurrent_constraint</span><span class="o">=</span><span class="n">noise_gru_src</span><span class="o">.</span><span class="n">recurrent_constraint</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                        <span class="n">bias_constraint</span><span class="o">=</span><span class="n">noise_gru_src</span><span class="o">.</span><span class="n">bias_constraint</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">noise_seq</span><span class="p">,</span> <span class="n">noise_state_out</span> <span class="o">=</span> <span class="n">noise_gru_exp</span><span class="p">(</span><span class="n">noise_in</span><span class="p">,</span> <span class="n">initial_state</span><span class="o">=</span><span class="n">noise_state_in</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">noise_gru_exp</span><span class="o">.</span><span class="n">set_weights</span><span class="p">(</span><span class="n">noise_gru_src</span><span class="o">.</span><span class="n">get_weights</span><span class="p">())</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># 5) denoise_gru 输入：concat([vad_seq, noise_seq, features_in])</span>
</span></span><span class="line"><span class="cl">    <span class="n">denoise_in</span> <span class="o">=</span> <span class="n">concatenate</span><span class="p">([</span><span class="n">vad_seq</span><span class="p">,</span> <span class="n">noise_seq</span><span class="p">,</span> <span class="n">features_in</span><span class="p">],</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;denoise_concat_export&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">denoise_gru_src</span> <span class="o">=</span> <span class="n">training_model</span><span class="o">.</span><span class="n">get_layer</span><span class="p">(</span><span class="s1">&#39;denoise_gru&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">denoise_gru_exp</span> <span class="o">=</span> <span class="n">GRU</span><span class="p">(</span><span class="mi">96</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">&#39;tanh&#39;</span><span class="p">,</span> <span class="n">recurrent_activation</span><span class="o">=</span><span class="s1">&#39;sigmoid&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                          <span class="n">return_sequences</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">return_state</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;denoise_gru_export&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                          <span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">denoise_gru_src</span><span class="o">.</span><span class="n">kernel_regularizer</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                          <span class="n">recurrent_regularizer</span><span class="o">=</span><span class="n">denoise_gru_src</span><span class="o">.</span><span class="n">recurrent_regularizer</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                          <span class="n">kernel_constraint</span><span class="o">=</span><span class="n">denoise_gru_src</span><span class="o">.</span><span class="n">kernel_constraint</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                          <span class="n">recurrent_constraint</span><span class="o">=</span><span class="n">denoise_gru_src</span><span class="o">.</span><span class="n">recurrent_constraint</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                          <span class="n">bias_constraint</span><span class="o">=</span><span class="n">denoise_gru_src</span><span class="o">.</span><span class="n">bias_constraint</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">denoise_seq</span><span class="p">,</span> <span class="n">denoise_state_out</span> <span class="o">=</span> <span class="n">denoise_gru_exp</span><span class="p">(</span><span class="n">denoise_in</span><span class="p">,</span> <span class="n">initial_state</span><span class="o">=</span><span class="n">denoise_state_in</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">denoise_gru_exp</span><span class="o">.</span><span class="n">set_weights</span><span class="p">(</span><span class="n">denoise_gru_src</span><span class="o">.</span><span class="n">get_weights</span><span class="p">())</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># 6) denoise_output</span>
</span></span><span class="line"><span class="cl">    <span class="n">denoise_output_src</span> <span class="o">=</span> <span class="n">training_model</span><span class="o">.</span><span class="n">get_layer</span><span class="p">(</span><span class="s1">&#39;denoise_output&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">denoise_output_exp_layer</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">22</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">&#39;sigmoid&#39;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;denoise_output_export&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                                     <span class="n">kernel_constraint</span><span class="o">=</span><span class="n">denoise_output_src</span><span class="o">.</span><span class="n">kernel_constraint</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                                     <span class="n">bias_constraint</span><span class="o">=</span><span class="n">denoise_output_src</span><span class="o">.</span><span class="n">bias_constraint</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">denoise_output_exp</span> <span class="o">=</span> <span class="n">denoise_output_exp_layer</span><span class="p">(</span><span class="n">denoise_seq</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">denoise_output_exp_layer</span><span class="o">.</span><span class="n">set_weights</span><span class="p">(</span><span class="n">denoise_output_src</span><span class="o">.</span><span class="n">get_weights</span><span class="p">())</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">export_model</span> <span class="o">=</span> <span class="n">Model</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="n">features_in</span><span class="p">,</span> <span class="n">vad_state_in</span><span class="p">,</span> <span class="n">noise_state_in</span><span class="p">,</span> <span class="n">denoise_state_in</span><span class="p">],</span>
</span></span><span class="line"><span class="cl">        <span class="n">outputs</span><span class="o">=</span><span class="p">[</span><span class="n">denoise_output_exp</span><span class="p">,</span> <span class="n">vad_output_exp</span><span class="p">,</span> <span class="n">vad_state_out</span><span class="p">,</span> <span class="n">noise_state_out</span><span class="p">,</span> <span class="n">denoise_state_out</span><span class="p">],</span>
</span></span><span class="line"><span class="cl">        <span class="n">name</span><span class="o">=</span><span class="s1">&#39;rnnoise_export_with_states&#39;</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;  ✓ Model rebuilt successfully with GRU state ports&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">export_model</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">convert</span><span class="p">(</span><span class="n">hdf5_path</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">onnx_path</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">opset</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">13</span><span class="p">,</span> <span class="n">auto_rebuild</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="n">hdf5_path</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">raise</span> <span class="ne">FileNotFoundError</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;HDF5 model not found: </span><span class="si">{</span><span class="n">hdf5_path</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Loading Keras model from: </span><span class="si">{</span><span class="n">hdf5_path</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Load with custom objects registered for deserialization</span>
</span></span><span class="line"><span class="cl">    <span class="n">model</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">load_model</span><span class="p">(</span><span class="n">hdf5_path</span><span class="p">,</span> <span class="n">custom_objects</span><span class="o">=</span><span class="n">CUSTOM_OBJECTS</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># Auto-rebuild model with GRU states if needed</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">auto_rebuild</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;</span><span class="se">\n</span><span class="s2">=== Auto-Rebuild Mode ===&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;  Checking if model needs GRU state ports...&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">model</span> <span class="o">=</span> <span class="n">rebuild_model_with_states</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;  Model ready for conversion with GRU state ports</span><span class="se">\n</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Check if the model has GRU state inputs/outputs</span>
</span></span><span class="line"><span class="cl">    <span class="n">num_inputs</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">inputs</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">num_outputs</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">outputs</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Model has </span><span class="si">{</span><span class="n">num_inputs</span><span class="si">}</span><span class="s2"> input(s) and </span><span class="si">{</span><span class="n">num_outputs</span><span class="si">}</span><span class="s2"> output(s)&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># Print input information</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">inp</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">inputs</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;  Input </span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">inp</span><span class="o">.</span><span class="n">name</span><span class="si">}</span><span class="s2">, shape: </span><span class="si">{</span><span class="n">inp</span><span class="o">.</span><span class="n">shape</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># Print output information</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">out</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">outputs</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;  Output </span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">out</span><span class="o">.</span><span class="n">name</span><span class="si">}</span><span class="s2">, shape: </span><span class="si">{</span><span class="n">out</span><span class="o">.</span><span class="n">shape</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># Check if this is a model with GRU states (4 inputs and 5 outputs)</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">num_inputs</span> <span class="o">==</span> <span class="mi">4</span> <span class="ow">and</span> <span class="n">num_outputs</span> <span class="o">==</span> <span class="mi">5</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;Detected model with GRU state inputs/outputs&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># Build input signature for model with efficient state management</span>
</span></span><span class="line"><span class="cl">        <span class="n">input_specs</span> <span class="o">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="n">inp</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">inputs</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">inp_name</span> <span class="o">=</span> <span class="n">inp</span><span class="o">.</span><span class="n">name</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;:&#39;</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">            <span class="n">inp_shape</span> <span class="o">=</span> <span class="n">inp</span><span class="o">.</span><span class="n">shape</span><span class="o">.</span><span class="n">as_list</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">            
</span></span><span class="line"><span class="cl">            <span class="c1"># Handle different input shapes</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">inp_shape</span><span class="p">)</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>  <span class="c1"># features: (None, None, 42)</span>
</span></span><span class="line"><span class="cl">                <span class="n">spec</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">TensorSpec</span><span class="p">([</span><span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="n">inp_shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]],</span> <span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="n">inp_name</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="k">elif</span> <span class="nb">len</span><span class="p">(</span><span class="n">inp_shape</span><span class="p">)</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>  <span class="c1"># GRU states: (None, hidden_size)</span>
</span></span><span class="line"><span class="cl">                <span class="n">spec</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">TensorSpec</span><span class="p">([</span><span class="kc">None</span><span class="p">,</span> <span class="n">inp_shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]],</span> <span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="n">inp_name</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="c1"># Fallback: use dynamic shape</span>
</span></span><span class="line"><span class="cl">                <span class="n">spec</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">TensorSpec</span><span class="p">([</span><span class="kc">None</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">inp_shape</span><span class="p">),</span> <span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="n">inp_name</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            
</span></span><span class="line"><span class="cl">            <span class="n">input_specs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">spec</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Converting to ONNX (opset </span><span class="si">{</span><span class="n">opset</span><span class="si">}</span><span class="s2">) with GRU state inputs/outputs...&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># Convert with all input signatures</span>
</span></span><span class="line"><span class="cl">        <span class="n">tf2onnx</span><span class="o">.</span><span class="n">convert</span><span class="o">.</span><span class="n">from_keras</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">input_signature</span><span class="o">=</span><span class="n">input_specs</span><span class="p">,</span> <span class="n">output_path</span><span class="o">=</span><span class="n">onnx_path</span><span class="p">,</span> <span class="n">opset</span><span class="o">=</span><span class="n">opset</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="n">num_inputs</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;Detected standard model without GRU state ports&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># Use a dynamic input signature (None, None, 42) to preserve time dimension flexibility</span>
</span></span><span class="line"><span class="cl">        <span class="n">input_name</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">inputs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">name</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;:&#39;</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">spec</span> <span class="o">=</span> <span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">TensorSpec</span><span class="p">([</span><span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="mi">42</span><span class="p">],</span> <span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="n">input_name</span><span class="p">),)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Converting to ONNX (opset </span><span class="si">{</span><span class="n">opset</span><span class="si">}</span><span class="s2">)...&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># Convert directly from the Keras model</span>
</span></span><span class="line"><span class="cl">        <span class="n">tf2onnx</span><span class="o">.</span><span class="n">convert</span><span class="o">.</span><span class="n">from_keras</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">input_signature</span><span class="o">=</span><span class="n">spec</span><span class="p">,</span> <span class="n">output_path</span><span class="o">=</span><span class="n">onnx_path</span><span class="p">,</span> <span class="n">opset</span><span class="o">=</span><span class="n">opset</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># Generic conversion for models with multiple inputs but unknown structure</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Converting to ONNX (opset </span><span class="si">{</span><span class="n">opset</span><span class="si">}</span><span class="s2">) with </span><span class="si">{</span><span class="n">num_inputs</span><span class="si">}</span><span class="s2"> inputs...&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">input_specs</span> <span class="o">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="n">inp</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">inputs</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">inp_name</span> <span class="o">=</span> <span class="n">inp</span><span class="o">.</span><span class="n">name</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;:&#39;</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">            <span class="n">inp_shape</span> <span class="o">=</span> <span class="n">inp</span><span class="o">.</span><span class="n">shape</span><span class="o">.</span><span class="n">as_list</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">            <span class="c1"># Use dynamic shapes for flexibility</span>
</span></span><span class="line"><span class="cl">            <span class="n">spec</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">TensorSpec</span><span class="p">([</span><span class="kc">None</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">inp_shape</span><span class="p">),</span> <span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="n">inp_name</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">input_specs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">spec</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">tf2onnx</span><span class="o">.</span><span class="n">convert</span><span class="o">.</span><span class="n">from_keras</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">input_signature</span><span class="o">=</span><span class="n">input_specs</span><span class="p">,</span> <span class="n">output_path</span><span class="o">=</span><span class="n">onnx_path</span><span class="p">,</span> <span class="n">opset</span><span class="o">=</span><span class="n">opset</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Saved ONNX model to: </span><span class="si">{</span><span class="n">onnx_path</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">(</span><span class="n">description</span><span class="o">=</span><span class="s1">&#39;Convert Keras HDF5 model to ONNX for RNNoise.&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;--input&#39;</span><span class="p">,</span> <span class="s1">&#39;-i&#39;</span><span class="p">,</span> <span class="n">required</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">&#39;Path to Keras HDF5 model file&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;--output&#39;</span><span class="p">,</span> <span class="s1">&#39;-o&#39;</span><span class="p">,</span> <span class="n">required</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">&#39;Path to output ONNX file&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;--opset&#39;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">int</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="mi">13</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">&#39;ONNX opset version (default: 13)&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;--auto-rebuild&#39;</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s1">&#39;store_true&#39;</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">                        <span class="n">help</span><span class="o">=</span><span class="s1">&#39;Automatically rebuild model with GRU state ports if missing&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">input_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">abspath</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">input</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">output_path</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">output</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">output_path</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">base</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">splitext</span><span class="p">(</span><span class="n">input_path</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">output_path</span> <span class="o">=</span> <span class="n">base</span> <span class="o">+</span> <span class="s1">&#39;.onnx&#39;</span>
</span></span><span class="line"><span class="cl">    <span class="n">output_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">abspath</span><span class="p">(</span><span class="n">output_path</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">output_path</span><span class="p">),</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">convert</span><span class="p">(</span><span class="n">input_path</span><span class="p">,</span> <span class="n">output_path</span><span class="p">,</span> <span class="n">opset</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">opset</span><span class="p">,</span> <span class="n">auto_rebuild</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">auto_rebuild</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">main</span><span class="p">()</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h4 id="322-推理时对隐状态进行管理">3.2.2 推理时对隐状态进行管理</h4>
<p>前面导出onnx模型时，已经为每个GRU节点导出了隐状态的输入和输出端口，因此在每一次帧的时候，只需要将上一次推理保存的隐状态信息输入到对应的隐状态输入端口，同时在推理后对GRU节点的隐状态输出端口进行保存，就可以实现流式推理GRU保留历史信息了。</p>
<p>以下是部分核心函数实现，只需要将其嵌入到原rnnoise降噪代码中就可以实现ort推理了。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">  1
</span><span class="lnt">  2
</span><span class="lnt">  3
</span><span class="lnt">  4
</span><span class="lnt">  5
</span><span class="lnt">  6
</span><span class="lnt">  7
</span><span class="lnt">  8
</span><span class="lnt">  9
</span><span class="lnt"> 10
</span><span class="lnt"> 11
</span><span class="lnt"> 12
</span><span class="lnt"> 13
</span><span class="lnt"> 14
</span><span class="lnt"> 15
</span><span class="lnt"> 16
</span><span class="lnt"> 17
</span><span class="lnt"> 18
</span><span class="lnt"> 19
</span><span class="lnt"> 20
</span><span class="lnt"> 21
</span><span class="lnt"> 22
</span><span class="lnt"> 23
</span><span class="lnt"> 24
</span><span class="lnt"> 25
</span><span class="lnt"> 26
</span><span class="lnt"> 27
</span><span class="lnt"> 28
</span><span class="lnt"> 29
</span><span class="lnt"> 30
</span><span class="lnt"> 31
</span><span class="lnt"> 32
</span><span class="lnt"> 33
</span><span class="lnt"> 34
</span><span class="lnt"> 35
</span><span class="lnt"> 36
</span><span class="lnt"> 37
</span><span class="lnt"> 38
</span><span class="lnt"> 39
</span><span class="lnt"> 40
</span><span class="lnt"> 41
</span><span class="lnt"> 42
</span><span class="lnt"> 43
</span><span class="lnt"> 44
</span><span class="lnt"> 45
</span><span class="lnt"> 46
</span><span class="lnt"> 47
</span><span class="lnt"> 48
</span><span class="lnt"> 49
</span><span class="lnt"> 50
</span><span class="lnt"> 51
</span><span class="lnt"> 52
</span><span class="lnt"> 53
</span><span class="lnt"> 54
</span><span class="lnt"> 55
</span><span class="lnt"> 56
</span><span class="lnt"> 57
</span><span class="lnt"> 58
</span><span class="lnt"> 59
</span><span class="lnt"> 60
</span><span class="lnt"> 61
</span><span class="lnt"> 62
</span><span class="lnt"> 63
</span><span class="lnt"> 64
</span><span class="lnt"> 65
</span><span class="lnt"> 66
</span><span class="lnt"> 67
</span><span class="lnt"> 68
</span><span class="lnt"> 69
</span><span class="lnt"> 70
</span><span class="lnt"> 71
</span><span class="lnt"> 72
</span><span class="lnt"> 73
</span><span class="lnt"> 74
</span><span class="lnt"> 75
</span><span class="lnt"> 76
</span><span class="lnt"> 77
</span><span class="lnt"> 78
</span><span class="lnt"> 79
</span><span class="lnt"> 80
</span><span class="lnt"> 81
</span><span class="lnt"> 82
</span><span class="lnt"> 83
</span><span class="lnt"> 84
</span><span class="lnt"> 85
</span><span class="lnt"> 86
</span><span class="lnt"> 87
</span><span class="lnt"> 88
</span><span class="lnt"> 89
</span><span class="lnt"> 90
</span><span class="lnt"> 91
</span><span class="lnt"> 92
</span><span class="lnt"> 93
</span><span class="lnt"> 94
</span><span class="lnt"> 95
</span><span class="lnt"> 96
</span><span class="lnt"> 97
</span><span class="lnt"> 98
</span><span class="lnt"> 99
</span><span class="lnt">100
</span><span class="lnt">101
</span><span class="lnt">102
</span><span class="lnt">103
</span><span class="lnt">104
</span><span class="lnt">105
</span><span class="lnt">106
</span><span class="lnt">107
</span><span class="lnt">108
</span><span class="lnt">109
</span><span class="lnt">110
</span><span class="lnt">111
</span><span class="lnt">112
</span><span class="lnt">113
</span><span class="lnt">114
</span><span class="lnt">115
</span><span class="lnt">116
</span><span class="lnt">117
</span><span class="lnt">118
</span><span class="lnt">119
</span><span class="lnt">120
</span><span class="lnt">121
</span><span class="lnt">122
</span><span class="lnt">123
</span><span class="lnt">124
</span><span class="lnt">125
</span><span class="lnt">126
</span><span class="lnt">127
</span><span class="lnt">128
</span><span class="lnt">129
</span><span class="lnt">130
</span><span class="lnt">131
</span><span class="lnt">132
</span><span class="lnt">133
</span><span class="lnt">134
</span><span class="lnt">135
</span><span class="lnt">136
</span><span class="lnt">137
</span><span class="lnt">138
</span><span class="lnt">139
</span><span class="lnt">140
</span><span class="lnt">141
</span><span class="lnt">142
</span><span class="lnt">143
</span><span class="lnt">144
</span><span class="lnt">145
</span><span class="lnt">146
</span><span class="lnt">147
</span><span class="lnt">148
</span><span class="lnt">149
</span><span class="lnt">150
</span><span class="lnt">151
</span><span class="lnt">152
</span><span class="lnt">153
</span><span class="lnt">154
</span><span class="lnt">155
</span><span class="lnt">156
</span><span class="lnt">157
</span><span class="lnt">158
</span><span class="lnt">159
</span><span class="lnt">160
</span><span class="lnt">161
</span><span class="lnt">162
</span><span class="lnt">163
</span><span class="lnt">164
</span><span class="lnt">165
</span><span class="lnt">166
</span><span class="lnt">167
</span><span class="lnt">168
</span><span class="lnt">169
</span><span class="lnt">170
</span><span class="lnt">171
</span><span class="lnt">172
</span><span class="lnt">173
</span><span class="lnt">174
</span><span class="lnt">175
</span><span class="lnt">176
</span><span class="lnt">177
</span><span class="lnt">178
</span><span class="lnt">179
</span><span class="lnt">180
</span><span class="lnt">181
</span><span class="lnt">182
</span><span class="lnt">183
</span><span class="lnt">184
</span><span class="lnt">185
</span><span class="lnt">186
</span><span class="lnt">187
</span><span class="lnt">188
</span><span class="lnt">189
</span><span class="lnt">190
</span><span class="lnt">191
</span><span class="lnt">192
</span><span class="lnt">193
</span><span class="lnt">194
</span><span class="lnt">195
</span><span class="lnt">196
</span><span class="lnt">197
</span><span class="lnt">198
</span><span class="lnt">199
</span><span class="lnt">200
</span><span class="lnt">201
</span><span class="lnt">202
</span><span class="lnt">203
</span><span class="lnt">204
</span><span class="lnt">205
</span><span class="lnt">206
</span><span class="lnt">207
</span><span class="lnt">208
</span><span class="lnt">209
</span><span class="lnt">210
</span><span class="lnt">211
</span><span class="lnt">212
</span><span class="lnt">213
</span><span class="lnt">214
</span><span class="lnt">215
</span><span class="lnt">216
</span><span class="lnt">217
</span><span class="lnt">218
</span><span class="lnt">219
</span><span class="lnt">220
</span><span class="lnt">221
</span><span class="lnt">222
</span><span class="lnt">223
</span><span class="lnt">224
</span><span class="lnt">225
</span><span class="lnt">226
</span><span class="lnt">227
</span><span class="lnt">228
</span><span class="lnt">229
</span><span class="lnt">230
</span><span class="lnt">231
</span><span class="lnt">232
</span><span class="lnt">233
</span><span class="lnt">234
</span><span class="lnt">235
</span><span class="lnt">236
</span><span class="lnt">237
</span><span class="lnt">238
</span><span class="lnt">239
</span><span class="lnt">240
</span><span class="lnt">241
</span><span class="lnt">242
</span><span class="lnt">243
</span><span class="lnt">244
</span><span class="lnt">245
</span><span class="lnt">246
</span><span class="lnt">247
</span><span class="lnt">248
</span><span class="lnt">249
</span><span class="lnt">250
</span><span class="lnt">251
</span><span class="lnt">252
</span><span class="lnt">253
</span><span class="lnt">254
</span><span class="lnt">255
</span><span class="lnt">256
</span><span class="lnt">257
</span><span class="lnt">258
</span><span class="lnt">259
</span><span class="lnt">260
</span><span class="lnt">261
</span><span class="lnt">262
</span><span class="lnt">263
</span><span class="lnt">264
</span><span class="lnt">265
</span><span class="lnt">266
</span><span class="lnt">267
</span><span class="lnt">268
</span><span class="lnt">269
</span><span class="lnt">270
</span><span class="lnt">271
</span><span class="lnt">272
</span><span class="lnt">273
</span><span class="lnt">274
</span><span class="lnt">275
</span><span class="lnt">276
</span><span class="lnt">277
</span><span class="lnt">278
</span><span class="lnt">279
</span><span class="lnt">280
</span><span class="lnt">281
</span><span class="lnt">282
</span><span class="lnt">283
</span><span class="lnt">284
</span><span class="lnt">285
</span><span class="lnt">286
</span><span class="lnt">287
</span><span class="lnt">288
</span><span class="lnt">289
</span><span class="lnt">290
</span><span class="lnt">291
</span><span class="lnt">292
</span><span class="lnt">293
</span><span class="lnt">294
</span><span class="lnt">295
</span><span class="lnt">296
</span><span class="lnt">297
</span><span class="lnt">298
</span><span class="lnt">299
</span><span class="lnt">300
</span><span class="lnt">301
</span><span class="lnt">302
</span><span class="lnt">303
</span><span class="lnt">304
</span><span class="lnt">305
</span><span class="lnt">306
</span><span class="lnt">307
</span><span class="lnt">308
</span><span class="lnt">309
</span><span class="lnt">310
</span><span class="lnt">311
</span><span class="lnt">312
</span><span class="lnt">313
</span><span class="lnt">314
</span><span class="lnt">315
</span><span class="lnt">316
</span><span class="lnt">317
</span><span class="lnt">318
</span><span class="lnt">319
</span><span class="lnt">320
</span><span class="lnt">321
</span><span class="lnt">322
</span><span class="lnt">323
</span><span class="lnt">324
</span><span class="lnt">325
</span><span class="lnt">326
</span><span class="lnt">327
</span><span class="lnt">328
</span><span class="lnt">329
</span><span class="lnt">330
</span><span class="lnt">331
</span><span class="lnt">332
</span><span class="lnt">333
</span><span class="lnt">334
</span><span class="lnt">335
</span><span class="lnt">336
</span><span class="lnt">337
</span><span class="lnt">338
</span><span class="lnt">339
</span><span class="lnt">340
</span><span class="lnt">341
</span><span class="lnt">342
</span><span class="lnt">343
</span><span class="lnt">344
</span><span class="lnt">345
</span><span class="lnt">346
</span><span class="lnt">347
</span><span class="lnt">348
</span><span class="lnt">349
</span><span class="lnt">350
</span><span class="lnt">351
</span><span class="lnt">352
</span><span class="lnt">353
</span><span class="lnt">354
</span><span class="lnt">355
</span><span class="lnt">356
</span><span class="lnt">357
</span><span class="lnt">358
</span><span class="lnt">359
</span><span class="lnt">360
</span><span class="lnt">361
</span><span class="lnt">362
</span><span class="lnt">363
</span><span class="lnt">364
</span><span class="lnt">365
</span><span class="lnt">366
</span><span class="lnt">367
</span><span class="lnt">368
</span><span class="lnt">369
</span><span class="lnt">370
</span><span class="lnt">371
</span><span class="lnt">372
</span><span class="lnt">373
</span><span class="lnt">374
</span><span class="lnt">375
</span><span class="lnt">376
</span><span class="lnt">377
</span><span class="lnt">378
</span><span class="lnt">379
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="c1">// Initialize ONNX model
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="kt">int</span> <span class="nf">initialize_onnx_model</span><span class="p">(</span><span class="n">RNNoiseContext</span><span class="o">*</span> <span class="n">ctx</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">model_path</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="c1">// Get ONNX Runtime API
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="k">const</span> <span class="n">OrtApiBase</span><span class="o">*</span> <span class="n">api_base</span> <span class="o">=</span> <span class="nf">OrtGetApiBase</span><span class="p">();</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">api_base</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting ONNX Runtime API base</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span> <span class="o">=</span> <span class="n">api_base</span><span class="o">-&gt;</span><span class="nf">GetApi</span><span class="p">(</span><span class="n">ORT_API_VERSION</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting ONNX Runtime API</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Initialize ONNX Runtime environment
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">OrtStatus</span><span class="o">*</span> <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">CreateEnv</span><span class="p">(</span><span class="n">ORT_LOGGING_LEVEL_WARNING</span><span class="p">,</span> <span class="s">&#34;RNNoiseONNX&#34;</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">env</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error creating ONNX Runtime environment</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Create session options
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">CreateSessionOptions</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session_options</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error creating session options</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Set session options
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SetIntraOpNumThreads</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session_options</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error setting intra-op threads</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SetSessionGraphOptimizationLevel</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session_options</span><span class="p">,</span> <span class="n">ORT_ENABLE_EXTENDED</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error setting optimization level</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Create session
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">CreateSession</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">env</span><span class="p">,</span> <span class="n">model_path</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session_options</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error creating ONNX session</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Get allocator
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">GetAllocatorWithDefaultOptions</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting allocator</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Get input/output names
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="kt">size_t</span> <span class="n">num_input_nodes</span><span class="p">,</span> <span class="n">num_output_nodes</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetInputCount</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">num_input_nodes</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting input count</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetOutputCount</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">num_output_nodes</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting output count</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;ONNX Model Info:</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;  Input nodes: %zu</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">num_input_nodes</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;  Output nodes: %zu</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">num_output_nodes</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Detect model type: 4 inputs + 5 outputs = model with GRU states
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">has_gru_states</span> <span class="o">=</span> <span class="p">(</span><span class="n">num_input_nodes</span> <span class="o">==</span> <span class="mi">4</span> <span class="o">&amp;&amp;</span> <span class="n">num_output_nodes</span> <span class="o">==</span> <span class="mi">5</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">has_gru_states</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;  Model type: WITH GRU state inputs/outputs</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1">// Get all input names
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetInputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting features input name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetInputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name_vad_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting VAD state input name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetInputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name_noise_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting noise state input name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetInputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name_denoise_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting denoise state input name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1">// Get all output names
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetOutputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_denoise</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting denoise output name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetOutputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_vad</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting VAD output name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetOutputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_vad_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting VAD state output name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetOutputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_noise_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting noise state output name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetOutputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_denoise_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting denoise state output name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;  Inputs:</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;    [0] %s (features)</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;    [1] %s (VAD GRU state)</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name_vad_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;    [2] %s (noise GRU state)</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name_noise_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;    [3] %s (denoise GRU state)</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name_denoise_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;  Outputs:</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;    [0] %s (denoise)</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_denoise</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;    [1] %s (VAD)</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_vad</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;    [2] %s (VAD GRU state)</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_vad_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;    [3] %s (noise GRU state)</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_noise_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;    [4] %s (denoise GRU state)</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_denoise_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;  Model type: Standard (without GRU state ports)</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1">// Get input name (standard model)
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetInputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting input name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1">// Get output names (standard model)
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetOutputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_denoise</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting denoise output name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">SessionGetOutputName</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">allocator</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_vad</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting VAD output name</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;  Input: %s</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;  Output denoise: %s</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_denoise</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;  Output VAD: %s</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_vad</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Allocate buffers
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_buffer</span> <span class="o">=</span> <span class="p">(</span><span class="kt">float</span><span class="o">*</span><span class="p">)</span><span class="nf">malloc</span><span class="p">(</span><span class="n">FRAME_SIZE</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_buffer</span> <span class="o">=</span> <span class="p">(</span><span class="kt">float</span><span class="o">*</span><span class="p">)</span><span class="nf">malloc</span><span class="p">(</span><span class="n">FRAME_SIZE</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_buffer</span> <span class="o">||</span> <span class="o">!</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_buffer</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error: Memory allocation failed</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Initialize RNNoise state for feature extraction
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">denoise_state</span> <span class="o">=</span> <span class="nf">rnnoise_create</span><span class="p">(</span><span class="nb">NULL</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">denoise_state</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error: Failed to create RNNoise state</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="nf">rnnoise_init</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">denoise_state</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Initialize biquad filter memory
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">mem_hp_x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.0f</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">mem_hp_x</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.0f</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Initialize processing buffers
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="nf">memset</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">X</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">X</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memset</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">P</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">P</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memset</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">Ex</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">Ex</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memset</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">Ep</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">Ep</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memset</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">Exp</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">Exp</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memset</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">lastg</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">lastg</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memset</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">synthesis_mem</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">synthesis_mem</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Initialize frame count
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">frame_count</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Initialize GRU states if model supports it
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="nf">initialize_gru_states</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;ONNX model loaded successfully: %s</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">model_path</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Initialize GRU states
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="kt">void</span> <span class="nf">initialize_gru_states</span><span class="p">(</span><span class="n">RNNoiseContext</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memset</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">vad_gru_state</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">vad_gru_state</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memset</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">noise_gru_state</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">noise_gru_state</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memset</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">denoise_gru_state</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">denoise_gru_state</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">gru_states_initialized</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// ONNX inference with external state management
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="kt">int</span> <span class="nf">onnx_inference_with_states</span><span class="p">(</span><span class="n">RNNoiseContext</span><span class="o">*</span> <span class="n">ctx</span><span class="p">,</span> <span class="k">const</span> <span class="kt">float</span><span class="o">*</span> <span class="n">features</span><span class="p">,</span> <span class="kt">float</span><span class="o">*</span> <span class="n">gains</span><span class="p">,</span> <span class="kt">float</span><span class="o">*</span> <span class="n">vad</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="c1">// Prepare separate input tensors for features and GRU states
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="kt">float</span> <span class="n">features_data</span><span class="p">[</span><span class="mi">42</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">    <span class="kt">float</span> <span class="n">vad_state_data</span><span class="p">[</span><span class="mi">24</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">    <span class="kt">float</span> <span class="n">noise_state_data</span><span class="p">[</span><span class="mi">48</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">    <span class="kt">float</span> <span class="n">denoise_state_data</span><span class="p">[</span><span class="mi">96</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Copy features
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="nf">memcpy</span><span class="p">(</span><span class="n">features_data</span><span class="p">,</span> <span class="n">features</span><span class="p">,</span> <span class="mi">42</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Copy GRU states (use saved states for next frame)
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="nf">memcpy</span><span class="p">(</span><span class="n">vad_state_data</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">vad_gru_state</span><span class="p">,</span> <span class="mi">24</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memcpy</span><span class="p">(</span><span class="n">noise_state_data</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">noise_gru_state</span><span class="p">,</span> <span class="mi">48</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memcpy</span><span class="p">(</span><span class="n">denoise_state_data</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">denoise_gru_state</span><span class="p">,</span> <span class="mi">96</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Create input tensors
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="k">const</span> <span class="kt">int64_t</span> <span class="n">features_shape</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">42</span><span class="p">};</span>
</span></span><span class="line"><span class="cl">    <span class="k">const</span> <span class="kt">int64_t</span> <span class="n">vad_state_shape</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">24</span><span class="p">};</span>
</span></span><span class="line"><span class="cl">    <span class="k">const</span> <span class="kt">int64_t</span> <span class="n">noise_state_shape</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">48</span><span class="p">};</span>
</span></span><span class="line"><span class="cl">    <span class="k">const</span> <span class="kt">int64_t</span> <span class="n">denoise_state_shape</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">96</span><span class="p">};</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">OrtMemoryInfo</span><span class="o">*</span> <span class="n">memory_info</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">OrtStatus</span><span class="o">*</span> <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">CreateCpuMemoryInfo</span><span class="p">(</span><span class="n">OrtArenaAllocator</span><span class="p">,</span> <span class="n">OrtMemTypeDefault</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">memory_info</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error creating memory info</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Create input tensors
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">OrtValue</span><span class="o">*</span> <span class="n">features_tensor</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">OrtValue</span><span class="o">*</span> <span class="n">vad_state_tensor</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">OrtValue</span><span class="o">*</span> <span class="n">noise_state_tensor</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">OrtValue</span><span class="o">*</span> <span class="n">denoise_state_tensor</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">CreateTensorWithDataAsOrtValue</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">memory_info</span><span class="p">,</span> <span class="n">features_data</span><span class="p">,</span> <span class="mi">42</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="n">features_shape</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">features_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error creating features tensor</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseMemoryInfo</span><span class="p">(</span><span class="n">memory_info</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">CreateTensorWithDataAsOrtValue</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">memory_info</span><span class="p">,</span> <span class="n">vad_state_data</span><span class="p">,</span> <span class="mi">24</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="n">vad_state_shape</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">vad_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error creating VAD state tensor</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">features_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseMemoryInfo</span><span class="p">(</span><span class="n">memory_info</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">CreateTensorWithDataAsOrtValue</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">memory_info</span><span class="p">,</span> <span class="n">noise_state_data</span><span class="p">,</span> <span class="mi">48</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="n">noise_state_shape</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">noise_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error creating noise state tensor</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">features_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">vad_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseMemoryInfo</span><span class="p">(</span><span class="n">memory_info</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">CreateTensorWithDataAsOrtValue</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">memory_info</span><span class="p">,</span> <span class="n">denoise_state_data</span><span class="p">,</span> <span class="mi">96</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="n">denoise_state_shape</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">denoise_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error creating denoise state tensor</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">features_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">vad_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">noise_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseMemoryInfo</span><span class="p">(</span><span class="n">memory_info</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Prepare input names and tensors
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">input_names</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name_vad_state</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">                                 <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name_noise_state</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">input_name_denoise_state</span><span class="p">};</span>
</span></span><span class="line"><span class="cl">    <span class="n">OrtValue</span><span class="o">*</span> <span class="n">input_tensors</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="n">features_tensor</span><span class="p">,</span> <span class="n">vad_state_tensor</span><span class="p">,</span> <span class="n">noise_state_tensor</span><span class="p">,</span> <span class="n">denoise_state_tensor</span><span class="p">};</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Prepare output names
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">output_names</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_denoise</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_vad</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">                                  <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_vad_state</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_noise_state</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">                                  <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">output_name_denoise_state</span><span class="p">};</span>
</span></span><span class="line"><span class="cl">    <span class="n">OrtValue</span><span class="o">*</span> <span class="n">output_tensors</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">};</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Run inference
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">Run</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">session</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">input_names</span><span class="p">,</span> <span class="p">(</span><span class="k">const</span> <span class="n">OrtValue</span><span class="o">*</span> <span class="k">const</span><span class="o">*</span><span class="p">)</span><span class="n">input_tensors</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                   <span class="n">output_names</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="n">output_tensors</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error running inference</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">features_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">vad_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">noise_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">denoise_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseMemoryInfo</span><span class="p">(</span><span class="n">memory_info</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Get output data
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="kt">float</span><span class="o">*</span> <span class="n">denoise_output</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">float</span><span class="o">*</span> <span class="n">vad_output</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">float</span><span class="o">*</span> <span class="n">updated_vad_state</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">float</span><span class="o">*</span> <span class="n">updated_noise_state</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">float</span><span class="o">*</span> <span class="n">updated_denoise_state</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">GetTensorMutableData</span><span class="p">(</span><span class="n">output_tensors</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="o">&amp;</span><span class="n">denoise_output</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting denoise output data</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">goto</span> <span class="n">cleanup</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">GetTensorMutableData</span><span class="p">(</span><span class="n">output_tensors</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="o">&amp;</span><span class="n">vad_output</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting VAD output data</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">goto</span> <span class="n">cleanup</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">GetTensorMutableData</span><span class="p">(</span><span class="n">output_tensors</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="o">&amp;</span><span class="n">updated_vad_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting updated VAD state data</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">goto</span> <span class="n">cleanup</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">GetTensorMutableData</span><span class="p">(</span><span class="n">output_tensors</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="o">&amp;</span><span class="n">updated_noise_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting updated noise state data</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">goto</span> <span class="n">cleanup</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">status</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">GetTensorMutableData</span><span class="p">(</span><span class="n">output_tensors</span><span class="p">[</span><span class="mi">4</span><span class="p">],</span> <span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="o">&amp;</span><span class="n">updated_denoise_state</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nf">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&#34;Error getting updated denoise state data</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="k">goto</span> <span class="n">cleanup</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Store results
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="nf">memcpy</span><span class="p">(</span><span class="n">gains</span><span class="p">,</span> <span class="n">denoise_output</span><span class="p">,</span> <span class="n">NB_BANDS</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="o">*</span><span class="n">vad</span> <span class="o">=</span> <span class="n">vad_output</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1">// Update GRU states with the outputs from the model (for next frame)
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="nf">memcpy</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">vad_gru_state</span><span class="p">,</span> <span class="n">updated_vad_state</span><span class="p">,</span> <span class="mi">24</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memcpy</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">noise_gru_state</span><span class="p">,</span> <span class="n">updated_noise_state</span><span class="p">,</span> <span class="mi">48</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">memcpy</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">denoise_gru_state</span><span class="p">,</span> <span class="n">updated_denoise_state</span><span class="p">,</span> <span class="mi">96</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">gru_states_initialized</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl"><span class="nl">cleanup</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="c1">// Cleanup
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">features_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">vad_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">noise_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">denoise_state_tensor</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">5</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">output_tensors</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseValue</span><span class="p">(</span><span class="n">output_tensors</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">api</span><span class="o">-&gt;</span><span class="nf">ReleaseMemoryInfo</span><span class="p">(</span><span class="n">memory_info</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h3 id="33-推理性能对比">3.3 推理性能对比</h3>
<p>可以看到在不需要自己手搓各个算子的C实现，借助ORT就可以实现接近5倍的性能提升，这投入回报比可是不要太高了。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">=== Overall Inference Time Statistics ===
</span></span><span class="line"><span class="cl">Total frames processed: 2048
</span></span><span class="line"><span class="cl">Frames with inference: 2047
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">ONNX Inference:
</span></span><span class="line"><span class="cl">  Total time: 73.568 ms
</span></span><span class="line"><span class="cl">  Average per frame: 0.036 ms
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">C Inference:
</span></span><span class="line"><span class="cl">  Total time: 349.820 ms
</span></span><span class="line"><span class="cl">  Average per frame: 0.171 ms
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Comparison:
</span></span><span class="line"><span class="cl">  ONNX / C ratio: 0.21x
</span></span><span class="line"><span class="cl">  Speedup: 4.76x (ONNX faster)
</span></span><span class="line"><span class="cl">==========================================</span></span></code></pre></td></tr></table>
</div>
</div>
<h2 id="四总结">四、总结</h2>
<p>ORT作为跨平台的推理引擎，在语音降噪模型部署中具有显著优势。正确使用ORT需要：</p>
<ol>
<li><strong>理解基本概念</strong>：掌握InferenceSession、Execution Provider等核心概念</li>
<li><strong>遵循推理流程</strong>：按照标准的加载、准备、执行、获取结果流程</li>
<li><strong>管理隐状态</strong>：对于时序模型，必须正确管理隐状态的传递和更新</li>
<li><strong>性能优化</strong>：根据场景选择合适的优化选项和执行提供者</li>
</ol>
<p>对于实时语音降噪场景，隐状态管理是关键，需要仔细设计状态传递逻辑，确保模型能够正确利用历史信息。</p>
<p>通过合理使用ORT，可以充分发挥深度学习语音降噪模型的性能，实现高效、稳定的实时推理。</p>
<p>另外，ORT还有很多高级特性，大家可以自己摸索尝试下。</p>
<img src="/images/To-Be-Continued.jpeg"/>]]></content:encoded>
    </item>
    <item>
      <title>语音增强算法评估指南</title>
      <link>https://lyapple2008.github.io/posts/202508/2025-08-11-%E9%9F%B3%E9%A2%91%E7%AE%97%E6%B3%95%E8%AF%84%E4%BC%B0/</link>
      <pubDate>Mon, 11 Aug 2025 22:27:28 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202508/2025-08-11-%E9%9F%B3%E9%A2%91%E7%AE%97%E6%B3%95%E8%AF%84%E4%BC%B0/</guid>
      <description>&lt;h1 id=&#34;语音增强算法评估指南&#34;&gt;语音增强算法评估指南&lt;/h1&gt;
&lt;p&gt;如今语音增强算法已成为智能设备、视频会议和助听器等应用的核心，它能从嘈杂环境中“拯救”清晰的语音信号，但如何判断一个算法的好坏？这就是评估的意义所在。今天，我们来聊聊语音增强算法的评估体系，通过一个国际挑战赛作为切入点，带你一步步了解关键指标和计算方法。无论你是初学者还是从业者，这篇文章都能帮你理清思路。&lt;/p&gt;</description>
      <content:encoded><![CDATA[<h1 id="语音增强算法评估指南">语音增强算法评估指南</h1>
<p>如今语音增强算法已成为智能设备、视频会议和助听器等应用的核心，它能从嘈杂环境中“拯救”清晰的语音信号，但如何判断一个算法的好坏？这就是评估的意义所在。今天，我们来聊聊语音增强算法的评估体系，通过一个国际挑战赛作为切入点，带你一步步了解关键指标和计算方法。无论你是初学者还是从业者，这篇文章都能帮你理清思路。</p>
<h2 id="前言为什么需要评估语音增强算法">前言：为什么需要评估语音增强算法？</h2>
<p>想象一下，你开发了一个语音增强模型，自认为它能完美去除背景噪音。但在实际应用中，用户反馈“声音听起来怪怪的”或“某些噪音下完全失效”。这就是为什么评估至关重要：它提供了一个客观、量化的标准，帮助开发者识别算法的优缺点、优化性能，并与其他方法进行公平比较。</p>
<p>评估的作用主要体现在三个方面：</p>
<ul>
<li><strong>指导开发</strong>：通过指标反馈，迭代模型设计，避免主观偏见。</li>
<li><strong>基准比较</strong>：在竞赛或论文中，用统一标准衡量不同算法的进步。</li>
<li><strong>实际部署</strong>：确保算法在真实场景（如移动端或低信噪比环境）下的鲁棒性和通用性。</li>
</ul>
<p>没有评估，算法开发就像盲人摸象；有了评估，它就成了科学工程。接下来，我们以NeurIPS 2024竞赛轨道下的URGENT 2024挑战为例，深入探讨评估体系。这个挑战聚焦于构建通用语音增强模型，强调在不同噪声、采样率和麦克风配置下的表现。</p>
<h2 id="urgent-2024挑战评估体系的典范">URGENT 2024挑战：评估体系的典范</h2>
<p><a href="https://urgent-challenge.github.io/urgent2024/">URGENT 2024（Universality, Robustness, and Generalizability for EnhancemeNT）</a>挑战旨在解决传统语音增强研究的痛点：许多算法只针对特定条件优化，缺乏跨场景泛化能力。挑战要求参赛者使用统一的公共数据集训练单一模型，处理各种失真（如噪声、混响），并支持不同输入格式（如单/多通道、不同采样率）。</p>
<p>这个挑战的亮点在于其全面评估框架，包括非侵入式（无参考信号）和侵入式（需参考信号）指标，以及下游任务相关指标（如词错误率WER）。它还引入主观MOS（Mean Opinion Score）评分作为最终盲测环节的补充。挑战提供ESPnet工具包的基线模型，鼓励数据增强，但严格限制训练数据来源，确保公平性。</p>
<p>通过这个挑战，我们可以看到评估不仅是“打分”，而是推动行业向真实场景迈进的基准。接下来，重点介绍挑战中使用的核心客观指标：PESQ、ESTOI、SDR、MCD、LSD、DNSMOS和NISQA。</p>
<h2 id="评估指标详解每个指标都在测什么">评估指标详解：每个指标都在测什么？</h2>
<p>语音增强评估指标大致分为侵入式（需要干净参考信号）和非侵入式（无需参考，模拟真实场景）。URGENT 2024挑战选用这些指标来全面考察算法的语音质量、可懂度和保真度。下面逐一解释：</p>
<ul>
<li>
<p><strong>PESQ (Perceptual Evaluation of Speech Quality)</strong>：这是一个侵入式指标，通过比较增强后的语音与参考干净信号，评估感知质量。它关注失真和噪声对人类听觉的影响，得分范围通常为-0.5到4.5（越高越好）。在语音增强中，PESQ常用于客观测试算法的整体质量，尤其适合电话或VoIP场景。</p>
</li>
<li>
<p><strong>ESTOI (Extended Short-Time Objective Intelligibility)</strong>：侵入式指标，专注于评估增强语音的可懂度。它分析短时段信号，预测听者在噪声下的理解能力，得分从0到1（越高表示更易懂）。这个指标特别适用于低信噪比环境，帮助算法优化对人类认知的友好度。</p>
</li>
<li>
<p><strong>SDR (Signal-to-Distortion Ratio)</strong>：侵入式指标，计算期望信号能量与失真（包括噪声和伪影）能量的比率，通常以dB为单位（越高越好）。它评估增强信号的整体保真度，在多通道或复杂噪声场景中非常实用。</p>
</li>
<li>
<p><strong>MCD (Mel-Cepstral Distortion)</strong>：侵入式指标，量化增强信号与参考信号在梅尔倒谱系数上的差异（越低越好）。它聚焦谱失真，提供对感知质量的洞察，常用于评估算法对语音频谱的保留能力。</p>
</li>
<li>
<p><strong>LSD (Log-Spectral Distance)</strong>：侵入式指标，测量增强和参考信号功率谱的对数差异（越低越好）。它评估谱准确性，帮助理解算法如何保留原始语音特征，适用于频域分析。</p>
</li>
<li>
<p><strong>DNSMOS (Deep Noise Suppression Mean Opinion Score)</strong>：非侵入式指标，无需参考信号，使用深度学习模型预测语音质量（模拟人类评分，范围1-5）。它基于人类评级训练，适用于真实场景评估，尤其当干净参考不可用时。</p>
</li>
<li>
<p><strong>NISQA (Non-Intrusive Speech Quality Assessment)</strong>：同样是非侵入式指标，使用机器学习预测感知质量，无需参考。它评估整体语音质量，在参考信号缺失的实际部署中大放异彩。</p>
</li>
</ul>
<p>这些指标组合使用，能从质量、可懂度和失真等多维度评估算法。URGENT挑战强调，非侵入式指标如DNSMOS和NISQA更贴近现实，因为真实环境中往往没有干净参考。</p>
<h2 id="评测数据集从哪里获取如何使用">评测数据集：从哪里获取，如何使用？</h2>
<p>要实际计算这些指标，需要可靠的数据集。URGENT 2024挑战通过其GitHub仓库提供官方评测数据集，托管在Hugging Face上，便于下载和使用。</p>
<ul>
<li><strong>官方评测数据集</strong>：包括验证集、非盲测集和盲测集，地址：https://huggingface.co/datasets/urgent-challenge/urgent2024_official。这些数据集包含各种失真条件下的语音样本，适合测试算法的通用性。</li>
<li><strong>MOS数据集</strong>：额外提供带人类标注MOS分数的语音质量评估数据集，地址：https://huggingface.co/datasets/urgent-challenge/urgent2024_mos。用于主观指标验证。</li>
</ul>
<p>访问方式简单：在Hugging Face平台搜索并下载，或使用Python的datasets库加载。数据集设计覆盖不同噪声、混响和麦克风配置，确保评估的全面性。</p>
<blockquote>
<p>可以参考下面的代码将hugging face中的validataion数据集以wav形式保存在本地，方便后续不同算法进行处理后，对处理结果进行评估。</p></blockquote>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span><span class="lnt">31
</span><span class="lnt">32
</span><span class="lnt">33
</span><span class="lnt">34
</span><span class="lnt">35
</span><span class="lnt">36
</span><span class="lnt">37
</span><span class="lnt">38
</span><span class="lnt">39
</span><span class="lnt">40
</span><span class="lnt">41
</span><span class="lnt">42
</span><span class="lnt">43
</span><span class="lnt">44
</span><span class="lnt">45
</span><span class="lnt">46
</span><span class="lnt">47
</span><span class="lnt">48
</span><span class="lnt">49
</span><span class="lnt">50
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">datasets</span> <span class="kn">import</span> <span class="n">load_dataset</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">soundfile</span> <span class="k">as</span> <span class="nn">sf</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">os</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># ===== 参数 =====</span>
</span></span><span class="line"><span class="cl"><span class="n">DATASET_NAME</span> <span class="o">=</span> <span class="s2">&#34;urgent-challenge/urgent2024_official&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">SPLIT</span> <span class="o">=</span> <span class="s2">&#34;validation&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">SAVE_DIR</span> <span class="o">=</span> <span class="s2">&#34;../data/validation&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">NUM_PROC</span> <span class="o">=</span> <span class="mi">8</span>  <span class="c1"># 并行进程数，可以改成你的 CPU 核心数</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># ===== 目录准备 =====</span>
</span></span><span class="line"><span class="cl"><span class="n">clean_dir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">SAVE_DIR</span><span class="p">,</span> <span class="s2">&#34;clean&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">noisy_dir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">SAVE_DIR</span><span class="p">,</span> <span class="s2">&#34;noisy&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">clean_dir</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">noisy_dir</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># ===== 加载数据 =====</span>
</span></span><span class="line"><span class="cl"><span class="n">dataset</span> <span class="o">=</span> <span class="n">load_dataset</span><span class="p">(</span><span class="n">DATASET_NAME</span><span class="p">,</span> <span class="n">SPLIT</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># ===== 保存函数 =====</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">save_audio</span><span class="p">(</span><span class="n">example</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">uid</span> <span class="o">=</span> <span class="n">example</span><span class="p">[</span><span class="s2">&#34;id&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># noisy</span>
</span></span><span class="line"><span class="cl">    <span class="n">noisy</span> <span class="o">=</span> <span class="n">example</span><span class="p">[</span><span class="s2">&#34;noisy_audio&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="n">noisy_samples</span><span class="p">,</span> <span class="n">noisy_sr</span> <span class="o">=</span> <span class="n">noisy</span><span class="p">[</span><span class="s2">&#34;array&#34;</span><span class="p">],</span> <span class="n">noisy</span><span class="p">[</span><span class="s2">&#34;sampling_rate&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># clean</span>
</span></span><span class="line"><span class="cl">    <span class="n">clean</span> <span class="o">=</span> <span class="n">example</span><span class="p">[</span><span class="s2">&#34;clean_audio&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="n">clean_samples</span><span class="p">,</span> <span class="n">clean_sr</span> <span class="o">=</span> <span class="n">clean</span><span class="p">[</span><span class="s2">&#34;array&#34;</span><span class="p">],</span> <span class="n">clean</span><span class="p">[</span><span class="s2">&#34;sampling_rate&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># 确保采样率一致</span>
</span></span><span class="line"><span class="cl">    <span class="k">assert</span> <span class="n">noisy_sr</span> <span class="o">==</span> <span class="n">clean_sr</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&#34;Sample rate mismatch: noisy=</span><span class="si">{</span><span class="n">noisy_sr</span><span class="si">}</span><span class="s2">, clean=</span><span class="si">{</span><span class="n">clean_sr</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># 保存文件路径</span>
</span></span><span class="line"><span class="cl">    <span class="n">noisy_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">noisy_dir</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">uid</span><span class="si">}</span><span class="s2">.wav&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">clean_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">clean_dir</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">uid</span><span class="si">}</span><span class="s2">.wav&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># 写文件</span>
</span></span><span class="line"><span class="cl">    <span class="n">sf</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">noisy_path</span><span class="p">,</span> <span class="n">noisy_samples</span><span class="p">,</span> <span class="n">noisy_sr</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">sf</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">clean_path</span><span class="p">,</span> <span class="n">clean_samples</span><span class="p">,</span> <span class="n">clean_sr</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="p">{</span><span class="s2">&#34;noisy_path&#34;</span><span class="p">:</span> <span class="n">noisy_path</span><span class="p">,</span> <span class="s2">&#34;clean_path&#34;</span><span class="p">:</span> <span class="n">clean_path</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># ===== 多进程导出 =====</span>
</span></span><span class="line"><span class="cl"><span class="n">dataset</span><span class="o">.</span><span class="n">map</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">save_audio</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">num_proc</span><span class="o">=</span><span class="n">NUM_PROC</span><span class="p">,</span>      <span class="c1"># 开启多进程</span>
</span></span><span class="line"><span class="cl">    <span class="n">desc</span><span class="o">=</span><span class="s2">&#34;Exporting audio&#34;</span><span class="p">,</span> <span class="c1"># tqdm 进度条描述</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h2 id="指标计算实践一步步上手">指标计算实践：一步步上手</h2>
<p>计算这些指标需要工具和脚本。URGENT挑战的GitHub仓库（https://github.com/urgent-challenge/urgent2024_challenge）提供了evaluation_metrics文件夹下的实用脚本，如calculate_intrusive_se_metrics.py（处理PESQ、ESTOI、SDR、MCD、LSD等侵入式指标，支持无限SDR值处理），calculate_nonintrusive_dnsmos.py和calculate_nonintrusive_nisqa.py分别计算DNSMOS和NISQA。</p>
<p><strong>计算步骤示例</strong>：</p>
<blockquote>
<p>参考evaluation_metrics/README.md</p></blockquote>
<ol>
<li><strong>数据准备</strong>对validation数据集中的noisy数据进行处理，得到enhanced的数据，按下面的目录结构进行组织
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span><span class="lnt">9
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">📁 /path/to/your/data/
</span></span><span class="line"><span class="cl">├── 📁 enhanced/
</span></span><span class="line"><span class="cl">│   ├── 🔈 fileid_1.wav
</span></span><span class="line"><span class="cl">│   ├── 🔈 fileid_2.wav
</span></span><span class="line"><span class="cl">│   └── ...
</span></span><span class="line"><span class="cl">└── 📁 clean/
</span></span><span class="line"><span class="cl">    ├── 🔈 fileid_1.wav
</span></span><span class="line"><span class="cl">    ├── 🔈 fileid_2.wav
</span></span><span class="line"><span class="cl">    └── ...</span></span></code></pre></td></tr></table>
</div>
</div>
</li>
<li><strong>生成scp文件</strong> 为clean数据和enhance数据生成对应的scp文件，在scp文件中包含两列信息，一列是一个唯一的文件id，一列是文件路径，如下
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span><span class="lnt">9
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl"># enhanced.scp
</span></span><span class="line"><span class="cl">fileid_1 /path/to/your/data/enhanced/fileid_1.flac
</span></span><span class="line"><span class="cl">fileid_2 /path/to/your/data/enhanced/fileid_2.flac
</span></span><span class="line"><span class="cl">...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"># reference.scp
</span></span><span class="line"><span class="cl">fileid_1 /path/to/your/data/clean/fileid_1.flac
</span></span><span class="line"><span class="cl">fileid_2 /path/to/your/data/clean/fileid_2.flac
</span></span><span class="line"><span class="cl">...</span></span></code></pre></td></tr></table>
</div>
</div>
</li>
<li><strong>运行脚本</strong>：例如，对于侵入式指标，运行<code>calculate_intrusive_se_metrics.py</code>输入增强文件和参考文件，输出PESQ等分数。非侵入式如DNSMOS可直接输入增强语音。</li>
<li><strong>示例代码片段</strong>：
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="cp">#!/bin/bash
</span></span></span><span class="line"><span class="cl"><span class="cp"></span><span class="nv">nj</span><span class="o">=</span><span class="m">8</span>  <span class="c1"># Number of parallel CPU jobs for speedup</span>
</span></span><span class="line"><span class="cl"><span class="nv">python</span><span class="o">=</span>python3
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nv">output_prefix</span><span class="o">=</span>metrics_score
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># PESQ, ESTOI, SDR, MCD, LSD</span>
</span></span><span class="line"><span class="cl"><span class="si">${</span><span class="nv">python</span><span class="si">}</span> calculate_intrusive_se_metrics.py <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>   --ref_scp reference.scp <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>   --inf_scp enhanced_webrtc.scp <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>   --output_dir <span class="s2">&#34;</span><span class="si">${</span><span class="nv">output_prefix</span><span class="si">}</span><span class="s2">&#34;</span>/scoring_webrtc <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>   --nj <span class="si">${</span><span class="nv">nj</span><span class="si">}</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>   --chunksize <span class="m">60</span></span></span></code></pre></td></tr></table>
</div>
</div>
对于批量计算，仓库脚本支持文件夹输入。</li>
</ol>
<p>在实践中，建议结合主观听测（如MOS）验证客观指标，避免“高分低体验”的情况。</p>
<h2 id="结语评估驱动创新">结语：评估驱动创新</h2>
<p>语音增强算法评估不是终点，而是起点。通过URGENT 2024这样的挑战，我们看到评估体系在推动算法向通用、鲁棒方向演进。未来，随着更多非侵入式指标和多模态数据的融入，评估将更贴近真实世界。</p>
<p>最后我们来看下之前介绍的WebRTC NS是什么水平吧，最终的指标如下所示。</p>
<p><img alt="webrtc ns算法评估指标" loading="lazy" src="/images/2025-08-11/validation_webrtc_ns.png"></p>
<p><img alt="noisy" loading="lazy" src="/images/2025-08-11/validation_noisy.png"></p>
<p>可以看到WebRTC NS的评估指标分数只比noisy的分数好一点点，可见还有很大的上升空间。传统算法或多或少都会基于一些假设，而这些假设不只小范围内是生效的，这造成了传统算法的局限性。近年深度学习基于数据驱动的方法，进一步突破这些局限性，极大提高了音频算法的效果上线。下期预告，让我们迈上深度学习时代吧。</p>
<img src="/images/To-Be-Continued.jpeg"/>]]></content:encoded>
    </item>
  </channel>
</rss>
