<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>开发日志 on BeYoung</title>
    <link>https://lyapple2008.github.io/tags/%E5%BC%80%E5%8F%91%E6%97%A5%E5%BF%97/</link>
    <description>Recent content in 开发日志 on BeYoung</description>
    <image>
      <title>BeYoung</title>
      <url>https://lyapple2008.github.io/%3Clink%20or%20path%20of%20image%20for%20opengraph,%20twitter-cards%3E</url>
      <link>https://lyapple2008.github.io/%3Clink%20or%20path%20of%20image%20for%20opengraph,%20twitter-cards%3E</link>
    </image>
    <generator>Hugo -- 0.147.9</generator>
    <language>zh</language>
    <copyright>See this site&amp;rsquo;s source code here, licensed under GPLv3 ·</copyright>
    <lastBuildDate>Mon, 09 Mar 2026 18:24:47 +0800</lastBuildDate>
    <atom:link href="https://lyapple2008.github.io/tags/%E5%BC%80%E5%8F%91%E6%97%A5%E5%BF%97/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>基于iOS系统接口实现双语字幕App</title>
      <link>https://lyapple2008.github.io/posts/202603/2026-03-09-%E5%9F%BA%E4%BA%8Eios%E7%B3%BB%E7%BB%9F%E6%8E%A5%E5%8F%A3%E5%AE%9E%E7%8E%B0%E5%8F%8C%E8%AF%AD%E5%AD%97%E5%B9%95app/</link>
      <pubDate>Mon, 09 Mar 2026 18:24:47 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202603/2026-03-09-%E5%9F%BA%E4%BA%8Eios%E7%B3%BB%E7%BB%9F%E6%8E%A5%E5%8F%A3%E5%AE%9E%E7%8E%B0%E5%8F%8C%E8%AF%AD%E5%AD%97%E5%B9%95app/</guid>
      <description>&lt;p&gt;之前预告的 iOS 系统双语字幕 App，终于完成了。废话不多说，直接展示效果。目前的实现是通过 iOS 系统接口进行的，作为一个 baseline，后面也可以接入第三方开源方案。文末有项目链接，各位道友自取。&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>之前预告的 iOS 系统双语字幕 App，终于完成了。废话不多说，直接展示效果。目前的实现是通过 iOS 系统接口进行的，作为一个 baseline，后面也可以接入第三方开源方案。文末有项目链接，各位道友自取。</p>
<h2 id="模块流程">模块流程</h2>

<div class="mermaid">graph LR
    A[捕获系统播放音频] --&gt; B[语音识别ASR]
    B --&gt; C[语言翻译]
    B --&gt; D[双语字幕]
    C --&gt; D</div>
<h2 id="效果展示">效果展示</h2>
<iframe width="100%" max-width="400" height="600" src="//player.bilibili.com/player.html?bvid=BV1irXGBtEnU&autoplay=0" frameborder="0" allowfullscreen></iframe>
<h2 id="一些感想">一些感想</h2>
<p>这也是第一个完全 AI Coding 的项目，有了 AI Coding 后有了一种自己无所不能的错觉。作为一个大龄程序员，已经放弃抵抗了，打不过咱就加入。</p>
<h3 id="ai-是放大器">AI 是放大器</h3>
<p>在做这个 Project 的过程中，对我来说最大的困难是 UI 部分。<a href="/posts/202601/2026-02-01-%E5%85%B3%E4%BA%8Eaicoding%E7%9A%84%E4%B8%80%E4%BA%9B%E6%84%9F%E6%83%B3%E5%92%8C%E5%90%8E%E7%BB%AD%E7%9A%84%E8%A7%84%E5%88%92/">关于 AICoding 的一些感想和后续的规划</a> 中提到的，AI 的能力约等于使用者自身的能力。最近听到&quot;AI 是放大器&quot;这个说法，这个比喻可真的是贴切形象了。假如 AI 可以放大 10 倍，如果应用在你熟悉的领域是 10 分，放大后就是 100 分；如果应用在你不熟悉的领域，比如 iOS 客户端开发，对我来说约等于 0 分，放大后也是约等于 0 分。那这个时候就看天吃饭了，AI 给什么就吃什么，就算有问题我也无法判断，更别说是纠正 AI 了。所以网上一堆零基础开发一个 App 上线并产生收入的，真的可能吗？难道我的使用姿势不对？</p>
<h3 id="狠狠地用起来">狠狠地用起来</h3>
<p>虽然对于自己不熟悉的领域，使用 AI 产生可用代码的偶然性挺大的，不过作为 AI 协作者，AI 在学习人类的代码，人类也可以学习 AI 的代码。在这个项目中，对于自己不熟悉的客户端 UI 部分的代码，我就会让 AI 给我讲解一遍，从语法到架构、为什么这么写。在提问 → 学习 → 再提问 → 再学习的不断迭代过程中，也在慢慢提高 AI 输出自己不熟悉代码的可控性和判断力。</p>
<p>现在是将想法落地最好的时代。以前你可能会卡在某个问题因为找不到有效的解答而不了了之，或者因为某个技术不懂而不能落地，但是现在你随时可以通过 AI 大模型得到你想要的答案，协助你实现自己的 idea。要像大神 Karpathy 一样，因为 token 没用完而感到焦虑，不停去跟 AI 交流、提问、讨论，落地想法。去尝试将 AI 嵌入到自己的工程流程中，这个我也还没找到方向，但是我觉得这个方向是没错的。狠狠地、使劲用起来吧。</p>
<h2 id="项目链接">项目链接</h2>
<p><a href="https://github.com/lyapple2008/DoubleSubtitleUseSystemAPI">DoubleSubtitleUseSystemAPI</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>ASR任务初体验</title>
      <link>https://lyapple2008.github.io/posts/202602/2026-02-14-asr%E4%BB%BB%E5%8A%A1%E5%88%9D%E4%BD%93%E9%AA%8C/</link>
      <pubDate>Sat, 14 Feb 2026 17:59:07 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202602/2026-02-14-asr%E4%BB%BB%E5%8A%A1%E5%88%9D%E4%BD%93%E9%AA%8C/</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;这篇继续是开发日志，正在开发一款iOS端的实时双语字幕APP，由于需要用到语音识别，了解了下语音识别任务的现状和主流方案，为后面方案选择做准备。&lt;/p&gt;</description>
      <content:encoded><![CDATA[<blockquote>
<p>这篇继续是开发日志，正在开发一款iOS端的实时双语字幕APP，由于需要用到语音识别，了解了下语音识别任务的现状和主流方案，为后面方案选择做准备。</p></blockquote>
<h3 id="模块流程">模块流程</h3>

<div class="mermaid">graph LR
    A[捕获系统播放音频] --&gt; B[语音识别ASR]
    B --&gt; C[语言翻译]
    B --&gt; D[双语字幕]
    C --&gt; D</div>
<h1 id="什么是asr任务">什么是ASR任务</h1>
<p>ASR（Automatic Speech Recognition，自动语音识别）任务，指的是将连续的音频信号转换为对应的文本序列的过程。这是一种典型的序列到序列（Sequence-to-Sequence）的转换任务：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">audio (连续信号)  →  text (离散 token)</span></span></code></pre></td></tr></table>
</div>
</div>
<p>从信号处理的角度来看，传统ASR需要解决以下核心问题：</p>
<ol>
<li><strong>声学建模</strong>：将音频特征映射到音素或字符</li>
<li><strong>语言建模</strong>：捕捉词汇之间的概率关系</li>
<li><strong>对齐问题</strong>：音频帧与输出token之间的对齐</li>
</ol>
<p>不过，在现代端到端深度学习方案中，这三个问题都被统一在一个神经网络中解决，不再需要独立的声学模型和语言模型。模型通过端到端训练自动学习如何从音频特征直接映射到文本输出。</p>
<h1 id="与降噪任务的区别">与降噪任务的区别</h1>
<p>在开发iOS双语字幕的过程中，我之前可能接触过降噪任务，这两者有本质区别：</p>
<table>
  <thead>
      <tr>
          <th>维度</th>
          <th>降噪任务</th>
          <th>ASR任务</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>输入输出</strong></td>
          <td>音频 → 音频</td>
          <td>音频 → 文本</td>
      </tr>
      <tr>
          <td><strong>任务类型</strong></td>
          <td>信号回归</td>
          <td>序列到序列</td>
      </tr>
      <tr>
          <td><strong>评估指标</strong></td>
          <td>SNR、PESQ</td>
          <td>WER、CER</td>
      </tr>
      <tr>
          <td><strong>难点</strong></td>
          <td>保留语音质量</td>
          <td>识别准确性</td>
      </tr>
      <tr>
          <td><strong>模型结构</strong></td>
          <td>Encoder</td>
          <td>Encoder-Decoder</td>
      </tr>
  </tbody>
</table>
<p>简单来说：</p>
<ul>
<li><strong>降噪任务</strong>：输入一段有噪声的音频，输出干净的音频（同类转换）</li>
<li><strong>ASR任务</strong>：输入音频，输出文字（跨模态转换）</li>
</ul>
<p>ASR的难点在于它需要&quot;理解&quot;音频内容并转换为语义符号，而不是简单地处理信号波形。同时，ASR任务的输入和输出也不像降噪任务那样是一一对应的关系，还需要处理对齐问题。</p>
<h1 id="目前主流的实现方案">目前主流的实现方案</h1>
<p>主流的ASR实现方案主要有三种：CTC、RNN-T和基于Attention的Seq2Seq。</p>
<h2 id="ctc-connectionist-temporal-classification">CTC (Connectionist Temporal Classification)</h2>
<p>CTC是一种经典的对齐方法，核心思想是<strong>不需要显式的对齐标签</strong>，而是通过&quot;blank&quot;机制自动学习对齐。</p>
<p>CTC引入了空白符（blank）和折叠机制：</p>
<ul>
<li>重复的字符会被折叠（如 &ldquo;aaabbb&rdquo; → &ldquo;ab&rdquo;）</li>
<li>blank符号不产生任何输出</li>
<li>通过动态规划计算所有可能路径的概率</li>
</ul>
<p><a href="https://distill.pub/2017/ctc/">Sequence Modeling With CTC</a> 详细原理可参考这篇文章。</p>
<p>训练阶段就是使目标token序列路径的概率最大，而推理阶段就是搜索概率最大的token路径并输出，这里通常有两种方法：</p>
<table>
  <thead>
      <tr>
          <th>方法</th>
          <th>描述</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Greedy Search</strong></td>
          <td>每一步都选择概率最大的token输出</td>
      </tr>
      <tr>
          <td><strong>Beam Search</strong></td>
          <td>每一步保留top-N概率的token结果，输出token序列路径概率最大的结果</td>
      </tr>
  </tbody>
</table>
<h2 id="rnn-t-recurrent-neural-network-transducer">RNN-T (Recurrent Neural Network Transducer)</h2>
<p>RNN-T是CTC的扩展，引入了一个额外的预测网络（Prediction Network）来建模输出token之间的依赖关系。</p>
<p>RNN-T由三部分组成：</p>
<ol>
<li><strong>编码器（Encoder）</strong>：将音频特征转换为声学表征</li>
<li><strong>预测网络（Prediction Network）</strong>：基于已输出的token预测下一个token</li>
<li><strong>联合网络（Joint Network）</strong>：结合Encoder和Prediction的输出，预测下一个token</li>
</ol>
<p>RNN-T在输出时，不仅会输入当前帧音频数据，还会输入历史输出作为参考</p>
<h2 id="seq2seq-with-attention">Seq2Seq with Attention</h2>
<p>基于Attention机制的序列到序列模型是目前最流行的方案，被广泛用于Whisper、Paraformer等现代ASR系统。</p>
<h3 id="原理">原理</h3>
<p>经典结构：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">Encoder → Attention → Decoder → Output</span></span></code></pre></td></tr></table>
</div>
</div>
<ul>
<li><strong>Encoder</strong>：将音频特征编码为高维表征（通常使用Transformer或Conformer）</li>
<li><strong>Attention</strong>：让Decoder在生成每个token时&quot;关注&quot;输入的不同部分</li>
<li><strong>Decoder</strong>：自回归生成输出文本</li>
</ul>
<h3 id="优缺点">优缺点</h3>
<p><strong>优点</strong>：</p>
<ul>
<li>可以建模任意长度序列的依赖关系</li>
<li>识别准确率高</li>
<li>易于添加语言模型集成</li>
<li>适合大规模预训练</li>
</ul>
<p><strong>缺点</strong>：</p>
<ul>
<li>推理延迟较高（需要完整音频或较大chunk）</li>
<li>流式识别实现复杂</li>
<li>计算资源需求大</li>
</ul>
<h1 id="流式模型和非流式模型">流式模型和非流式模型</h1>
<p>流式（Streaming）和非流式（Offline）是ASR系统根据实时性要求的两种设计模式。</p>
<h2 id="什么是非流式模型">什么是非流式模型</h2>
<p>非流式模型（Offline ASR）需要<strong>等待完整音频输入后才能开始识别</strong>。</p>
<p>特点：</p>
<ul>
<li>输入：完整的音频文件或长音频段</li>
<li>延迟：较高，需要等待音频结束</li>
<li>准确率：通常较高，因为可以看到完整的上下文</li>
<li>适用场景：视频字幕生成、会议转录、录音文件处理</li>
</ul>
<h2 id="什么是流式模型">什么是流式模型</h2>
<p>流式模型（Streaming ASR）可以<strong>边接收音频输入边输出识别结果</strong>，实现实时识别。</p>
<p>特点：</p>
<ul>
<li>输入：连续的音频流（通常是短片段，如30ms一帧）</li>
<li>延迟：低，可以做到几百毫秒内输出</li>
<li>准确率：通常略低于非流式，因为只有历史和部分未来上下文</li>
<li>适用场景：实时语音对话、语音助手、直播字幕</li>
</ul>
<h2 id="技术实现差异">技术实现差异</h2>
<table>
  <thead>
      <tr>
          <th>维度</th>
          <th>流式模型</th>
          <th>非流式模型</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>上下文</strong></td>
          <td>有限上下文（lookahead）</td>
          <td>完整上下文</td>
      </tr>
      <tr>
          <td><strong>延迟</strong></td>
          <td>低（&lt;500ms）</td>
          <td>高</td>
      </tr>
      <tr>
          <td><strong>准确率</strong></td>
          <td>略低</td>
          <td>较高</td>
      </tr>
      <tr>
          <td><strong>模型复杂度</strong></td>
          <td>较高（需处理分段）</td>
          <td>较低</td>
      </tr>
      <tr>
          <td><strong>内存占用</strong></td>
          <td>较小</td>
          <td>较大</td>
      </tr>
  </tbody>
</table>
<p>流式模型通常需要特殊设计，如：</p>
<ul>
<li><strong>Chunked Attention</strong>：将音频分成小块处理</li>
<li><strong>CTC Prefix</strong>：使用CTC的前缀解码</li>
<li><strong>Lookahead</strong>：只考虑有限的未来帧</li>
</ul>
<hr>
<h1 id="自回归解码和非自回归解码">自回归解码和非自回归解码</h1>
<p>解码方式决定了模型如何生成输出文本。</p>
<h2 id="自回归解码-autoregressive-decoding">自回归解码 (Autoregressive Decoding)</h2>
<p>自回归解码是目前最主流的方式，特点是<strong>逐 token 生成，每个 token 的生成依赖之前所有生成的 token</strong>。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">输出: &#34;hello world&#34;
</span></span><span class="line"><span class="cl">生成过程:
</span></span><span class="line"><span class="cl">  1. 生成 &#34;h&#34;
</span></span><span class="line"><span class="cl">  2. 基于 &#34;h&#34; 生成 &#34;he&#34;
</span></span><span class="line"><span class="cl">  3. 基于 &#34;he&#34; 生成 &#34;hel&#34;
</span></span><span class="line"><span class="cl">  4. ...</span></span></code></pre></td></tr></table>
</div>
</div>
<p>特点：</p>
<ul>
<li><strong>优点</strong>：生成质量高，可以建模长期依赖</li>
<li><strong>缺点</strong>：串行生成，推理速度慢（O(n) 复杂度，n为输出长度）</li>
<li>典型模型：Transformer Decoder、RNN-T</li>
</ul>
<h2 id="非自回归解码-non-autoregressive-decoding">非自回归解码 (Non-autoregressive Decoding)</h2>
<p>非自回归解码是一种<strong>并行生成</strong>方式，一次性输出整个序列。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">输出: &#34;hello world&#34;
</span></span><span class="line"><span class="cl">生成过程:
</span></span><span class="line"><span class="cl">  1. 直接输出完整句子 &#34;hello world&#34;</span></span></code></pre></td></tr></table>
</div>
</div>
<p>特点：</p>
<ul>
<li><strong>优点</strong>：并行生成，推理速度快（O(1) 复杂度）</li>
<li><strong>缺点</strong>：难以建模输出token之间的依赖，生成质量可能较低</li>
<li>典型实现：CTC、FastCorrect、Mask-Predict</li>
</ul>
<h2 id="对比">对比</h2>
<table>
  <thead>
      <tr>
          <th>维度</th>
          <th>自回归解码</th>
          <th>非自回归解码</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>生成方式</strong></td>
          <td>串行，逐token</td>
          <td>并行，一次性输出</td>
      </tr>
      <tr>
          <td><strong>推理速度</strong></td>
          <td>慢</td>
          <td>快</td>
      </tr>
      <tr>
          <td><strong>生成质量</strong></td>
          <td>高</td>
          <td>略低</td>
      </tr>
      <tr>
          <td><strong>依赖关系</strong></td>
          <td>建模token间依赖</td>
          <td>假设条件独立</td>
      </tr>
      <tr>
          <td><strong>典型模型</strong></td>
          <td>RNN-T、Seq2Seq</td>
          <td>CTC、FastConformer</td>
      </tr>
  </tbody>
</table>
<h1 id="方案选择">方案选择</h1>
<p>OK，前面了解了这么多，都是为了后续实现最小原型产品，选择语音识别方案做准备，目标不是要训练一个SOTA模型，因此这里暂时只是粗浅的了解。</p>
<p>根据需求分析：</p>
<ul>
<li><strong>目标设备</strong>：iOS移动端（资源有限）</li>
<li><strong>语言支持</strong>：多语言</li>
<li><strong>实时性</strong>：实时输入音频，实时输出文字</li>
</ul>
<h2 id="选择优先级">选择优先级</h2>
<ol>
<li><strong>首先能跑</strong>：模型大小和计算量必须在移动端可承受范围内</li>
<li><strong>然后多语言</strong>：需要支持多种语言识别</li>
<li><strong>最后准确率</strong>：在功能可用后再优化性能</li>
</ol>
<h2 id="优先级一模型能在移动端跑起来">优先级一：模型能在移动端跑起来</h2>
<h3 id="参数量选择">参数量选择</h3>
<p>移动端资源有限，模型参数量直接决定了能否运行。这里暂时没有明确的数值界限，实际运行起来再看，后面也可以用量化技术缩小模型体积。</p>
<p><strong>量化技术</strong>：INT8量化可将模型体积缩小约4倍，准确率损失通常&lt;5%；INT4可进一步缩小到1/8，但准确率下降更明显。</p>
<h3 id="架构选择">架构选择</h3>
<table>
  <thead>
      <tr>
          <th>架构</th>
          <th>计算量</th>
          <th>内存占用</th>
          <th>移动端适用性</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Transformer Encoder</strong></td>
          <td>中</td>
          <td>中</td>
          <td>★★★★☆</td>
      </tr>
      <tr>
          <td><strong>Conformer</strong></td>
          <td>中高</td>
          <td>中</td>
          <td>★★★★☆</td>
      </tr>
      <tr>
          <td><strong>RNN/LSTM</strong></td>
          <td>低</td>
          <td>低</td>
          <td>★★★☆☆</td>
      </tr>
  </tbody>
</table>
<p><strong>建议</strong>：选择Encoder-only或轻量级的Conformer结构，参数量控制在50M以内。</p>
<h2 id="优先级二支持多语言">优先级二：支持多语言</h2>
<p>确认模型能在移动端运行后，需要考虑多语言支持能力。</p>
<h3 id="不同模型架构的多语言能力">不同模型架构的多语言能力</h3>
<table>
  <thead>
      <tr>
          <th>模型架构</th>
          <th>多语言支持</th>
          <th>说明</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>端到端Seq2Seq</strong></td>
          <td>★★★★★</td>
          <td>训练时使用多语言数据，自然支持多语言</td>
      </tr>
      <tr>
          <td><strong>RNN-T</strong></td>
          <td>★★★☆☆</td>
          <td>可支持，但需要针对性训练多语言版本</td>
      </tr>
      <tr>
          <td><strong>CTC</strong></td>
          <td>★★☆☆☆</td>
          <td>通常针对单一语言，多语言版本较少</td>
      </tr>
  </tbody>
</table>
<h3 id="结论">结论</h3>
<p>需要支持多语言时，<strong>优先选择端到端Seq2Seq架构</strong>，这类模型的预训练版本通常已支持数十到上百种语言。</p>
<h2 id="优先级三实时性要求">优先级三：实时性要求</h2>
<p>实时字幕要求模型能够在接收音频的同时输出文字。</p>
<h3 id="流式-vs-非流式">流式 vs 非流式</h3>
<table>
  <thead>
      <tr>
          <th>类型</th>
          <th>延迟</th>
          <th>实现难度</th>
          <th>实时字幕适用性</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>流式模型</strong></td>
          <td>&lt;500ms</td>
          <td>高</td>
          <td>★★★★★</td>
      </tr>
      <tr>
          <td><strong>非流式模型</strong></td>
          <td>&gt;1s</td>
          <td>低</td>
          <td>★★☆☆☆</td>
      </tr>
  </tbody>
</table>
<p><strong>折中方案</strong>：使用非流式模型时，可通过&quot;分块处理&quot;策略模拟流式效果：</p>
<ul>
<li>将音频切分为固定长度的chunk（如1秒）</li>
<li>逐块识别并拼接结果</li>
<li>通过缓存历史上下文减少误差</li>
</ul>
<h3 id="解码方式">解码方式</h3>
<table>
  <thead>
      <tr>
          <th>解码方式</th>
          <th>延迟</th>
          <th>实现复杂度</th>
          <th>实时性</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>非自回归（Greedy）</strong></td>
          <td>低</td>
          <td>简单</td>
          <td>★★★★★</td>
      </tr>
      <tr>
          <td><strong>非自回归（Beam Search）</strong></td>
          <td>中</td>
          <td>中</td>
          <td>★★★★☆</td>
      </tr>
      <tr>
          <td><strong>自回归</strong></td>
          <td>高</td>
          <td>复杂</td>
          <td>★★☆☆☆</td>
      </tr>
  </tbody>
</table>
<p><strong>建议</strong>：实时场景优先选择<strong>非自回归解码</strong>（Greedy），延迟最低。</p>
<h2 id="总结">总结</h2>
<p><strong>核心理念</strong>：先完成端到端流程验证，再根据实际体验进行针对性优化。移动端ASR是一个迭代过程，不必追求一步到位。</p>
<p>后续我会记录在iOS端的具体实现过程，各位道友记得点赞追番哦。</p>
<p><img alt="各位道友记得一键三连" loading="lazy" src="/images/%E4%B8%80%E9%94%AE%E4%B8%89%E8%BF%9E.jpg"></p>
]]></content:encoded>
    </item>
    <item>
      <title>iOS音频捕获</title>
      <link>https://lyapple2008.github.io/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/</link>
      <pubDate>Sun, 25 Jan 2026 07:52:25 +0800</pubDate>
      <guid>https://lyapple2008.github.io/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;这篇是iOS双语字幕软件的开发日志，目标是在iOS端实现，在观看视频时，实时对播放的内容进行识别和翻译，显示双语字幕，用于打破外语视频内容观看门槛。&lt;/p&gt;</description>
      <content:encoded><![CDATA[<blockquote>
<p>这篇是iOS双语字幕软件的开发日志，目标是在iOS端实现，在观看视频时，实时对播放的内容进行识别和翻译，显示双语字幕，用于打破外语视频内容观看门槛。</p></blockquote>
<h3 id="模块流程">模块流程</h3>

<div class="mermaid">graph LR
    A[捕获系统播放音频] --&gt; B[语音识别ASR]
    B --&gt; C[语言翻译]
    B --&gt; D[双语字幕]
    C --&gt; D</div>
<h1 id="ios音频捕获与数据共享">iOS音频捕获与数据共享</h1>
<p>本文介绍iOS系统音频捕获的实现方案，使用Broadcast Upload Extension捕获系统播放的音频，并通过App Group与主应用共享数据。</p>
<h2 id="目录">目录</h2>
<ul>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#%e4%b8%80%e7%b3%bb%e7%bb%9f%e9%9f%b3%e9%a2%91%e6%8d%95%e8%8e%b7">一、系统音频捕获</a>
<ul>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#broadcast-upload-extension%e9%85%8d%e7%bd%ae">Broadcast Upload Extension配置</a></li>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#%e5%90%af%e5%8a%a8%e4%b8%8e%e5%85%b3%e9%97%ad">启动与关闭</a></li>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#%e9%9f%b3%e9%a2%91%e6%a0%bc%e5%bc%8f%e8%bd%ac%e6%8d%a2">音频格式转换</a></li>
</ul>
</li>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#%e4%ba%8cextension%e4%b8%8e%e4%b8%bb%e5%ba%94%e7%94%a8%e6%95%b0%e6%8d%ae%e5%85%b1%e4%ba%ab">二、Extension与主应用数据共享</a>
<ul>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#app-group%e9%85%8d%e7%bd%ae">App Group配置</a></li>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#%e6%95%b0%e6%8d%ae%e8%af%bb%e5%86%99%e5%ae%9e%e7%8e%b0">数据读写实现</a></li>
<li><a href="/posts/202601/2026-01-25-ios%E9%9F%B3%E9%A2%91%E6%8D%95%E8%8E%B7/#darwin%e9%80%9a%e7%9f%a5">Darwin通知</a></li>
</ul>
</li>
</ul>
<hr>
<h2 id="一系统音频捕获">一、系统音频捕获</h2>
<p>iOS系统出于安全和隐私考虑，<strong>不允许应用直接捕获系统音频</strong>（如视频播放、音乐等，使用通话模式的APP播放的声音捕获不到）。必须使用Broadcast Upload Extension，通过屏幕录制的形式获取音频数据。</p>
<h3 id="broadcast-upload-extension配置">Broadcast Upload Extension配置</h3>
<p>要让Extension收到ReplayKit的数据，必须同时满足：</p>
<ol>
<li>工程里有Broadcast Upload Extension target</li>
<li>主App用系统UI启动broadcast</li>
<li>Extension的Info.plist / Capabilities / 类继承全部正确</li>
</ol>
<p><strong>创建步骤：</strong></p>
<ol>
<li>
<p>在Xcode中新建Extension类型中选择Broadcast Upload Extension</p>
</li>
<li>
<p>配置Info.plist：</p>
</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-xml" data-lang="xml"><span class="line"><span class="cl"><span class="nt">&lt;key&gt;</span>NSExtension<span class="nt">&lt;/key&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;dict&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;key&gt;</span>NSExtensionPointIdentifier<span class="nt">&lt;/key&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;string&gt;</span>com.apple.broadcast-services-upload<span class="nt">&lt;/string&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;key&gt;</span>NSExtensionPrincipalClass<span class="nt">&lt;/key&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;string&gt;</span>$(PRODUCT_MODULE_NAME).SampleHandler<span class="nt">&lt;/string&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;/dict&gt;</span></span></span></code></pre></td></tr></table>
</div>
</div>
<ol start="3">
<li>主App配置UIBackgroundModes：</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-xml" data-lang="xml"><span class="line"><span class="cl"><span class="nt">&lt;key&gt;</span>UIBackgroundModes<span class="nt">&lt;/key&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;array&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;string&gt;</span>audio<span class="nt">&lt;/string&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;/array&gt;</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>注意</strong>：未配置audio可能导致音频接收不到或屏幕锁定后Extension被暂停。</p>
<h3 id="启动与关闭">启动与关闭</h3>
<p>Broadcast upload extension不能在代码中直接启动，只能由系统UI触发。Extension只能自己调用<code>finishBroadcastWithError</code>关闭，主App只能&quot;间接控制&quot;关闭。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kd">let</span> <span class="nv">picker</span> <span class="p">=</span> <span class="n">RPSystemBroadcastPickerView</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">frame</span><span class="p">:</span> <span class="n">CGRect</span><span class="p">(</span><span class="n">x</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="n">width</span><span class="p">:</span> <span class="mi">44</span><span class="p">,</span> <span class="n">height</span><span class="p">:</span> <span class="mi">44</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">picker</span><span class="p">.</span><span class="n">preferredExtension</span> <span class="p">=</span> <span class="s">&#34;com.xxx.broadcast&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">picker</span><span class="p">.</span><span class="n">showsMicrophoneButton</span> <span class="p">=</span> <span class="kc">true</span>
</span></span><span class="line"><span class="cl"><span class="n">view</span><span class="p">.</span><span class="n">addSubview</span><span class="p">(</span><span class="n">picker</span><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h3 id="音频格式转换">音频格式转换</h3>
<p>语音识别引擎接收的音频格式需要是16kHz单声道音频，因此这里需要先进行格式转换。这里需要注意是，并没有官方文档说ReplayKit回调的数据格式类型是怎样的，因此这里需要兼容各种格式。</p>
<p><strong>格式检测与提取：</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kr">override</span> <span class="kd">func</span> <span class="nf">processSampleBuffer</span><span class="p">(</span><span class="kc">_</span> <span class="n">sampleBuffer</span><span class="p">:</span> <span class="n">CMSampleBuffer</span><span class="p">,</span> <span class="n">with</span> <span class="n">sampleBufferType</span><span class="p">:</span> <span class="n">RPSampleBufferType</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="k">case</span> <span class="p">.</span><span class="n">audioApp</span> <span class="p">=</span> <span class="n">sampleBufferType</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="kd">let</span> <span class="nv">formatDescription</span> <span class="p">=</span> <span class="n">CMSampleBufferGetFormatDescription</span><span class="p">(</span><span class="n">sampleBuffer</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">          <span class="kd">let</span> <span class="nv">streamDesc</span> <span class="p">=</span> <span class="n">CMAudioFormatDescriptionGetStreamBasicDescription</span><span class="p">(</span><span class="n">formatDescription</span><span class="p">)?.</span><span class="n">pointee</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">inputSampleRate</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mSampleRate</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">channelCount</span> <span class="p">=</span> <span class="nb">Int</span><span class="p">(</span><span class="n">streamDesc</span><span class="p">.</span><span class="n">mChannelsPerFrame</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">bitsPerChannel</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mBitsPerChannel</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">formatFlags</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mFormatFlags</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isFloat</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsFloat</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isNonInterleaved</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsNonInterleaved</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isBigEndian</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsBigEndian</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// 提取音频数据...</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>完整的音频处理实现：</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">  1
</span><span class="lnt">  2
</span><span class="lnt">  3
</span><span class="lnt">  4
</span><span class="lnt">  5
</span><span class="lnt">  6
</span><span class="lnt">  7
</span><span class="lnt">  8
</span><span class="lnt">  9
</span><span class="lnt"> 10
</span><span class="lnt"> 11
</span><span class="lnt"> 12
</span><span class="lnt"> 13
</span><span class="lnt"> 14
</span><span class="lnt"> 15
</span><span class="lnt"> 16
</span><span class="lnt"> 17
</span><span class="lnt"> 18
</span><span class="lnt"> 19
</span><span class="lnt"> 20
</span><span class="lnt"> 21
</span><span class="lnt"> 22
</span><span class="lnt"> 23
</span><span class="lnt"> 24
</span><span class="lnt"> 25
</span><span class="lnt"> 26
</span><span class="lnt"> 27
</span><span class="lnt"> 28
</span><span class="lnt"> 29
</span><span class="lnt"> 30
</span><span class="lnt"> 31
</span><span class="lnt"> 32
</span><span class="lnt"> 33
</span><span class="lnt"> 34
</span><span class="lnt"> 35
</span><span class="lnt"> 36
</span><span class="lnt"> 37
</span><span class="lnt"> 38
</span><span class="lnt"> 39
</span><span class="lnt"> 40
</span><span class="lnt"> 41
</span><span class="lnt"> 42
</span><span class="lnt"> 43
</span><span class="lnt"> 44
</span><span class="lnt"> 45
</span><span class="lnt"> 46
</span><span class="lnt"> 47
</span><span class="lnt"> 48
</span><span class="lnt"> 49
</span><span class="lnt"> 50
</span><span class="lnt"> 51
</span><span class="lnt"> 52
</span><span class="lnt"> 53
</span><span class="lnt"> 54
</span><span class="lnt"> 55
</span><span class="lnt"> 56
</span><span class="lnt"> 57
</span><span class="lnt"> 58
</span><span class="lnt"> 59
</span><span class="lnt"> 60
</span><span class="lnt"> 61
</span><span class="lnt"> 62
</span><span class="lnt"> 63
</span><span class="lnt"> 64
</span><span class="lnt"> 65
</span><span class="lnt"> 66
</span><span class="lnt"> 67
</span><span class="lnt"> 68
</span><span class="lnt"> 69
</span><span class="lnt"> 70
</span><span class="lnt"> 71
</span><span class="lnt"> 72
</span><span class="lnt"> 73
</span><span class="lnt"> 74
</span><span class="lnt"> 75
</span><span class="lnt"> 76
</span><span class="lnt"> 77
</span><span class="lnt"> 78
</span><span class="lnt"> 79
</span><span class="lnt"> 80
</span><span class="lnt"> 81
</span><span class="lnt"> 82
</span><span class="lnt"> 83
</span><span class="lnt"> 84
</span><span class="lnt"> 85
</span><span class="lnt"> 86
</span><span class="lnt"> 87
</span><span class="lnt"> 88
</span><span class="lnt"> 89
</span><span class="lnt"> 90
</span><span class="lnt"> 91
</span><span class="lnt"> 92
</span><span class="lnt"> 93
</span><span class="lnt"> 94
</span><span class="lnt"> 95
</span><span class="lnt"> 96
</span><span class="lnt"> 97
</span><span class="lnt"> 98
</span><span class="lnt"> 99
</span><span class="lnt">100
</span><span class="lnt">101
</span><span class="lnt">102
</span><span class="lnt">103
</span><span class="lnt">104
</span><span class="lnt">105
</span><span class="lnt">106
</span><span class="lnt">107
</span><span class="lnt">108
</span><span class="lnt">109
</span><span class="lnt">110
</span><span class="lnt">111
</span><span class="lnt">112
</span><span class="lnt">113
</span><span class="lnt">114
</span><span class="lnt">115
</span><span class="lnt">116
</span><span class="lnt">117
</span><span class="lnt">118
</span><span class="lnt">119
</span><span class="lnt">120
</span><span class="lnt">121
</span><span class="lnt">122
</span><span class="lnt">123
</span><span class="lnt">124
</span><span class="lnt">125
</span><span class="lnt">126
</span><span class="lnt">127
</span><span class="lnt">128
</span><span class="lnt">129
</span><span class="lnt">130
</span><span class="lnt">131
</span><span class="lnt">132
</span><span class="lnt">133
</span><span class="lnt">134
</span><span class="lnt">135
</span><span class="lnt">136
</span><span class="lnt">137
</span><span class="lnt">138
</span><span class="lnt">139
</span><span class="lnt">140
</span><span class="lnt">141
</span><span class="lnt">142
</span><span class="lnt">143
</span><span class="lnt">144
</span><span class="lnt">145
</span><span class="lnt">146
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kd">private</span> <span class="kd">let</span> <span class="nv">targetSampleRate</span><span class="p">:</span> <span class="nb">Double</span> <span class="p">=</span> <span class="mf">16000.0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">/// 处理音频样本buffer</span>
</span></span><span class="line"><span class="cl"><span class="kd">private</span> <span class="kd">func</span> <span class="nf">processAudioBuffer</span><span class="p">(</span><span class="kc">_</span> <span class="n">sampleBuffer</span><span class="p">:</span> <span class="n">CMSampleBuffer</span><span class="p">,</span> <span class="n">source</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="kd">let</span> <span class="nv">formatDescription</span> <span class="p">=</span> <span class="n">CMSampleBufferGetFormatDescription</span><span class="p">(</span><span class="n">sampleBuffer</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">          <span class="kd">let</span> <span class="nv">streamDesc</span> <span class="p">=</span> <span class="n">CMAudioFormatDescriptionGetStreamBasicDescription</span><span class="p">(</span><span class="n">formatDescription</span><span class="p">)?.</span><span class="n">pointee</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">inputSampleRate</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mSampleRate</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">channelCount</span> <span class="p">=</span> <span class="nb">Int</span><span class="p">(</span><span class="n">streamDesc</span><span class="p">.</span><span class="n">mChannelsPerFrame</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">bitsPerChannel</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mBitsPerChannel</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">formatFlags</span> <span class="p">=</span> <span class="n">streamDesc</span><span class="p">.</span><span class="n">mFormatFlags</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isFloat</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsFloat</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isNonInterleaved</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsNonInterleaved</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">isBigEndian</span> <span class="p">=</span> <span class="p">(</span><span class="n">formatFlags</span> <span class="o">&amp;</span> <span class="n">kAudioFormatFlagIsBigEndian</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// 获取 AudioBufferList</span>
</span></span><span class="line"><span class="cl">    <span class="kd">var</span> <span class="nv">audioBufferList</span> <span class="p">=</span> <span class="n">AudioBufferList</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="kd">var</span> <span class="nv">blockBuffer</span><span class="p">:</span> <span class="n">CMBlockBuffer</span><span class="p">?</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">status</span> <span class="p">=</span> <span class="n">CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">sampleBuffer</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">bufferListSizeNeededOut</span><span class="p">:</span> <span class="kc">nil</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">bufferListOut</span><span class="p">:</span> <span class="p">&amp;</span><span class="n">audioBufferList</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">bufferListSize</span><span class="p">:</span> <span class="n">MemoryLayout</span><span class="p">&lt;</span><span class="n">AudioBufferList</span><span class="p">&gt;.</span><span class="n">size</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">blockBufferAllocator</span><span class="p">:</span> <span class="kc">nil</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">blockBufferMemoryAllocator</span><span class="p">:</span> <span class="kc">nil</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">flags</span><span class="p">:</span> <span class="n">kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">blockBufferOut</span><span class="p">:</span> <span class="p">&amp;</span><span class="n">blockBuffer</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="n">status</span> <span class="p">==</span> <span class="n">noErr</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">audioBufferListPointer</span> <span class="p">=</span> <span class="n">UnsafeMutableAudioBufferListPointer</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="nb">UnsafeMutablePointer</span><span class="p">&lt;</span><span class="n">AudioBufferList</span><span class="p">&gt;.</span><span class="n">allocate</span><span class="p">(</span><span class="n">capacity</span><span class="p">:</span> <span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">defer</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="n">audioBufferListPointer</span><span class="p">.</span><span class="n">unsafeMutablePointer</span><span class="p">.</span><span class="n">deallocate</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">numBuffers</span> <span class="p">=</span> <span class="n">audioBufferListPointer</span><span class="p">.</span><span class="bp">count</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">frameCount</span> <span class="p">=</span> <span class="n">CMSampleBufferGetNumSamples</span><span class="p">(</span><span class="n">sampleBuffer</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="kd">var</span> <span class="nv">floatSamples</span><span class="p">:</span> <span class="p">[</span><span class="nb">Float</span><span class="p">]</span> <span class="p">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// 处理非交错格式</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">isNonInterleaved</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="kd">var</span> <span class="nv">channelData</span><span class="p">:</span> <span class="p">[[</span><span class="nb">Float</span><span class="p">]]</span> <span class="p">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="n">bufferIndex</span> <span class="k">in</span> <span class="mf">0.</span><span class="p">.&lt;</span><span class="n">numBuffers</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="kd">let</span> <span class="nv">buffer</span> <span class="p">=</span> <span class="n">audioBufferListPointer</span><span class="p">[</span><span class="n">bufferIndex</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">            <span class="k">guard</span> <span class="kd">let</span> <span class="nv">data</span> <span class="p">=</span> <span class="n">buffer</span><span class="p">.</span><span class="n">mData</span> <span class="k">else</span> <span class="p">{</span> <span class="k">continue</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">            <span class="kd">let</span> <span class="nv">dataByteSize</span> <span class="p">=</span> <span class="nb">Int</span><span class="p">(</span><span class="n">buffer</span><span class="p">.</span><span class="n">mDataByteSize</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="kd">var</span> <span class="nv">channelSamples</span><span class="p">:</span> <span class="p">[</span><span class="nb">Float</span><span class="p">]</span> <span class="p">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="n">isFloat</span> <span class="o">&amp;&amp;</span> <span class="n">bitsPerChannel</span> <span class="p">==</span> <span class="mi">32</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="kd">let</span> <span class="nv">floatPtr</span> <span class="p">=</span> <span class="n">data</span><span class="p">.</span><span class="n">assumingMemoryBound</span><span class="p">(</span><span class="n">to</span><span class="p">:</span> <span class="nb">Float</span><span class="p">.</span><span class="kc">self</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="kd">let</span> <span class="nv">count</span> <span class="p">=</span> <span class="n">dataByteSize</span> <span class="o">/</span> <span class="n">MemoryLayout</span><span class="p">&lt;</span><span class="nb">Float</span><span class="p">&gt;.</span><span class="n">size</span>
</span></span><span class="line"><span class="cl">                <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mf">0.</span><span class="p">.&lt;</span><span class="bp">count</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="kd">var</span> <span class="nv">value</span> <span class="p">=</span> <span class="n">floatPtr</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">                    <span class="k">if</span> <span class="n">isBigEndian</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                        <span class="n">value</span> <span class="p">=</span> <span class="nb">Float</span><span class="p">(</span><span class="n">bitPattern</span><span class="p">:</span> <span class="n">value</span><span class="p">.</span><span class="n">bitPattern</span><span class="p">.</span><span class="n">bigEndian</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="cl">                    <span class="n">channelSamples</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="n">bitsPerChannel</span> <span class="p">==</span> <span class="mi">16</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="kd">let</span> <span class="nv">int16Ptr</span> <span class="p">=</span> <span class="n">data</span><span class="p">.</span><span class="n">assumingMemoryBound</span><span class="p">(</span><span class="n">to</span><span class="p">:</span> <span class="nb">Int16</span><span class="p">.</span><span class="kc">self</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="kd">let</span> <span class="nv">count</span> <span class="p">=</span> <span class="n">dataByteSize</span> <span class="o">/</span> <span class="n">MemoryLayout</span><span class="p">&lt;</span><span class="nb">Int16</span><span class="p">&gt;.</span><span class="n">size</span>
</span></span><span class="line"><span class="cl">                <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mf">0.</span><span class="p">.&lt;</span><span class="bp">count</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="kd">var</span> <span class="nv">value</span> <span class="p">=</span> <span class="n">int16Ptr</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">                    <span class="k">if</span> <span class="n">isBigEndian</span> <span class="p">{</span> <span class="n">value</span> <span class="p">=</span> <span class="n">value</span><span class="p">.</span><span class="n">bigEndian</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">                    <span class="n">channelSamples</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="nb">Float</span><span class="p">(</span><span class="n">value</span><span class="p">)</span> <span class="o">/</span> <span class="mf">32768.0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">            <span class="n">channelData</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">channelSamples</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="c1">// 混音为单声道</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="kd">let</span> <span class="nv">firstChannel</span> <span class="p">=</span> <span class="n">channelData</span><span class="p">.</span><span class="bp">first</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="n">channelData</span><span class="p">.</span><span class="bp">count</span> <span class="p">==</span> <span class="mi">1</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="n">floatSamples</span> <span class="p">=</span> <span class="n">firstChannel</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mf">0.</span><span class="p">.&lt;</span><span class="n">firstChannel</span><span class="p">.</span><span class="bp">count</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="kd">var</span> <span class="nv">sum</span><span class="p">:</span> <span class="nb">Float</span> <span class="p">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">                    <span class="k">for</span> <span class="n">ch</span> <span class="k">in</span> <span class="n">channelData</span> <span class="k">where</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">ch</span><span class="p">.</span><span class="bp">count</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                        <span class="n">sum</span> <span class="o">+=</span> <span class="n">ch</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="cl">                    <span class="n">floatSamples</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">sum</span> <span class="o">/</span> <span class="nb">Float</span><span class="p">(</span><span class="n">channelData</span><span class="p">.</span><span class="bp">count</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="c1">// 交错格式处理...</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// 重采样到16kHz</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">inputSampleRate</span> <span class="o">!=</span> <span class="n">targetSampleRate</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="n">floatSamples</span> <span class="p">=</span> <span class="n">resample</span><span class="p">(</span><span class="n">floatSamples</span><span class="p">,</span> <span class="n">from</span><span class="p">:</span> <span class="n">inputSampleRate</span><span class="p">,</span> <span class="n">to</span><span class="p">:</span> <span class="n">targetSampleRate</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">sharedBuffer</span><span class="p">.</span><span class="n">writeAudioSamples</span><span class="p">(</span><span class="n">floatSamples</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">/// 使用AVAudioConverter重采样</span>
</span></span><span class="line"><span class="cl"><span class="kd">private</span> <span class="kd">func</span> <span class="nf">resample</span><span class="p">(</span><span class="kc">_</span> <span class="n">samples</span><span class="p">:</span> <span class="p">[</span><span class="nb">Float</span><span class="p">],</span> <span class="n">from</span> <span class="n">inputRate</span><span class="p">:</span> <span class="nb">Double</span><span class="p">,</span> <span class="n">to</span> <span class="n">outputRate</span><span class="p">:</span> <span class="nb">Double</span><span class="p">)</span> <span class="p">-&gt;</span> <span class="p">[</span><span class="nb">Float</span><span class="p">]</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="n">inputRate</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">outputRate</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">,</span> <span class="n">inputRate</span> <span class="o">!=</span> <span class="n">outputRate</span><span class="p">,</span> <span class="o">!</span><span class="n">samples</span><span class="p">.</span><span class="bp">isEmpty</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">samples</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="kd">let</span> <span class="nv">inputFormat</span> <span class="p">=</span> <span class="n">AVAudioFormat</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">commonFormat</span><span class="p">:</span> <span class="p">.</span><span class="n">pcmFormatFloat32</span><span class="p">,</span> <span class="n">sampleRate</span><span class="p">:</span> <span class="n">inputRate</span><span class="p">,</span> <span class="n">channels</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="n">interleaved</span><span class="p">:</span> <span class="kc">false</span>
</span></span><span class="line"><span class="cl">    <span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">outputFormat</span> <span class="p">=</span> <span class="n">AVAudioFormat</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">commonFormat</span><span class="p">:</span> <span class="p">.</span><span class="n">pcmFormatFloat32</span><span class="p">,</span> <span class="n">sampleRate</span><span class="p">:</span> <span class="n">outputRate</span><span class="p">,</span> <span class="n">channels</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="n">interleaved</span><span class="p">:</span> <span class="kc">false</span>
</span></span><span class="line"><span class="cl">    <span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">converter</span> <span class="p">=</span> <span class="n">AVAudioConverter</span><span class="p">(</span><span class="n">from</span><span class="p">:</span> <span class="n">inputFormat</span><span class="p">,</span> <span class="n">to</span><span class="p">:</span> <span class="n">outputFormat</span><span class="p">)</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">samples</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="kd">let</span> <span class="nv">inputBuffer</span> <span class="p">=</span> <span class="n">AVAudioPCMBuffer</span><span class="p">(</span><span class="n">pcmFormat</span><span class="p">:</span> <span class="n">inputFormat</span><span class="p">,</span> <span class="n">frameCapacity</span><span class="p">:</span> <span class="n">AVAudioFrameCount</span><span class="p">(</span><span class="n">samples</span><span class="p">.</span><span class="bp">count</span><span class="p">)),</span>
</span></span><span class="line"><span class="cl">          <span class="kd">let</span> <span class="nv">outputBuffer</span> <span class="p">=</span> <span class="n">AVAudioPCMBuffer</span><span class="p">(</span><span class="n">pcmFormat</span><span class="p">:</span> <span class="n">outputFormat</span><span class="p">)</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">samples</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">inputBuffer</span><span class="p">.</span><span class="n">frameLength</span> <span class="p">=</span> <span class="n">AVAudioFrameCount</span><span class="p">(</span><span class="n">samples</span><span class="p">.</span><span class="bp">count</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">inputData</span> <span class="p">=</span> <span class="n">inputBuffer</span><span class="p">.</span><span class="n">floatChannelData</span><span class="p">!</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mf">0.</span><span class="p">.&lt;</span><span class="n">samples</span><span class="p">.</span><span class="bp">count</span> <span class="p">{</span> <span class="n">inputData</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="n">i</span><span class="p">]</span> <span class="p">=</span> <span class="n">samples</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">ratio</span> <span class="p">=</span> <span class="n">outputRate</span> <span class="o">/</span> <span class="n">inputRate</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">outputFrameCount</span> <span class="p">=</span> <span class="nb">Int</span><span class="p">(</span><span class="n">ceil</span><span class="p">(</span><span class="nb">Double</span><span class="p">(</span><span class="n">samples</span><span class="p">.</span><span class="bp">count</span><span class="p">)</span> <span class="o">*</span> <span class="n">ratio</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="n">outputBuffer</span><span class="p">.</span><span class="n">frameCapacity</span> <span class="p">=</span> <span class="n">AVAudioFrameCount</span><span class="p">(</span><span class="n">outputFrameCount</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">var</span> <span class="nv">error</span><span class="p">:</span> <span class="n">NSError</span><span class="p">?</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">status</span> <span class="p">=</span> <span class="n">converter</span><span class="p">.</span><span class="n">convert</span><span class="p">(</span><span class="n">to</span><span class="p">:</span> <span class="n">outputBuffer</span><span class="p">,</span> <span class="n">error</span><span class="p">:</span> <span class="p">&amp;</span><span class="n">error</span><span class="p">)</span> <span class="p">{</span> <span class="kc">_</span><span class="p">,</span> <span class="n">outStatus</span> <span class="k">in</span>
</span></span><span class="line"><span class="cl">        <span class="n">outStatus</span><span class="p">.</span><span class="n">pointee</span> <span class="p">=</span> <span class="p">.</span><span class="n">haveData</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">inputBuffer</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">guard</span> <span class="n">status</span> <span class="p">==</span> <span class="p">.</span><span class="n">haveData</span><span class="p">,</span> <span class="n">error</span> <span class="p">==</span> <span class="kc">nil</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="n">samples</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">outputData</span> <span class="p">=</span> <span class="n">outputBuffer</span><span class="p">.</span><span class="n">floatChannelData</span><span class="p">!</span>
</span></span><span class="line"><span class="cl">    <span class="kd">let</span> <span class="nv">outputLength</span> <span class="p">=</span> <span class="nb">Int</span><span class="p">(</span><span class="n">outputBuffer</span><span class="p">.</span><span class="n">frameLength</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="p">(</span><span class="mf">0.</span><span class="p">.&lt;</span><span class="n">outputLength</span><span class="p">).</span><span class="bp">map</span> <span class="p">{</span> <span class="n">outputData</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="nv">$0</span><span class="p">]</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p><strong>实现要点：</strong></p>
<ol>
<li><strong>内存安全</strong>：使用 <code>UnsafeMutableAudioBufferListPointer</code> 和 <code>defer</code> 确保内存正确释放</li>
<li><strong>多格式支持</strong>：支持 Float32、Int16、Int32 格式</li>
<li><strong>字节序处理</strong>：支持大端和小端字节序</li>
<li><strong>非交错格式</strong>：正确处理每个通道独立 buffer 的格式</li>
<li><strong>混音</strong>：立体声自动混音为单声道</li>
<li><strong>高质量重采样</strong>：使用系统 AVAudioConverter</li>
</ol>
<hr>
<h2 id="二extension与主应用数据共享">二、Extension与主应用数据共享</h2>
<p>Broadcast Extension与主应用运行在不同进程，涉及到进程间通信，这里选择使用实现比较简单的App Group共享容器进行数据交换。</p>
<h3 id="app-group配置">App Group配置</h3>
<p><strong>Entitlements配置：</strong></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-xml" data-lang="xml"><span class="line"><span class="cl"><span class="nt">&lt;key&gt;</span>com.apple.security.application-groups<span class="nt">&lt;/key&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;array&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;string&gt;</span>group.com.xxx.shared<span class="nt">&lt;/string&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;/array&gt;</span></span></span></code></pre></td></tr></table>
</div>
</div>
<p>需要在Apple Developer Portal创建App Group，并在XCode中为两个target启用。</p>
<h3 id="数据读写实现">数据读写实现</h3>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span><span class="lnt">31
</span><span class="lnt">32
</span><span class="lnt">33
</span><span class="lnt">34
</span><span class="lnt">35
</span><span class="lnt">36
</span><span class="lnt">37
</span><span class="lnt">38
</span><span class="lnt">39
</span><span class="lnt">40
</span><span class="lnt">41
</span><span class="lnt">42
</span><span class="lnt">43
</span><span class="lnt">44
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kd">class</span> <span class="nc">AudioSharedBuffer</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kd">static</span> <span class="kd">let</span> <span class="nv">appGroupId</span> <span class="p">=</span> <span class="s">&#34;group.com.xxx.shared&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="kd">private</span> <span class="kd">let</span> <span class="nv">sharedContainerURL</span> <span class="p">=</span> <span class="n">FileManager</span><span class="p">.</span><span class="k">default</span><span class="p">.</span><span class="n">containerURL</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">forSecurityApplicationGroupIdentifier</span><span class="p">:</span> <span class="kc">Self</span><span class="p">.</span><span class="n">appGroupId</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// Extension写入处理后的音频</span>
</span></span><span class="line"><span class="cl">    <span class="kd">func</span> <span class="nf">writeAudioSamples</span><span class="p">(</span><span class="kc">_</span> <span class="n">samples</span><span class="p">:</span> <span class="p">[</span><span class="nb">Float</span><span class="p">])</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">guard</span> <span class="kd">let</span> <span class="nv">url</span> <span class="p">=</span> <span class="n">sharedContainerURL</span><span class="p">?.</span><span class="n">appendingPathComponent</span><span class="p">(</span><span class="s">&#34;audio.raw&#34;</span><span class="p">)</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="kd">let</span> <span class="nv">data</span> <span class="p">=</span> <span class="n">samples</span><span class="p">.</span><span class="n">withUnsafeBufferPointer</span> <span class="p">{</span> <span class="n">Data</span><span class="p">(</span><span class="n">buffer</span><span class="p">:</span> <span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">FileManager</span><span class="p">.</span><span class="k">default</span><span class="p">.</span><span class="n">fileExists</span><span class="p">(</span><span class="n">atPath</span><span class="p">:</span> <span class="n">url</span><span class="p">.</span><span class="n">path</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="kd">let</span> <span class="nv">handle</span> <span class="p">=</span> <span class="k">try</span><span class="p">?</span> <span class="n">FileHandle</span><span class="p">(</span><span class="n">forWritingTo</span><span class="p">:</span> <span class="n">url</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">handle</span><span class="p">?.</span><span class="n">seekToEndOfFile</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">            <span class="n">handle</span><span class="p">?.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">handle</span><span class="p">?.</span><span class="n">closeFile</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">try</span><span class="p">?</span> <span class="n">data</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">to</span><span class="p">:</span> <span class="n">url</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="n">postDarwinNotification</span><span class="p">(</span><span class="s">&#34;com.xxx.newAudioData&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1">// 主应用读取音频</span>
</span></span><span class="line"><span class="cl">    <span class="kd">func</span> <span class="nf">readAudioSamples</span><span class="p">()</span> <span class="p">-&gt;</span> <span class="p">[</span><span class="nb">Float</span><span class="p">]?</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">guard</span> <span class="kd">let</span> <span class="nv">url</span> <span class="p">=</span> <span class="n">sharedContainerURL</span><span class="p">?.</span><span class="n">appendingPathComponent</span><span class="p">(</span><span class="s">&#34;audio.raw&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">              <span class="n">FileManager</span><span class="p">.</span><span class="k">default</span><span class="p">.</span><span class="n">fileExists</span><span class="p">(</span><span class="n">atPath</span><span class="p">:</span> <span class="n">url</span><span class="p">.</span><span class="n">path</span><span class="p">)</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="kc">nil</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="kd">let</span> <span class="nv">data</span> <span class="p">=</span> <span class="k">try</span><span class="p">?</span> <span class="n">Data</span><span class="p">(</span><span class="n">contentsOf</span><span class="p">:</span> <span class="n">url</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">try</span><span class="p">?</span> <span class="n">FileManager</span><span class="p">.</span><span class="k">default</span><span class="p">.</span><span class="n">removeItem</span><span class="p">(</span><span class="n">at</span><span class="p">:</span> <span class="n">url</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="kd">let</span> <span class="nv">floatCount</span> <span class="p">=</span> <span class="n">data</span><span class="p">?.</span><span class="bp">count</span> <span class="p">??</span> <span class="mi">0</span> <span class="o">/</span> <span class="n">MemoryLayout</span><span class="p">&lt;</span><span class="nb">Float</span><span class="p">&gt;.</span><span class="n">size</span>
</span></span><span class="line"><span class="cl">        <span class="kd">var</span> <span class="nv">samples</span> <span class="p">=</span> <span class="p">[</span><span class="nb">Float</span><span class="p">](</span><span class="n">repeating</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="bp">count</span><span class="p">:</span> <span class="n">floatCount</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">data</span><span class="p">?.</span><span class="n">copyBytes</span><span class="p">(</span><span class="n">to</span><span class="p">:</span> <span class="n">samples</span><span class="p">.</span><span class="n">withUnsafeMutableBufferPointer</span> <span class="p">{</span> <span class="nv">$0</span> <span class="p">})</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">samples</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">private</span> <span class="kd">func</span> <span class="nf">postDarwinNotification</span><span class="p">(</span><span class="kc">_</span> <span class="n">name</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="kd">let</span> <span class="nv">center</span> <span class="p">=</span> <span class="n">CFNotificationCenterGetDarwinNotifyCenter</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="n">CFNotificationCenterPostNotification</span><span class="p">(</span><span class="n">center</span><span class="p">,</span> <span class="n">CFNotificationName</span><span class="p">(</span><span class="n">name</span> <span class="k">as</span> <span class="n">CFString</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">            <span class="kc">nil</span><span class="p">,</span> <span class="kc">nil</span><span class="p">,</span> <span class="kc">true</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
<h3 id="darwin通知">Darwin通知</h3>
<p>Extension写入数据后发送Darwin通知，主应用监听后立即读取：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">startListening</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">CFNotificationCenterAddObserver</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">CFNotificationCenterGetDarwinNotifyCenter</span><span class="p">(),</span>
</span></span><span class="line"><span class="cl">        <span class="nb">Unmanaged</span><span class="p">.</span><span class="n">passUnretained</span><span class="p">(</span><span class="kc">self</span><span class="p">).</span><span class="n">toOpaque</span><span class="p">(),</span>
</span></span><span class="line"><span class="cl">        <span class="p">{</span> <span class="kc">_</span><span class="p">,</span> <span class="n">observer</span><span class="p">,</span> <span class="kc">_</span><span class="p">,</span> <span class="kc">_</span><span class="p">,</span> <span class="kc">_</span> <span class="k">in</span>
</span></span><span class="line"><span class="cl">            <span class="k">guard</span> <span class="kd">let</span> <span class="nv">observer</span> <span class="p">=</span> <span class="n">observer</span> <span class="k">else</span> <span class="p">{</span> <span class="k">return</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="kd">let</span> <span class="nv">selfPtr</span> <span class="p">=</span> <span class="nb">Unmanaged</span><span class="p">&lt;</span><span class="n">YourClass</span><span class="p">&gt;.</span><span class="n">fromOpaque</span><span class="p">(</span><span class="n">observer</span><span class="p">).</span><span class="n">takeUnretainedValue</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="kd">let</span> <span class="nv">samples</span> <span class="p">=</span> <span class="n">selfPtr</span><span class="p">.</span><span class="n">audioBuffer</span><span class="p">.</span><span class="n">readAudioSamples</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="n">selfPtr</span><span class="p">.</span><span class="n">onAudioReceived</span><span class="p">?(</span><span class="n">samples</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="cl">        <span class="s">&#34;com.xxx.newAudioData&#34;</span> <span class="k">as</span> <span class="n">CFString</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="kc">nil</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="p">.</span><span class="n">deliverImmediately</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
<hr>
<h2 id="samplehandler完整示例">SampleHandler完整示例</h2>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-swift" data-lang="swift"><span class="line"><span class="cl"><span class="kd">class</span> <span class="nc">SampleHandler</span><span class="p">:</span> <span class="n">RPBroadcastSampleHandler</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kd">private</span> <span class="kd">let</span> <span class="nv">sharedBuffer</span> <span class="p">=</span> <span class="n">AudioSharedBuffer</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="kd">private</span> <span class="kd">let</span> <span class="nv">targetSampleRate</span><span class="p">:</span> <span class="nb">Double</span> <span class="p">=</span> <span class="mf">16000.0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kr">override</span> <span class="kd">func</span> <span class="nf">broadcastStarted</span><span class="p">(</span><span class="n">withSetupInfo</span> <span class="n">setupInfo</span><span class="p">:</span> <span class="p">[</span><span class="nb">String</span> <span class="p">:</span> <span class="n">NSObject</span><span class="p">]?)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="n">sharedBuffer</span><span class="p">.</span><span class="n">clearAudioData</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kr">override</span> <span class="kd">func</span> <span class="nf">broadcastPaused</span><span class="p">()</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kr">override</span> <span class="kd">func</span> <span class="nf">broadcastResumed</span><span class="p">()</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kr">override</span> <span class="kd">func</span> <span class="nf">broadcastFinished</span><span class="p">()</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kr">override</span> <span class="kd">func</span> <span class="nf">processSampleBuffer</span><span class="p">(</span><span class="kc">_</span> <span class="n">sampleBuffer</span><span class="p">:</span> <span class="n">CMSampleBuffer</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                                     <span class="n">with</span> <span class="n">sampleBufferType</span><span class="p">:</span> <span class="n">RPSampleBufferType</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">switch</span> <span class="n">sampleBufferType</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">case</span> <span class="p">.</span><span class="n">audioApp</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="c1">// 处理应用音频（系统播放的音频）</span>
</span></span><span class="line"><span class="cl">            <span class="kd">let</span> <span class="nv">samples</span> <span class="p">=</span> <span class="n">convertTo16kMono</span><span class="p">(</span><span class="n">sampleBuffer</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">sharedBuffer</span><span class="p">.</span><span class="n">writeAudioSamples</span><span class="p">(</span><span class="n">samples</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">case</span> <span class="p">.</span><span class="n">audioMic</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="c1">// 忽略麦克风音频</span>
</span></span><span class="line"><span class="cl">            <span class="k">break</span>
</span></span><span class="line"><span class="cl">        <span class="k">case</span> <span class="p">.</span><span class="n">video</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="c1">// 忽略视频</span>
</span></span><span class="line"><span class="cl">            <span class="k">break</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div>
]]></content:encoded>
    </item>
  </channel>
</rss>
