April 30, 2025

用本地gpt-oss-120b驱动DEVONthink, Brave, ShellGPT, Raycast, BBEdit

[2025年8月更新，切换到本地gpt-oss-120b模型]

[2025年7月更新，添加Claude Code写代码]

[2025年6月更新，添加DEVONthink 4分析论文]

这篇文章展示如何用LM Studio在Apple M4 Max和128GB内存的MacBook Pro上运行本地gpt-oss-120b模型，用于阅读论文、浏览器、命令行、快捷启动、编辑器和写代码，AI无处不在。

硬件环境 Link to this section

Apple M4 Max处理器采用统一内存架构，与传统GPU-CPU分离设计形成鲜明对比。统一架构将内存整合于同一芯片，CPU和GPU直接共享内存空间，消除了组件间的数据复制环节。这种设计本质上消除了PCI-E带宽瓶颈，显著降低了数据传输开销，直接提升模型运算速度。

Apple M4 Max提供高达546GB/s的内存带宽，这种超高带宽使得大规模模型推理时的数据吞吐不再受限，显著减少了GPU等待数据的时间，进一步提升了推理性能。

MLX是Apple专为Silicon芯片开发的机器学习框架，特点是直接利用神经网络加速器，为本地LLM提供高性能支持。这不是普通的框架移植，而是从芯片架构层面优化的解决方案。

配备128GB统一内存的M4 Max可实现120B参数量FP4模型的近实时响应。MLX格式模型在此环境下完全依靠GPU加速，CPU资源占用极低，这对整机散热和能耗控制具有显著意义。

相同显存容量下，Apple Silicon处理LLM任务的效率优于传统独立GPU架构（如RTX 5090），主要受益于统一内存架构减少的数据传输环节，提高了计算管线效率。

MLX与GGUF格式对比，两种主流模型格式在M系列芯片上表现各有特点：

MLX是Apple专为Silicon优化，完全GPU加速，限制是无法加载超过物理内存的模型。
GGUF是通用格式，支持GPU offloading技术，理论上可加载超过物理内存的大模型，但会显著占用CPU资源。适合需要特大规模模型且不在意CPU占用的场景。

在实际应用中，这种本地算力支持为多种应用场景提供了基础，包括论文分析、命令行工具、浏览器AI和辅助写代码。

配置本地gpt-oss-120b模型 Link to this section

我使用LM Studio在MacBook Pro上运行gpt-oss-120b和gpt-oss-20b模型。为获得最佳性能，OpenAI建议gpt-oss-120b和gpt-oss-20b的推理参数如下：

Temperature = 1.0
Top K Sampling = 0（或尝试设置为100是否能获得更好结果）
Top P Sampling = 1.0
Min P Sampling = 0
推荐的最小上下文窗口：16,384
支持的最大上下文窗口：131,072

System prompt:

plain
You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-05

# Valid channels: analysis, commentary, final. Channel must be included for every message.

配置好推理参数后，打开服务器：

用DEVONthink读论文 Link to this section

DEVONthink是Mac平台的知识管理工具，新版本4.x集成了AI功能，功能也越来越完善，

目前可以实现基本的论文总结功能，总结的还不错，多轮对话稍差：

Brave浏览器Leo AI Link to this section

作为Google Chrome的替代浏览器，Brave浏览器内置的Leo AI支持自定义本地模型：

使用Leo AI可以在Brave浏览器中实现一些简单功能，比如总结网页：

用ShellGPT补全命令行 Link to this section

安装shell-gpt，并使用本地gpt-oss-120b模型，

shell
uv tool install "shell-gpt"

mkdir -p ~/.config/shell_gpt
cat > ~/.config/shell_gpt/.sgptrc << "EOF"
API_BASE_URL=http://localhost:1234/v1
OPENAI_API_KEY=not_needed
DEFAULT_MODEL=gpt-oss-120b
#DEFAULT_MODEL=gpt-oss-20b
EOF

如果是Ubuntu下RTX 5090单卡，32GB显存只支持gpt-oss-20b模型。

测试1，

shell
└> sgpt -s "List all the git submodules"
git submodule foreach --quiet 'echo $name'
[E]xecute, [D]escribe, [A]bort: E
third_party/MathReader
third_party/TextNormalizationCoveringGrammars
third_party/ldoce-mcp-server
third_party/llama.cpp

测试2，

shell
└> sgpt -s "Find all files larger than 100MB"
find . -type f -size +100M
[E]xecute, [D]escribe, [A]bort: E

用Raycast快捷操作 Link to this section

先将gpt-oss模型配置到Raycast自定义模型中：

shell
cat > ~/.config/raycast/ai/providers.yaml << "EOF"
providers:
  - id: lmstudio
    name: LM Studio
    base_url: http://localhost:1234/v1
    api_keys:
      lmstudio: "not_needed"
    models:
      - id: gpt-oss-120b
        name: "GPT-OSS-120B (LM Studio)"
        context: 131072
        provider: lmstudio # matches "lmstudio" in api_keys
        abilities:
          temperature:
            supported: true
          vision:
            supported: false
          system_message:
            supported: true
          tools:
            supported: true
          reasoning_effort:
            supported: true
      - id: gpt-oss-20b
        name: "GPT-OSS-20B (LM Studio)"
        context: 131072
        provider: lmstudio # matches "lmstudio" in api_keys
        abilities:
          temperature:
            supported: true
          vision:
            supported: false
          system_message:
            supported: true
          tools:
            supported: true
          reasoning_effort:
            supported: true
EOF

接下来，设置Raycast Quick AI Model使用GPT-OSS-120B (LM Studio)模型，AI Commands Model也设置成GPT-OSS-120B (LM Studio)模型：

测试Raycast AI使用gpt-oss-120b和高德地图MCP回答问题：

通过Tab键调用Quick AI Chat，快速访问gpt-oss-120b模型：

BBEdit AI工作表 Link to this section

BBEdit是Mac上对大文件支持最好的编辑器，从15.1版本开始集成了AI工作表。为BBEdit添加LM Studio本地模型支持：

shell
mkdir -p "$HOME/Library/Application Support/BBEdit/Chat API Descriptions/"

cat > "$HOME/Library/Application Support/BBEdit/Chat API Descriptions/LMStudio.json" << "EOF"
{
  "com.barebones.DocumentType": "com.barebones.bbedit.ChatAPIDescription",
  "com.barebones.DocumentFormatVersion": 1,
  "identifier": "lmstudio-docsample",
  "displayName": "LM Studio",
  "endpoint": "http://localhost:1234/v1/chat/completions",
  "models":
    [
      "gpt-oss-120b",
      "gpt-oss-20b"
    ],
  "defaultModel": "gpt-oss-120b",
  "requiresAPIKey": 0,
  "explicitStreamParameterRequired": true
}
EOF

创建LMStudio.json文件后，在设置界面选择gpt-oss-120b模型：

接下来创建LM Studio Worksheet，可以在一个文档里和gpt-oss-120b聊天。输入问题之后，通过快捷键control+回车发送给LM Studio，这种方式的优点是可以记录完整聊天过程：

用CLINE写代码 Link to this section

Visual Studio Code作为最流行的代码编辑器之一，结合CLINE插件和本地LM Studio可以实现类似GitHub Copilot的代码辅助功能：

使用时，只需在VSCode中安装CLINE插件，并选择LM Studio API Provider，然后选择gpt-oss-120b模型即可。CLINE+gpt-oss-120b编程效果不太好，会删除不该删除的文件，还不如免费版本。

用Claude Code写代码 Link to this section

另外一篇文章在MacBook Pro上用gpt-oss-120b驱动Claude Code

用本地gpt-oss-120b驱动DEVONthink, Brave, ShellGPT, Raycast, BBEdit

硬件环境 Link to this section #

配置本地gpt-oss-120b模型 Link to this section #

用DEVONthink读论文 Link to this section #

Brave浏览器Leo AI Link to this section #

用ShellGPT补全命令行 Link to this section #

用Raycast快捷操作 Link to this section #

BBEdit AI工作表 Link to this section #

用CLINE写代码 Link to this section #

用Claude Code写代码 Link to this section #

参考资料 Link to this section #