用 Go + Ollama 构建本地离线 RAG 知识库系统

📅 2026/6/30 23:23:35

# 用 Go Ollama 构建本地离线 RAG 知识库系统## 背景大模型应用开发中RAGRetrieval-Augmented Generation检索增强生成已成为最主流的知识增强方案。但云端 API 调用存在成本、隐私、稳定性等问题。本文将介绍如何用 Go 语言 Ollama 本地模型构建一个完全离线运行的 RAG 知识库问答系统。## 项目概述本项目基于 Go 语言实现核心特性- ✅ **完全离线运行** - 断网状态仍可正常问答- ✅ **本地模型适配** - 支持 Ollama、OpenAI、DeepSeek 多后端- ✅ **文档自动切片** - Markdown 按标题层级智能切分- ✅ **多轮对话** - Session 上下文记忆管理- ✅ **配置化设计** - 一个配置文件切换所有参数## 技术架构┌─────────────────────────────────────────────┐│ HTTP API ││ (Gin 框架) │└─────────────────────────────────────────────┘│┌─────────────┼─────────────┐▼ ▼ ▼文档服务 RAG 服务会话管理│ │ │▼ ▼ ▼文本切片向量检索 SQLite(Markdown) (Memory/ 历史存储Milvus)│ │▼ ▼┌──────────────┐ ┌──────────────┐│ Embedding │ │ LLM ││ (Ollama) │ │ (Ollama) ││nomic-embed │ │ qwen2.5:7b ││ -text │ │ │└──────────────┘ └──────────────┘## 核心模块实现### 1. 配置化管理通过 TOML 配置文件管理所有参数支持一键切换模型后端toml# config.toml[vector]type memory # 可切换为 milvusdim 768 # nomic-embed-text 向量维度[embedding]provider ollama # 可切换为 openai、deepseekmodel nomic-embed-text[llm]provider ollamamodel qwen2.5:7b[rag]topk 5 # 检索文档数量min_score 0.6 # 最小相似度阈值配置解析核心代码gofunc LoadConfig(path string) (*Config, error) {data, err : os.ReadFile(path)if err ! nil {return nil, err}cfg : Config{Server: ServerConfig{Host: localhost, Port: 8080},Embedding: EmbeddingConfig{Provider: ollama, Model: nomic-embed-text},LLM: LLMConfig{Provider: ollama, Model: qwen2.5:7b},}// TOML section 解析lines : strings.Split(string(data), \n)currentSection : for _, line : range lines {if strings.HasPrefix(line, [) strings.HasSuffix(line, ]) {currentSection line[1:len(line)-1]continue}// ... 解析 keyvalue}return cfg, nil}### 2. 本地 Embedding 实现Ollama 提供了原生 Embedding API无需 OpenAI Keygofunc (c *Client) ollamaEmbeddings(ctx context.Context, texts []string) ([][]float32, error) {baseURL : http://localhost:11434vectors : make([][]float32, 0, len(texts))for _, text : range texts {reqBody : map[string]interface{}{model: c.config.Model, // nomic-embed-textinput: text,}reqBytes, _ : json.Marshal(reqBody)// Ollama embed APIresp, err : http.Post(baseURL/api/embed, application/json, bytes.NewReader(reqBytes))if err ! nil {return nil, err}var result struct {Embeddings [][]float32 json:embeddings}json.NewDecoder(resp.Body).Decode(result)vectors append(vectors, result.Embeddings[0])}return vectors, nil}**关键点**- 使用 Ollama 新版 /api/embed 接口- nomic-embed-text 生成 768 维向量- 支持批量处理提升效率### 3. 统一 LLM 客户端接口设计统一的 Client 接口支持多后端切换gotype Client interface {Chat(ctx context.Context, messages []Message) (string, error)ChatStream(ctx context.Context, messages []Message, callback func(string)) errorGetModel() string}// Ollama 实现type OllamaClient struct {model stringbaseURL string}func (c *OllamaClient) Chat(ctx context.Context, messages []Message) (string, error) {reqBody : map[string]interface{}{model: c.model,messages: messages,stream: false,}resp, err : http.Post(c.baseURL/api/chat, application/json, bytes.NewReader(reqBytes))// ... 解析响应return result.Message.Content, nil}// OpenAI/DeepSeek 兼容实现type OpenAIClient struct {model stringapiKey stringbaseURL string}### 4. Markdown 文档切片器按标题层级智能切分保留文档结构gotype MarkdownSplitter struct{}func (s *MarkdownSplitter) Split(text string) []Chunk {chunks : make([]Chunk, 0)headingRegex : regexp.MustCompile(^(#{1,6})\s(.))var currentContent strings.Buildervar currentHeading stringfor _, line : range strings.Split(text, \n) {if match : headingRegex.FindStringSubmatch(line); match ! nil {// 遇到新标题保存上一个切片if currentContent.Len() 0 {chunks append(chunks, Chunk{Content: currentContent.String(),Metadata: map[string]interface{}{heading: currentHeading,},})}currentContent.Reset()currentHeading match[2]} else {currentContent.WriteString(line \n)}}return chunks}### 5. 多轮对话上下文管理基于 Session ID SQLite 存储历史消息gofunc (s *RAGService) Ask(req *AskRequest) (*AskResponse, error) {// 1. 获取或创建会话sessionID : req.SessionIDif sessionID {sessionID fmt.Sprintf(session_%d, time.Now().UnixNano())}// 2. 获取历史消息限制最近 10 条history, _ : s.repo.GetChatHistory(sessionID)messages : make([]llm.Message, 0)startIdx : 0if len(history) 10 {startIdx len(history) - 10}for _, msg : range history[startIdx:] {messages append(messages, llm.Message{Role: msg.Role,Content: msg.Content,})}// 3. 添加当前问题messages append(messages, llm.Message{Role: user,Content: req.Question,})// 4. 调用 LLManswer, _ : s.llmClient.Chat(ctx, messages)// 5. 保存对话记录s.repo.CreateMessage(ChatMessage{SessionID: sessionID,Role: user,Content: req.Question,})s.repo.CreateMessage(ChatMessage{SessionID: sessionID,Role: assistant,Content: answer,})return AskResponse{SessionID: sessionID,Answer: answer,}, nil}## 部署与测试### 前置条件bash# 1. 安装 Ollamawinget install Ollama.Ollama# 2. 拉取模型ollama pull nomic-embed-text # Embedding 模型274MBollama pull qwen2.5:7b # LLM 模型4.7GB### 启动服务bashcd cmd/raggo build -o rag-server.exe ../rag-server.exe### 测试流程bash# 1. 健康检查curl http://localhost:8080/api/v1/health# 2. 上传文档curl -X POST http://localhost:8080/api/v1/documents/upload \-F filetest.md# 3. 单轮问答curl -X POST http://localhost:8080/api/v1/chat/ask \-H Content-Type: application/json \-d {question: Go语言的特性有哪些}# 4. 多轮追问使用返回的 session_idcurl -X POST http://localhost:8080/api/v1/chat/ask \-H Content-Type: application/json \-d {session_id: session_xxx, question: 详细说说并发性}## 测试结果| 功能 | 状态 | 说明 ||------|------|------|| 文档切片 | ✅ | Markdown 9 个切片成功入库 || 向量化 | ✅ | 768 维向量生成正常 || RAG 检索 | ✅ | Top-5 文档召回准确 || LLM 回答 | ✅ | Qwen2.5:7b 响应正常 || 多轮对话 | ✅ | 上下文传递正确 || 离线运行 | ✅ | 断网状态仍可问答 |## 遇到的问题与解决### 问题 1: Embedding URL 拼接错误**错误**ollama request failed: 404**原因**OpenAI 兼容接口配置了 /v1 前缀但 Ollama 原生 API 是 /api/embed**解决**在 ollamaEmbeddings 方法中正确处理 baseURLgobaseURL : http://localhost:11434if c.config.BaseURL ! {baseURL strings.TrimSuffix(c.config.BaseURL, /v1)}### 问题 2: TOML 配置解析失败**错误**配置值包含注释内容如 ollama # 可选...**原因**解析器未处理 # 注释**解决**添加注释处理逻辑govalue : strings.TrimSpace(parts[1])if idx : strings.Index(value, #); idx ! -1 {value strings.TrimSpace(value[:idx])}### 问题 3: 向量维度不匹配**错误**vector dim 0 not match collection definition**原因**Embedding 返回空向量因 baseURL 配置问题**解决**统一 Embedding 和 LLM 客户端的 baseURL 处理逻辑## 下一步优化1. **流式输出** - 添加 SSE 接口 /api/v1/chat/stream2. **Milvus 集成** - 替换内存存储支持大规模数据3. **更多文档格式** - 支持 PDF、Word 解析4. **混合检索** - BM25 向量混合检索5. **Web UI** - 添加前端界面## 项目地址GitHub: https://github.com/Blue-wu/golllm分支: feature/local-model---**总结**通过本项目实现了完全离线的 RAG 知识库系统。核心在于1. Ollama 本地模型替代云端 API2. 统一客户端接口设计实现多后端切换3. 配置化管理降低维护成本这为后续的私有化部署、敏感数据处理、成本控制等场景提供了可行的技术方案。

新闻详情

相关阅读

企业 AI 落地六大深坑：预算超支、系统闲置的根因与工程化破局路径

智能排班与车辆调度核心业务逻辑详解

SD-PPP终极指南：三分钟掌握Photoshop AI插件，免费提升创作效率300%

Codex++ 配置 Codex 模型教程

AI大模型应用开发实战：从Prompt工程到RAG与低代码平台全栈指南

Applite：重新定义macOS软件管理的优雅革命

5个实用技巧：快速掌握Monitorian多显示器亮度调节

[智能体-614]：OpenClaw构建智能体的过程，本质是围绕大模型，在智能体框架引擎的驱动下，用自然语言构建数字化公司的过程

嵌入式Linux开发避坑：手把手教你为Rockchip平台适配Realtek RTL8211F PHY驱动

FAE放射组学分析工具：医学影像特征探索的完整解决方案

基于Dify与DeepSeek构建私有知识库问答系统实战指南

餐饮老板必看：扫码点餐小程序3步搞定，别再让顾客干等了！

管理者的六个层次

华为OD机试2025C卷-座位调整[100分]（ Java _ Python3 _ C++ _ C语言 _ JsNode _ Go）实现100%通过率

CrabCode v1.0.7与v1.0.8 更新速览！

FAE放射组学分析工具：医学影像特征探索的完整解决方案

基于Dify与DeepSeek构建私有知识库问答系统实战指南

餐饮老板必看：扫码点餐小程序3步搞定，别再让顾客干等了！