SuperCompress(arjunkshah/supercompress)

📅 2026/7/1 14:10:54
SuperCompress(arjunkshah/supercompress)
一、主要功能SuperCompress 是面向 LLM 的轻量级提示词 / 上下文压缩工具核心目标在几乎不丢失语义的前提下大幅减少输入给大模型的 token 数量。对prompt、对话历史、RAG 检索结果、工具返回内容JSON / 日志做智能精简压缩率 60%–95%10k token → 几百几千 token语义损失极低LLM 输出质量几乎不变显著降低 OpenAI / Anthropic 等 LLM 的调用成本、延迟、上下文溢出风险轻量、可本地部署、CPU 即可运行无需 GPU二、实现原理核心SuperCompress 通过轻量小模型 规则引擎的语义感知压缩。小模型训练方法参考文档https://arjunkshah-supercompress-55.mintlify.app/development/training三、效果验证#!/usr/bin/env python3# -*- coding: utf-8 -*-importreimportsysimportollamafromsupercompressimportcompress_context# 配置 OLLAMA_HOSThttp://localhost:11434BUDGET_RATIO0.8NUM_PREDICT2048RETRY_LIMIT2# 测试数据同前long_context...# 请粘贴您的长文本queryWhat do fetchone, fetchall, and fetchmany return when no rows are found?key_points[{name:fetchone returns None,keywords:[fetchone,[none,null]]},{name:fetchall returns empty list,keywords:[fetchall,[empty list,[],empty array]]},{name:fetchmany returns empty list,keywords:[fetchmany,[empty list,[],empty array]]}]# 辅助函数 defget_ollama_client():returnollama.Client(hostOLLAMA_HOST)defis_model_available(model_nameqwen3.5:4b):try:get_ollama_client().show(model_name)returnTrueexceptExceptionase:print(f❌ 模型检查失败{e})returnFalsedefclean_context(text):清理压缩文本中的多余空行和特殊符号提高可读性# 将多个连续换行合并为两个linestext.splitlines()cleaned[]prev_emptyFalseforlineinlines:ifline.strip():ifnotprev_empty:cleaned.append()prev_emptyTrueelse:cleaned.append(line.strip())prev_emptyFalsereturn\n.join(cleaned)defask_qwen(context,question,modelqwen3.5:4b,retryRETRY_LIMIT):# 清理上下文contextclean_context(context)promptfAnswer the question based ONLY on the reference below. If the reference does not contain the answer, say Not provided. Reference:{context}Question:{question}Answer:clientget_ollama_client()forattemptinrange(retry1):try:# 固定温度不随时间改变respclient.chat(modelmodel,messages[{role:user,content:prompt}],options{temperature:0.7,top_p:0.9,repeat_penalty:1.1,num_ctx:4096,num_predict:NUM_PREDICT,})answerresp[message][content].strip()# 打印原始回答调试用print(f [调试] 原始回答内容{answer})iflen(answer)3andattemptretry:print(f [重试{attempt1}] 回答过短重试...)continuereturnanswerifanswerelse[空回答]exceptExceptionase:print(f [尝试{attempt1}] 异常{e})ifattemptretry:returnf[调用失败:{e}]return[调用失败: 未知]defcheck_context_has_answer(context,points):context_lowercontext.lower()results[]forpointinpoints:method,value_optionspoint[keywords]method_hitmethod.lower()incontext_lower value_hitany(v.lower()incontext_lowerforvinvalue_options)results.append((method,method_hit,value_hit))returnresultsdefjaccard_similarity(text1,text2):ifnottext1ornottext2:return0.0words1set(re.findall(r\w,text1.lower()))words2set(re.findall(r\w,text2.lower()))interwords1words2 unionwords1|words2returnlen(inter)/len(union)ifunionelse0.0defcosine_similarity(text1,text2):try:fromsklearn.feature_extraction.textimportTfidfVectorizerfromsklearn.metrics.pairwiseimportcosine_similarityassk_cosifnottext1ornottext2:return0.0vecTfidfVectorizer().fit_transform([text1,text2])returnsk_cos(vec[0:1],vec[1:2])[0][0]exceptImportError:return0.0defcalc_recall(answer,points):answer_loweranswer.lower()hits[]forpinpoints:method,valsp[keywords]hits.append(method.lower()inanswer_lowerandany(v.lower()inanswer_lowerforvinvals))returnsum(hits)/len(points),hits# 主流程 defmain():print(*80)print(f检查 Ollama 服务 ({OLLAMA_HOST}) ...)ifnotis_model_available(qwen3.5:4b):sys.exit(1)# ---- 压缩 ----print(\n开始上下文压缩...)compress_resultcompress_context(long_context,query,budget_ratioBUDGET_RATIO)compressed_rawcompress_result.compressed_text# ---- 清理压缩文本 ----compressed_textclean_context(compressed_raw)print(\n【压缩后文本预览前500字符】)print(compressed_text[:500]...)# ---- 检查关键信息 ----info_okall(mandvfor_,m,vincheck_context_has_answer(compressed_text,key_points))ifnotinfo_ok:print(\n⚠️ 压缩后缺少关键信息改用原始上下文。)compressed_textlong_contextelse:print(\n✅ 压缩后包含所有关键信息。)# ---- 生成回答 ----print(\n正在生成原始上下文回答...)answer_originalask_qwen(long_context,query)print(\n正在生成压缩后上下文回答...)answer_compressedask_qwen(compressed_text,query)# ---- 如果压缩回答仍为空尝试用原始上下文但减小温度额外尝试 ----ifanswer_compressed[空回答]orlen(answer_compressed)3:print(\n⚠️ 压缩回答仍为空。尝试用原始上下文再生成一次作为替代仅用于展示相似度...)# 为保持对比我们用原始上下文生成另一个答案但标记为压缩版实际内容可能相同# 但为了演示我们可以直接借用原始答案作为替代但这样相似度会很高失去意义。# 更好的做法尝试将压缩文本重新组织增加明确的“答案”提示。# 这里我们改用更直接的 prompt 询问压缩文本中的内容fallback_promptfExtract the answer from the following text about fetch methods when no rows are found. Answer concisely. Text:{compressed_text}Answer:clientget_ollama_client()try:respclient.chat(modelqwen3.5:4b,messages[{role:user,content:fallback_prompt}],options{temperature:0.3,num_predict:512})answer_compressedresp[message][content].strip()ifnotanswer_compressed:answer_compressed[空回答]exceptExceptionase:answer_compressedf[调用失败:{e}]# ---- 输出 ----print(\n*80)print(【诊断报告】)print(f原始 Token 数{compress_result.original_tokens})print(f压缩后 Token 数{compress_result.kept_tokens})print(f压缩比例{compress_result.kv_savings_pct:.1f}%)print(\n【信息保留检查】)formethod,m_hit,v_hitincheck_context_has_answer(compressed_text,key_points):print(f{method}:{✅ifm_hitandv_hitelse❌})print(\n【回答内容】)print(f原始回答{answer_original})print(f压缩回答{answer_compressed})print(\n【量化评估】)rec_ori,hits_oricalc_recall(answer_original,key_points)rec_comp,hits_compcalc_recall(answer_compressed,key_points)print(f召回率原始{rec_ori:.2f}压缩{rec_comp:.2f})jacjaccard_similarity(answer_original,answer_compressed)coscosine_similarity(answer_original,answer_compressed)print(fJaccard 相似度{jac:.3f})print(fCosine 相似度{cos:.3f})print(\n【命中明细】)fori,pinenumerate(key_points):print(f{p[name]}: 原始{✅ifhits_ori[i]else❌}压缩{✅ifhits_comp[i]else❌})if__name____main__:main()在小模型场景下压缩后反而会使得think时间变长而且对中文支持不是很好需要自己重新额外训练