平面设计都需要什么软件_新媒体网络营销的概念_免费网站服务器_关键词com

大模型用例生成前置工作之文档读取——构建你的自动化测试基础

在群友的极力推荐下，开始了streamlit的学习之旅。本文将介绍如何使用Streamlit开发一个多功能文档处理工具，支持读取、预览、格式转换和导出多种测试相关文档（YAML、JSON、DOCX、PDF、Excel、Markdown），并提供Markdown预览选项，为大模型测试用例生成奠定基础。源码以上传云盘，欢迎下载试用，云盘地址见文章底部截图。

功能概述

支持的文档格式：
- YAML、JSON、DOCX、PDF、Excel、Markdown
核心功能：
- 文件上传与预览
- 转换为Markdown格式预览
- 格式转换（Excel ↔ JSON/Markdown，JSON ↔ Excel/Markdown）
- 文件导出为转换后的格式
测试用例生成预留接口：
- 为后续集成大模型生成测试用例提供数据准备支持

依赖项安装

在开始前，请确保安装以下Python库：

pip install streamlit pandas openpyxl python-docx pdfminer.six PyYAML markdown2

完整代码实现

import streamlit as st
import pandas as pd
import yaml
from docx import Document
from pdfminer.high_level import extract_text
from markdown2 import markdown
import base64
from io import BytesIO# 主标题与副标题
st.title("文档处理工具：大模型测试用例生成的前置准备")
st.subheader("支持YAML/JSON/DOCX/PDF/Excel/Markdown的读取、预览与格式转换")def file_uploader_section():"""文件上传与基本信息显示"""uploaded_file = st.file_uploader("上传文件",type=["yml", "json", "docx", "pdf", "xlsx", "md"],accept_multiple_files=False)if uploaded_file:st.write(f"文件名：{uploaded_file.name}")st.write(f"文件类型：{uploaded_file.type}")st.write(f"文件大小：{uploaded_file.size/1024:.2f} KB")return uploaded_filedef file_reader(file):"""根据文件类型读取内容并转换为字符串"""content = ""file_type = file.name.split('.')[-1]if file_type == 'yml':content = yaml.safe_load(file)content = yaml.dump(content)  # 转为字符串elif file_type == 'json':df = pd.read_json(file)content = df.to_string(index=False)elif file_type == 'docx':doc = Document(file)paragraphs = [para.text for para in doc.paragraphs]content = '\n'.join(paragraphs)elif file_type == 'pdf':content = extract_text(file)elif file_type == 'xlsx':df = pd.read_excel(file)content = df.to_string(index=False)elif file_type == 'md':content = file.read().decode()return contentdef format_converter(content, convert_to_md):"""将文本转换为Markdown格式"""if convert_to_md:return markdown(content)else:return contentdef file_exporter(file, converted_data, export_format):"""生成文件导出链接"""buffer = BytesIO()if export_format == "Original":file.seek(0)data = file.read()elif export_format == "JSON":if file.name.endswith('.xlsx'):df = pd.read_excel(file)data = df.to_json(orient='records').encode()else:st.error("仅Excel支持导出为JSON")return Noneelif export_format == "Markdown":if isinstance(converted_data, str):data = converted_data.encode()else:data = converted_data.to_markdown().encode()elif export_format == "Excel":if file.name.endswith('.json'):df = pd.read_json(file)df.to_excel(buffer, index=False)data = buffer.getvalue()else:st.error("仅JSON支持导出为Excel")return Noneelse:st.error("无效格式")return Noneb64 = base64.b64encode(data).decode()href = f'<a href="data:file/{export_format.lower()};base64,{b64}" ' \f'download="{file.name}.{export_format.lower()}">下载文件</a>'return hrefdef conversion_options(file_type):"""根据文件类型生成转换选项"""options = ["Original"]if file_type == 'xlsx':options += ["JSON", "Markdown"]elif file_type == 'json':options += ["Excel", "Markdown"]return optionsdef main():uploaded_file = file_uploader_section()if uploaded_file:content = file_reader(uploaded_file)file_type = uploaded_file.name.split('.')[-1]# 文档预览with st.expander("文档预览"):convert_to_md = st.checkbox("转换为Markdown格式预览")converted_content = format_converter(content, convert_to_md)if convert_to_md:st.markdown(converted_content, unsafe_allow_html=True)else:st.text(converted_content)# 格式转换与导出with st.expander("格式转换与导出"):options = conversion_options(file_type)selected_format = st.selectbox("选择导出格式", options)if selected_format != "Original":export_link = file_exporter(uploaded_file, converted_content, selected_format)if export_link:st.markdown(export_link, unsafe_allow_html=True)# 测试用例生成预留接口with st.expander("测试用例生成（预留）"):st.write("该功能需要集成NLP模型实现，当前版本暂不支持")if __name__ == "__main__":main()

代码分块详解

1. 文件上传与基本信息显示

def file_uploader_section():uploaded_file = st.file_uploader("上传文件",type=["yml", "json", "docx", "pdf", "xlsx", "md"],accept_multiple_files=False)if uploaded_file:st.write(f"文件名：{uploaded_file.name}")st.write(f"文件类型：{uploaded_file.type}")st.write(f"文件大小：{uploaded_file.size/1024:.2f} KB")return uploaded_file

功能：提供文件上传入口，显示文件名、类型和大小
关键点：
- st.file_uploader支持指定文件类型
- accept_multiple_files=False限制单次上传一个文件
上传一个EXCEL文档

2. 文件内容读取与解析

def file_reader(file):content = ""file_type = file.name.split('.')[-1]if file_type == 'yml':content = yaml.safe_load(file)content = yaml.dump(content)  # 转为字符串elif file_type == 'json':df = pd.read_json(file)content = df.to_string(index=False)# 其他文件类型处理...return content

功能：根据文件类型解析内容并返回字符串
关键点：
- 使用pandas处理Excel/JSON的表格数据
- yaml.dump()将YAML对象转为字符串便于后续处理
文档预览：

3. Markdown格式转换

def format_converter(content, convert_to_md):if convert_to_md:return markdown(content)else:return content

功能：将文本内容转换为Markdown格式
依赖库：markdown2实现文本到Markdown的渲染

4. 文件导出功能

def file_exporter(file, converted_data, export_format):buffer = BytesIO()if export_format == "JSON":if file.name.endswith('.xlsx'):df = pd.read_excel(file)data = df.to_json(orient='records').encode()else:st.error("仅Excel支持导出为JSON")return Noneelif export_format == "Excel":if file.name.endswith('.json'):df = pd.read_json(file)df.to_excel(buffer, index=False)data = buffer.getvalue()else:st.error("仅JSON支持导出为Excel")return None# ...其他格式处理...b64 = base64.b64encode(data).decode()href = f'<a ...>下载文件</a>'return href

功能：生成文件导出链接
关键点：
- 使用base64编码生成可下载的文件流
- 支持Excel ↔ JSON/Markdown、JSON ↔ Excel/Markdown的双向转换
Excel ↔ JSON：

[{"Test Name":"用例A","Status":"Pass","Execution Time":1744365600000,"Failure Reason":null,"Duration (s)":5,"Tester":"测试A","Environment":"Test","Version":"v1.0"},{"Test Name":"用例B","Status":"Fail","Execution Time":1744365900000,"Failure Reason":"请求超时","Duration (s)":3,"Tester":"测试A","Environment":"Test","Version":"v1.0"}
]

5. 格式转换选项

def conversion_options(file_type):options = ["Original"]if file_type == 'xlsx':options += ["JSON", "Markdown"]elif file_type == 'json':options += ["Excel", "Markdown"]return options

功能：根据文件类型动态生成转换选项
支持的转换：
- Excel → JSON/Markdown
- JSON → Excel/Markdown

6. 主函数逻辑

def main():uploaded_file = file_uploader_section()if uploaded_file:content = file_reader(uploaded_file)file_type = uploaded_file.name.split('.')[-1]# 预览部分with st.expander("文档预览"):convert_to_md = st.checkbox("转换为Markdown格式预览")converted_content = format_converter(content, convert_to_md)# 显示预览内容# 转换与导出部分with st.expander("格式转换与导出"):options = conversion_options(file_type)selected_format = st.selectbox("选择导出格式", options)if selected_format != "Original":export_link = file_exporter(...)st.markdown(export_link, unsafe_allow_html=True)# 测试用例生成预留with st.expander("测试用例生成（预留）"):st.write("该功能需要集成NLP模型实现，当前版本暂不支持")

功能：整合所有模块并组织界面
关键点：
- 使用st.expander折叠功能区域，提升界面整洁度
- 预留测试用例生成扩展位置

使用示例

运行应用：
```
streamlit run document_tool.py
```
界面操作：
- 上传文件 → 查看文件信息 → 预览内容 → 选择导出格式 → 下载文件

扩展建议

测试用例生成功能：
- 集成deepseek-r1模型
- 根据文档内容生成功能测试用例或接口测试用例
用户体验优化：
- 添加文件内容搜索功能
- 支持多文件批量处理

通过本文的工具，测试工程师可以高效地完成文档预处理工作，为大模型驱动的自动化测试奠定基础。代码完整且可直接运行，适合快速集成到现有测试流程中。源码获取路径：
在这里插入图片描述

在这里插入图片描述