AppAgent 文档生成教程
本教程将详细解释如何使用 AppAgent
代码生成应用程序的文档。代码的主要功能是基于用户演示(demo)生成应用程序的文档,并将生成的文档保存到指定目录中。
1. 代码概述
代码的主要功能包括:
- 解析命令行参数,获取应用程序名称、演示目录等信息。
- 加载配置文件,初始化大语言模型(如 OpenAI 或 Qwen)。
- 根据演示记录生成文档,并将文档保存到指定目录。
2. 代码运行逻辑
2.1 导入依赖库
import argparse
import ast
import json
import os
import re
import sys
import timeimport prompts
from config import load_config
from model import OpenAIModel, QwenModel
from utils import print_with_color
argparse
: 用于解析命令行参数。ast
: 用于安全地解析字符串为 Python 对象。json
: 用于处理 JSON 数据。os
: 用于处理文件和目录路径。re
: 用于正则表达式操作。sys
: 用于系统相关操作,如退出程序。time
: 用于控制请求间隔时间。prompts
: 包含生成文档的提示模板。config
: 包含加载配置文件的函数。model
: 包含大语言模型的实现(如 OpenAI 和 Qwen)。utils
: 包含辅助函数,如带颜色的打印。
2.2 解析命令行参数
arg_desc = "AppAgent - Human Demonstration"
parser = argparse.ArgumentParser(formatter_class=argparse.RawDescriptionHelpFormatter, description=arg_desc
)
parser.add_argument("--app", required=True)
parser.add_argument("--demo", default="demo_notes_2024-12-27_14-16-43")
parser.add_argument("--root_dir", default="./")
args = vars(parser.parse_args())
--app
: 指定应用程序名称(必需)。--demo
: 指定演示目录名称(默认值为demo_notes_2024-12-27_14-16-43
)。--root_dir
: 指定根目录(默认值为当前目录./
)。
2.3 加载配置文件并初始化模型
configs = load_config()if configs["MODEL"] == "OpenAI":mllm = OpenAIModel(base_url=configs["OPENAI_API_BASE"],api_key=configs["OPENAI_API_KEY"],model=configs["OPENAI_API_MODEL"],temperature=configs["TEMPERATURE"],max_tokens=configs["MAX_TOKENS"],)
elif configs["MODEL"] == "Qwen":mllm = QwenModel(api_key=configs["DASHSCOPE_API_KEY"], model=configs["QWEN_MODEL"])
else:print_with_color(f"ERROR: Unsupported model type {configs['MODEL']}!", "red")sys.exit()
load_config()
: 加载配置文件,返回配置字典。- 根据配置中的
MODEL
类型初始化相应的大语言模型(如 OpenAI 或 Qwen)。
2.4 设置工作目录和路径
root_dir = args["root_dir"]
work_dir = os.path.join(root_dir, "apps")
if not os.path.exists(work_dir):os.mkdir(work_dir)
app = args["app"]
work_dir = os.path.join(work_dir, app)
demo_dir = os.path.join(work_dir, "demos")
demo_name = args["demo"]
task_dir = os.path.join(demo_dir, demo_name)
xml_dir = os.path.join(task_dir, "xml")
labeled_ss_dir = os.path.join(task_dir, "labeled_screenshots")
record_path = os.path.join(task_dir, "record.txt")
task_desc_path = os.path.join(task_dir, "task_desc.txt")
if (not os.path.exists(task_dir)or not os.path.exists(xml_dir)or not os.path.exists(labeled_ss_dir)or not os.path.exists(record_path)or not os.path.exists(task_desc_path)
):sys.exit()
log_path = os.path.join(task_dir, f"log_{app}_{demo_name}.txt")docs_dir = os.path.join(work_dir, "demo_docs")
if not os.path.exists(docs_dir):os.mkdir(docs_dir)
- 根据命令行参数设置工作目录和路径。
- 检查必要的目录和文件是否存在,如果不存在则退出程序。
2.5 生成文档
print_with_color(f"Starting to generate documentations for the app {app} based on the demo {demo_name}","yellow",
)
doc_count = 0
with open(record_path, "r") as infile:step = len(infile.readlines()) - 1infile.seek(0)for i in range(1, step + 1):img_before = os.path.join(labeled_ss_dir, f"{demo_name}_{i}.png")img_after = os.path.join(labeled_ss_dir, f"{demo_name}_{i + 1}.png")rec = infile.readline().strip()action, resource_id = rec.split(":::")action_type = action.split("(")[0]action_param = re.findall(r"\((.*?)\)", action)[0]if action_type == "tap":prompt_template = prompts.tap_doc_templateprompt = re.sub(r"<ui_element>", action_param, prompt_template)elif action_type == "text":input_area, input_text = action_param.split(":sep:")prompt_template = prompts.text_doc_templateprompt = re.sub(r"<ui_element>", input_area, prompt_template)elif action_type == "long_press":prompt_template = prompts.long_press_doc_templateprompt = re.sub(r"<ui_element>", action_param, prompt_template)elif action_type == "swipe":swipe_area, swipe_dir = action_param.split(":sep:")if swipe_dir == "up" or swipe_dir == "down":action_type = "v_swipe"elif swipe_dir == "left" or swipe_dir == "right":action_type = "h_swipe"prompt_template = prompts.swipe_doc_templateprompt = re.sub(r"<swipe_dir>", swipe_dir, prompt_template)prompt = re.sub(r"<ui_element>", swipe_area, prompt)else:breaktask_desc = open(task_desc_path, "r").read()prompt = re.sub(r"<task_desc>", task_desc, prompt)doc_name = resource_id + ".txt"doc_path = os.path.join(docs_dir, doc_name)if os.path.exists(doc_path):doc_content = ast.literal_eval(open(doc_path).read())if doc_content[action_type]:if configs["DOC_REFINE"]:suffix = re.sub(r"<old_doc>",doc_content[action_type],prompts.refine_doc_suffix,)prompt += suffixprint_with_color(f"Documentation for the element {resource_id} already exists. The doc will be "f"refined based on the latest demo.","yellow",)else:print_with_color(f"Documentation for the element {resource_id} already exists. Turn on DOC_REFINE "f"in the config file if needed.","yellow",)continueelse:doc_content = {"tap": "","text": "","v_swipe": "","h_swipe": "","long_press": "",}print_with_color(f"Waiting for GPT-4V to generate documentation for the element {resource_id}","yellow",)status, rsp = mllm.get_model_response(prompt, [img_before, img_after])if status:doc_content[action_type] = rspwith open(log_path, "a") as logfile:log_item = {"step": i,"prompt": prompt,"image_before": f"{demo_name}_{i}.png","image_after": f"{demo_name}_{i + 1}.png","response": rsp,}logfile.write(json.dumps(log_item) + "\n")with open(doc_path, "w") as outfile:outfile.write(str(doc_content))doc_count += 1print_with_color(f"Documentation generated and saved to {doc_path}", "yellow")else:print_with_color(rsp, "red")time.sleep(configs["REQUEST_INTERVAL"])print_with_color(f"Documentation generation phase completed. {doc_count} docs generated.", "yellow"
)
- 读取演示记录文件,逐行处理每个步骤。
- 根据操作类型(如
tap
、text
、long_press
、swipe
)生成相应的提示模板。 - 如果文档已存在且
DOC_REFINE
配置为True
,则对现有文档进行优化。 - 调用大语言模型生成文档,并将生成的文档保存到指定目录。
- 记录日志信息,包括步骤、提示、前后截图和模型响应。
- 控制请求间隔时间,避免频繁请求。
3. 示例
文件准备:AppAgent源码 (step_recorder.py)
python document_generation.py --app notes
生成文件:
- demo_docs/com.miui.notes.id_list_root_view_com.miui.notes.id_content_add_1.txt
{'tap': 'Tapping this UI element will open the keyboard, allowing the user to input or edit text within the task entry field. This is useful for adding or modifying task details.', 'text': '', 'v_swipe': '', 'h_swipe': '', 'long_press': ''}
- log_notes_demo_notes_2024-12-27_14-16-43.txt
{"step": 1,"prompt": "I will give you the screenshot of a mobile app before and after tapping the UI element labeled \nwith the number 4 on the screen. The numeric tag of each element is located at the center of the element. \nTapping this UI element is a necessary part of proceeding with a larger task, which is to record \"hello world\". Your task is to \ndescribe the functionality of the UI element concisely in one or two sentences. Notice that your description of the UI \nelement should focus on the general function. For example, if the UI element is used to navigate to the chat window \nwith John, your description should not include the name of the specific person. Just say: \"Tapping this area will \nnavigate the user to the chat window\". Never include the numeric tag of the UI element in your description. You can use \npronouns such as \"the UI element\" to refer to the element.\nA documentation of this UI element generated from previous demos is shown below. Your \ngenerated description should be based on this previous doc and optimize it. Notice that it is possible that your \nunderstanding of the function of the UI element derived from the given screenshots conflicts with the previous doc, \nbecause the function of a UI element can be flexible. In this case, your generated description should combine both.\nOld documentation of this UI element: Tapping this UI element will open the keyboard, allowing the user to input or edit text within the task entry field.","image_before": "demo_notes_2024-12-27_14-16-43_1.png","image_after": "demo_notes_2024-12-27_14-16-43_2.png","response": "Tapping this UI element will open the keyboard, allowing the user to input or edit text within the task entry field. This is useful for adding or modifying task details."
}
总的文件目录: