使用 Google Gemini 3.5 Flash 进行零样本目标检测:入门教程

📅 2026/6/26 4:03:49
使用 Google Gemini 3.5 Flash 进行零样本目标检测:入门教程
使用 Google Gemini 3.5 Flash 进行零样本目标检测入门教程本教程演示如何使用 Google Gemini 3.5 Flash 进行零样本目标检测并使用 supervision 库解析模型返回的检测结果。本文将覆盖以下内容为 Gemini 构造目标检测提示词使用单个提示词完成多类别检测分类别检测并合并检测结果针对密集场景使用结构化输出强制 JSON 目录 安装必需软件包 配置 API Key️ 下载示例图片️ 导入依赖与工具函数 示例牛油果检测单次提示词 示例热气球 示例鸟类 示例香蕉 示例车辆与车道 示例密封包裹️ 示例包裹标签 示例传送带上的包裹 示例黄色游泳圈自由格式响应 示例人员检测结构化输出安装必需软件包!pip install-q google-genaisupervision githttps://github.com/roboflow/supervision.gitadd-gemini-3.5-vlm-support配置 API Key你需要从 Google AI Studio 获取 Gemini API key。然后在 Colab 的 Secrets 中将它添加为GOOGLE_API_KEY。fromgoogle.colabimportuserdatafromgoogleimportgenai GOOGLE_API_KEYuserdata.get(GOOGLE_API_KEY)clientgenai.Client(api_keyGOOGLE_API_KEY)下载示例图片!wget-q https://storage.googleapis.com/com-roboflow-marketing/playground-examples/pexels-vanessa-loring-5966631.jpg !wget-q https://storage.googleapis.com/com-roboflow-marketing/playground-examples/pexels-eyup-sayar-290427017-18373303.jpg !wget-q https://storage.googleapis.com/com-roboflow-marketing/playground-examples/pexels-mutecevvil-18013812.jpg !wget-q https://storage.googleapis.com/com-roboflow-marketing/playground-examples/pexels-shvets-production-7195054.jpg !wget-q https://storage.googleapis.com/com-roboflow-marketing/playground-examples/pexels-spencer-4353558.jpg !wget-q https://storage.googleapis.com/com-roboflow-marketing/playground-examples/top-shot-of-a-worker-scanning-boxes-using-a-bar-co-2026-01-11-09-59-09-utc.jpg !wget-q https://storage.googleapis.com/com-roboflow-marketing/playground-examples/warehouse-workers-inspecting-boxes-along-conveyor-2026-01-11-09-55-23-utc.jpg !wget-q https://storage.googleapis.com/com-roboflow-marketing/playground-examples/top-view-of-people-relaxing-in-the-pool-on-yellow-2026-03-24-21-54-59-utc.jpg !wget-q https://storage.googleapis.com/com-roboflow-marketing/playground-examples/aerial-drone-photograph-of-traffic-jam-in-metropol-2026-03-18-17-36-02-utc.jpg导入依赖与工具函数这里定义一个提示词模板要求 Gemini 返回[ymin, xmin, ymax, xmax]格式的边界框并将坐标归一化到 0-1000 范围。同时我们还定义一个可复用的图像标注函数用于把检测框绘制到图片上。fromgoogle.genaiimporttypesfrompydanticimportBaseModel,FieldfromPILimportImageimportsupervisionassv DETECTION_PROMPT_TEMPLATE Carefully examine this image and detect ALL visible objects, including small, distant, or partially visible ones. IMPORTANT: Focus on finding as many objects as possible, even if you are only moderately confident. Make sure each bounding box is as tight as possible. Valid object classes: {class_list} For each detected object, provide: - label: the exact class name from the list above - confidence: your certainty (between 0.0 and 1.0) - box_2d: the bounding box [ymin, xmin, ymax, xmax] normalized to 0-1000 Detect everything that matches the valid classes. Do not be conservative; include objects even with moderate confidence. Return a JSON array, for example: [ {{label: {class_example}, confidence: 0.95, box_2d: [100, 200, 300, 400]}} ] COLORsv.ColorPalette.from_hex([#ffff00,#ff9b00,#ff66ff,#3399ff,#ff66b2,#ff8080,#b266ff,#9999ff,#66ffff,#33ff99,#66ff66,#99ff00])classDetection(BaseModel):label:strconfidence:floatField(ge0,le1)box_2d:list[int]Field(min_length4,max_length4)defbuild_detection_prompt(classes:list[str])-str:returnDETECTION_PROMPT_TEMPLATE.format(class_list, .join(classes),class_exampleclasses[0],).strip()defannotate_image(image,detections,with_labelsTrue):text_scalesv.calculate_optimal_text_scale(resolution_whimage.size)thicknesssv.calculate_optimal_line_thickness(resolution_whimage.size)annotatedimage.copy()annotatedsv.BoxAnnotator(colorCOLOR,thicknessthickness).annotate(annotated,detections)ifwith_labels:annotatedsv.LabelAnnotator(colorCOLOR,text_colorsv.Color.BLACK,text_scaletext_scale,text_thicknessthickness,smart_positionTrue,).annotate(annotated,detections)annotated.thumbnail((1000,1000))returnannotated示例牛油果检测单次提示词在一次 API 调用中检测所有与牛油果相关的类别。IMAGE_PATHpexels-vanessa-loring-5966631.jpgCLASSES[avocado with the pit,avocado without the pit,pit]imageImage.open(IMAGE_PATH)promptbuild_detection_prompt(CLASSES)responseclient.models.generate_content(modelgemini-3.5-flash,contents[image,prompt],configtypes.GenerateContentConfig(temperature0,thinking_configtypes.ThinkingConfig(thinking_budget0)),)print(response.text)detectionssv.Detections.from_vlm(vlmsv.VLM.GOOGLE_GEMINI_3_5,resultresponse.text,resolution_whimage.size,classesCLASSES,)annotate_image(image,detections,with_labelsFalse)示例热气球IMAGE_PATHpexels-eyup-sayar-290427017-18373303.jpgCLASSES[air balloon]imageImage.open(IMAGE_PATH)promptbuild_detection_prompt(CLASSES)responseclient.models.generate_content(modelgemini-3.5-flash,contents[image,prompt],configtypes.GenerateContentConfig(temperature0,thinking_configtypes.ThinkingConfig(thinking_budget0)),)detectionssv.Detections.from_vlm(vlmsv.VLM.GOOGLE_GEMINI_3_5,resultresponse.text,resolution_whimage.size,classesCLASSES,)annotate_image(image,detections,with_labelsFalse)示例鸟类IMAGE_PATHpexels-mutecevvil-18013812.jpgCLASSES[bird]imageImage.open(IMAGE_PATH)promptbuild_detection_prompt(CLASSES)responseclient.models.generate_content(modelgemini-3.5-flash,contents[image,prompt],configtypes.GenerateContentConfig(temperature0,thinking_configtypes.ThinkingConfig(thinking_budget0)),)detectionssv.Detections.from_vlm(vlmsv.VLM.GOOGLE_GEMINI_3_5,resultresponse.text,resolution_whimage.size,classesCLASSES,)annotate_image(image,detections,with_labelsFalse)示例香蕉IMAGE_PATHpexels-shvets-production-7195054.jpgCLASSES[open banana,closed banana]imageImage.open(IMAGE_PATH)promptbuild_detection_prompt(CLASSES)responseclient.models.generate_content(modelgemini-3.5-flash,contents[image,prompt],configtypes.GenerateContentConfig(temperature0,thinking_configtypes.ThinkingConfig(thinking_budget0)),)detectionssv.Detections.from_vlm(vlmsv.VLM.GOOGLE_GEMINI_3_5,resultresponse.text,resolution_whimage.size,classesCLASSES,)annotate_image(image,detections,with_labelsFalse)示例车辆与车道IMAGE_PATHaerial-drone-photograph-of-traffic-jam-in-metropol-2026-03-18-17-36-02-utc.jpgCLASSES[car on 1st lane,car on 2nd lane,car on 3rd lane,car on 4th lane,car on 5th lane,car on 6th lane]imageImage.open(IMAGE_PATH)promptbuild_detection_prompt(CLASSES)responseclient.models.generate_content(modelgemini-3.5-flash,contents[image,prompt],)detectionssv.Detections.from_vlm(vlmsv.VLM.GOOGLE_GEMINI_3_5,resultresponse.text,resolution_whimage.size,classesCLASSES,)annotate_image(image,detections,with_labelsTrue)示例密封包裹IMAGE_PATHtop-shot-of-a-worker-scanning-boxes-using-a-bar-co-2026-01-11-09-59-09-utc.jpgCLASSES[saled package]imageImage.open(IMAGE_PATH)promptbuild_detection_prompt(CLASSES)responseclient.models.generate_content(modelgemini-3.5-flash,contents[image,prompt],configtypes.GenerateContentConfig(temperature0,thinking_configtypes.ThinkingConfig(thinking_budget0)),)detectionssv.Detections.from_vlm(vlmsv.VLM.GOOGLE_GEMINI_3_5,resultresponse.text,resolution_whimage.size,classesCLASSES,)annotate_image(image,detections,with_labelsTrue)示例包裹标签IMAGE_PATHtop-shot-of-a-worker-scanning-boxes-using-a-bar-co-2026-01-11-09-59-09-utc.jpgCLASSES[package label]imageImage.open(IMAGE_PATH)promptbuild_detection_prompt(CLASSES)responseclient.models.generate_content(modelgemini-3.5-flash,contents[image,prompt],configtypes.GenerateContentConfig(temperature0,thinking_configtypes.ThinkingConfig(thinking_budget0)),)detectionssv.Detections.from_vlm(vlmsv.VLM.GOOGLE_GEMINI_3_5,resultresponse.text,resolution_whimage.size,classesCLASSES,)annotate_image(image,detections,with_labelsTrue)示例传送带上的包裹IMAGE_PATHwarehouse-workers-inspecting-boxes-along-conveyor-2026-01-11-09-55-23-utc.jpgCLASSES[saled package with no label]imageImage.open(IMAGE_PATH)promptbuild_detection_prompt(CLASSES)responseclient.models.generate_content(modelgemini-3.5-flash,contents[image,prompt],configtypes.GenerateContentConfig(temperature0,thinking_configtypes.ThinkingConfig(thinking_budget0)),)detectionssv.Detections.from_vlm(vlmsv.VLM.GOOGLE_GEMINI_3_5,resultresponse.text,resolution_whimage.size,classesCLASSES,)annotate_image(image,detections,with_labelsTrue)示例黄色游泳圈自由格式响应在目标数量较多的密集场景中模型可能会在 JSON 数组完整结束之前截断响应。这里使用标准提示词不强制指定输出格式。IMAGE_PATHtop-view-of-people-relaxing-in-the-pool-on-yellow-2026-03-24-21-54-59-utc.jpgCLASSES[yellow swim ring]imageImage.open(IMAGE_PATH)promptbuild_detection_prompt(CLASSES)responseclient.models.generate_content(modelgemini-3.5-flash,contents[image,prompt],configtypes.GenerateContentConfig(temperature0,thinking_configtypes.ThinkingConfig(thinking_budget0)),)detectionssv.Detections.from_vlm(vlmsv.VLM.GOOGLE_GEMINI_3_5,resultresponse.text,resolution_whimage.size,classesCLASSES,)annotate_image(image,detections,with_labelsFalse)示例人员检测结构化输出使用response_mime_typeapplication/json和response_schema可以强制 Gemini 返回符合我们 schema 的合法 JSON。对于自由格式响应可能在 JSON 中途截断的密集场景这种方式尤其有用。IMAGE_PATHtop-view-of-people-relaxing-in-the-pool-on-yellow-2026-03-24-21-54-59-utc.jpgCLASSES[person]imageImage.open(IMAGE_PATH)promptbuild_detection_prompt(CLASSES)responseclient.models.generate_content(modelgemini-3.5-flash,contents[image,prompt],configtypes.GenerateContentConfig(response_mime_typeapplication/json,response_schemalist[Detection],temperature0,thinking_configtypes.ThinkingConfig(thinking_budget0)),)detectionssv.Detections.from_vlm(vlmsv.VLM.GOOGLE_GEMINI_3_5,resultresponse.text,resolution_whimage.size,classesCLASSES,)annotate_image(image,detections,with_labelsFalse)