基于 silero vad 的声纹提纯

📅 2026/6/24 10:41:45

支持提取干净人声有人声总时长无人声总时长最大无人声区间时长fromsilero_vadimportload_silero_vad,get_speech_timestamps,collect_chunksfromsrc.ultisimportload_audiodefpurified_voice(self,audio_source,sample_rate16000,min_silence_duration_ms700,speech_pad_ms100,output_pathNone):声音/声纹提纯 Args: audio_source (str | Path | bytes | np.ndarray | torch.Tensor): 支持路径、字节流、Numpy、Tensor. sample_rate (int, optional): Defaults to 16000. min_silence_duration_ms (int, optional): 最少静音持续时间, Defaults to 700. speech_pad_ms (int, optional): 人声边缘缓冲, Defaults to 100. output_path (str, optional): Defaults to None. Returns: Dict: - torch.Tensor: [1, t]. - bool: True 为有人声且提纯; False 为无人声返回原音. - float: 有人声时长 (秒). - float: 无人声时长 (秒). - float: 最大无人声区间时长 (秒). waveform,sr,total_frames,duration_timeload_audio(audio_source,target_srsample_rate)ori_waveformwaveform.flatten()speech_timestampsget_speech_timestamps(ori_waveform,self.vad_model,sampling_ratesample_rate,threshold0.5,min_silence_duration_msmin_silence_duration_ms,speech_pad_msspeech_pad_ms)speech_durationsum(((seg[end]-seg[start])/sample_rateforseginspeech_timestamps),0.0)silence_durationmax(duration_time-speech_duration,0.0)max_silence_interval0.0ifnotspeech_timestamps:max_silence_intervalduration_timeelse:max_silence_intervalmax(max_silence_interval,speech_timestamps[0][start]/sample_rate)foriinrange(len(speech_timestamps)-1):gap(speech_timestamps[i1][start]-speech_timestamps[i][end])/sample_rate max_silence_intervalmax(max_silence_interval,gap)max_silence_intervalmax(max_silence_interval,(len(ori_waveform)-speech_timestamps[-1][end])/sample_rate)ifspeech_timestamps:purified_waveformcollect_chunks(speech_timestamps,ori_waveform)purified_waveformpurified_waveform[None,:]ifoutput_path:torchaudio.save(output_path,purified_waveform,sample_rate,encodingPCM_S,bits_per_sample16)ifself.debug:print(f提纯完成, waveform shape:{ori_waveform.shape}, purified_waveform shape:{purified_waveform.shape}, 已成功保存至{output_path})return{waveform:purified_waveform,has_speech:True,speech_duration:round(speech_duration,4),silence_duration:round(silence_duration,4),max_silence_interval:round(max_silence_interval,4),}else:ifself.debug:print(未在音频中检测到有效人声。)return{waveform:ori_waveform[None,:],has_speech:False,speech_duration:0.0,silence_duration:duration_time,max_silence_interval:duration_time}

新闻详情

相关阅读

中润苏能：通用油选大厂，细分流体选苏能

个人微信自动化为何频发内存溢出？从 WechatApi 看多媒体消息的云端同步与清理架构

如何用最长公共子串与Jaccard相似度提升吊牌OCR容错率

Rust+DeepSeek构建语义化API Mock服务

trae平台中OpenCLAW技能的正确安装与原理详解

个人开发者的能力操作系统：Skill协议设计与实践

OpenSpec契约驱动开发：终结Vibe Coding的接口混乱

指针的本质：从内存地址到智能指针的全链路解析

Trae：重构编程工作流的操作系统级AI开发工具

UVA10082 WERTYU（洛谷-UVA10082）

2026怎么选能支持多流派解盘逻辑的AI辅助解盘工具？资深专家教你看懂底层算力

RAG 系统中「检索质量」与「生成质量」之间那道隐形的鸿沟，到底是怎么形成的？

3个步骤让小爱音箱变身AI语音助手：MiGPT深度体验指南

PDF对比终极指南：用diff-pdf轻松识别文档差异的完整教程

嵌入式GUI控件实战：ROTARY、SCROLLBAR、SLIDER原理与应用