支持提取干净人声有人声总时长无人声总时长最大无人声区间时长fromsilero_vadimportload_silero_vad,get_speech_timestamps,collect_chunksfromsrc.ultisimportload_audiodefpurified_voice(self,audio_source,sample_rate16000,min_silence_duration_ms700,speech_pad_ms100,output_pathNone):声音/声纹提纯 Args: audio_source (str | Path | bytes | np.ndarray | torch.Tensor): 支持路径、字节流、Numpy、Tensor. sample_rate (int, optional): Defaults to 16000. min_silence_duration_ms (int, optional): 最少静音持续时间, Defaults to 700. speech_pad_ms (int, optional): 人声边缘缓冲, Defaults to 100. output_path (str, optional): Defaults to None. Returns: Dict: - torch.Tensor: [1, t]. - bool: True 为有人声且提纯; False 为无人声返回原音. - float: 有人声时长 (秒). - float: 无人声时长 (秒). - float: 最大无人声区间时长 (秒). waveform,sr,total_frames,duration_timeload_audio(audio_source,target_srsample_rate)ori_waveformwaveform.flatten()speech_timestampsget_speech_timestamps(ori_waveform,self.vad_model,sampling_ratesample_rate,threshold0.5,min_silence_duration_msmin_silence_duration_ms,speech_pad_msspeech_pad_ms)speech_durationsum(((seg[end]-seg[start])/sample_rateforseginspeech_timestamps),0.0)silence_durationmax(duration_time-speech_duration,0.0)max_silence_interval0.0ifnotspeech_timestamps:max_silence_intervalduration_timeelse:max_silence_intervalmax(max_silence_interval,speech_timestamps[0][start]/sample_rate)foriinrange(len(speech_timestamps)-1):gap(speech_timestamps[i1][start]-speech_timestamps[i][end])/sample_rate max_silence_intervalmax(max_silence_interval,gap)max_silence_intervalmax(max_silence_interval,(len(ori_waveform)-speech_timestamps[-1][end])/sample_rate)ifspeech_timestamps:purified_waveformcollect_chunks(speech_timestamps,ori_waveform)purified_waveformpurified_waveform[None,:]ifoutput_path:torchaudio.save(output_path,purified_waveform,sample_rate,encodingPCM_S,bits_per_sample16)ifself.debug:print(f提纯完成, waveform shape:{ori_waveform.shape}, purified_waveform shape:{purified_waveform.shape}, 已成功保存至{output_path})return{waveform:purified_waveform,has_speech:True,speech_duration:round(speech_duration,4),silence_duration:round(silence_duration,4),max_silence_interval:round(max_silence_interval,4),}else:ifself.debug:print(未在音频中检测到有效人声。)return{waveform:ori_waveform[None,:],has_speech:False,speech_duration:0.0,silence_duration:duration_time,max_silence_interval:duration_time}