如何把huggingface格式的whisper模型转为openai格式

1. 摘要

openai目前提供的模型有tiny，tiny.en，base，base.en，small，small.en，medium，medium.en，large-v1，large-v2，large-v3共11种，其中en结尾的是英语模型，由于whisper模型的微调开源的，在huggingface中可以找到各种微调后的模型，比如针对识别泰语优化的模型，我们可以使用huggingface格式的模型来使用whisper进行语音识别，那如果我想要在原先已经写好的基于openai格式的whisper模型进行语音识别，那么我们就需要想办法把huggingface格式的whisper模型转为openai格式，这也是本篇文章要讲的内容。

2. convert hf to openai

首先我们需要安装两个python依赖

pip install openai-whisper transformers -i https://pypi.tuna.tsinghua.edu.cn/simple

然后我们需要到huggingface中找一个需要转换的whisper模型，这里我找的是使用泰语微调好的large-v3模型

复制下面的代码到你电脑中，并命名为convert_hf_to_openai.py

import argparseimport torch
from torch import nnfrom transformers import WhisperConfig, WhisperForConditionalGeneration# Create the reverse mapping adapting it from the original `WHISPER_MAPPING` in
# the `convert_openai_to_hf.py` script:
REVERSE_WHISPER_MAPPING = {"layers": "blocks","fc1": "mlp.0","fc2": "mlp.2","final_layer_norm": "mlp_ln",".self_attn.q_proj": ".attn.query",".self_attn.k_proj": ".attn.key",".self_attn.v_proj": ".attn.value",".self_attn_layer_norm": ".attn_ln",".self_attn.out_proj": ".attn.out",".encoder_attn.q_proj": ".cross_attn.query",".encoder_attn.k_proj": ".cross_attn.key",".encoder_attn.v_proj": ".cross_attn.value",".encoder_attn_layer_norm": ".cross_attn_ln",".encoder_attn.out_proj": ".cross_attn.out","decoder.layer_norm.": "decoder.ln.","encoder.layer_norm.": "encoder.ln_post.","embed_tokens": "token_embedding","encoder.embed_positions.weight": "encoder.positional_embedding","decoder.embed_positions.weight": "decoder.positional_embedding",
}def reverse_rename_keys(s_dict: dict) -> dict:"""Renames the keys back from Hugging Face to OpenAI Whisper format.By using this function on an HF model's state_dict, we should get the names in the format expected by Whisper.Args:s_dict (`dict`): A dictionary with keys in Hugging Face format.Returns:`dict`: The same dictionary but in OpenAI Whisper format."""keys = list(s_dict.keys())for orig_key in keys:new_key = orig_keyfor key_r, value_r in REVERSE_WHISPER_MAPPING.items():if key_r in orig_key:new_key = new_key.replace(key_r, value_r)# print(f"{orig_key} -> {new_key}")s_dict[new_key] = s_dict.pop(orig_key)return s_dictdef make_emb_from_linear(linear: nn.Linear) -> nn.Embedding:"""Converts a linear layer's weights into an embedding layer.The linear layer's `in_features` dimension corresponds to the vocabulary size and its `out_features` dimensioncorresponds to the embedding size.Args:linear (`nn.Linear`): The linear layer to be converted.Returns:`nn.Embedding`:An embedding layer with weights set to those of the input linear layer."""vocab_size, emb_size = linear.weight.data.shapeemb_layer = nn.Embedding(vocab_size, emb_size, _weight=linear.weight.data)return emb_layerdef extract_dims_from_hf(config: WhisperConfig) -> dict:"""Extracts necessary dimensions from Hugging Face's WhisperConfig.Extracts necessary dimensions and related configuration data from the Hugging Face model and then restructure itfor the OpenAI Whisper format.Args:config (`WhisperConfig`): Configuration of the Hugging Face's model.Returns:`dict`: The `dims` of the OpenAI Whisper model."""dims = {"n_vocab": config.vocab_size,"n_mels": config.num_mel_bins,"n_audio_state": config.d_model,"n_text_ctx": config.max_target_positions,"n_audio_layer": config.encoder_layers,"n_audio_head": config.encoder_attention_heads,"n_text_layer": config.decoder_layers,"n_text_head": config.decoder_attention_heads,"n_text_state": config.d_model,"n_audio_ctx": config.max_source_positions,}return dimsdef convert_tfms_to_openai_whisper(hf_model_path: str, whisper_dump_path: str):"""Converts a Whisper model from the Hugging Face to the OpenAI format.Takes in the path to a Hugging Face Whisper model, extracts its state_dict, renames keys as needed, and then savesthe model OpenAI's format.Args:hf_model_path (`str`):Path to the pretrained Whisper model in Hugging Face format.whisper_dump_path (`str`):Destination path where the converted model in Whisper/OpenAI format will be saved.Returns:`None`"""print("HF model path:", hf_model_path)print("OpenAI model path:", whisper_dump_path)# Load the HF model and its state_dictmodel = WhisperForConditionalGeneration.from_pretrained(hf_model_path)state_dict = model.state_dict()# Use a reverse mapping to rename state_dict keysstate_dict = reverse_rename_keys(state_dict)# Extract configurations and other necessary metadatadims = extract_dims_from_hf(model.config)# Remove the proj_out weights from state dictionarydel state_dict["proj_out.weight"]# Construct the Whisper checkpoint structurestate_dict = {k.replace("model.", "", 1): v for k, v in state_dict.items()}whisper_checkpoint = {"dims": dims, "model_state_dict": state_dict}# Save in Whisper's formattorch.save(whisper_checkpoint, whisper_dump_path)if __name__ == "__main__":parser = argparse.ArgumentParser()# Required parametersparser.add_argument("--checkpoint",type=str,help="Path of name of the Hugging Face checkpoint.",  # noqa: E501)parser.add_argument("--whisper_dump_path",type=str,help="Path to the output Whisper model.",  # noqa: E501)args = parser.parse_args()convert_tfms_to_openai_whisper(args.checkpoint, args.whisper_dump_path)

最后我们可以使用下面两种方式来把huggingface格式的whisper模型转为openai格式

2.1 使用命令行方式

python convert_hf_to_openai.py \--checkpoint ./whisper-large-v3-Thai \--whisper_dump_path large-v3.th.pt

第一个参数，表示指定从huggingface中下载的模型，第二个参数，表示你要转换成openai格式的模型名称，这里的名称可以自定义，你不一定要命名为large-v3.th.pt

2.2 使用Python代码方式

import whisper
from transformers.models.whisper.convert_hf_to_openai import convert_tfms_to_openai_whisper
convert_tfms_to_openai_whisper("./whisper-large-v3-Thai", "large-v3.th.pt")

第一个参数，表示你要转换的huggingface格式的模型，第二个参数，表示你要转换成openai格式的模型名称

更多内容欢迎访问我的个人技术分享博客
文章对应的视频可以到我的B站中观看

3. 参考文章

[1] 把hf格式的whisper模型转为openai格式

[2] transformers中关于Whisper使用文档