Could you please explain how is the original whisper tiny model exported to sentis format?

Could you please explain how is the original whisper tiny model exported to sentis format?
Thank you.

I know that I can use optimum to export encoder and decoder. And I confirmed that it can be replaced by those onnx.
I’d like to know how the LogMelSpectro model is exported.

The trick is to be able to export the STFT (onnx is not able to export that)

class LogMelNet(torch.nn.Module):
    def __init__(self):
        super(LogMelNet, self).__init__()
        self.filters = mel_filters('cpu', whisper.audio.N_MELS)
        self.logmelmodel = STFT(win_len=400, win_hop=160, fft_len=400)

    def forward(self, audio):
        magnitudes = self.logmelmodel(audio)
        mel_spec = self.filters @ magnitudes

        log_spec = torch.clamp(mel_spec, min=1e-10).log10()
        log_spec = torch.maximum(log_spec, log_spec.max() - 8.0)
        log_spec = (log_spec + 4.0) / 4.0
        return log_spec
1 Like

Thank you. I found out your previous post as well.
But I encounter some problem.
I got this problem.

ValueError: not enough values to unpack (expected 2, got 1)

I followed your previous post here.

shape error maybe?

audio = torch.randn(1, 16000*30)
logmel = whisper.log_mel_spectrogram(audio)
logmelmodel = LogMelNet()
torch.onnx.export(logmelmodel, (audio), "LogMelSepctro.onnx", export_params=True, do_constant_folding=True, input_names = ['audio'],  output_names=['log_mel'])#, dynamic_axes={'audio_input' : {1 : 'n_mels', 2 : 'n_ctx'}})
logmel2 = logmelmodel(audio)

this works

What’s your version of pytorch and conv-stft?
When I use 2.2.2 and 0.1.2(I don’t know why the pypi shows 0.2.0 but it installs 0.1.2), I encounter that torch.rfft is not defined problem. If I use torch.fft.rfft. It still reports error.
When I use 2.2.2 and the github source version, I encounter the problems I previously mention.

Here’s a colab notebook reproducing the problem.

python 3.9, conv-stft latest

I forgot to give the permission to the link. Could you please have a look at the link? I follow the same instructions you mention.

Can you please have a look at the reproducing notebook?
Can you provide the original LogMelSpectro.onnx?
The one in the repo is sentis format. I try to make a backend of MindSpore Lite which converter supports onnx model for Sentis.

Thank you.

I finally fix it with these lines. Use the latest version of conv_stft.

        stft = self.logmelmodel.transform(audio, return_type='magphase')[0]
        magnitudes = stft[..., :-1].abs() ** 2