I have problems understanding, what and why you did (and how you did it) to the model structure of the original in roneneldan/TinyStories-33M.
In the original TinyStories the ONNX model seems to have following structure:
Three INPUTS (input_ids, attention_mask, Position_ids) and only one OUTPUT (logits) - this is exactly the structure of the GPT-NEO that is the basis of the model.
Whereas sentis tiny stories model has got only one INPUT (Input index) and NINE outputs (values and keys).
To check the difference I used python optimum to export to ONNX model from the original roneneldan/TinyStories-33M model (pt-model) with: optimum-cli export onnx --opset 15 --model model --task text-generation model_tinyorig_ONNX
Reason why I did that is: I trained a new GPT-NEO model (which has got the same structure as roneneldan/TinyStories-33M) exported it to ONNX with the same cli command as above and tried to use it instead of the sentis-model in the same sample code of tiny stories.cs and get the error (as with the original model from roneneldan), that the input dimensions of the model are not correct (which is natural since the models input and output structure differs).
(error message: AssertionException: ModelOutputs.ValueError: inputs length does not equal model input count 1, 3 Assertion failure. Value was False Expected: True)
Is there a guide or description how exactly you exported the model to ONNX / further explanation of the process?
Thank you in advance!