[Generative AI] VoiceGPT

Now available offline!

VoiceGPT is a highly-realistic Text To Voice AI solution for Fantasy Voices. This Unity Editor’s Extension tool can create voices from text, clone voices, trim, combine and equalize audio files. Choose from 60+ voices. This service does not entail subscriptions or repetitive payments. This documentation will help you get started with using the extension and provide information on how to use the asset to its best capacity.

Link to the asset - VoiceGPT

EXAMPLES
- Voices

In the code of destiny, debug doubts and execute the program of unwavering determination. :arrow_forward: PLAY!

Beyond the screen, discover the uncharted lands of perseverance and claim the trophy of resilience. :arrow_forward: PLAY!

Press forward, no matter the level. The adventure of a lifetime awaits in the next frame. :arrow_forward: PLAY!

Seriously!? Can you stop teasing me? It isn’t funny anymore. :arrow_forward: PLAY!

- Accents

Now, retired, I sit in my small dacha, sipping hot tea, memories of comrades and distant battles warming my heart. My babushka’s borscht, a taste of home, brings comfort in the quiet days. Life was tough. :arrow_forward: PLAY!

Ja! It’s all so different now… Used to bike through tulip fields, and now, dodging zombies! I hate this! :arrow_forward: PLAY!

- Non-Word Sounds

Compilation by different characters of - Slow laugh, Ouch, Uh Uh Ah, Uh huh, Uff, Aha, Nuh uh, Mmm, Oh - :arrow_forward: PLAY!

- Languages

千里之行,始于足下 :arrow_forward: PLAY!
Die beste Zeit für einen Neuanfang ist jetzt. :arrow_forward: PLAY!
सपने वो नहीं जो हम सोते वक्त देखते हैं, सपने वो हैं जो हमें सोने नहीं देते। :arrow_forward: PLAY!
Liberté, égalité, fraternité. :arrow_forward: PLAY!
삶이 있는 한 희망은 있다 :arrow_forward: PLAY!
Onde há vontade, há um caminho. :arrow_forward: PLAY!
La vita è breve, l’arte è lunga. :arrow_forward: PLAY!
A los Tontos No les Dura el Dinero :arrow_forward: PLAY!
Doe normaal, dan doe je al gek genoeg :arrow_forward: PLAY!
Az élet szép :arrow_forward: PLAY!
Güzel şeylere inan :arrow_forward: PLAY!
Fortuna kołem się toczy. :arrow_forward: PLAY!
I"العقل زينة. :arrow_forward: PLAY!
Co tě nezabije, to tě posílí. :arrow_forward: PLAY!
Береги платье снову, а честь смолоду. :arrow_forward: PLAY!

Note: All languages are available in all the 60+ voices.
You can find more examples listed in the documentation

ABOUT
VoiceGPT is an LAM (Large Audio Model) of networks and libraries that are capable of life-like voice generation through text using AI and deep learning made for Unity. Works in realtime, both in, Edit Mode or Play Mode inside of the Unity Editor or any mobile device. This asset has a one-click, beginner friendly GUI and does not require any coding to use.

QUOTA
500,000 characters per month of voice over and narration takes with VoiceGPT. 500,000 characters translates to 150 pages of 12-point text in Calibri. This quota is issued on the 1st of every month. Process up to 8x characters more.

LINKS
Documentation | Website

Please note: The voices you hear in this description and the videos (Trailer and Getting Started) are AI generated.
Please check out the forum page for the latest developments and discussion related to this asset. We are researching and adding more functionality continuously. Your support is appreciated.

FEATURES
Ultra Fast Voice Cloning: Clone any voice with just 3-6 seconds of the voice clip.

Text to Voice Converter: Simply enter the text to be voiced out and click on generate. Get game ready voices with any voice of your choice plus 60 more options.

Language and Accent Support: The VoiceGPT_X model supports different languages such as English, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Dutch, Hungarian Turkish, Polish, Arabic, Czech, and Russian.

Voice Modulation controls: These controls allow users to adjust parameters such as speech clarity and variability in voices, as well as add emotions through text prompting. By manipulating these parameters, users can customize the generated speech to better suit their needs and preferences.

:wavy_dash: Preview waveform: Play sound clips right inside the editor without going into the play mode. Scrub the play head to play any part of the clip. Timestamps and simple graphic of the waveform is shown for better clarity inside the editor.

:scissors: Trim audio: A user friendly GUI in the Editor to trim the ends of an audio clip if in case a part of the clip is not required or is empty.

:heavy_plus_sign: Combine clips: Multiple audio clips can be combined into one using an intuitive user friendly feature in the editor. Simply select clips, rearrange their order with ease and merge them into one.

:gear: Equalize tracks: Mastering audio clips involves equalization of clips which can easily be done within the editor itself. Simply select the clip, adjust gain, pitch and frequency band sliders. A 6 band equalization is offered in the editor.

Editor Script: The Editor Script displays all the options neatly in one panel. The editor has an in-built preview audio player. Simple design for trimming, combining and equalizing or mastering audio tracks.

EDITOR
Keeping it all in the editor: Keeping all assets in one workspace inside the Editor and having to switch to fewer services can have several benefits, such as:

  • Improved Efficiency: When all assets are located in one workspace, it becomes easier to access and manage them. Users do not have to spend time switching between different services or applications, which can be time-consuming and lead to a loss of productivity.

  • Streamlined Workflow: Having all assets in one workspace can help create a more streamlined workflow. This is because users can easily move between different assets, such as code files, images, and documents, without having to navigate between different services. This can help to speed up the development process and make it more efficient.

  • Reduced Complexity: Using fewer services can help to reduce the complexity of the development process.

In the pack, you will find a demo scene and an editor window which help you to access the TTS models. There are other useful audio settings like trimming, combining and mastering the audio track that can be accessed through the VoiceGPT Editor Window.

DEPENDENCIES
This tool requires the Editor Coroutines package from the package manager and an active internet connection.

LIMITATIONS
Since this tool is still under development, there are a few limitations:

  • Process up to 500 character at a single time. This limit will increase as we scale up.
  • There are around 60+ voices to choose from. With Voice Cloning, you can add how ever many you’d like.
  • Audio generation time is ~5 seconds per clip. This may increase with an increased number of tokens and user base.
  • Character count per month is limited to 500,000.
2 Likes

oulala I want that !!

1 Like

This is what I need! I checked samples of all languages and they sound fantastic, the sound is generated in real time, I was looking for a plugin for Unity that would allow me to generate a unique sound for each NPC in each language and this is the program! The NPC would read the generated text and convert it into speech. Unfortunately, there is one serious drawback that disqualifies your program, you require an Internet connection and there are limits on the number of words, I would pay even 3x more if I could use it offline without limits, even if the weight of the package would require a huge amount of disk space and a powerful processor, I am disappointed and at the same time very impressed by such multilingual technology!

Yet another server based product that cannot be unyoked from the cloud. You can do the same with raspy or any number of huggingface repos with ONNX for import to Unity. You would make real money with a portable solution.

1 Like

Hello ippdev,

Thank you for this feedback.

Luckily, one of our server based asset now ships with an offline version. Occupies about 2GB and generates images on your own hardware without being connected to the internet. We are slowly starting to convert all models to an offline + online hybrid version (offline for users that have a high compute graphics card and online for those who want to deploy the model to mobile phones).

Hello Pioxon,

Just an hour ago we published an offline version for Ai.Fy which runs inside the Editor on your own hardware!
You can use the offline version for Ai.Fy without limits.

We are also working on an offline version for VoiceGPT. We will update VoiceGPT with the offline version as soon as we are able to develop and ship it to Unity!

hi
asset is fine
there is an error, it breaks DeepVoice:

The name 'WaveUtils' does not exist in the current context
The name 'FoldOuts' does not exist in the current context

unsure how to proceed

Hello the_unity_saga,

We will update the asset so you can use DeepVoice and VoiceGPT together in the same project file.

Edit: We have sent out an update that should resolve the conflicts between the assets.

1 Like

I opened a support ticket two days ago with this exact same issue, and the solution. Plus other issues, some with solutions and others without. I guess that ticket hasn’t been looked at yet.

Here is an excerpt of the ticket I created, the solution to this issue is the first bullet point, but you’ll have other issues afterwards, which have their solution in this ticket too.

  1. When installing the package, if one already has DeepVoice installed in the same project, errors show up in the console about the “FoldOuts” namespace being unknown. Adding “using AiKodexVoiceGPT” to DeepVoiceEditor.cs fixes the errors.

  2. When opening the VoiceGPT window for the first time, an error appers in the console saying the folder “Assets/VoiceGPT/Voices/” does not exist. It isn’t clear to me whether VoiceGPT is DeepVoice v3 or a different asset entirely, but right now it installs itself into DeepVoice and merges with it. Creating that “Voices” folder fixes the error.

  3. When testing one of the preview voices in VoiceGPT (let’s say the very first one), I get a cascade of errors in the console. The stack trace is a bit long, but here is the relevant info:

NullReferenceException
System.Reflection.RuntimeMethodInfo.Invoke (System.Object obj, System.Reflection.BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[ ] parameters, System.Globalization.CultureInfo culture) (at <1c8569827291471e9db0dcd976e97952>:0)
Rethrow as TargetInvocationException: Exception has been thrown by the target of an invocation.
System.Reflection.RuntimeMethodInfo.Invoke (System.Object obj, System.Reflection.BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[ ] parameters, System.Globalization.CultureInfo culture) (at <1c8569827291471e9db0dcd976e97952>:0)
System.Reflection.MethodBase.Invoke (System.Object obj, System.Object[ ] parameters) (at <1c8569827291471e9db0dcd976e97952>:0)
AiKodexVoiceGPT.VoiceGPTEditor.PlayClip (UnityEngine.AudioClip clip, System.Int32 startSample, System.Boolean loop) (at Assets/DeepVoice/Editor/Scripts/VoiceGPTEditor.cs:1342)
AiKodexVoiceGPT.VoiceGPTEditor.OnGUI () (at Assets/DeepVoice/Editor/Scripts/VoiceGPTEditor.cs:373)

This error is fixed by creating a folder named “Preview Voices” inside the manually created “Voices” folder (see above) and moving all the preview voices wav files inside it.

  1. When moving my mouse pointer over the VoiceGPT window, I get a bunch of “null texture passed to GUI.DrawTexture” warnings in the console. Here is the stack trace:

null texture passed to GUI.DrawTexture
UnityEngine.GUI:smile:rawTexture (UnityEngine.Rect,UnityEngine.Texture,UnityEngine.ScaleMode,bool,single)
AiKodexVoiceGPT.VoiceGPTEditor:OnGUI () (at Assets/DeepVoice/Editor/Scripts/VoiceGPTEditor.cs:299)
UnityEngine.GUIUtility:ProcessEvent (int,intptr,bool&)

The thing that concerns me the most is that the output voices, while good, are only 22kHz, while DeepVoice outputs 44kHz, and the difference is noticeable. Voice cloning works well, but I didn’t test it thoroughly. If it works as well as it seems, one could reuse 15-second long voice files generated with DeepVoice, and use them as a base to generate voice lines with VoiceGPT. But with the current difference in sound quality, you won’t want to do that yet. The player would immediately notice the difference between sounds created with DeepVoice and sounds created with VoiceGPT, even if they’re the same voice.

Hope this helps.

Can anybody say something about current laws? Is it allowed to use sound files generated by AI in your final build to be published on Google/apple stores ?

I have read a few articles but so far it seems it’s fine as long as it’s not offensive content or against their terms and Conditions

Hello NightKin,

Thank you so much for communicating these issues in such a detailed manner. We are grateful for the time you took out to write this forum reply and also give out a few solutions.

Point number 1, 2, 3 and 4 can be solved by conflicting GUIDs and we have released an update to fix this. Should be live in a a day or two.

We are working hard to develop an offline system, since it is very clear to us that developers want to be able to inference locally - which is what VoiceGPT aims to accomplish in a few weeks.

Thank you for the quick answer. I just wanted to point that although you answered in private to my support ticket as well, you also referred here about “point 6” which was a private remark to you only.

Hi cyrus234,

According to Google, beginning early this year, applications using AI-generated content will need to include a feature allowing users to flag or report offensive material in order to maintain their presence on Google’s Play Store though there are exceptions to this policy, notably for applications that only use AI to summarize existing content, such as books, and productivity apps integrating AI as a feature. In such cases, the new policy does not apply. For character voicing, which is mostly our use-case, unless they are offensive and politically charged attracting hate and controversy, there isn’t much to be concerned about.

As long as the AI-generated voices are not utilized to promote hate or discrimination, they may be included into your application, game, visual novel, etc that you are building. Similar policies are put out by Apple App Store and Steam as they do not want to get roped into anything political that was stirred up using their service, naturally.

Avoid the use of AI-generated content to promote hate or discrimination within your application and you should not have any issues.

2 Likes

Ah, thank you for pointing this out, we have edited our reply.

what is the difference between deepvoice and voicegpt? whats does voicegpt do that deepvoice doesnt?

Hi LoveAndDreams,

Without a lot of fluff, here’s the direct answer:
VoiceGPT has an edge on:

  • Voice Cloning
  • Massive Quota
  • Faster Inference speeds
  • A great possibility of an offline version

DeepVoice has an edge on:

  • 44Khz generations (high audio quality)
  • Undoubtedly studio-like vocals with Mono and Multi models
  • More languages
  • More AI models to choose from
1 Like

Is this asset meant for use at runtime? It seems to be tied to the AssetDatabase, and i can’t seem to decouple it. Is it just for editor usage?
It can’t be built into a player, only used in-editor right now.

Hi TaylorCaudle,

In the CanvasController script, you have the option to decouple the AssetDatabase Editor API. The script currently contains only a few calls that specify paths and instruct the Editor to Refresh to display the asset, which isn’t essential during runtime builds so you can easily remove them.

However, we advise against using the cloud-based service for runtime builds. This recommendation stems from the risk tied with changes to our non-static endpoint for the asset, which could of course affect your application’s up-status. So you see why maintaining your app’s compatibility with the asset’s API would require you to keep updating your app as our endpoint changes, requiring constant work and monitoring from you.

[Announcement]

We are glad to announce that VoiceGPT will be available offline in the next update!
Please select the local model and you will be able to generate voices locally using your own hardware.
Please note that this is just the initial release so improvements will follow.

1 Like

All good yeah I figured swapped that out with some file redirecting. Just needed some TTS for an assistant tool I use.