[Generative AI] DeepVoice - Text To Voice

DeepVoice is an ultra-realistic Text To Voice AI solution. This tool can create voices from text, trim, combine and equalize audio files. Choose from 80+ voices.

No sign-up, No API Keys, no recurring payments, no subscription fees, no additional costs, just one-click easy to use inferences on our voice model.
ABOUT
DeepVoice is an LAM (Large Audio Model) of networks and libraries that are capable of life-like voice generation through text using AI and deep learning made for Unity.

INVOICE NUMBER
You can find the Invoice Number here : https://assetstore.unity.com/orders
Enter this invoice number to gain access to the Voice Generator. Please contact us at info@aikodex.com if you need any assistance.

QUOTA
30,000 characters per month (refreshed every 15 days → 15,000 characters) of voice over and narration takes with DeepVoice. 15,000 characters translates to 5 pages of 12-point text in Calibri. This quota is issued on the 15th and the 1st of every month.

LINKS
Works in realtime, both in, Edit Mode or Play Mode inside of the Unity Editor. This asset has a one-click, beginner friendly GUI and does not require any coding to use.

Website and Support | Documentation

Pipelines Supported: Standard, HDRP, URP and SRP. (All)

FEATURES
Text to Voice Converter: The main function of the asset is to provide you with ready for production voices. Simply enter the text to be voiced out and click on generate.

Examples for prompting:

Narration / Dialogues / Voice over / Dubbing
“In the darkest of nights, hope shines like a single star, reminding us that heroes are born from adversity.”

:arrow_forward:︎ Play

“Had to be me. Someone else might have gotten it wrong.”

:arrow_forward:︎ Play

“I think it was called Ueno Station, but I’m not sure. I’ve never been to Tokyo before, so everything is unfamiliar to me.”

:arrow_forward:︎ Play

Pauses
“So I think - I should take this route if I want to reach on time”

:arrow_forward:︎ Play

Or
“But well… I’m not entirely convinced”

:arrow_forward:︎ Play

Emotions
Note: The dialogue tag (“he said confused”, “he shouted angrily”) has been cut out using the audio trimmer within the asset.
“I have had enough!” he shouted angrily.

:arrow_forward:︎ Play

“I wish you were right, I truly do, but you’re not” he said, assertively.

:arrow_forward:︎ Play

Famous Personalities
“I don’t hire a lot of number-crunchers, and I don’t trust fancy marketing surveys. I do my own surveys and draw my own conclusions.”

:arrow_forward:︎ Play

“Nothing can stand in the way of the power of millions of voices calling for change.”

:arrow_forward:︎ Play

More examples are given in the description of the asset page.

Language and Accent Support: The DeepVoice_Multi model supports different languages such as English, Japanese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian.

Voice Modulation controls: These controls allow users to adjust parameters such as speech clarity and variability in voices, as well as add emotions through text prompting. By manipulating these parameters, users can customize the generated speech to better suit their needs and preferences.

:wavy_dash: Preview waveform: Play sound clips right inside the editor without going into the play mode. Scrub the play head to play any part of the clip. Timestamps and simple graphic of the waveform is shown for better clarity inside the editor.

:scissors: Trim audio: A user friendly GUI in the Editor to trim the ends of an audio clip if in case a part of the clip is not required or is empty.

:heavy_plus_sign: Combine clips: Multiple audio clips can be combined into one using an intuitive user friendly feature in the editor. Simply select clips, rearrange their order with ease and merge them into one.

:gear: Equalize tracks: Mastering audio clips involves equalization of clips which can easily be done within the editor itself. Simply select the clip, adjust gain, pitch and frequency band sliders. A 6 band equalization is offered in the editor.

Editor Script: The Editor Script displays all the options neatly in one panel. The editor has an in-built preview audio player. Simple design for trimming, combining and equalizing or mastering audio tracks.

EDITOR
Keeping it all in the editor: Keeping all assets in one workspace inside the Editor and having to switch to fewer services can have several benefits, such as:

  • Improved Efficiency: When all assets are located in one workspace, it becomes easier to access and manage them. Users do not have to spend time switching between different services or applications, which can be time-consuming and lead to a loss of productivity.

  • Streamlined Workflow: Having all assets in one workspace can help create a more streamlined workflow. This is because users can easily move between different assets, such as code files, images, and documents, without having to navigate between different services. This can help to speed up the development process and make it more efficient.

  • Reduced Complexity: Using fewer services can help to reduce the complexity of the development process.

In the pack, you will find a demo scene and an editor window which help you to access the TTS models. There are other useful audio settings like trimming, combining and mastering the audio track that can be accessed through the DeepVoice Editor Window.

DEPENDENCIES
This tool requires the Editor Coroutines package from the package manager and an active internet connection.

LIMITATIONS
Since this tool is still under development, there are a few limitations:

  • For now, the text that can be processed is set to a limit of 200 characters or 30 to 50 words or 5 to 6 sentences or one paragraph.
  • There are around 80 voices to choose from, out of which Mono/Multi have 15. We are working on adding more.
  • Audio generation time is ~8-15 seconds per clip. This may increase with an increased number of tokens and user base.
  • Character count per fortnight is limited to 15000. Per month, this translates to a limitation of 30000 characters.

Please check out the documentation for an in-depth explanation and working of the asset. If you have any questions, suggestions, inquiries for private servers or would like to share your thoughts, please send us an email at info@aikodex.com

1 Like

What about copyright? Can this product be used commercially? For example, Steam may remove the game due to AI content!

2 Likes

Hi guys,

Im having issues, I bought this asset today and it seemed to work for the first hour. I have only used 3882 characters so far, but I have had some fun.

Now I settled down to do some work with the asset and it refuses to generate the voices anymore. My invoice number has been accepted and I hit save, when I select any voice and hit generate, it starts then stops immediately with no new voice beiung generated. Im using different voices so Im not getting filename issues.

I get this error:

“There was an error in generating the voice. Please check your invoice/order number and try again or check the documentation for more information.”

There’s nothing in the docs regarding this particular error. I’m using small sentences in fact I have 125 characters left for the message which fails (they all fail atm actually)…

Any help would be gratefully received

thanks,

P.

Thank you for reaching out to us. We received your email and have offered solutions that should hopefully resolve these issues. We have also updated the back end to process requests with special formatting.

For developers browsing this forum thread:
For a new line please use \n instead of the Enter key.
To use quotes, please use " instead of ".

We will be bringing out an update which allows the user to use special formatting without the above exceptions.

Thank you.

Yes, this product can be used commercially. All the voices offered in the asset are in the open public domain or are based on fictitious characters. We’ve included a terms of use and service within the asset for the developers perusal. The asset has models such as Neural and Standard based on Text to Speech software by Amazon that is thoroughly licensed and has been on the market since November 2016 and includes 50 voices across with many different accents. Steam states that the legal ownership of AI-generated art is unclear. The use of AI generated voices on the other hand have been widespread, some of which were licensed a decades ago (Google Text to Speech - 13 November 2013).

1 Like

I can confirm within an hour the team had resolved my issue and I was back developing new lines of speech for my game.

Thanks AiKodex for an excellent asset and excellent service! Great job!

2 Likes

Hello! Really interested in this app, but I wanted to ask some questions to see if this might work with my current project. Without going into details, I am building iOS/Android app experience where it would be cool to have AI generated voices narrate random moments that occur within the game. So my questions are as follows:

  • Will this run on a phone app?
  • If it can run on an app, would each instance running (say 100 users playing) eat up the 30,000 character limit? In other words would this be scalable to multiple instances.

Thanks in advance, and major kudos! This is a really cool asset!

Will other language models be added in the future? Such as Chinese, Japanese, etc.

Hello yung_beezy93,

Yes, the application can run on a mobile app. Since it is server-based, the mobile devices will have the same generation time as on a PC. The app can send out requests and download files, and it can use the Application.persistentDataPath to write the bytes.
Regarding the character limit, it is difficult to calculate how many players will generate how many characters. However, to give you an idea, 30,000 characters can fill approximately 10 pages of a document with 12 point Calibri font. If the sole functionality of the game is to generate voices, then 30,000 characters may not be sufficient for a large number of users.
If you have 100 users playing simultaneously and each instance consumes characters for voice generation, it is possible that the 30,000 character limit could be reached quickly. This may impact the scalability of the application for multiple instances.
It is worth noting that there is an option to purchase another license of the asset to avail a new invoice number and double the character quota. However, it may not be the most economical solution.
Nevertheless, the developers behind the asset are working on finding ways to increase the character quota, so there may be future improvements to address scalability concerns.
Thank you for your interest in the app, and I’m glad you find it cool! If you have any more questions, feel free to ask.

1 Like

Hello tt10977,
As of now, Chinese and Japanese is not supported by the asset. We would like to support them in the future, but as of now, we are uncertain if we will be able to do so.

Any knowledge about a subscription plan in the future which could occur by the company which created this AI voices? I assume, that they might not keep it free forever… that is my only concern.

We are in the process of developing an offline AI voice model using Barracuda instances with a python framework in Unity. Using ONNX open neural network exchange, conversion of these NN models is possible. We already offer Ai.Fy which offers two offline super-resolution AI models. We hope to bring this into DeepVoice as well - at least for a few voices initially if not all.

2 Likes

How long does it take for Unity to assign an invoice number?

EDIT: It took 2.5 hours to populate.

This asset is absolutely amazing - perfect for our project!

I think this asset definitely needs some sort of packs to buy more characters though. I have already found myself having to re-do my inputs trying to get the perfect emotion that I find myself quickly burning through my character allotment even though it is just for one generation.

I also think the character limit needs to be much higher. I have so little characters left when using the tagging feature that it becomes quite a nuisance because I can only get one sentence done. If I break it up into smaller submits, I lose the tone and the flow of the dialogue and I also burn through more characters having to re-do my same tags each submit. I think the best way to go about doing this is to make tags free of the character limit such as he said nervously to know where they are in the sentence. It would be even greater to have an option to automatically omit these tags in the final generation if a toggle is on - though this may be asking to much programatically.

Either way, absolutely amazing asset especially at the current cost!

1 Like

Hello Ghosthowl,

Thank you for your positive feedback!

We understand that there may be a character crunch. Our margins are slim when we talk about the server support in terms of longevity. However, we have seen an exponential growth in the customer base in these short days. Feedback and reviews motivate us a lot to do better, and we’d be grateful if you could write a few words on the store.

As for increasing the quota, we will actively work on expanding our offerings. We acknowledge the inconvenience it may cause when utilizing the tagging feature and will explore options to either increase the limit or exempt tags from counting towards it.

Your valuable feedback contributes to our ongoing efforts to enhance the asset’s functionality and user experience. We appreciate your support and encourage you to share any further suggestions or questions you may have :slight_smile:

1 Like

i’m thinking about buying this Amazing plugin, but i need to know if it actually do runtime text to speech ? if yes what’s the delay of response ?

1 Like

Hello sael-you,

Yes, you can perform Text to Speech during runtime. We offer a demo scene that performs runtime generations. The delay in generations is ~5 to 10 seconds depending upon the number of characters. Hope this answers your question :slight_smile:

1 Like

[Announcement]

Automatic Quota Reset to 15,000 characters.

15,000 characters allotted for the period 15-07-2023 to 31-07-2023.

Add japanese please

[Announcement]

We will have a short server maintenance check on UTC 17:00 to UTC 17:30. The servers are expected to return to normal functionality on UTC 17:30.

Your patience is appreciated.