RT-Voice - Run-time text-to-speech solution

Thank you, please let me know!

Hi, I’m working on a mobile VR project that requires multi-language text-to-speech, so I’m thinking to have a go with RT-Voice.

It’s my first project using TTS, so sorry for my trivial question: how can I include the languages I need? Can I choose them from a preset list in Unity or should I only rely on device’s installed TTS languages?

In the latter case, could I then turn to MaryTTS system to select a set of languages?

Just to clarify, I should need english, italian, german, french, spanish and chinese.

Thanks!

Hi

Thank you for your interest in RT-Voice!

Here are the answers to your questions:

  • RTV uses the underlying TTS system and the installed languages. Apple and Google TTS support all your desired languages per default.
  • Yes, you could use MaryTTS. Afaik, there isn’t a voice for Spanish and Chinese, but you can probably find one or create it on your own :wink:

I hope this gets you further!

Cheers
Stefan

Hi Stefan! Windows allows users to specify TTS playback rate in the control panel. Is there any way in RTVoice to use that rate when calling Speaker.SpeakNative, or is there some way to query RTVoice as to what the system’s default playback rate is? Thanks in advance!

Hi

You can change the rate of the speech - there is a parameter called “rate” to accomplish that.
For more, please see:
https://www.crosstales.com/media/data/assets/rtvoice/api/class_crosstales_1_1_r_t_voice_1_1_speaker.html#a7083a472c301a672038851f9e64df0a4

Cheers
Stefan

I understand that I can change the speech rate, and that I can provide an interface for my users so that they can change it themselves. What I’m asking specifically is this: is there a way for my program to know that voice and what rate my user has selected in Windows, so that when they load my program it is already using the voice and rate that they prefer?

Ok, I see.
No, that’s currently not possible, but I will add it in the next release. :wink:

So long,
Stefan

Thanks. :smile:

Hi,

I have a problem : when I test tts from the Inspector (the “Preview Voice” button) everything works, but I have no audio when I run my application from Unity (on Windows).

I am using RTVoice version 2.7.2

Here are the settings of the RTVoice node : (split into 2 screenshots) :

I am using RT-Voice with Vuforia. My ARCamera has an audio listener :

Here is my scene hierarchy :

Also, I see the following error in the Console :

ArgumentNullException: Argument cannot be null.
Parameter name: key
System.Collections.Generic.Dictionary`2[System.String,UnityEngine.AudioSource].ContainsKey (System.String key) (at /Users/builduser/buildslave/mono/build/mcs/class/corlib/System.Collections.Generic/Dictionary.cs:458)
Crosstales.RTVoice.Speaker.Silence (System.String uid)
Crosstales.RTVoice.Tool.TextFileSpeaker.Silence ()
Crosstales.RTVoice.Tool.TextFileSpeaker.Speak ()
Crosstales.RTVoice.Tool.TextFileSpeaker.Start ()

Hi , i just bought RT Voice , but i dont succeed to use ssml on my iphone.

Does it work on ios ?

Thanks

Hi Jean

SSML works only on Windows standalone or together with MaryTTS.

Cheers
Stefan

Thanks for the answer ,
Is it any plan to add this functionnality later (or Emotion ml)?

On iphone i m able to download more voice package (in accessibility menu) i succeed to use the downloaded voice but not the siri voice, is it possible to use siri voice ?

Unfortunately, iOS doesn’t support SSML, so we can’t do much against it.
But MaryTTS should be a solution - the only downside is imho the desired Internet connection.

Regarding “Siri” - this is also not possible since Apple hasn’t integrated it into their speech synthesizer :frowning:

I m trying to read list of sentence , i try to use event for wait end of sentence . it s working well on my laptop (macbook) but when i try on iphone there is no waiting time.

i try to use same concept than in your dialog scene (who work great on my phone) but it s not working here my code :

using System.Collections;
using System;
using System.Collections.Generic;
using UnityEngine;
using Crosstales.RTVoice;
using Crosstales.RTVoice.Model;
using UnityEngine.UI;

public class TalkOnClick : MonoBehaviour
{

    public string text;
    
 
    private Book theBook;
    private Dictionary<string,Crosstales.RTVoice.Model.Voice> currentVoices = new Dictionary<string, Crosstales.RTVoice.Model.Voice>();
    public bool IsSpeaking=false;
    private string uidSpeaker;

    private TranslationManager translationManager = new TranslationManager();

    private void Start()
    {
        // Subscribe event listeners
        Speaker.OnSpeakStart += speakStartMethod;
        Speaker.OnSpeakComplete += speakCompleteMethod;

        try
        {
            Debug.Log("start");
            TextAsset textFile = Resources.Load<TextAsset>("books");
            theBook = JsonUtility.FromJson<Book>(textFile.text);
        }
        catch (System.Exception ex)
        {
            Debug.Log(ex.Message);
        }
    
    }

    void OnDestroy()
    {
        // Unsubscribe event listeners
        Speaker.OnSpeakStart -= speakStartMethod;
        Speaker.OnSpeakComplete -= speakCompleteMethod;
    }

    private void speakStartMethod(Wrapper wrapper)
    {
        if (wrapper.Uid.Equals(uidSpeaker))
        {
            Debug.Log("speakStartMethod - Speaker : " + wrapper);
            IsSpeaking = true;
        }
     
    }

    private void speakCompleteMethod(Wrapper wrapper)
    {
        if (wrapper.Uid.Equals(uidSpeaker))
        {
            Debug.Log("speakCompleteMethod - Speaker : " + wrapper);
            IsSpeaking = false;
        }
    }

    private bool startReading = false;

    void Update()
    {
        if (startReading)
            return;
     
        if (Input.GetMouseButtonDown(0))
        {
            startReading = true;
            StartCoroutine(ReadBook());
        }
    }

   
  
    IEnumerator ReadBook()
{
        yield return SayAndWait(theBook.name,theBook.culture,0.5f);
 
        foreach (var sentence in theBook.sentences)
        {
              yield return SayAndWait(sentence, theBook.culture,0.5f);
        }

        startReading = false;
       
}


    private IEnumerator SayAndWait(string sentence,string culture="EN", float waitSeconds=0)
    {
        Say(sentence,culture);

        //wait until ready
        do
        {
            yield return null;
        } while (!IsSpeaking && startReading);

        //wait until played
        do
        {
            yield return null;
        } while (IsSpeaking && startReading);

        yield return new WaitForSeconds(waitSeconds);

    }

    private void Say(string sentence,string culture="EN") {

        if (string.IsNullOrEmpty(sentence))
            return;
      
        if (string.IsNullOrEmpty(culture))
            culture = "EN";
       
            currentVoices.Add(culture, Speaker.VoiceForCulture(culture,index));
        }
     
        Debug.Log("[" + culture + "]" + "Say :" +sentence );
        uidSpeaker =      Speaker.Speak(sentence, null, currentVoices[culture]);
    }
}

did i missed something ?

thanks for your support !

EDIT : i just noticed that i have this error :
[TTS] _BeginSpeaking: couldn’t begin playback

Please make sure you wait for the event “OnVoicesReady” before starting the first speech.
That should solve the problem.

hi , i create on new standalone version of my script who include the OnVoicesReady event but i still have the same problem.

Can you try it ?

  • start new project
  • switch to iphone platform
  • import RT Voice
  • add RT Voice prefab
  • add this script to RT Voice instance
  • start and touch the screen for launch the reading

here the script

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Crosstales.RTVoice;
using Crosstales.RTVoice.Model;

public class TalkOnClick : MonoBehaviour
{
    private Dictionary<string,Crosstales.RTVoice.Model.Voice> currentVoices = new Dictionary<string, Crosstales.RTVoice.Model.Voice>();
    public bool IsSpeaking = false;
    public bool voiceReady = false;

    private string uidSpeaker;
    private void Start()
    {
        Speaker.OnVoicesReady += onVoicesReady;
        // Subscribe event listeners
        Speaker.OnSpeakStart += speakStartMethod;
        Speaker.OnSpeakComplete += speakCompleteMethod;
    }

    private void onVoicesReady()
    {
        voiceReady = true;

        Debug.Log("voiceReady");
    }

    void OnDestroy()
    {
        // Unsubscribe event listeners
        Speaker.OnSpeakStart -= speakStartMethod;
        Speaker.OnSpeakComplete -= speakCompleteMethod;
        Speaker.OnVoicesReady -= onVoicesReady;

    }
    private void speakStartMethod(Wrapper wrapper)
    {
        if (wrapper.Uid.Equals(uidSpeaker))
        {
            Debug.Log("speakStartMethod - Speaker : " + wrapper);
            IsSpeaking = true;
        }
      
    }

    private void speakCompleteMethod(Wrapper wrapper)
    {
        if (wrapper.Uid.Equals(uidSpeaker))
        {
            Debug.Log("speakCompleteMethod - Speaker : " + wrapper);
            IsSpeaking = false;
        }
    }

    private bool startReading = false;

    void Update()
    {
        if (!voiceReady || startReading)
            return;
      
        if (Input.GetMouseButtonDown(0))
        {
            startReading = true;

            StartCoroutine(ReadBook());
        }


        for (int i = 0; i < Input.touchCount; ++i)
        {
            startReading = true;
            if (Input.GetTouch(i).phase == TouchPhase.Began)
                StartCoroutine(ReadBook());
           
        }
    }

    
   
    IEnumerator ReadBook()
    {
        Debug.Log("talk !!");
    
            for (int i = 0; i < 10; i++)
            {
                
            yield return SayAndWait(string.Format("this is the sentence {0}",i+1));
            }
        startReading = false; 
    }


    private IEnumerator SayAndWait(string sentence,string culture="EN", float waitSeconds=0)
    {
        Say(sentence,culture);

        //wait until ready
        do
        {
            yield return null;
        } while (!IsSpeaking && startReading);

        //wait until played
        do
        {
            yield return null;
        } while (IsSpeaking && startReading);
        yield return new WaitForSeconds(waitSeconds);

    }

    private void Say(string sentence,string culture="EN") {
        if (string.IsNullOrEmpty(sentence))
            return;
       
        if (string.IsNullOrEmpty(culture))
            culture = "EN";
       
        if (!currentVoices.ContainsKey(culture))
            currentVoices.Add(culture, Speaker.VoiceForCulture(culture));
       
        Debug.Log("[" + culture + "]" + "Say :" +sentence );
        uidSpeaker =      Speaker.Speak(sentence, null, currentVoices[culture]);
    }
}

Thanks

Hi

I’m currently on a business trip and don’t have an iOS-device available.
However, I found some potential problems and fixed them in your script (see #CT).
Can you please try it again and tell me if it helps?

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Crosstales.RTVoice;
using Crosstales.RTVoice.Model;

public class TalkOnClick : MonoBehaviour
{
    private Dictionary<string, Voice> currentVoices = new Dictionary<string, Voice>();
    private bool IsSpeaking = false;
    private bool voiceReady = false;

    private string uidSpeaker;
    private bool startReading = false;

    public void Start()
    {
        Speaker.OnVoicesReady += onVoicesReady;
        // Subscribe event listeners
        Speaker.OnSpeakStart += speakStartMethod;
        Speaker.OnSpeakComplete += speakCompleteMethod;
    }

    public void OnDestroy()
    {
        // Unsubscribe event listeners
        Speaker.OnSpeakStart -= speakStartMethod;
        Speaker.OnSpeakComplete -= speakCompleteMethod;
        Speaker.OnVoicesReady -= onVoicesReady;

    }

    public void Update()
    {
        if (!voiceReady || startReading)
            return;

#if UNITY_STANDALONE || UNITY_EDITOR //#CT: added to prevent double speak
        if (Input.GetMouseButtonDown(0))
        {
            startReading = true;

            StartCoroutine(ReadBook());
        }
#else

        for (int i = 0; i < Input.touchCount; ++i)
        {
            if (Input.GetTouch(i).phase == TouchPhase.Began)
            {
                StartCoroutine(ReadBook());
                startReading = true;
                break; //#CT: multiple calls prevented
            }
        }
#endif
    }

    private void onVoicesReady()
    {
        voiceReady = true;

        Debug.Log("voiceReady");
    }

    private void speakStartMethod(Wrapper wrapper)
    {
        if (wrapper.Uid.Equals(uidSpeaker))
        {
            Debug.Log("speakStartMethod - Speaker : " + wrapper);
            IsSpeaking = true;
        }

    }

    private void speakCompleteMethod(Wrapper wrapper)
    {
        if (wrapper.Uid.Equals(uidSpeaker))
        {
            Debug.Log("speakCompleteMethod - Speaker : " + wrapper);
            IsSpeaking = false;
        }
    }

    private IEnumerator ReadBook()
    {
        Debug.Log("talk !!");

        for (int i = 0; i < 10; i++)
        {

            yield return SayAndWait(string.Format("this is the sentence {0}", i + 1));
        }
        startReading = false;
    }


    private IEnumerator SayAndWait(string sentence, string culture = "EN", float waitSeconds = 0.1f) //#CT: small delay added
    {
        Say(sentence, culture);

        //wait until ready
        do
        {
            yield return null;
        } while (!IsSpeaking && startReading);

        //wait until played
        do
        {
            yield return null;
        } while (IsSpeaking && startReading);

        yield return new WaitForSeconds(waitSeconds);
    }

    private void Say(string sentence, string culture = "EN")
    {
        if (string.IsNullOrEmpty(sentence))
            return;

        if (string.IsNullOrEmpty(culture))
            culture = "EN";

        if (!currentVoices.ContainsKey(culture))
            currentVoices.Add(culture, Speaker.VoiceForCulture(culture));

        Debug.Log("[" + culture + "]" + "Say :" + sentence);
        //uidSpeaker = Speaker.Speak(sentence, null, currentVoices[culture]);
        uidSpeaker = Speaker.SpeakNative(sentence, currentVoices[culture]); //#CT: iOS only supports native
    }
}

Thank you!

So long,
Stefan

Yes it s working :slight_smile: .

Thanks a lot for your support !!!

Another question , I plan to buy lipsync asset (like salsa or lipsync pro), which one is easier to integrate and have best support with RT Voice ?

You’re welcome!

I suggest SALSA - it offers a great integration and support.

So long,
Stefan

Thanks for the advice , i buy it .