Easy speech synthesis on a Mac

I needed speech synthesis for a recent project. I started out using Watson’s text-to-speech service, but in less than a week I hit the limit of their free tier (10,000 characters). Since I’m on a Mac, I decided to try Apple’s speech instead, and I love it. The voice quality is at least as good, if not better; the performance is great, and it’s free.

Here’s the code:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Events;

public class AppleSpeechSynth : MonoBehaviour {
      
    public string voice = "Samantha";
    public int outputChannel = 48;
  
    public UnityEvent onStartedSpeaking;
    public UnityEvent onStoppedSpeaking;
  
    System.Diagnostics.Process speechProcess;
    bool wasSpeaking;
  
    void Update() {
        bool isSpeaking = (speechProcess != null && !speechProcess.HasExited);
        if (isSpeaking != wasSpeaking) {
            if (isSpeaking) onStartedSpeaking.Invoke();
            else onStoppedSpeaking.Invoke();
            wasSpeaking = isSpeaking;
        }
    }
  
    public void Speak(string text) {
        string cmdArgs = string.Format("-a {2} -v {0} \"{1}\"", voice, text.Replace("\"", ","), outputChannel);
        speechProcess = System.Diagnostics.Process.Start("/usr/bin/say", cmdArgs);      
    }

}

Just call the Speak method, and bask in the sultry (or manly, as you prefer) sounds of speech.

Note that I needed the outputChannel parameter in order to redirect the output (through SoundFlower) to QuickTime when recording this demo video. That was a PITA, because then I couldn’t hear it while recording… but anyway, if it you have any trouble hearing the speech, do a say -a ‘?’ on the command line, and check that the output channel you have selected is the correct number for “Built-in Output”.

4 Likes

Hi Joe, I want to implement this script in my unity project. How can I do that? should I attach it with my main camera or game object?

It’s just a MonoBehaviour. Attach it to whatever you want.

hello! may i ask how u went about using Watson’s service? I was trying out their speech-to-text service, however, i had errors in compiling the script (error under the inspector section).

Hi,
This is VERY interesting!!!
How do you implement the voice recognition on the mac?
Can you use Apple also for the recognition?

Sorry, I have no idea about that.

I tried everything I could, not possible using the native macOS binaries. Also note that for the TTS feature - the say command - there is some legal restrictions…

Hey! I’m a beginning coder and I need some text to speech in my (mac based) project. Can you explain what the two UnityEvents:

    public UnityEvent onStartedSpeaking;
    public UnityEvent onStoppedSpeaking;

do in the script? Do I need to use them in some way, or just call the Speak method?

Thank you so much for this code, btw, it’s exactly what I needed.

Those simply provide events other code can use if they need them. If you don’t need them, you don’t need them.

If you’re not familiar with Unity events and all the cool ways they let you decouple your code, check out this tutorial (old but still applies today).

1 Like

Thanks! I’m not familiar with Unity events so I appreciate the link to the tutorial.

1 Like

Can someone explain what this code is doing exactly?
I need to implement voice recognition in my (iOS) game and I feel like this actually works but I don’t know what it’s doing. I’ve placed in into my game but I also don’t know if it’s working. Am I supposed to have downloaded something else or this should work no matter what?

It’s just invoking the /usr/bin/say command (a built-in command-line app on macOS) via the shell. This is speech synthesis, not voice recognition. I wouldn’t expect it to work on iOS.

Thank you. This definitely helped me save some time

Great stuff, thank you. Missed hearing Fred’s voice

1 Like

I like this command argument for Say and I wanted to make a second one so I could poll the local MacOS and get a list of all available SpeechSynthesis voices

I found a command that runs like this in bash terminal and tried to work it into the same format as your say command above but I couldn’t figure it out all the way because I wasn’t sure if I needed a path declaration like you had at first

This is the command I want to process as a bash/terminal argument out of Unity C#

ls /System/Library/Speech/Voices | sed 's/.SpeechVoice$//'

I tried setting up

    public void ListAvailable() {
        string cmdArgs = "ls /System/Library/Speech/Voices | sed 's/.SpeechVoice$//'";
        speechProcess = System.Diagnostics.Process.Start(cmdArgs);     
    }

Wasn’t sure if I -needed- to add the first part of the other command format from your original Speak function where it says “/usr/bin/say”, or if I could just pass one string as a command argument

I also don’t know how to capture the console’s response to that, I know it would return text but I am no expert on talking to bash indirectly.

Meanwhile I have another question:
Is it possible to access some kind of phoneme or viseme stream on the Mac side of SpeechSynthesis or another library for timing of mouth poses on an Avatar? Obviously the speak function engages Synthesis. Not sure if there’s kind of timing system exposed, I can see where people have set the speed parameter on the speech on Mac Side so I guess there’s something back there

Discussion threads on this issue are surprisingly scant given the amount of time all of these systems have been coexisting. Thanks so much for any expertise you can offer.

@ippdev and I are trying to crack this nut so we can get universal speech support

Hmm, you’re trying to execute a compound command — where output of one command is piped into another. I’m not certain that System.Diagnostics.Process.Start can do that.

An alternative would be to use just the “ls” command (which is actually “/bin/ls”) as the first argument to Process.Start, with “/System/Library/Speech/Voices” as the cmdArgs (second argument).

But then you will need to process the returned text. That is a little tricky, but it is doable; Process.Start returns a Process object, which has a StandardOutput stream you can read from. See these answers for some examples. Once your code is reading the results of the ls command, you can search it yourself for SpeechVoice entries.

Or, better yet: why are we going to all this work to run ls in a shell to get a list of files the hard way? C# has built-in methods to get files in a directory. Just use one of those instead.

Hi, is there any way to save the audio generated by the speech synthesis? I’m trying to display the audio generated visually.

Yes. Type “man say” in Terminal for details.