I needed speech synthesis for a recent project. I started out using Watson’s text-to-speech service, but in less than a week I hit the limit of their free tier (10,000 characters). Since I’m on a Mac, I decided to try Apple’s speech instead, and I love it. The voice quality is at least as good, if not better; the performance is great, and it’s free.
Here’s the code:
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Events;
public class AppleSpeechSynth : MonoBehaviour {
public string voice = "Samantha";
public int outputChannel = 48;
public UnityEvent onStartedSpeaking;
public UnityEvent onStoppedSpeaking;
System.Diagnostics.Process speechProcess;
bool wasSpeaking;
void Update() {
bool isSpeaking = (speechProcess != null && !speechProcess.HasExited);
if (isSpeaking != wasSpeaking) {
if (isSpeaking) onStartedSpeaking.Invoke();
else onStoppedSpeaking.Invoke();
wasSpeaking = isSpeaking;
}
}
public void Speak(string text) {
string cmdArgs = string.Format("-a {2} -v {0} \"{1}\"", voice, text.Replace("\"", ","), outputChannel);
speechProcess = System.Diagnostics.Process.Start("/usr/bin/say", cmdArgs);
}
}
Just call the Speak method, and bask in the sultry (or manly, as you prefer) sounds of speech.
Note that I needed the outputChannel parameter in order to redirect the output (through SoundFlower) to QuickTime when recording this demo video. That was a PITA, because then I couldn’t hear it while recording… but anyway, if it you have any trouble hearing the speech, do a say -a ‘?’ on the command line, and check that the output channel you have selected is the correct number for “Built-in Output”.
hello! may i ask how u went about using Watson’s service? I was trying out their speech-to-text service, however, i had errors in compiling the script (error under the inspector section).
I tried everything I could, not possible using the native macOS binaries. Also note that for the TTS feature - the say command - there is some legal restrictions…
Can someone explain what this code is doing exactly?
I need to implement voice recognition in my (iOS) game and I feel like this actually works but I don’t know what it’s doing. I’ve placed in into my game but I also don’t know if it’s working. Am I supposed to have downloaded something else or this should work no matter what?
It’s just invoking the /usr/bin/say command (a built-in command-line app on macOS) via the shell. This is speech synthesis, not voice recognition. I wouldn’t expect it to work on iOS.
I like this command argument for Say and I wanted to make a second one so I could poll the local MacOS and get a list of all available SpeechSynthesis voices
I found a command that runs like this in bash terminal and tried to work it into the same format as your say command above but I couldn’t figure it out all the way because I wasn’t sure if I needed a path declaration like you had at first
This is the command I want to process as a bash/terminal argument out of Unity C#
ls /System/Library/Speech/Voices | sed 's/.SpeechVoice$//'
I tried setting up
public void ListAvailable() {
string cmdArgs = "ls /System/Library/Speech/Voices | sed 's/.SpeechVoice$//'";
speechProcess = System.Diagnostics.Process.Start(cmdArgs);
}
Wasn’t sure if I -needed- to add the first part of the other command format from your original Speak function where it says “/usr/bin/say”, or if I could just pass one string as a command argument
I also don’t know how to capture the console’s response to that, I know it would return text but I am no expert on talking to bash indirectly.
Meanwhile I have another question:
Is it possible to access some kind of phoneme or viseme stream on the Mac side of SpeechSynthesis or another library for timing of mouth poses on an Avatar? Obviously the speak function engages Synthesis. Not sure if there’s kind of timing system exposed, I can see where people have set the speed parameter on the speech on Mac Side so I guess there’s something back there
Discussion threads on this issue are surprisingly scant given the amount of time all of these systems have been coexisting. Thanks so much for any expertise you can offer.
@ippdev and I are trying to crack this nut so we can get universal speech support
Hmm, you’re trying to execute a compound command — where output of one command is piped into another. I’m not certain that System.Diagnostics.Process.Start can do that.
An alternative would be to use just the “ls” command (which is actually “/bin/ls”) as the first argument to Process.Start, with “/System/Library/Speech/Voices” as the cmdArgs (second argument).
But then you will need to process the returned text. That is a little tricky, but it is doable; Process.Start returns a Process object, which has a StandardOutput stream you can read from. See these answers for some examples. Once your code is reading the results of the ls command, you can search it yourself for SpeechVoice entries.
Or, better yet: why are we going to all this work to run ls in a shell to get a list of files the hard way? C# has built-in methods to get files in a directory. Just use one of those instead.