Subtitle System for SFX

Hello all, I want to work on this system. Now some questions,
Subtitles are only needed for SFX or for talking dialogue also?

Hey @Harsh-NJ , The dialogue is already written, and the sound FX for dialogue is a blabber so I think that we don’t need to add subtitles for that. What do you think ?

Do you think that when creating this system we can use the Localization and TMPro packages ?

I saw the card on roadmap, so I created this thread, as the final decision will be of the community of course.

Yes, we can use TMPro (for a bit styling), but I see no use of localization package as
Chooo Chooo!
In English is same as
Chooo Cooo!
In Spanish, german, Hindi, and all other languages.

Except if they use different letters - for example in Greece, Russia, Arabic countries or China

1 Like

Many onomatopoeia differ between languages : Example Yummy in English is Miam in French, Woof-woof is ouaf-ouaf

Also some different letter may be needed for certain languages like @Smurjo said.

Hey Harsh, thanks for opening the thread. I have added more info on the card, I’ll replicate it here too:

Hope this clears up what the idea behind the task is. In any case, keep in mind it’s a lower priority than other things we’re doing at the moment.

We don’t say “choo choo” in Italian! :smile: We say “ciuf ciuf” (unmistakable proof)

I have been working on this task on and off over the past couple of months and I'd like to share some thoughts and possible insights I've had while developing it. My work focused mostly on the system itself rather than the UI, so I did not explore any use of TMPro.

  • We should absolutely use LocalizedString for caption descriptions.
  • We should do our best to match the goals of Closed Captioning as described here. (Note: These are US guidelines. The only other other list of guidelines I found were EU, but most of the differences were in regard to formatting rather than general goals.) These goals include:

  • Accuracy. Describe as much of the sound stage as possible at any given moment.

  • Synchronous. Ensure that descriptions occur at same time as sound.

  • Complete. Describe sounds consistently throughout use of application.

  • Placement. Place the captions so that they do not block important visual elements of the game.

  • We should use LocalizedAsset to switch between caption formatting for different regions.

  • The balance between accuracy and placement will be hardest goals to manage. There will likely be several sounds playing at once and we will need to decide programmatically which sounds are most important and show only those we can fit in UI. Which leads me into what I consider the hardest programming challenge of this task...

  • Unity does not have an API to retrieve a list of AudioSources sorted by audibility (except in the Profiler). So, we will need to programmatically figure out which audio sources are most audible to the listener at any given moment. This is difficult because a sound goes through several transformations in Unity before it gets outputted. It has a base average AudioClip loudness, AudioSource volume, spatialization, rolloff, and AudioMixerGroup attenuation.

My approach was to calculate clip RMS at AssetPostProcess-time and save them in a SO database, because RMS calculations are very expensive and in our case the clip loudness should not change at runtime. The database would also have an editable field for the LocalizedString description. Then as AudioSources play, you lookup the clip for the base average loudness and run it through all the runtime calculations like AudioSource volume, rolloff, etc. early out if it ever touches 0. Then put each AudioSource into a list, sort by our calculated audibility, lookup the clip once more for the description, and voila you have the most important captions!

In my opinion, this is necessary to address the accuracy goal of captions. But I understand if we want to avoid all this in favor of a rougher approximation based on some sort of priority. Anyway, I'd be happy to be of any help with this task.

Hey @drod7425 , Thanks for your explanation. But how do will be make the SFX indicating Sprite appear on screen such that it points to where the sound is coming in the 3D world (position of the AudioEmitter)?

Hey @Harsh-NJ ! Ciro left you a hint in the card info :wink: Just google “offscreen indicator unity”. There are many tutorials. Here’s just one that I found:

I don’t know if that particular tutorial is the best approach, but most are going to boil down to converting the object’s world position to screen space via Camera.WorldToScreenPoint and doing the math to figure out if/where an indicator is necessary.

I am making my own implemantation and i had problem. I made a system which cooperate with a AudioCueEventChannel but it seems like Finish event is never called. I thought that Finish event is called when a audio finished playing. It is a system mistake or i am missing meaning of that event.
PS: Sorry if my question is stupid but i am new at this open project stuff and i am still figuring out what to do :smile:

I just picked up this on Friday too :smile: Made some prototyping and have a few things to discuss.
First of all, I split this task into two parts - displaying the text on the screen and the offscreen indicator.

  1. Should we display SFX which are in the loop?
  2. What to do if one object has two sounds like campfire (BoilingWater & Campfire)? Maybe display them together as multiline?
  3. Ciro suggested “script that is in charge of playing the sound will notify the ClosedCaptioningSystem and pass the necessary parameters”. Won’t be better to register for ‘AudioCueEventChannelSO’ in the ClosedCaptioningSystem and separate completely from AudioManager?

Game preview:


In my configuration campfire (loop) does not display, because I didn’t know if we would like to have it on the screen all the time. The bard has two sing sounds - long and short, so you can observer different states - ‘La en’ & ‘Lalala en’.

Implementation preview:
Onomatopoeia prefab to instantiate in the sound location:

**using System.Collections;
using TMPro;
using UnityEngine;

namespace Assets.Scripts.Audio
public class ClosedCaptioningSystem : MonoBehaviour
public GameObject OnomatopoeiaPrefab;

    public void VisualiseAudioClip(Onomatopoeia onomatopoeia, Vector3 position = default)
        var newOnomatopoeia = Instantiate(OnomatopoeiaPrefab, position, Quaternion.identity);
        var onomatopoeiaTextComponent = newOnomatopoeia.GetComponentInChildren<TextMeshPro>();

        if (!string.IsNullOrEmpty(onomatopoeia.SoundText.TableReference))
            onomatopoeiaTextComponent.text = onomatopoeia.SoundText.GetLocalizedString();
        StartCoroutine(DestroyNewOnomatopoeia(newOnomatopoeia, onomatopoeia.Duration));

    IEnumerator DestroyNewOnomatopoeia(GameObject newOnomatopoeia, float duration)
        yield return new WaitForSeconds(duration);


Lines added in the AudioManager:

What do you think? I will be adding object pooling and refactoring a bit later, but first wanted to ask about the approach.

Something more interesting than mine idea. I tried to do the same, on the Canvas UI, a pointer pointing towards the audio direction (relative to the center of the screen), and you did it in the 3D space.

And about support for sprites?

Anyway, great work.

I think it would better because we made system more modular. That was my thoughts when i was creating my script.

It would be best and simplest solution.

And by the way thank you. Your post help me to figure out some stuff. Thank you a lot <3 Great Work:smile:

@Harsh-NJ I think that canvas will be required for the ‘off-screen’ sounds (when the sound it behind you for example). I didn’t look into this part though. First part shows the text in 3d world and second will have to notify the player that some sound exists outside of the screen.

Sprite support can be easily added as the image reference to the Onomatopoeia object, next to the Duration and Sound text. However I wasn’t thinking about adding it. It should be easy to extend it later.

@GoodBoySK Agree with you, it will be more modular in the separate listener class. Good to hear that my post was useful for you! :slight_smile:

I am planning to refactor the code a little bit and add object pooling. Then will share with you guys. Hopefully will have some time soon.

Update: I’ve moved the code into the separate class listener and added object pooling.

Here is the link to my repo in case anyone would like to see how is it done so far:

I added an option to display AudioCueSO with many AudioClips attached. Previously it was displaying one on top of another. Initially I thought about just concatenating them, but then realised that two AudoClips can have different duration. Eventually I’ve just added simple shift in the position of each caption.

This is just an example to easily present the case (campfire has two sounds in the loop and they won’t be displayed - at least for now):

Also settings for the caption added to the Language section in settings:

I added the off-screen indicator to this solution. All scripts were taken from the following video (big thanks to the author of this tutorial!):

I modified them for our purpose and it looks quite cool. There is an arrow pointing to the each caption which is out of the camera view. Once we look at the caption, the arrow will disappear. I hope someone from the community would like to modify the visual aspect of the caption and the arrow indicator. I took the image from the solution just to have something to visualise it. Maybe the sound icon attached to the arrow would do? A few screens below:



Brief description of the off-screen solution:
The Caption prefab has OffscreenTargetIndicator child object (with the arrow image and TargetIndicator script).
Once the Caption is displayed I also add it to the captionEmmiters list. It keeps all the captions currently displayed in the game. This list is needed for the Update method from the ClosedCaptioningManger which takes care of the off-screen indicators (show/hide them and update the position/rotation). When the caption duration ends and the script is about to return the caption object to the pool, this object is also removed from the captionEmmiters list. Update method doesn't care about it anymore. Also the arrow disappear because it is the children object of the caption prefab (returned to the pool already).

I am open to discuss this solution. Link to the git is below. What do you think guys?


Here is the video of the functionality:


If anyone is interested, I am posting the pull request below:

1 Like

Oh wow, great implementation! It’s exactly as I imagined it. Obviously the text and the markers can be made more readable, but the functionality is there.

Thanks! I’ll take a look at the PR as soon as we can. Sorry if it won’t be immediately, it’s a feature that was a bit lower on the priority list.

1 Like