Is this a functional approach to implementing a custom scripting language inside Unity?

I’m working on an RPG, and I’d like to give our writers the ability to arbitrarily execute game logic, so that they can move NPCs, change quest flags, or otherwise manipulate the game state from within dialogues. I’ve come up with and implemented an incredibly bare-bones scripting language which allows this, but I’m literally inventing things as I go, and since this involves an awful lot of token-parsing and string comparison, it’d make me feel a lot better if I could get some feedback on my design. It’s worked perfectly thus-far, but since it’s using C# to directly compare strings instead of making use of a dedicated lexer generator and parser generator, I’m concerned that there are tons of hidden gotchas just waiting to cause problems under the surface.

“Scripting commands” in my system start out as a generic string that gets sent to a scripting interpreter. The interpreter then has three jobs, in order: it parses the string into a space-delimited string[ ], checks if the string represents a well-formed command, and if it does, the interpreter uses a service locator to find the appropriate game APIs and executes its logic as-requested.

Determining if the string is well-formed is a two-step process. The first word in any command must identify what kind of logic it executes: SpawnCharacter, MoveCamera, AddItem, and so forth. I have a CommandType enum containing every acceptable type of command, so I test the first word by trying to convert it from a string to an enum. If it converts cleanly, and the interpreter has logic accompanying that command type, it runs a second check on the remainder of the string which is different for each command type. SpawnCharacter commands, for instance, must have exactly two words following “SpawnCharacter,” and of those two words the first must match a valid character ID, and the second must match a valid spawn point.

If all of the above conditions are met, the string gets handed to some ExecuteSpawnCharacter(string command) method which knows how to turn the string into an ID + spawn point and invoke the spawn manager to actually place the desired guy in the desired location. If any of the above conditions are not met at any time, the entire test fails upwards, and the scripting interpreter rejects the string.

That’s all there is to the entire system. I like that it’s fairly extensible (adding a new command is as simple as adding a new entry to my CommandType enum, and a function to handle that command type to the interpreter), I hate that actually sanitizing incoming strings involves tons and tons of converting strings to enums, and comparing string to string (e.g. if(characterID == “Bob”)). The degree to which this ties the scripting interpreter into the rest of the game’s architecture is also a little worrying, but since I access all game functionality through public interfaces retrieved via a service locator, and not through direct references, even that is technically decoupled.

So am I missing any big problems with this approach? It is so clumsy and frankenstein-y compared to a properly-designed scripting system that uses purpose-built lexers and parsers, but it doesn’t have any performance issues (I can interpret 1,000 commands/tick with no lag, in gameplay it’ll be interpreting 3-5 commands once or twice every few minutes). My instinct is to shrug and say “It’s technically not great, but if it works, it’s modular, and it’s extensible, that means it’s good enough,” but since I have very little formal software training I’m concerned that this is an architectural implosion waiting to happen.

So one of the biggest problems with interpreting logic at runtime like this is that you often generate large amounts garbage memory in the form of all the string parsing. Which results in more frequent GC calls, and as we know in Unity land… the garbage collector is very lacking and slow. This causes frame stutter.

It sounds to me like you have text files for the dialogue scripts, and that your writers are sprinkling these commands in there. (if it’s not a text file, and it’s through a script and the inspector editor… well, there’s completely different ways to deal with what you want).

I would suggest PRE-PARSING the entire script.

Instead of having a text asset, you instead have a custom asset (ScriptableObject). And you write a editor script that parses the text file into the contents of this custom asset.

It breaks the text up into it’s dialogue chunks. And it converts any string commands to special token objects (your enum and a parameter collection for instance) that get serialized into the asset.

This way at runtime you just check if this next block is dialogue, or a command. If it’s a command you just invoke it, the parsing is already done.

If you don’t want to preparse. Well let me suggest that you come up with a less garbage intensive way to do comparisons on the string.

To give you an example, here is a simple arithmetic parser I wrote a long while back that parses a string into a math problem. It supports passing in an object as an optional parameter, and also has multiple math functions:

Note that in it I use a special class I wrote called ‘ReusableStringReader’:

And when I parse I parse one ‘char’ at a time. Since int’s and char’s are value types they don’t generate garbage. I then build up the commands with a StringBuilder (StringBuilder allocates a chunk of memory and sticks chars into, rather than creating a new string with every append, further reducing garbage), and overall keeping my creation of strings to a minimum (note how I parse my numbers digit by digit, rather than creating a string and using float.Parse).

3 Likes

This is one of those topics that inevitably come up for dialogue systems, which is why Dialogue System for Unity includes a LUA scripting environment. I would just like to add though that while it might seem like a fantastic idea to give more control to non-programmers, you’ll end up reaching a point where you need to direct a scene with more precise control than the simple LUA commands can give. You can say “move 5 steps to the right” or “go to waypoint 3, turn towards character A”, but in order to REALLY direct a scene using those kinds of commands, you’re going to have dozens of them, one after another, and it’s going to become unrealistic for the dialogue writers to be able to property visualize the scene in enough detail to get a decent result. Worse, when it’s time to go in and fine-tune the results, your own system will be less “helpful” and more “in the way”.

It’s definitely a thing you can do, and many have preceded you in that direction, but be careful not to rely on it too heavily, and leave yourself a way to completely override sets of scene/dialogue commands in a non-destructive way that won’t just get wiped out when it comes time to update your dialogue files again. If you are going to go through with it, then do it with LUA or even just parsing XML documents or something into pre-made object types (Dialogue Nodes, Command Nodes, etc), and don’t parse text as literal function names to use in reflection, or allow C# scripts that are compiled at runtime, or anything like that. Use the keyword approach- translate keywords like “CameraMove” into command objects, and just queue up the command objects to run in sequence. Give only as much control as absolutely necessary to get the result you need.

4 Likes

^^

This, this so much.

I hadn’t thought about the GC aspect of this at all, that’s a really great consideration. It’s a bit less dire than it could be based on my system, but I think there’s a lot of room to make it leaner based on the examples you linked me to, so thank you!

If it’s at all useful or illustrative, the way I have my thing set up is through JSON files, which get imported as textassets. I’m working really hard to write a disciplined, data-driven approach to cutscene and content curation, so that once it’s finished we can write game content without tying ourselves to the engine, or having to use Unity’s editor as a cutscene/content creation system. I use Inky to actually write dialogue and cutscenes, and export them as a JSON file. Scripting commands are each marked by an escape character in Inky, so they aren’t included as part of the body text.

On Unity’s end I drag each cutscene/dialogue in as a textasset, and when the game starts up it checks the cutscene folder and builds a Dictionary<cutsceneID,TextAsset> of everything in said folder. When the cutscene manager is asked to play some given ID it parses the corresponding TextAsset into a custom Cutscene class which essentially contains a list of each line of text to display, a corresponding list of metadata/scripting commands (stored as strings), and optionally any additional data we’d want to match to the line reads in the future, like audio file refs.

Parsing the textasset into that cutscene file is by far the biggest string-handling operation this system does, and honestly I could probably save a lot of performance by pre-parsing the entire cutscene database into data at the start of the game and serializing it as part of the save file, instead of parsing it on-demand. Thus far I’m only not doing that because GC hasn’t become a problem yet, and I don’t want to bloat the save files as part of a premature optimization.

As a result of all of the above, when it actually comes time to parse strings into scripting commands, I’m actually handling a fairly small amount of text; in the test cutscenes we’ve written to give the system a thorough shakedown, we’re finding that a typical cutscene will contain 5-10 3-word strings to be parsed into scripting commands per page of display text.

This is definitely great advice, thank you! I should also definitely clarify, I’m not expecting to get high-fidelity, cinematic cutscenes out of this system; our game is extremely reading-heavy, the rough draft of the script is pushing 200,000 words, and I expect a bit more bloat even after the main editing pass. Most of this is delivered through talking to individual characters in the overworld, or through short cutscenes dotted throughout the game that are a lot more like visual novels than something like the Witcher; the vast majority of these scenes are spent in one or two cameras,and the only time we need anything super-complex, like groups of characters moving in tandem is to open or close the scene. For the bulk of each sequence, you’re looking at idling characters who just need to emote on cue so it’s obvious who’s talking. From the writer’s perspective, this results in every line looking something this, which I think is much more reasonable for a writer to produce in quantity:

Black Knight: Come back here, I’m not done with you! #CueAnimation BlackKnight Jump
King Arthur: You’ve got no legs left, you crazy jerk. #CueAnimation KingArthur Annoyed

These days, if you’re doing cutscene-like things for a dialogue, I’d recommend using Timeline instead of some kind of custom command setup. So if you want to make things move when they’re talking, set up a timeline asset and link it to the dialogue file.

1 Like

This may be an incredibly obvious question for anyone who’s spent a lot of time working with Timeline, but what’s the workflow to get individual dialogue lines and the timeline’s animations working together? The main reason I haven’t looked at using Unity’s internal tools for cutscenes is because I don’t want to get stuck in a logistics nightmare where I’m manually calling dozens of timeline-specific flags by name in each dialogue.

Edit: Oh, to clarify, my system is very classic JRPG text: the cutscene is moved forward by your mashing through dialogue, if you stop advancing and just sit on your current speech bubble, everybody in the scene continues to idle. Each time you advance one line, whatever new animation/sound/logic that is supposed to execute plays, then everyone goes back to idling.