I was working on a tool for Unity and the script was becoming fairly long (2000+ lines) and I was wondering how I should make code ‘maintainable’.
Let’s say I have 4 structures in my code.
Code that creates the editor UI for the tool
Code that creates a mesh
Code that processes the mesh
Code that saves the mesh
Now, all these structures are in the same script, would it be better to create separate scripts for them?
Do I create a namespace with 3 classes in them (create, process, save) and then in my main script (the editor UI script) I write a ‘using’ statement? Or how do people usually make there code more maintainable, is it good practice to create a new script for a big function?
I think it’s definitely good to separate code into different scripts along clear boundaries. For example when writing an editor extension I put the UI design information (colors, styles etc) in a separate static script. Otherwise it clutters up the script and obscures information about the functionality.
Unless your mesh generation is very tightly coupled to everything else leading up to it, I think it should definitely be separated.
Saving - yeah why not? The less irrelevant stuff in the main script, the clearer your view of what’s going on.
Personally, I don’t like to make scripts that go for more than 400-500 lines if I can help it, and I really dislike navigating scripts with several thousand lines, especially if it’s less than perfectly organised. The more modular and separable the code, the more enjoyable it is to work with.
In developing with such a language like C#, it’s generally acknowledged that it’s better if you design your class that it handles only a single, well defined responsibility or a role.
Then you can compose those small classes to handle larger and more complex tasks, not unlike how you attach various components to a game object to add more features in Unity.
Sometimes it’s called SRP(Single Responsibility Principle), and I think it’s quite a sane rule to follow even if you are not strictly following OOP.
2,000 lines is a lot. There are some extreme cases where 2k lines in a single class can still be correct, but they’re definitely extreme.
As @mysticfall says, it’s generally best to isolate code into discrete ‘responsibilities’ or ‘concerns’. It get’s a little tricky sometimes figuring out what exactly a ‘responsibility’ is, and there are really no concrete rules here.
In general, I like my files to be around 100 lines each. In general, this is enough code to really get work done, but not so much that it becomes hard to navigate.
Often, very math/data processing heavy components can get pretty verbose, Mesh generation/manipulation code is some of the worst. I have a single method that’s around ~150 lines for simply generating meshes from splines. I’ve seen others writing more complex mesh editing with methods that can hover around 200-300 lines (and are pretty reasonable given the context).
So in general, mesh gen code will tend to be more lines per class than almost any other kind of process, there’s simply a ton of processing you need to do. That said, it sounds like you’re frustrated with your code (and that’s the biggest sign that something is off).
At a certain point you simply need to use common sense. At absolute minimum, the ‘editor gui’ code should probably live elsewhere. I would also recommend considering the use of the ‘partial’ keyword to at least split code into different files.
You need to look up some coding principles. One core principle is SRP, Single Responsibility Principle
It basically says that your methods (and classes) should do one thing and one thing only and do it well. Adhering to this principle keeps code maintainable because it keeps logic that is not concerned with one another separate .
Also use name spaces properly. This can help organise your code into logical sections that help with ‘at a glance’ organisation. Name spaces give context to the code.
Take a look at this book:
It’s my go to book on code organisation. If you have 2000 lines of code in one .cs file, thats TOO much. And it will quickly become unmanageable.
That said, yeah, that’s also expert level code and fundamental GUI elements are their own unique beast. If you haven’t been writing code for at least a decade and you have a class that’s 2k lines, something is wrong.
MeshGen is also a kind of unique beast, but you should still look very critically at any method over 100 lines, and hyper critically at any method entering the 200-300 line range.
I don’t think 200-300 lines is too much of a problem. I would say 500-1000 is where it starts becoming an issue. I can fairly easily scroll down 400-500 lines of code and find whatever I want in a few seconds, as long as it’s formatted properly.
Also, mesh generation can easily become verbose. Especially when the shape is non-geometrical and contains a lot of non-mathematical design rules, you can easily find yourself having to hard-code verts/triangles/uvs in a non-reusable way - that’s probably a design issue as well though not an easy one to fix.
But in the end, I think it comes down to OOP principles and whether the function is doing multiple clearly separable operations, that’s when it would be best to separate it.
When functions are between 500-1000 we’ve left the realm of programming and entered the realm of straight up dirty hacking.
If you’re working solo on something and you’re cool with the code, then great, but if someone is asking for tips on how to better organize their work a 500+ line function means that something is very wrong.
If you can’t break down a 500 line method, then chances are very good that the problem is higher level and the question isn’t so much
Why exactly? Sure there’s a point where the readability starts to decrease (around the 400-500 mark imo), but I wouldn’t say that anything above 500 is ‘dirty hacking’. There are a lot of cases (probably mesh generation being one) where you’d like to have a lot of different operations in their own method, that would be too small to warrant a script of their own - and which might easily add up.
Imo it’s really a question of whether there’s multiple tasks going on in the same script that you wouldn’t need to debug at the same time, that’s when things start to clutter.
Thanks for all the answers! So it seems having scripts that are over a thousand lines is certainly not the way to go.
I have a script for an editorwindow, and I would say the code that makes the editor ‘pretty’ is taking up a good chunk of that space. The other chunk is used for mesh generation code. Currently I’m creating several small scripts for things like saving the mesh, exporting the mesh. I certainly need to work on a module/component based approach so that in the future, if I for example want to improve the saving system, I can just work in a 100 line script without having to worry about other things.
When you name your methods, name them in such a way as there can no mistake as to what they do. That will provide context for the code. If your code starts moving away from the method names remit, then you should think about creating a new method to help out.
Think of it like this. You go to McDonalds and order a BigMac. You have asked the guy taking your order that. He just puts the order into the computer, which is his responsibility, he doesn’t make the food. The creation of your BigMac is the responsibility of the kitchen staff, which in turn delegates responsibility to others, depending on the tasks required. Think about what steps are required and break down the logic into smaller parts, which are manageable and independent.
Good abstraction leads too neat and manageable code. However be careful not to over do it. Get to the point you are happy with the level of complexity and keep it that way.
So the idea of ‘single responsibility’ also applies to functions right. Different methods and functions should each have a single ‘concern’ and action they perform.
I’m the first one to say that ‘responsibility’ is pretty subjective and that it varies from subject to subject. But at the same time, there are certain thresholds where it’s hard to imagine a function is acting within ‘sane’ bounds.
To put this in perspective, on your average monitor using in 12 pt text.
500 lines is a function that is around 15 screens long.
1000 lines is a function that is around 30 screen lengths.
On a printed page, we’re talking about around 25 or 50 pages of text.
You can’t even use IDE functions to navigate functions of that length other than raw bookmarking features or text searching.
Forget about “good practices” as a raw matter of practicality, dealing with a single function that long is difficult and arduous. There are very, very few operations that one might need to perform that require that much code that makes sense to live in a single method.
If it seems that you can’t break it down, then most likely the problem is higher up - as in - the goal is probably too broad and needs to be divided into smaller chunks for practicality if nothing else.
If for example, if you need to hardcode 500 verts, the problem might be that there may be better ways to represent the data than 500 hardcoded verts inline with the processing. Maybe you can pack the verts elsewhere into a full mesh then just read the verts (or offsets) in. Maybe the vert declaration could be a function. Maybe just an array declaration.
Maybe the hardcoded verts should actually be it’s own class. Maybe with a set of methods that allow you to manipulate subsections or duplicate and offset.
There are always exceptions and corner cases, but there are so few where a method of 500+ lines is defensible. Honestly at 2-300 you should really be questioning your approach to the problem.
For the record, I am not against “Dirty Hacks” entirely. I have written plenty of my own, and I have spewed out plenty of garbage code. Filthy Hacks have their place, but that doesn’t mean we can’t aim to do better.
In ‘normal code’ anything over than 1 screen is generally a sign of a big big mistake.
In mesh gen I’ve seen functions at around 300 lines be defensible, as in, you can make a good argument for why he did that.
Generally though, even working with meshes, approaching 100 lines, in a function, you gotta seriously ask yourself “am I doing something dumb?” chances are, the answer is yes.
So what do you guys suggest for Editor GUI code? I use a lot of lines like ‘EditorGuiLayout.Color’ or ‘EditorGuiLayout.Vector3’ and it’s taking up tons of space, is this inevitable?
Editor code varies a lot. It’s much harder to generalize about than something more directed like mesh code.
It’s easy to read simple statements, so if you just have tons of simple one liner statements that’s easy to digest.
If you’re doing more complex stuff, or there’s a bunch of logic or loops or whatever, it’s harder to read and you’re generally better off splitting separate logical operations or areas into different methods or classes.
If you just have a list of 150 lines of EditorGuiLayout.Color one after another, that’s easy to understand right, it’s quick to read and figure out. Right? Sure that breaks the 1 screen rule and it breaks the 100 line rule, but man is that code easy to read and understand.
Imagine what that code looks like. It’s super flat. The indent on the left never changes, there are no curly braces. The margin on the right is really pretty flat too. Almost everything is the same length. The code looks like a rectangle. The stuff on screen perfectly matches the code in order. 150 lines of that is no problem to understand.
It’s when you have more and more different kinds of operations or different kinds of logic batched together things get weird.
Plus one for Mesh generation being a $&@#% in terms of lines of code required. All of my classes tend to top out at about 100 lines, with the exception of server interactions and Mesh generation.
Some random thoughts:
On splitting the class - This should be done if the logic can be effectively split into self contained units. At the same time a class should be big enough to encapsulate all of its logic. I find jumping up and down the screen less painful then jumping through several classes.
On long methods - The first guiding principle should be DRY. If you write the same code twice, make it its own method.
On the other hand I’m torn on breaking up long methods simply because they are long. On the face of it breaking up a long method into half a dozen shorter methods seems to make things more readable. But each time you create a new method, you create a new context. You can no longer gaurentee the state of your program when the method is called. Which ultimately adds more cognitive load onto the developer.
Like most things the answer is somewhere in the middle with a strong dose of ‘it depends’.
I actually go back and forth here too. Sometimes it absolutely makes sense to keep everything in a single method and just either wrap bits in regions or mark them with big fat comments.
The thing is, there is different kinds of cognitive load, like if you just want to “get the idea of what’s happening” it’s much nicer to see a couple method calls with nice friendly names that give you a quick overview.
If you need to get into the precise details and the nitty gritty, it’s easier to have it all in a single block. Having to ‘jump around’ to read code adds considerable fatigue.
Still, super hard to figure out sometimes. Regions are a nice middle ground, but that’s a matter of personal/project preference and regions within a method might not be everyone’s cup of tea.