I think a fairly common solution in Unity is to perform the selection in screen space.
There’s at least a couple of ways to respond to mouse events; one would be to check for these events in an Update() function using Input.GetMouseButtonDown(), GetMouseButtonUp(), and mousePosition.
When a mouse ‘button down’ event is detected, set a ‘marquee’ variable to ‘true’ to indicate that you’re now in marquee selection mode, and save the current mouse position. When a mouse ‘button up’ event is detected, leave marquee selection mode.
While in marquee selection mode, for each update, create a 2-d AABB from the saved mouse position and the current mouse position. Then, use Camera.WorldToScreenPoint() to find the screen-space positions of all objects of interest. Deselect all objects, and then select any for which the screen-space position is within the AABB.
There are various enhancements and possible optimizations you could introduce, but that’s the basic idea.
I and many others also came to Unity from a more traditional ‘low-level’ background, and although the Unity architecture does seem to throw some people, my own opinion is that there’s nothing particularly mysterious about it. Unity provides a fairly easy-to-understand framework under the hood that handles the game loop, update cycles, and so on (the kind of stuff you’d typically code by hand if you were, say, writing a game from scratch in C++), and then lets you hook into it using game object behaviors. Because the basic elements of the system (input, object transforms, etc.) are exposed, you can basically implement just about anything you can think of via script components (within the limits of the engine itself, of course, although that’s the case with any engine).
If you haven’t already seen it, here is a page from the Unity docs that summarizes some methods for accessing game objects and scripts attached to those objects (this seems to be one of the things that frequently throws people).