Linux editor stuck on loading because of bee_backend w/ workaround

So, after creating a new hdrp project with unity 2021.1.19f1 on my recently installed linux box I found that the editor got stuck at some random places in the launch process.

After a bit of investigating I found out that the culprit was bee_backend and more particularly the --stdin-canary that was preventing the process to close on completion.

The last line in the editor log is the following:
Starting: /......../unity/editors/2021.1.19f1/Editor/Data/bee_backend --dagfile="Library/Bee/2400b0aE.dag" --profile="Library/Bee/profiler.json" --continue-on-failure --stdin-canary ScriptAssemblies
WorkingDir: /......../unity/my-lovely-project

Forcefully killing the process was causing build errors, despite the completion of the tasks, so instead I found a neat, temporary solution until the problem get resolved: add a shell script in the middle to remove the --stdin-canary and forward every other args to the real bee_backend executable.

In case someone is having the exact same issue, here are the steps I did to make it work:
- go to your unity editor folder, then in the Data folder
- rename bee_backend to bee_backend_real
- create a bee_backend file, make it executable
- write the following in the newly created bee_backend file

#! /bin/bash

args=("$@")
for ((i=0; i<"${#args[@]}"; ++i))
do
    case ${args[i]} in
        --stdin-canary)
            unset args[i];
            break;;
    esac
done
${0}_real "${args[@]}"

(note, this is a shell script and will be executed on your machine. Please don't put random code you don't understand on your machine)

34 Likes

Hey @neamtim . Thanks for the good deep analysis and debugging!

The stdin-canary mechanism in the bee backend is used for the editor to communicate with the backend. In current versions of Unity, the only purpose of this is to cleanly stop the build when the user hits cancel on a progress bar. In the future we will communicate more data between the editor and bee backend, but in doing so also ran into issues on Linux doing that, so we are replacing the whole mechanism with domain socket IPC, which will likely make this problem go away in the future.

That said, if we have problems running current Unity on Linux, we should fix those - we have not seen any such issues with the current setup in testing. Does this happen for you every time script compilation runs or just randomly on some occasions? Does it happen on any empty new project, or is it project specific? And can you share your system setup? Then we can have someone in our QA try to reproduce this.

1 Like

I use game-ci + github actions with self-hosted runners for build my project and have the look like issue with Unity 2021.2.x.

Github Actions fails runs with the message:

The hosted runner: Hosted Agent lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

The last lines writed by Uinity is
```
....
2021-12-16T21:20:36.3536601Z Starting: /opt/unity/Editor/Data/bee_backend
--profile="Library/Bee/backend_profiler3.traceevents"
--stdin-canary --dagfile="Library/Bee/Player43e62af5.dag"
--continue-on-failure --dagfilejson="Library/Bee/Player43e62af5.dag.json" Player
2021-12-16T21:20:36.3539849Z WorkingDir: /github/workspace/TheGame

..next build step
2021-12-16T21:26:49.1462513Z Post job cleanup.
```

More than 6 minutes without any log records could look like stuck the build process

I've used a virtual machine with Ubunty linux for debugging the issue and find that the bee command local runtime is more than 13 minutes.

Looks like a github runner watchdog is too strong :)

Anyway 13 minutes without activity in logs looks strange. I guess an heartbeat line in a build log every minute can solve this )

1 Like

It happens when I launch the editor on an empty/new/template project. The editor is stuck in the splash-screen. 100% of the time.
I am using opensuse tumbleweed. (and I have the same issue on two different installs of opensuse tumbleweed).
It has happened to every version of the editor since the post. And each time I upgrade the editor version, I simply apply the same process. I did not have an issue with the script. (but admittedly, my project is super small).

From what I remember of the issue, the editor closes the pipe and bee_backed doesn't close? (but it's been a long time since I investigated the thing)

2 Likes

[quote=“Sergey_OneUp_Games”, post:3, topic: 854480]
I use game-ci + github actions with self-hosted runners for build my project and have the look like issue with Unity 2021.2.x.

Github Actions fails runs with the message:

The hosted runner: Hosted Agent lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

The last lines writed by Uinity is

....
2021-12-16T21:20:36.3536601Z Starting: /opt/unity/Editor/Data/bee_backend
--profile="Library/Bee/backend_profiler3.traceevents"
--stdin-canary --dagfile="Library/Bee/Player43e62af5.dag"
--continue-on-failure --dagfilejson="Library/Bee/Player43e62af5.dag.json" Player
2021-12-16T21:20:36.3539849Z WorkingDir: /github/workspace/TheGame

..next build step
2021-12-16T21:26:49.1462513Z Post job cleanup.

More than 6 minutes without any log records could look like stuck the build process

I’ve used a virtual machine with Ubunty linux for debugging the issue and find that the bee command local runtime is more than 13 minutes.

Looks like a github runner watchdog is too strong :slight_smile:

Anyway 13 minutes without activity in logs looks strange. I guess an heartbeat line in a build log every minute can solve this )
[/quote]

I’ve got a successful build.The story is not about watchdog, is about bee_backed output bufferization. I found clang++ with OOM in /var/log/syslog. Github-hosted runners have 7GB RAM - https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners. Self-hosted runner with 16GB RAM must have in my case :slight_smile:

Please flush to a build log an buffered output from commands (like bee_bakend) periodically. It gives developers more information about build issues on remote machines.

[quote=“neamtim”, post:1, topic: 854480]
So, after creating a new hdrp project with unity 2021.1.19f1 on my recently installed linux box I found that the editor got stuck at some random places in the launch process.

After a bit of investigating I found out that the culprit was bee_backend and more particularly the --stdin-canary that was preventing the process to close on completion.

The last line in the editor log is the following:
Starting: /…/unity/editors/2021.1.19f1/Editor/Data/bee_backend --dagfile=“Library/Bee/2400b0aE.dag” --profile=“Library/Bee/profiler.json” --continue-on-failure --stdin-canary ScriptAssemblies
WorkingDir: /…/unity/my-lovely-project

Forcefully killing the process was causing build errors, despite the completion of the tasks, so instead I found a neat, temporary solution until the problem get resolved: add a shell script in the middle to remove the –stdin-canary and forward every other args to the real bee_backend executable.

In case someone is having the exact same issue, here are the steps I did to make it work:

  • go to your unity editor folder, then in the Data folder
  • rename bee_backend to bee_backend_real
  • create a bee_backend file, make it executable
  • write the following in the newly created bee_backend file
#! /bin/bash

args=("$@")
for ((i=0; i<"${#args[@]}"; ++i))
do
    case ${args[i]} in
        --stdin-canary)
            unset args[i];
            break;;
    esac
done
${0}_real "${args[@]}"

(note, this is a shell script and will be executed on your machine. Please don’t put random code you don’t understand on your machine)
[/quote]
You saved my life, I can’t thank you enough. Please someone give this man a medal.

I can confirm this bug is happening in 2021.3.4f1 LTS and in 2022.1.4f1. Please, Unity guys, we like you so much, solve that for us! We buy you a beer!

[quote=“jonas-echterhoff_1”, post:2, topic: 854480]
That said, if we have problems running current Unity on Linux, we should fix those - we have not seen any such issues with the current setup in testing. Does this happen for you every time script compilation runs or just randomly on some occasions? Does it happen on any empty new project, or is it project specific? And can you share your system setup? Then we can have someone in our QA try to reproduce this.
[/quote]

I encountered the problem of that bee freezing back when i used openSUSE distro (yes, i know its not officially supported). On Ubuntu 20.04 it worked fine, and on Alt Linux it worked fine too for me.


Same issue as @Sergey_OneUp_Games also using Game.ci

Can confirm that this is occurring when performing a build using Github Actions on a Linux runner when building for Android. Does not happen on my local machine when perform build via editor on a Mac machine. This is on a "blank" project with nothing in it besides the default. No extra/special packages. Editor version: 2021.3.3f1

Local build is ~3mins. Cloud build: ~15mins

Log:

2022-07-03T03:03:56.2376560Z *** Tundra requires additional run (32.05 seconds), 161 items updated, 667 evaluated
2022-07-03T03:03:56.2377661Z *** Additional run caused by: contents change of /github/workspace/Library/Bee/artifacts/Android/il2cppOutput/cpp
2022-07-03T03:03:56.2379673Z Starting: /opt/unity/Editor/Data/Tools/netcorerun/netcorerun "/opt/unity/Editor/Data/PlaybackEngines/AndroidPlayer/AndroidPlayerBuildProgram.exe" "/opt/unity/Editor/Data/PlaybackEngines/AndroidPlayer/Bee:/opt/unity/Editor/Data/Tools/BuildPipeline" "Library/Bee/Player0857283b.dag.json" "Library/Bee/Player0857283b-inputdata.json" "Library/Bee/buildprogram2.traceevents"
2022-07-03T03:03:56.2381141Z WorkingDir: /github/workspace
2022-07-03T03:03:56.9272142Z ExitCode: 0 Duration: 0s712ms
2022-07-03T03:03:56.9286244Z Starting: /opt/unity/Editor/Data/bee_backend --profile="Library/Bee/backend_profiler3.traceevents" --stdin-canary --dagfile="Library/Bee/Player0857283b.dag" --continue-on-failure --dagfilejson="Library/Bee/Player0857283b.dag.json" Player
2022-07-03T03:03:56.9291787Z WorkingDir: /github/workspace
2022-07-03T03:11:20.8181532Z ExitCode: 0 Duration: 7m:23s
2022-07-03T03:11:20.8197839Z Finished compiling graph: 1352 nodes, 3586 flattened edges (3580 ToBuild, 8 ToUse), maximum node priority 814

Not great, considering every minute of stalled time in a cloud build is time wasted. @jonas-echterhoff_1

2 Likes

i am using this automated build system https://game.ci/
and bee_backend is taking too much time, which is not ok on ci

[quote=“jonas-echterhoff_1”, post:2, topic: 854480]
Hey @neamtim . Thanks for the good deep analysis and debugging!

The stdin-canary mechanism in the bee backend is used for the editor to communicate with the backend. In current versions of Unity, the only purpose of this is to cleanly stop the build when the user hits cancel on a progress bar. In the future we will communicate more data between the editor and bee backend, but in doing so also ran into issues on Linux doing that, so we are replacing the whole mechanism with domain socket IPC, which will likely make this problem go away in the future.

That said, if we have problems running current Unity on Linux, we should fix those - we have not seen any such issues with the current setup in testing. Does this happen for you every time script compilation runs or just randomly on some occasions? Does it happen on any empty new project, or is it project specific? And can you share your system setup? Then we can have someone in our QA try to reproduce this.
[/quote]

In headless builds, cancel button behavior should not result in a hanging build.

This has not been an issue for us in 2020 LTS, but since 2021 all our headless builds fail after a 60 minute timeout after this command.

Is there any chance this could be fixed in the next LTS update?

I am seeing the same thing!

I just upgraded a project from 2020.1.15f1 to the latest 2021.3.8f1 LTS and I am hitting this problem. I am using game.ci with a linux ubuntu runner and it stalls on bee_backend

*** Tundra requires additional run (42.88 seconds), 446 items updated, 2564 evaluated
*** Additional run caused by: contents change of /github/workspace/Library/Bee/artifacts/Android/il2cppOutput/cpp
Starting: /opt/unity/Editor/Data/Tools/netcorerun/netcorerun "/opt/unity/Editor/Data/PlaybackEngines/AndroidPlayer/AndroidPlayerBuildProgram.exe" "/opt/unity/Editor/Data/PlaybackEngines/AndroidPlayer/Bee:/opt/unity/Editor/Data/Tools/BuildPipeline" "Library/Bee/Player0857283b.dag.json" "Library/Bee/Player0857283b-inputdata.json" "Library/Bee/buildprogram2.traceevents"
WorkingDir: /github/workspace
ExitCode: 0 Duration: 1s169ms
Starting: /opt/unity/Editor/Data/bee_backend --profile="Library/Bee/backend_profiler3.traceevents" --stdin-canary --dagfile="Library/Bee/Player0857283b.dag" --continue-on-failure --dagfilejson="Library/Bee/Player0857283b.dag.json" Player
WorkingDir: /github/workspace
Error: The operation was canceled.

It works fine when building the player on my local windows 10 machine.

Try add more RAM to your runner and try again

[quote=“wdc_bigbluegames”, post:8, topic: 854480]
Same issue as @Sergey_OneUp_Games also using Game.ci

Can confirm that this is occurring when performing a build using Github Actions on a Linux runner when building for Android. Does not happen on my local machine when perform build via editor on a Mac machine. This is on a “blank” project with nothing in it besides the default. No extra/special packages. Editor version: 2021.3.3f1

Local build is ~3mins. Cloud build: ~15mins

Log:

2022-07-03T03:03:56.2376560Z *** Tundra requires additional run (32.05 seconds), 161 items updated, 667 evaluated
2022-07-03T03:03:56.2377661Z *** Additional run caused by: contents change of /github/workspace/Library/Bee/artifacts/Android/il2cppOutput/cpp
2022-07-03T03:03:56.2379673Z Starting: /opt/unity/Editor/Data/Tools/netcorerun/netcorerun "/opt/unity/Editor/Data/PlaybackEngines/AndroidPlayer/AndroidPlayerBuildProgram.exe" "/opt/unity/Editor/Data/PlaybackEngines/AndroidPlayer/Bee:/opt/unity/Editor/Data/Tools/BuildPipeline" "Library/Bee/Player0857283b.dag.json" "Library/Bee/Player0857283b-inputdata.json" "Library/Bee/buildprogram2.traceevents"
2022-07-03T03:03:56.2381141Z WorkingDir: /github/workspace
2022-07-03T03:03:56.9272142Z ExitCode: 0 Duration: 0s712ms
2022-07-03T03:03:56.9286244Z Starting: /opt/unity/Editor/Data/bee_backend --profile="Library/Bee/backend_profiler3.traceevents" --stdin-canary --dagfile="Library/Bee/Player0857283b.dag" --continue-on-failure --dagfilejson="Library/Bee/Player0857283b.dag.json" Player
2022-07-03T03:03:56.9291787Z WorkingDir: /github/workspace
2022-07-03T03:11:20.8181532Z ExitCode: 0 Duration: 7m:23s
2022-07-03T03:11:20.8197839Z Finished compiling graph: 1352 nodes, 3586 flattened edges (3580 ToBuild, 8 ToUse), maximum node priority 814

Not great, considering every minute of stalled time in a cloud build is time wasted. @jonas-echterhoff_1
[/quote]

+1 on this bug. It makes GitHub builds on their server unusable. :-/

We also seem to (again) have this on Apple Silicon Macs.

1 Like

also having this issue game.ci selfhosted Github Actions Linux runners when building for Windows & WebGl

We have this problem as well, on Windows!

The log shows:
WorkingDir: G:/repos/photonjam
ExitCode: 0 Duration: 2m:34s
[BUSY 6s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 16s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 26s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 36s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 46s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 56s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 66s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 76s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 86s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 96s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 106s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 116s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 126s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 136s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[BUSY 146s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[465/474 154s] Csc Library/Bee/artifacts/1900b0aEDbg.dag/photonjam.dll (+2 others)
[467/474 0s] ILPostProcess Library/Bee/artifacts/1900b0aEDbg.dag/post-processed/photonjam.editor.dll (+pdb)
Our photonjam assembly contains just 13 files and every change resulted in 2 minutes or more of compile time... thats ridiculus :(
Regenerating the lib folder helps for a few cycles but the bug still comes bag
I do not understand why --stdin-canary is even needed... Canceling the compilation of scripts when tabbing back into Unity is not even a thing (no button on the dialog). If thats the only reason for the parameter just get rid of it when automatically reloading scripts! When doing a build the two minutes are irrelevant so there its not a big problem...

Well as i am on Windows right now, a shell script was not an option so i resorted to a lil custom c# program... Blazingly fast compilation now :smile:

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace bee_backend
{
    class Program
    {
        private static void Main(string[] args)
        {
            var newArgs = new List<string>(args);
            //which one is it?... no idea... it works...
            newArgs.Remove("--stdin-canary");
            newArgs.Remove("stdin-canary");
            var arguments = string.Join(" ", newArgs.ToArray());
            Console.WriteLine("Running bee with arguments: " + arguments);
            var p = new Process();
            p.StartInfo.UseShellExecute = false;
            p.StartInfo.RedirectStandardOutput = true;
            p.StartInfo.FileName = "bee_backend_real.exe";
            p.StartInfo.Arguments = arguments;
            p.Start();
        }
    }
}
1 Like

Bumping this because it's a big problem. I'm fine with "slow" builds, but a recent version of our app using gameCI and github hosted build runners takes 75 minutes. One of the bee_backend compilation steps takes 39 minutes with no meaningful output while it's running.

2022-10-12T17:39:15.1283083Z Starting: /opt/unity/Editor/Data/bee_backend --profile="Library/Bee/backend_profiler3.traceevents" --stdin-canary --dagfile="Library/Bee/Player0857283b.dag" --continue-on-failure --dagfilejson="Library/Bee/Player0857283b.dag.json" Player
2022-10-12T17:39:15.1284665Z WorkingDir: /github/workspace
2022-10-12T18:18:20.3971266Z ExitCode: 0 Duration: 39m:05s
2022-10-12T18:18:20.5102106Z Finished compiling graph: 2405 nodes, 6741 flattened edges (6733 ToBuild, 10 ToUse), maximum node priority 1637

Ci builds failed on 2021.3.11f1 due to this Bee System, Any workaround to fix this on game-ci?

1 Like

On windows I am using the bee_backend from editor 2021.3.7f1 at the moment and it is blazingly fast, might be an adaptable workaround on linux as well...


i got compiling assembly always failed on unity2021.3.8f1 ~ 2021.3.15f1 till now....it's looks the bee cause it.