Raspberry Pi Zero Beowulf Cluster to Calculate Lighting?

Do you like me find that you are waiting for those scene lighting calculations to run?

So I’m wondering if it would be possible to run the lighting calculations via a cheap super computer?

The Raspberry Pi Zero is out for £4 (1Ghz CPU:Broadcom BCM2835 RAM: 512 Mb)

And it’s possible to build them into a Beowulf cluster supercomputer.

So is it possible to offload the lighting calculations to a Linux-based supercomputer you can build for the price of a single mid-level graphics card or CPU?

Or are there hardware lighting calculation solutions that would be cheaper/faster?

I have a feeling you’d need very custom (programming) tech for this, because the device only has 512 Mb of ram.

For example, blender render farm would usually send one frame per machine, and to render data on one machine, you’ll need more memory than that.

You’d need to split total task in small chunks one cluster node would chew. Judging by info on beowulf cluster, the cluster software doesn’t handle that for you. So, you’d need to write highly-parallel custom software. Thats “very hard” programming difficulty.

You might not need highly-parallel custom software, if the lighting calculations can be broken down or batched into ‘GI Voxel’s’ then you simply need to pass the jobs out to each node in the clusters.

But the 512 Mb memory limit per node might be a restriction depending on how much data is needed e.g. if full scene/lighting data is needed to process one ‘GI Voxel’.

If you look at the lighting progress bar it tends to use the term clusters or batches so this could greatly simplify the process.

Also doesn’t the lighting system allow you to offload the baking to your office network?

I think its probably better to leverage gpgpu for this.

2 Likes

A raspberry pi zero isn’t worth it to do this with. It’s incredibly slow for any real computation (its 4-10 times slower than raspberry pi 2) and the cost of the networking cables would outweigh the cheap price of a single unit. Plus, it’s ARM and I’m not sure if Unity would even support that. And the puny RAM size.

If Unity supports distributed baking (I don’t even know) it’s probably best to use mid-range cpu’s with 8 gigs of ram or abouts or if it supports gpgpu, a mid range gpu with some silly pentium to keep the gpu fed.

1 Like

Doing that would be “highly-parallel custom software”. I’d probably also be more complex than it sounds, because the chunks aren’t isolated from one another.

Why would you though? A mid-level graphics card has tremendous processing capabilities by comparison.

1 Like

Most lighting systems do, Enlighten included AFAIK… But you need a decent network setup to leverage it when you’re spitting out lightmaps… Which can be a little costly for decent 10GB Ethernet switches…

Even if you could regulate the load across a cluster efficiently, which going by Lightmass client it’s good but not 100% efficient. To match my desktop setup I’d need 32 of them and even then I’d have double the RAM, if GPGPU and / CPU allocations were available then I’m not sure how many you’d need to even make it worth my while…

If I was to distribute it to e.g. my laptop as well, with GPGPU clustering then you’re probably talking hundreds of the little beggars until they could keep up.! Imagine the amount of wires!>>

Just buy a couple o’ decent computers, it would be far less messy if nothing else.

There’s too many unknowns.

Murphy laws says, that software you’re using for lighting calcaulations will want few gigabytes of memory shared across processes or something similarly unpleasant .

Nope. Unity lighting system can’t do that, last time I checked. One computer only. Maybe Enlighten can do that, but that thing isn’t integrated properly into the engine.
Unreal allows that, BUT it requires windows computers, not beowulf cluster.

So, by the looks of it, attempt to use PI cluster will result in royal pain, because you’d need to grab existing GI/lightmap solution, that comes with source, tear it to pieces and then make it work on beowulf architecture, by splitting it into tasks that can be parallelized WHILE keeping amount of ram being by single node sane.

I think it would be cheaper just to buy extra PC, and let it sit somewhere in the corner cooking lightmaps for you 24 h/day, because hiring a programmer/engineer to make this thing work would cost more. I wouldn’t bother with something like that, unless someone’s paying really well. Not worth it and moving lighting calculation to CUDA or OpenCL looks like more reasonable idea.

To potentially get more computational power than you can get from a single graphic card.
Then again, in that case it might be more reasonable to just cluster computers with GPUs instead, and clustering many small devices would probably only work well in theory.

Do you know how many of those you’d have to have though? The original Raspberry Pi uses the same processor albeit at 70% of the clock rate. It only achieved 0.041 GFLOPs (double precision) which according to Wikipedia is roughly on par with a Pentium II @ 300MHz.

https://en.wikipedia.org/wiki/Raspberry_Pi#Performance_of_first_generation_models

By comparison the weakest 900 series hardware with known statistics, the GTX 920M, achieves 18.4 GFLOPs. Or essentially over 449 times the computational power of a single original Raspberry Pi.

https://en.wikipedia.org/wiki/GeForce_900_series#GeForce_900_.289xx.29_series

Alternatively if you don’t want to have to deal with a GPU, the Intel i7-6700K achieves 81.28 GFLOPs. Or ~1,982 times.

http://techgage.com/article/intels-skylake-core-i7-6700k-a-performance-look/

Naturally these are very raw estimates of computing power, but nevertheless it wouldn’t be cheaper to build and power your cluster over using a single modern desktop/laptop.

4 Likes

@neginfinity

Yes, Enlighten does support clustered node computations…

I’m sure it’ll come to Unity sooner or later…

I’d guess few 10s of thousands. That’s wild guess.

Eh, thee’s a catch though, so don’t concentrate on FLOPs too much. GPU computations mean that thing should be done in very specific fashion, and restrictions are much tighter than with cpu programming. The basic example is when your code has a branch and you have a block of threads working on it, they ALL should make same decision on the branch, otherwise you’ll be losing performance.

Either way, I don’t think building PI cluster is a good idea, I just see where OP is coming from.

If only Xeon’s didn’t cost so much, next years 40 core model :slight_smile: http://cdn.wccftech.com/wp-content/uploads/2015/12/Intel-Broadwell-EP-Xeon-E5-2698-V4_Cinebench.jpg

1 Like

Right. That’s why I gave the statistics for Intel’s i7-6700K. The Raspberry Pi simply doesn’t make any sense for situations where you need considerable computational power. It’s primary advantage is the small footprint and it’s extremely low power demands. The Raspberry Pi Zero consumes less than one watt.

This makes it ideal for situations where you need an embedded controller but aren’t able to satisfy your needs with a standard microcontroller. It may also work great for low demand servers such as a MUD VPS.

1 Like

I’ll just leave this here:

Global Illumination: Distributed LAN lighting builds
Polish the Hackweek X project on distributing lighting builds (precompute/baking) across machines on a LAN. The aim is to make a system that will automatically discover available machines and use them for time consuming lighting builds. No setup should be required. Video.

1 Like

What about Imaginations Raytracing Hardware?

http://blog.imgtec.com/powervr-developers/powervr-gr6500-ray-tracing

You do realise that the Raspberry Pi GPU has a potential 24 GFLOPs of processing power and it’s instruction set has been released.

This would entail GPU programming but that’s a lot of potential power per a £4!

The PowerVR GR6500 has ratings for FP32 (single precision) and lower but nothing for FP64 (double precision). Those earlier statistics I gave were for FP64 (double precision). Generally GPU-based FP64 is far slower than FP32.

The PowerVR GR6500 is rated at 150 GFLOPs for FP32, the GTX 920M at 441.6, and the i7-6700K is 113.53.

OpenGL ES 2.0 does not support FP64. That’s very slow for FP32.

https://www.khronos.org/registry/gles/specs/2.0/es_cm_spec_2.0.24.pdf

Why, exactly, are you telling me this? I already told that I wouldn’t bother with Pi cluster for lightmap calculations.

“IN-PROGRESS, TIMELINES LONG OR UNCERTAIN”