How we replace most of our HPC# code with Rust code

I think the opportunity is mature to post something about an alternative to writing performant code for Unity. The problem that current DOTS has for our use case is eco-system and memory safety, both significantly slowing down our development. In the end, we have migrated most of what we have written in HPC#/C# to Rust such as procedural mesh generation, procedural terrain generation, ray casting based on voxel marching, asynchronous file IO, and serialization(binary/JSON). More importantly, what we could migrate over in near future are the majority of the core game logic(both client-side and server-side), Networking, and Job system. This might sound impossible to do or too much effort to try, but we have trivialized it by a efficient workflow, to write performant code in Rust for Unity.

Before I get started, let me clarify what is the goal of this thread, I will focus on technical things like how the development workflow looks, how the bindings look, how the **unsafe**s in both languages are dealt with and how can it be integrated with existing DOTS ECS framework. I assume you know what you want for your development, and this thread is here to only lower the marginal cost of using a little bit Rust for Unity development. You are welcome to ask any question and discuss this kind of workflow.

There is a lot of details, and I will divide them into part and update regularly. And here is part one:

1) Automatic binding generation:

I am sure you might have check some existing blog talking about how to call a Rust function from C#, and
a Rust function as the following can be compiled into a native plugin for Unity to use:

#[no_mangle]
pub unsafe extern "C" fn hello_world() {
    println!("hello world!");
}

On Unity’s C# side, all you have to do is to write some bindings for your function in Rust:

[DllImport("my_rust_code", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true)]
public static extern void hello_world();

However, you probably do not want to rewrite the bindings every time you change your function signatures. To overcome this problem, we will introduce several code generators.

Starting from something simple, two code generators could be used to generate binding from unsafe Rust to unsafe C# for now:
cbindgen generates a C/C++ header from unsafe rust code that is ffi compatible,
ClangSharp generates unmanaged unsafe binding C# code from C/C++ header, since it is unmanaged, they are also Burst compatible.

both of tools are well documented, you could check their repository here:
cbindgen: GitHub - mozilla/cbindgen: A project for generating C bindings from Rust code
ClangSharp: GitHub - dotnet/ClangSharp: Clang bindings for .NET written in C#

After writing a build script to chain them together as follows:

Rust code --(cbindgen)-> C/C++ Header --(ClangSharp)-> C# bindings
Your code will firstly generate a C/C++ header file containing the binding:

void hello_world();

Then it will be fed into ClangSharp here to generate a binding file ready for use.
Note: if your ide is complaining about missing NativeTypeNameAttribute: create a new file with the following content:

using System;
using System.Diagnostics;

[Conditional("DEBUG")]
[AttributeUsage(
    AttributeTargets.Property | AttributeTargets.Field | AttributeTargets.Parameter | AttributeTargets.ReturnValue,
    AllowMultiple = false, Inherited = true)]
internal sealed class NativeTypeNameAttribute : Attribute
{
    public NativeTypeNameAttribute(string name)
    {
        Name = name;
    }

    public string Name { get; }
}

You could also found this file from the ClangSharp repository.

Now, with the build script, once you edit some Rust code, both the binding and native plugin can be automatically deployed to Unity. However, Unity currently can not reload a native plugin. The best workaround that I found so far is compiling the library into a plugin, whose name is changed every time you make a new build(like incrementing a build version in the name). The generated binding will be pointed to the newer plugin, and a reload in Unity will force code to use the newer plugin. This is sufficient enough for development purposes since you won’t have to restart Unity to reload native plugins, but it has limitations:

  • since the older plugin will not be unloaded, Thus all of the things from the old plugin will be leaked without being used or freed and this is intended. But be careful with any global statics allocating a lot of resources.
  • Burst compiled code will occasionally(not sure if it is consistent) not recompile to a binding change, thus the burst compiled code with dllimport will still point to the native plugin with the earlier version. (it might get fixed in a later version of Burst, but I have not checked).

I hope what I wrote here is good enough for you to get started to explore. After this part, I will probably discuss a few thing regarding to FFI safety so you don’t crash your Unity. Using this workflow, a crash could happen from those causes:

  • Unity internal error
  • FFI boundary between Rust and C#
  • Rust code panics
  • Rust code abort process

We will be dealing with the second with a lot of code generation. Since the third will trigger a unwind which could be caught at FFI boundary, we will do so for every ffi function using code generation. The forth one will be unlikely to happen unless something “catastrophic happens”. Thus stay tuned for part 2 if you are interested.

14 Likes

2) FFI safe data types:

During the last part of the thread, we establish how the unsafe rust is compiled and generated into a native plugin and C# bindings that can be used from Unity. I also give a classic hello world as an example. Unfortunately, the function will not work in a way that is expected, it will not print out anything in Unity Console, but you have to find it from the Unity Editor log.

Note: println macro write the output to the stdout, which is not being displayed in Unity console.

We will be talking about logging/tracing from the native plugin to Unity in a future part since before then, we have some baseline knowledge need to know about what is ok to pass between C# and Rust.

C# and Rust are different languages, and the way they handle function call internal and data structure are different. However, the fundamental idea here we have to know is that both can have some features to allow interoperability with any language. The thing we need to look out for is the data structure. For a data structure to work in both languages, we have to make sure it has the same memory layout. Since the managed object in C# is a pointer to the actual data, which is managed by a garbage collector, it would be an extremely bad idea to pass that to anything outside of the C# runtime. The lifetime of that pointer is unpredictable(unless, rust has some mono API to operate on that data safely, which will not be the focus in this post).

the only thing we could pass is those boring data structures like pointers, integers, floats, etc. In C# we call it blittable types.

In this workflow, these types will be equivalent in both languages.

  • C# | Rust
  • byte | u8
  • sbyte | i8
  • short | i16
  • ushort | u16
  • int | i32
  • uint | u32
  • long | i64
  • ulong | u64
  • float | f32
  • double| f64
  • byte | bool (note here: bool is not blittable in C#, when ClangSharp generates the binding, it will be converted to byte)
  • T* | *mut T/ *const T (any pointer type)

One more thing that you could pass around between C# and Rust is structures. However, for compatibility with HPC#, we will not be talking about passing structure or returning structure in this thread(except for a structure with only a pointer in the field, which is compatible in HPC# and I will talk about using it in future parts).
(See:https://docs.unity3d.com/Packages/com.unity.burst@1.6/manual/docs/CSharpLanguageSupport_BurstIntrinsics.html#dllimport-and-internal-calls) Instead, we will need to use the pointer to that structure.

So, let work on a function that prints out hello world in the console by passing that string from Rust to C#. I will do it in multiple different ways, and I will explain what works and what does not work.

"How about just return a rust String back?"

#[no_mangle]
pub unsafe extern "C" fn hello_world() -> String {
    String::from("Hello World!")
}

This will not work since std::String does not have a consistent layout. For a rust struct to be FFI compatible, it must have the "#[repr(C)]" attribute.

"OK, what if I just pass a pointer to the string back?"

#[no_mangle]
pub unsafe extern "C" fn hello_world_by_ptr() -> *const u8 {
    let string = String::from("Hello World!");
    let bytes = string.as_bytes();
    bytes.as_ptr()
}

This will not work, and a big no-no for this. There is two reasons for that:
1. The pointer is pointing to a local variable that has a scope of the function, at the end of the function, the string will be dropped and the memory will be freed. the returned pointer will now become a dangling pointer.
2. Even if the pointee is not freed, it will be dangerous to pass back since it is not a null-terminated string. Thus we need to use a CString instead of String for that.

Note: In C#/C/C++, you have to remember that to free when something is out of scope.
But in Rust, the freeing is guaranteed to happen at the end of the scope, sometime it will create unintended behaviors like here, we do not want to drop the pointee, since the content of the pointee is intentionally leaked to C#.

"Well, just leak it intentionally then"

#[no_mangle]
pub unsafe extern "C" fn hello_world_by_leaking_ptr() -> *const u8 {
    let string = String::from("Hello World!");
    let cstring = CString::new(string).unwrap();
    let ptr = cstring.as_ptr();
    std::mem::forget(cstring);
    ptr as * const u8
}

This will work but it creates a potential memory leak if the memory is not freed from C#. What I mean by freeing from C# is that in C#, you have to call whatever is allocating the memory to free. However, that whatever-thing is rust internal allocator. Calling an allocator from C# might not redirect to the same allocator that Rust is using, and it will potentially lead to a Unity Crash. Thus, this method is not recommended.

now from here, we can take two approaches, one is to keep using heap-allocated string or we can use something that stores strings in a location that does not take an indirection, something like FixedString32.

for the first one, we need to include a wrapper to the rust internal allocator:

#[no_mangle]
pub unsafe extern "C" fn rust_deallocate(ptr: *mut c_void, size: usize, align: usize) {
    let layout = Layout::from_size_align(size, align).expect("");
    std::alloc::dealloc(ptr as *mut u8, layout);
}

This will point to the Rust default Global allocator, which is the same allocator a String is using(unless you are using rust allocator API have a customize allocator for that String).

We just need to we also pass the size and alignment of the string to the C# to allow C# to call the deallocating function. I will leave it to you to figure out how to do that.

for the second approach, we need to have a FixedString32 implemented in Rust that has the same layout and behaves exactly the same as Unity ones. And I happened to have implemented one using macros to generate FixedString with different capacities. During your implementation, you have to make sure that the FixedString32 is also null-terminated as well.

use std::mem::MaybeUninit;
use std::hash::{Hash, Hasher};
macro_rules! impl_fixed_string {
    ($name:ident, $array_len: expr, $cap: expr) => {
        #[derive(Clone, Debug)]
        #[repr(C)]
        pub struct $name {
            len: i16,
            bytes: [u8; $array_len],
        }

        impl Default for $name {
            fn default() -> Self {
                Self::from_u8_slice(&[])
            }
        }

        impl $name {
            const CAPACITY: usize = $cap;
            pub fn capacity(&self) -> usize {
                Self::CAPACITY
            }

            pub fn len(&self) -> usize {
                self.len as usize
            }

            pub fn is_empty(&self) -> bool {
                self.len == 0
            }

            pub fn clear(&mut self) {
                self.len = 0
            }

            // this function is private since FixedString can only allow a valid utf-8 string
            fn from_u8_slice<S: AsRef<[u8]>>(slice: S) -> Self {
                let slices = slice.as_ref();

                let len = if slices.len() > ($cap) {
                    $cap
                } else {
                    slices.len() as i16
                };
                // Safety: this string will null-terminated, so it is fine to have it uninitiated
                unsafe {
                    let bytes: MaybeUninit<[u8; ($array_len)]> = MaybeUninit::uninit();
                    let mut bytes = bytes.assume_init();
                    for i in 0..len as usize {
                        bytes[i] = slices[i];
                    }
                    // must be null terminated
                    bytes[len as usize] = b'\0';
                    Self { len, bytes }
                }
            }

            pub fn from_str<S: AsRef<str>>(string: S) -> Self {
                Self::from_u8_slice(string.as_ref())
            }

            pub fn as_bytes(&self) -> &[u8] {
                // # Safety: we need to reconstruct a slice here since the bytes will contain uninitialized memory
                unsafe {
                    std::slice::from_raw_parts(self.bytes.as_ptr() as *const u8, self.len as usize)
                }
            }
            pub fn as_str(&self) -> &str {
                // # Safety: we know that is no way to produce a FixedString being invalid utf-8 string
                unsafe { std::str::from_utf8_unchecked(self.as_bytes()) }
            }
        }

        impl From<Box<dyn std::any::Any + Send + 'static>> for $name {
            fn from(e: Box<dyn std::any::Any + Send + 'static>) -> Self {
                // The documentation suggests that it will *usually* be a str or String.
                if let Some(s) = e.downcast_ref::<&'static str>() {
                    $name::from_u8_slice(s)
                } else if let Some(s) = e.downcast_ref::<String>() {
                    $name::from_u8_slice(s)
                } else {
                    $name::from_u8_slice(&"Unknown panic!")
                }
            }
        }

        impl From<String> for $name {
            fn from(string: String) -> Self {
                $name::from_u8_slice(string)
            }
        }

        impl From<&str> for $name {
            fn from(from: &str) -> Self {
                Self::from_str(from)
            }
        }

        impl AsRef<str> for $name {
            fn as_ref(&self) -> &str {
                self.as_str()
            }
        }

        impl PartialEq for $name {
            #[inline]

            fn eq(&self, other: &Self) -> bool {
                self.as_str() == other.as_str()
            }
        }

        impl Eq for $name {}

        impl Hash for $name {
            #[inline]
            fn hash<H: Hasher>(&self, hasher: &mut H) {
                (*self.as_ref()).hash(hasher)
            }
        }
    };
}

impl_fixed_string!(FixedString32Ffi, 30, 29);
impl_fixed_string!(FixedString64Ffi, 62, 61);
impl_fixed_string!(FixedString128Ffi, 126, 125);
impl_fixed_string!(FixedString512Ffi, 510, 509);
impl_fixed_string!(FixedString4096Ffi, 4094, 4093);

Note: However, if the FixedStringFfi takes a string that is longer than the capacity, it will truncate the string. You could panic it out due to your needs.
we can also write a test to verify that the FixedString32Ffi is exactly 32 bytes long.

#[cfg(test)]
mod test {
    use std::mem::size_of;

    use super::*;

    #[test]
    fn test_layout() {
        assert_eq!(size_of::<FixedString32Ffi>(), 32);
        assert_eq!(size_of::<FixedString64Ffi>(), 64);
        assert_eq!(size_of::<FixedString128Ffi>(), 128);
        assert_eq!(size_of::<FixedString512Ffi>(), 512);
        assert_eq!(size_of::<FixedString4096Ffi>(), 4096);
    }
}

After implemented the FixedStringFfi, we can use it in our code:

#[no_mangle]
pub unsafe extern "C" fn hello_world_by_return_fixed_string() -> *const FixedString32Ffi {
    Box::leak(Box::new(FixedString32Ffi::from(
        "Hello World!",
    )))
}

Note: I have mentioned that will not be return a structure directly, but instead return a pointer to the structure. Then, we must box it by allocate and place it on the heap, and leak it to C#. But this is less ideal than the following:

#[no_mangle]
pub unsafe extern "C" fn hello_world_by_out_fixed_string(string: *mut FixedString128Ffi) -> bool {
    *string = FixedString128Ffi::from("Hello World!");
    true
}

this grammar is similar to C# where parameters are passed by our reference. It works better since there is no heap allocation, and HPC# can use this function as well. The boolean is there just to indicate whether the function is successful or not.

After finishing the rust side, it is a good time to take a look at the C# side, and we already have those bindings generated to FixedStringFfi and the hello_world_by_out_fixed_string if you set up a bindgen chain from part 1:

public unsafe partial struct FixedString32Ffi
{
    [NativeTypeName("int16_t")]
    public short len;
    [NativeTypeName("uint8_t [30]")]
    public fixed byte bytes[30];
}

Here it uses a fixed byte array to store the string. The length of it is stored in the first 2 bytes, and the rest of the bytes are the string. Curiously, it is a more straightforward implementation than Unity.Collections.FixedString32.

it is also good to write some helpers to these structures to make them easier to use:

public partial struct FixedString32Ffi
{
    public static implicit operator FixedString32(FixedString32Ffi d)
    {
        unsafe
        {
            return *(FixedString32*) &d;
        }
    }

    public static implicit operator FixedString32Ffi(FixedString32 d)
    {
        unsafe
        {
            return *(FixedString32Ffi*) &d;
        }
    }
}

After which, we can call the rust function to get a "Hello World!" string from Rust in form of FixedString32. I hope you get a glimpse of how the data is passed between C# and Rust in this workflow. If you have some questions, feel free to ask, and we will be talking about how to deal with panic in the next part.

8 Likes

…why though?
there’s memory safety. you don’t ever need to use pointers. not moreso than in rust.

i would much rather use C++ 20.
get some constexpr if action going.

OK but show me the benchmarks what performance boost do I get from learning another programming language?

We need to some data structure that is not inside Unity.Collections. The simplest example would be, how do you write a nested array without using UnsafeList or pointers? Then the memory problem comes. That is fine if you are more comfortable with C++, but I was not talking about replacing C++ with Rust, but I was talking about replacing HPC# with Rust.

[quote=“l33t_P4j33t, post:3, topic: 866020, username:l33t_P4j33t”]
get some constexpr if action going.
[/quote] Not sure I understand what you mean.

Thanks for your reply, and I will keep updating.

1 Like

Depends on what you want. If your are benchmark two same functionality in HPC# and Rust, I do not think there is any trick to push one ahead of the other one too much in most of cases. The low-level control is there for both languages. But the issue with HPC# was not the performance but the expressiveness limited by all sort of problems(I mentioned).

If you only need the functionality that is provided by Unity which is written in C++ but interfaced to you via C#, I don’t think there is a lot of performance boost that you could get out of from learning another programming language. But If you need to write, let’s say a database allowing you stream your world state data consistently to the file system because your world is just too large to save at once, I will implement that in neither C# nor HPC#.

Memory safety built into a language is just an abstraction layer. Somebody, at some point, would've had to go there and build the data structures with raw pointers, as is the case with Unity.Collections. It's not difficult if it stays in one class and is backed up by unit tests. There is no "unsafety" anywhere.
I don't get why one would avoid doing that dogmatically where that is something every programmer should be capable of.
I also don't get why then, at the same time, people basically say "we can do it better!" and give up on a substantial amount of the work that has already been done by the Unity team. Especially performance will suffer in the long run, as combining Burst code with an IL2CPP executable will be done without an interop layer.

constexpr is a very powerful tool in C++ which lets you evaluate anything at compile time. Not only the if branches that would only ever be evaluated at compile time - never at runtime -, as the post suggested, but also stuff like functions which may contain loops where the compiler cannot say whether or not it is infinite (halting problem). Although Burst gives us the Unity.Burst.CompilerServices.Constant.IsConstantExpression<T> which is also very powerful but doesn't come close to the C++ feature, unfortunately.

Yes, I agree with you on that. But, I choose to develop that data structure in Rust because I have the option. I am posting this thread so that other people who also enjoys Rust could also have this option.

I need you to explain why you assume that what I have here will suffer more from interop layer than “combining Burst code with an IL2CPP executable” will in performance? Is it true that Burst code is statically linked to the final IL2CPP code?

Thank you for explain constexpr as a keyword in C++, but care to explain “get some constexpr if action going” for him instead of the keyword?
Anyways, I do not expect you do do it for him, but please stay on topic.

1 Like

No not yet - burst compiled code still results in a DLL, currently. But it is something they’re working on - or at the very least it is on their to-do list. But that was an example for the long run which will be “huge” (200+ CPU cycles per function call of a difference). You’re already suffering from performance issues, though, which could be avoided if you sticked to HPC#. Most of them are minor but they do add up and could even cripple your performance - but I do assume that you don’t treat you DLLImport functions as normals functions but rather something like instantiating an IJob, so that’s probably not a performance issue.

In the end… If you’re not willing to write your code in HPC# you’ll pay for it by getting less perforjmance out of it.

The dude probably - likely - wanted to say that constexpr in C++ alone would make him want to write all of his code in C++ rather than in Rust if he were to choose a high performance language to do interop with - but as you said: You specifically targeted this post towards people who enjoy Rust ( which is a minority :stuck_out_tongue: ).

Unity has a native logging api FYI.

I think it’s only on Windows/Xbox (not sure about Mac and Linux) where native plugins are dynamically linked. Everywhere else it becomes a static library and the PInvoke price goes away.

Anyway, burst is nowhere near as expressive as languages as C++ and Rust, and needs quite more boilerplate than vanilla C. Writing really complex stuff with it can become a chore, so I appreciate efforts to document different pipelines for working with native code in Unity.

1 Like

Not until 2021.2, but I will also talk about using Unity native plugin api in a few post later.

I’d rather use C++ and Unreal than rust in Unity.

4 Likes

What is the point of people here complaining about Rust. If you don't want to use Rust, don't read his post. It's not like he is forcing anyone to do it.

8 Likes

While I mostly agree with you. It would be nice to have right at the beginning of the thread a list of objective/measurable pros/cons and some benchmarks to know if there is a performance price to pay or an actual gain.

“because we can” or “because we like it that way” are valid yet very dangerous reasons to act on software development (well, on anything productive I dare to say).
Sharing the experience is nice and all but it would be a lot more enriching if some more data was added so others can make informed decisions.

4 Likes

Exactly the same as your when complaining about people complaining about Rust.

2 Likes

That and what the pain points are generally.

A big one I’ve seen with Rust is the C/C++ library ecosystem is just so much better. Like out of the 5 C libraries we use there are either no Rust equivalents or they are substantially inferior. Especially true for things specific to games/high performance/concurrency.

We use Rust in one place which is for some high throughput IPC we do server side. That’s an easy case because it’s all Rust, and limited platforms it will only ever be linux or windows.

1 Like

I agree with you. A list of pros and cons is needed with some statistics to back it up could make the thread more clear for people to make a choice. In my defense, it was not included from the beginning because it was not about the performance from the starting when we switch majority of code from HPC# to Rust:

We had implemented some data structures in the HPC#, but it was extremely hard for one who comes from a .NET background and does not understand native memory. One mistake could easily lead to a Unity crash, which we have seen none so far after switching to the current workflow because:

  1. panics from Rust are handled gracefully at FFI boundary
  2. codegen eliminated most of the unsafe ffi boilerplates

Regardless, I shall include a more detailed list of pros and cons and statistics.

4 Likes

For anyone interested in overhead during PInvoke from C# and Burst. I make this benchmark to try to estimate the impact:

I will be using the QueryPerformanceCounter from winapi(See: https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter), which has a precision of less than 1000ns.

#[no_mangle]
pub unsafe extern "C" fn get_instant_time_in_nanoseconds() -> u64 {
    let now = std::time::Instant::now();
    match std::panic::catch_unwind(|| {
        let mut counter: u64 = 0;
        let mut freq: u64 = 0;
        winapi::um::profileapi::QueryPerformanceCounter(
            &mut counter as *mut u64 as *mut winapi::shared::ntdef::LARGE_INTEGER,
        );
        winapi::um::profileapi::QueryPerformanceFrequency(
            &mut freq as *mut u64 as *mut winapi::shared::ntdef::LARGE_INTEGER,
        );
        let instant_nsec = mul_div_u64(counter, 1_000_000_000, freq);

        instant_nsec
    }) {
        Ok(instant_nsec) => instant_nsec,
        Err(_) => 0,
    }
}

pub fn mul_div_u64(value: u64, numer: u64, denom: u64) -> u64 {
    let q = value / denom;
    let r = value % denom;
    // Decompose value as (value/denom*denom + value%denom),
    // substitute into (value*numer)/denom and simplify.
    // r < denom, so (denom*numer) is the upper bound of (r*numer)
    q * numer + r * numer / denom
}

Since the overhead of call into a Burst or Rust function is even smaller than 1000ns, I will benchmark many(250,000) iteration of calls into functions instead, and each time the benchmarked function will make a vectorized integer multiplication. I will also do the benchmark from both C# mono and Burst job.

[BurstCompile(CompileSynchronously = true, DisableSafetyChecks = true)]
public static unsafe void CallBurstNative(int4* aArrayPtr, int4* bArrayPtr, int4* cArrayPtr)
{
    for (int i = 0; i < TEST_ARRAY_SIZE; i++)
    {
        var aPtr = aArrayPtr + i;
        var bPtr = bArrayPtr + i;
        var cPtr = cArrayPtr + i;
        *cPtr = *aPtr * *bPtr;
    }
}

and the Rust counterpart:

[BurstCompile(CompileSynchronously = true, DisableSafetyChecks = true)]
public static unsafe void CallRust(int4* aArrayPtr, int4* bArrayPtr, int4* cArrayPtr)
{
    for (int i = 0; i < TEST_ARRAY_SIZE; i++)
    {
        var aPtr = aArrayPtr + i;
        var bPtr = bArrayPtr + i;
        var cPtr = cArrayPtr + i;
        Extern.test_multiply((Point4Ffi<int>*)aPtr, (Point4Ffi<int>*)bPtr, (Point4Ffi<int>*)cPtr);
    }
}
#[inline]
fn multiply_internal(a: &Point4Ffi<i32>, b: &Point4Ffi<i32>) -> Point4Ffi<i32> {
    Point4Ffi {
        x: a.x * b.x,
        y: a.y * b.y,
        z: a.z * b.z,
        w: a.w * b.w,
    }
}

#[no_mangle]
pub unsafe extern "C" fn test_multiply(
    a: *const Point4Ffi<i32>,
    b: *const Point4Ffi<i32>,
    out_result: *mut Point4Ffi<i32>,
) -> bool {
    *out_result = multiply_internal(&*a, &*b);
    true
}

#[derive(Copy, Clone, Eq, PartialOrd, PartialEq, Hash, Debug)]
#[repr(simd)]
pub struct Point4Ffi<I> {
    pub x: I,
    pub y: I,
    pub z: I,
    pub w: I,
}

I will also include another version for Rust using catch_unwind which will catch any unwinding which is trigger by a panic. This is crucial for Rust FFI safety, and I will be including it as well.

[BurstCompile(CompileSynchronously = true, DisableSafetyChecks = true)]
public static unsafe void CallRustCatchUnwind(int4* aArrayPtr, int4* bArrayPtr, int4* cArrayPtr)
{
    for (int i = 0; i < TEST_ARRAY_SIZE; i++)
    {
        var aPtr = aArrayPtr + i;
        var bPtr = bArrayPtr + i;
        var cPtr = cArrayPtr + i;
        Extern.test_multiply_catch_unwind((Point4Ffi<int>*)aPtr, (Point4Ffi<int>*)bPtr,
            (Point4Ffi<int>*)cPtr);
    }
}
#[inline]
fn multiply_internal_checked(a: &Point4Ffi<i32>, b: &Point4Ffi<i32>) -> Point4Ffi<i32> {
    Point4Ffi {
        x: a.x.checked_mul(b.x).unwrap(),
        y: a.y.checked_mul(b.y).unwrap(),
        z: a.z.checked_mul(b.z).unwrap(),
        w: a.w.checked_mul(b.w).unwrap(),
    }
}

#[no_mangle]
pub unsafe extern "C" fn test_multiply_catch_unwind(
    a:  *const Point4Ffi<i32>,
    b:  *const Point4Ffi<i32>,
    out_result: *mut Point4Ffi<i32>,
) -> bool {
    let error = std::panic::catch_unwind(|| multiply_internal_checked(&*a, &*b));

    match error {
        Ok(result) => {
            *out_result = result;
            true
        }
        Err(_) => false,
    }
}

And Here are the average result after doing the whole benchmark 10 times:

Burst CallBurstNative: 923,910ns
Burst CallRust: 1,007,560ns
Burst CallRustCatchUnwind:1,635,790ns
C# CallBurstNative: 3,106,310ns
C# CallRust: 1,698,170ns
C# CallRustCatchUnwind: 3,334,940ns

From the data, we can see that calling into Rust function without catch_unwind has a close performance to the Burst implementation with the gap of only 0.3346 ns per call(~1 cpu cycle). Meanwhile the performance cost of catch_unwind in this case is around 2.51292 ns per call(~9 cpu cycle) on top of Rust function call from Burst.

Calling Rust from C# directly has a better performance over calling burst compiled code from C# without catch_unwind, but when considering adding catch_unwind, it does bring the performance down.

A few things to note here:
1. All burst compiled code have DisableSafetyChecks set to true, while the test_multiply_catch_unwind checks for integer over flow(the reason for that is if the code does not panic, catch_unwind will be optimized away(the integer overflow for the implementation will intentionally trigger a panic in the function)). I did not do the checked multiplication for Burst because set DisableSafetyChecks to false will also allow burst to do some other checks, which mess up the performance of benchmarked Burst functions.
2. Burst function code is inlined in the benchmark job, while Rust function is dynamic linked.
3. The test is run under release mode inside the editor, with Burst safety checks disabled, leak detection disabled. I write everything with pointer operation to avoid ENABLE_UNITY_COLLECTIONS_CHECKS and any other safety checks.
4. The Rust code is compiled under release profile with profile.release.opt-level set to 3, which will optimize the code as much as possible. (I also found out the profile.release.debug flag is true after doing every benchmarking, I assume this should not affect the performance)
5. the benchmarked function is warmed up before the measurement while writing result to a native array allocated from Allocator.Temp.

Here are the raw log and the rest of the codes:

TestPerformance (136.810s)
---
Iteration 0
TEST_ARRAY_SIZE = 250000
WARMING_UP_ITERATIONS = 3
get_instant_time_in_nanoseconds overhead called from C#: 2000ns
get_instant_time_in_nanoseconds overhead from Burst: 200ns
Burst CallBurstNative: 856000ns
Burst CallRust: 964600 ns
Burst CallRustCatchUnwind: 1557700 ns
get_instant_time_in_nanoseconds overhead from Burst: 1800ns
C# CallBurstNative: 3003900ns
C# CallRust: 1756900 ns
C# CallRustCatchUnwind: 3300200 ns
Iteration 1
TEST_ARRAY_SIZE = 250000
WARMING_UP_ITERATIONS = 3
get_instant_time_in_nanoseconds overhead called from C#: 200ns
get_instant_time_in_nanoseconds overhead from Burst: 100ns
Burst CallBurstNative: 836300ns
Burst CallRust: 1028000 ns
Burst CallRustCatchUnwind: 1567500 ns
get_instant_time_in_nanoseconds overhead from Burst: 500ns
C# CallBurstNative: 2918900ns
C# CallRust: 1672700 ns
C# CallRustCatchUnwind: 3116800 ns
Iteration 2
TEST_ARRAY_SIZE = 250000
WARMING_UP_ITERATIONS = 3
get_instant_time_in_nanoseconds overhead called from C#: 300ns
get_instant_time_in_nanoseconds overhead from Burst: 200ns
Burst CallBurstNative: 1092200ns
Burst CallRust: 1171300 ns
Burst CallRustCatchUnwind: 1681900 ns
get_instant_time_in_nanoseconds overhead from Burst: 100ns
C# CallBurstNative: 4519600ns
C# CallRust: 1653300 ns
C# CallRustCatchUnwind: 3028600 ns
Iteration 3
TEST_ARRAY_SIZE = 250000
WARMING_UP_ITERATIONS = 3
get_instant_time_in_nanoseconds overhead called from C#: 200ns
get_instant_time_in_nanoseconds overhead from Burst: 100ns
Burst CallBurstNative: 824200ns
Burst CallRust: 901400 ns
Burst CallRustCatchUnwind: 1740000 ns
get_instant_time_in_nanoseconds overhead from Burst: 500ns
C# CallBurstNative: 2899900ns
C# CallRust: 1650900 ns
C# CallRustCatchUnwind: 3011000 ns
Iteration 4
TEST_ARRAY_SIZE = 250000
WARMING_UP_ITERATIONS = 3
get_instant_time_in_nanoseconds overhead called from C#: 400ns
get_instant_time_in_nanoseconds overhead from Burst: 300ns
Burst CallBurstNative: 821900ns
Burst CallRust: 885700 ns
Burst CallRustCatchUnwind: 1536800 ns
get_instant_time_in_nanoseconds overhead from Burst: 400ns
C# CallBurstNative: 2899400ns
C# CallRust: 1663700 ns
C# CallRustCatchUnwind: 3029000 ns
Iteration 5
TEST_ARRAY_SIZE = 250000
WARMING_UP_ITERATIONS = 3
get_instant_time_in_nanoseconds overhead called from C#: 200ns
get_instant_time_in_nanoseconds overhead from Burst: 100ns
Burst CallBurstNative: 835000ns
Burst CallRust: 882400 ns
Burst CallRustCatchUnwind: 1621200 ns
get_instant_time_in_nanoseconds overhead from Burst: 200ns
C# CallBurstNative: 2897300ns
C# CallRust: 1664900 ns
C# CallRustCatchUnwind: 5548800 ns
Iteration 6
TEST_ARRAY_SIZE = 250000
WARMING_UP_ITERATIONS = 3
get_instant_time_in_nanoseconds overhead called from C#: 200ns
get_instant_time_in_nanoseconds overhead from Burst: 100ns
Burst CallBurstNative: 815000ns
Burst CallRust: 949900 ns
Burst CallRustCatchUnwind: 1597000 ns
get_instant_time_in_nanoseconds overhead from Burst: 300ns
C# CallBurstNative: 2910000ns
C# CallRust: 1721000 ns
C# CallRustCatchUnwind: 3013800 ns
Iteration 7
TEST_ARRAY_SIZE = 250000
WARMING_UP_ITERATIONS = 3
get_instant_time_in_nanoseconds overhead called from C#: 100ns
get_instant_time_in_nanoseconds overhead from Burst: 200ns
Burst CallBurstNative: 1265800ns
Burst CallRust: 1214500 ns
Burst CallRustCatchUnwind: 1754600 ns
get_instant_time_in_nanoseconds overhead from Burst: 500ns
C# CallBurstNative: 3153100ns
C# CallRust: 1796400 ns
C# CallRustCatchUnwind: 3145100 ns
Iteration 8
TEST_ARRAY_SIZE = 250000
WARMING_UP_ITERATIONS = 3
get_instant_time_in_nanoseconds overhead called from C#: 400ns
get_instant_time_in_nanoseconds overhead from Burst: 400ns
Burst CallBurstNative: 803300ns
Burst CallRust: 879000 ns
Burst CallRustCatchUnwind: 1592700 ns
get_instant_time_in_nanoseconds overhead from Burst: 500ns
C# CallBurstNative: 2899100ns
C# CallRust: 1637400 ns
C# CallRustCatchUnwind: 3032600 ns
Iteration 9
TEST_ARRAY_SIZE = 250000
WARMING_UP_ITERATIONS = 3
get_instant_time_in_nanoseconds overhead called from C#: 200ns
get_instant_time_in_nanoseconds overhead from Burst: 200ns
Burst CallBurstNative: 1089400ns
Burst CallRust: 1198800 ns
Burst CallRustCatchUnwind: 1708500 ns
get_instant_time_in_nanoseconds overhead from Burst: 700ns
C# CallBurstNative: 2961900ns
C# CallRust: 1764500 ns
C# CallRustCatchUnwind: 3123500 ns
[Test]
        public static void TestPerformance()
        {
            for (int k = 0; k < 10; k++)
            {
                Debug.Log("Iteration " + k);
                Debug.Log($"TEST_ARRAY_SIZE = {TestJob.TEST_ARRAY_SIZE}");
                Debug.Log($"WARMING_UP_ITERATIONS = {TestJob.WARMING_UP_ITERATIONS}");

                var aArray = new NativeArray<int4>(TestJob.TEST_ARRAY_SIZE, Allocator.Persistent);
                var bArray = new NativeArray<int4>(TestJob.TEST_ARRAY_SIZE, Allocator.Persistent);
                var cArray0 = new NativeArray<int4>(TestJob.TEST_ARRAY_SIZE, Allocator.Persistent);
                var cArray1 = new NativeArray<int4>(TestJob.TEST_ARRAY_SIZE, Allocator.Persistent);
                var cArray2 = new NativeArray<int4>(TestJob.TEST_ARRAY_SIZE, Allocator.Persistent);
                var cArray3 = new NativeArray<int4>(TestJob.TEST_ARRAY_SIZE, Allocator.Persistent);
                var cArray4 = new NativeArray<int4>(TestJob.TEST_ARRAY_SIZE, Allocator.Persistent);
                var cArray5 = new NativeArray<int4>(TestJob.TEST_ARRAY_SIZE, Allocator.Persistent);


                {
                    var start = Extern.get_instant_time_in_nanoseconds();

                    var counter = Extern.get_instant_time_in_nanoseconds() - start;
                    Debug.Log($"get_instant_time_in_nanoseconds overhead called from C#: {counter}ns");
                }

                unsafe
                {
                    var aArrayPtr = (Point4Ffi<int>*)aArray.GetUnsafePtr();
                    var bArrayPtr = (Point4Ffi<int>*)bArray.GetUnsafePtr();
                    var start = Extern.get_instant_time_in_nanoseconds();

                    for (int i = 0; i < TestJob.TEST_ARRAY_SIZE; i++)
                    {
                        var aPtr = aArrayPtr + i;
                        var bPtr = bArrayPtr + i;
                        Extern.init_random_number_for_test(aPtr, bPtr);
                    }

                    var job = new TestJob
                    {
                        aArray = aArray,
                        bArray = bArray,
                        cArray0 = cArray0,
                        cArray1 = cArray1,
                        cArray2 = cArray2,
                    };
                    job.Run();
                }

                {
                    var start = Extern.get_instant_time_in_nanoseconds();
                    var counter = Extern.get_instant_time_in_nanoseconds() - start;
                    Debug.Log($"get_instant_time_in_nanoseconds overhead from Burst: {counter}ns");
                }

                // warming up
                for (int j = 0; j < TestJob.WARMING_UP_ITERATIONS; j++)
                {
                    var temp = new NativeArray<int4>(aArray.Length, Allocator.Temp);
                    unsafe
                    {
                        var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                        var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                        var cArrayPtr = (int4*)temp.GetUnsafePtr();
                        CallBurstNative(aArrayPtr, bArrayPtr, cArrayPtr);
                    }

                    temp.Dispose();
                }

                unsafe
                {
                    var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                    var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                    var cArrayPtr = (int4*)cArray3.GetUnsafePtr();
                    var start = Extern.get_instant_time_in_nanoseconds();
                    CallBurstNative(aArrayPtr, bArrayPtr, cArrayPtr);
                    var counter = Extern.get_instant_time_in_nanoseconds() - start;
                    Debug.Log($"C# CallBurstNative: {counter}ns");
                }

                // warming up
                for (int j = 0; j < TestJob.WARMING_UP_ITERATIONS; j++)
                {
                    var temp = new NativeArray<int4>(aArray.Length, Allocator.Temp);
                    unsafe
                    {
                        var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                        var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                        var cArrayPtr = (int4*)temp.GetUnsafePtr();
                        CallRust(aArrayPtr, bArrayPtr, cArrayPtr);
                    }

                    temp.Dispose();
                }

                unsafe
                {
                    var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                    var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                    var cArrayPtr = (int4*)cArray4.GetUnsafePtr();
                    var start = Extern.get_instant_time_in_nanoseconds();
                    CallRust(aArrayPtr, bArrayPtr, cArrayPtr);
                    var counter = Extern.get_instant_time_in_nanoseconds() - start;
                    Debug.Log($"C# CallRust: {counter} ns");
                }

                for (int j = 0; j < TestJob.WARMING_UP_ITERATIONS; j++)
                {
                    var temp = new NativeArray<int4>(aArray.Length, Allocator.Temp);
                    unsafe
                    {
                        var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                        var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                        var cArrayPtr = (int4*)temp.GetUnsafePtr();
                        CallRustCatchUnwind(aArrayPtr, bArrayPtr, cArrayPtr);
                    }

                    temp.Dispose();
                }

                unsafe
                {
                    var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                    var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                    var cArrayPtr = (int4*)cArray5.GetUnsafePtr();
                    var start = Extern.get_instant_time_in_nanoseconds();
                    CallRustCatchUnwind(aArrayPtr, bArrayPtr, cArrayPtr);
                    var counter = Extern.get_instant_time_in_nanoseconds() - start;
                    Debug.Log($"C# CallRustCatchUnwind: {counter} ns");
                }


                for (int i = 0; i < TestJob.TEST_ARRAY_SIZE; i++)
                {
                    Assert.AreEqual(aArray[i] * bArray[i], cArray0[i]);
                    Assert.AreEqual(aArray[i] * bArray[i], cArray1[i]);
                    Assert.AreEqual(aArray[i] * bArray[i], cArray2[i]);
                    Assert.AreEqual(aArray[i] * bArray[i], cArray3[i]);
                    Assert.AreEqual(aArray[i] * bArray[i], cArray4[i]);
                    Assert.AreEqual(aArray[i] * bArray[i], cArray5[i]);
                }

                aArray.Dispose();
                bArray.Dispose();
                cArray0.Dispose();
                cArray1.Dispose();
                cArray2.Dispose();
                cArray3.Dispose();
                cArray4.Dispose();
                cArray5.Dispose();
            }
        }

        public static unsafe void CallBurstNative(int4* aArrayPtr, int4* bArrayPtr, int4* cArrayPtr)
        {
            for (int i = 0; i < TestJob.TEST_ARRAY_SIZE; i++)
            {
                var aPtr = aArrayPtr + i;
                var bPtr = bArrayPtr + i;
                var cPtr = cArrayPtr + i;
                *cPtr = *aPtr * *bPtr;
            }
        }

        public static unsafe void CallRust(int4* aArrayPtr, int4* bArrayPtr, int4* cArrayPtr)
        {
            for (int i = 0; i < TestJob.TEST_ARRAY_SIZE; i++)
            {
                var aPtr = aArrayPtr + i;
                var bPtr = bArrayPtr + i;
                var cPtr = cArrayPtr + i;
                Extern.test_multiply((Point4Ffi<int>*)aPtr, (Point4Ffi<int>*)bPtr, (Point4Ffi<int>*)cPtr);
            }
        }

        public static unsafe void CallRustCatchUnwind(int4* aArrayPtr, int4* bArrayPtr, int4* cArrayPtr)
        {
            for (int i = 0; i < TestJob.TEST_ARRAY_SIZE; i++)
            {
                var aPtr = aArrayPtr + i;
                var bPtr = bArrayPtr + i;
                var cPtr = cArrayPtr + i;
                Extern.test_multiply_catch_unwind((Point4Ffi<int>*)aPtr, (Point4Ffi<int>*)bPtr,
                    (Point4Ffi<int>*)cPtr);
            }
        }

        [BurstCompile(CompileSynchronously = true, DisableSafetyChecks = true)]
        public struct TestJob : IJob
        {
            public const int WARMING_UP_ITERATIONS = 3;
            public const int TEST_ARRAY_SIZE = 250_000;

            public NativeArray<int4> aArray;
            public NativeArray<int4> bArray;
            public NativeArray<int4> cArray0;
            public NativeArray<int4> cArray1;
            public NativeArray<int4> cArray2;

            [BurstCompile(CompileSynchronously = true, DisableSafetyChecks = true)]
            public void Execute()
            {
                {
                    var start = Extern.get_instant_time_in_nanoseconds();
                    var counter = Extern.get_instant_time_in_nanoseconds() - start;
                    Debug.Log($"get_instant_time_in_nanoseconds overhead from Burst: {counter}ns");
                }

                // warming up
                for (int j = 0; j < WARMING_UP_ITERATIONS; j++)
                {
                    var temp = new NativeArray<int4>(aArray.Length, Allocator.Temp);
                    unsafe
                    {
                        var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                        var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                        var cArrayPtr = (int4*)temp.GetUnsafePtr();
                        CallBurstNative(aArrayPtr, bArrayPtr, cArrayPtr);
                    }

                    temp.Dispose();
                }

                unsafe
                {
                    var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                    var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                    var cArrayPtr = (int4*)cArray0.GetUnsafePtr();
                    var start = Extern.get_instant_time_in_nanoseconds();
                    CallBurstNative(aArrayPtr, bArrayPtr, cArrayPtr);
                    var counter = Extern.get_instant_time_in_nanoseconds() - start;
                    Debug.Log($"Burst CallBurstNative: {counter}ns");
                }

                // warming up
                for (int j = 0; j < WARMING_UP_ITERATIONS; j++)
                {
                    var temp = new NativeArray<int4>(aArray.Length, Allocator.Temp);
                    unsafe
                    {
                        var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                        var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                        var cArrayPtr = (int4*)temp.GetUnsafePtr();
                        CallRust(aArrayPtr, bArrayPtr, cArrayPtr);
                    }

                    temp.Dispose();
                }

                unsafe
                {
                    var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                    var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                    var cArrayPtr = (int4*)cArray1.GetUnsafePtr();
                    var start = Extern.get_instant_time_in_nanoseconds();
                    CallRust(aArrayPtr, bArrayPtr, cArrayPtr);
                    var counter = Extern.get_instant_time_in_nanoseconds() - start;
                    Debug.Log($"Burst CallRust: {counter} ns");
                }

                for (int j = 0; j < WARMING_UP_ITERATIONS; j++)
                {
                    var temp = new NativeArray<int4>(aArray.Length, Allocator.Temp);
                    unsafe
                    {
                        var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                        var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                        var cArrayPtr = (int4*)temp.GetUnsafePtr();
                        CallRustCatchUnwind(aArrayPtr, bArrayPtr, cArrayPtr);
                    }

                    temp.Dispose();
                }

                unsafe
                {
                    var aArrayPtr = (int4*)aArray.GetUnsafeReadOnlyPtr();
                    var bArrayPtr = (int4*)bArray.GetUnsafeReadOnlyPtr();
                    var cArrayPtr = (int4*)cArray2.GetUnsafePtr();
                    var start = Extern.get_instant_time_in_nanoseconds();
                    CallRustCatchUnwind(aArrayPtr, bArrayPtr, cArrayPtr);
                    var counter = Extern.get_instant_time_in_nanoseconds() - start;
                    Debug.Log($"Burst CallRustCatchUnwind: {counter} ns");
                }
            }

            [BurstCompile(CompileSynchronously = true, DisableSafetyChecks = true)]
            public static unsafe void CallBurstNative(int4* aArrayPtr, int4* bArrayPtr, int4* cArrayPtr)
            {
                for (int i = 0; i < TEST_ARRAY_SIZE; i++)
                {
                    var aPtr = aArrayPtr + i;
                    var bPtr = bArrayPtr + i;
                    var cPtr = cArrayPtr + i;
                    *cPtr = *aPtr * *bPtr;
                }
            }

            [BurstCompile(CompileSynchronously = true, DisableSafetyChecks = true)]
            public static unsafe void CallRust(int4* aArrayPtr, int4* bArrayPtr, int4* cArrayPtr)
            {
                for (int i = 0; i < TEST_ARRAY_SIZE; i++)
                {
                    var aPtr = aArrayPtr + i;
                    var bPtr = bArrayPtr + i;
                    var cPtr = cArrayPtr + i;
                    Extern.test_multiply((Point4Ffi<int>*)aPtr, (Point4Ffi<int>*)bPtr, (Point4Ffi<int>*)cPtr);
                }
            }

            [BurstCompile(CompileSynchronously = true, DisableSafetyChecks = true)]
            public static unsafe void CallRustCatchUnwind(int4* aArrayPtr, int4* bArrayPtr, int4* cArrayPtr)
            {
                for (int i = 0; i < TEST_ARRAY_SIZE; i++)
                {
                    var aPtr = aArrayPtr + i;
                    var bPtr = bArrayPtr + i;
                    var cPtr = cArrayPtr + i;
                    Extern.test_multiply_catch_unwind((Point4Ffi<int>*)aPtr, (Point4Ffi<int>*)bPtr,
                        (Point4Ffi<int>*)cPtr);
                }
            }
        }
#[no_mangle]
pub unsafe extern "C" fn init_random_number_for_test(
    a: *mut Point4Ffi<i32>,
    b: *mut Point4Ffi<i32>,
) -> bool {
    let error = std::panic::catch_unwind(move ||{
        let mut rng = rand::thread_rng();
        (*a).x = rng.gen_range(0..32768);
        (*a).y = rng.gen_range(0..32768);
        (*a).z = rng.gen_range(0..32768);
        (*a).w = rng.gen_range(0..32768);

        (*b).x = rng.gen_range(0..32768);
        (*b).y = rng.gen_range(0..32768);
        (*b).z = rng.gen_range(0..32768);
        (*b).w = rng.gen_range(0..32768);
    });

    match error {
        Ok(()) => {
            true
        }
        Err(_) => false,
    }
}
9 Likes

Thanks for sharing.
Which platforms did you try to build for so far, and what were the challenges?