OpenCL_Unity

Hello all.

I’ve done some minor work to get OpenCL up and running in Unity in a generic manner. I’ve open-sourced that work.

It’s available on git-hub here:

This is more for developers than designers and artists right now.

Anyhow, you can check the readme for details. I am going on vacation for a couple of weeks. I wanted to get this published so others can take a look. Even if it’s really very simple work.

It’s pretty easy to see where I’ve used a partial class to get the OpenCL linking broken out into it’s own file: Unity.CL.API.cs

Feel free to mess around with it. And if you track down the location of OpenCL on some device or another that I have not supported yet, please do a fork, hack and a pull request, and I’ll get to maintaining this thing when I get back from vacation.

FYI. Unity designers and artists: I have a second layer of tools built atop this layer, that makes it easy to use OpenCL in Unity. That layer of tools will probably end up in the Asset store some time in the coming weeks/months. So, soon.

Developers: it’s working well enough over here that I can confirm, this code gets OpenCL up and running. I have it rendering a texture. OpenCL can be really annoying to work with at this low level. But it does work.

Also, right now it doesn’t expose GLSharing. But it should be possible. That’s something I’ll work on when I get back. If anyone else wants to crack that nut before I do, feel free. On a prior bit of Unity<->OpenCL work, I was able to expose Apple’s calls for getting a shared context. It seemed to work. But that’s about as far as I got prior. Need to replicate, finish and test.

Have fun.

I’m very much an artist who attempts coding the best he can. I’m becoming very interested in a proper implementation of OpenCL and porting direct compute and cuda to it for something that can work on everything, i already know exactly what I wish to port. I wouldn’t know where to begin with implementing OpenCL however so much need being told ‘this will do what you’d expect it to’ given, for example, some kind of generic opencl task, and certainly when it comes to affecting graphics related things.

The end of your vacation is eagerly awaited! But I hope you had a nice time

Thanks! My vacation was great.

I did some housekeeping to the basic tools as they exist and made a quick preview screen-cap-video.

Obviously, in the end, it should do a little more than make a cube flash colors :mrgreen:

I guess what’s already useful here, is that you could append the program in the .cl file with something useful and you’ve already got a system that can render textures in OpenCL, per frame, with N number of color parameters.

But really, to do impressive stuff I need to now get to the task of implementing all the little bits.

For example:

-different ways of executing queues that make sense in Unity
-different kinds of arguments (integers, meshes and such).
-different ways of using work-products in unity (meshes and such).

And everything I mentioned in the prior post.

Unfortunately I don’t have an ETA on a release of the artist tools just yet. The underlying OpenCL bindings being open-source, I hope they’re already useful to some. The artist tools need to reach a certain level of maturity before I’m ready to release them in the asset store. If anyone really wants to get a hold of them as is, and start hacking away, please contact me through e-mail or otherwise. My website http://www.fie.us has contact info.

I wouldnt know where to begin at the moment, but i bookmarked your page as i think this is really neat and i’m still very interested in it for one thing at least

Actually i really might have something for this. If you’re aware of Unity’s main PBL shader solutions you may be aware that the ibl fr them is provided via convolusion using spherical harmonics, its quite a slow process! A good example of how it’s done is here, this is a great free PBR solution for Unity GitHub - larsbertram69/Lux: Lux – open source physically based shader framework for unity but as said, the convolusion is slow. I’m hoping to be able to use a multiplatform compute language (So, not CUDA or DirectCompute) to perform this convolution such that it doesn’t take nearly so long, so then the convolved cubemaps might be blended into each other at run-time to get quasi-realtime updates for them

I find this kind of thing is really needed to PBR sensible with dynamic enviornments - weather, time of day etc

Do you think your OpenCL tools could help with this? If so i’d definitely like to hear more (Or if you think its not sensibly doable) - that said these maps to be generated are very small with 64px edge length maps being the super quality but 32 or even 16 being fine.

Tentatively, yes. But there’s more to look into here to get a real answer. And there’s a little work to do.

I think you are correct to look to OpenCL. CUDA and DirectCompute could, of-course, do the job as well. But I assume you are frustrated with the platform fracturing associated with those two techs and are looking to OpenCL to be a standard on most (if not all) platforms, so you can focus on one implementation to support many platforms, rather than have to maintain 2-3 implementations.

I assume you are looking specifically to this file:

Which, if I’m getting it correctly by quickly glancing at it: It’s taking little cube-maps rendered (by Unity’s engine) at probe locations, and then pre-computing a generalized and sparsely sampled spherical harmonic (cubemap) to be used later, during shading. Probably two sets of probes. One for radiance and one for irradiance.

The slow part is going through all those rendered probes and building the spherical harmonics of the afore mentioned render-components. This process being a convolution. Usually convolves are executed by a convolution kernel. And that does seem to make it pretty clear, that OpenCL is going to do it faster. Ideally, you want it to be fast enough to run on a per-frame basis, incorporating dynamic objects.

Also, when executed on the CPU, it involves pulling the data from the GPU’s memory, doing the convolve, and then pushing the results back into the GPU’s memory when you’re done.

Ideally, you would rather leave the data in GPU memory, and do the work on the GPU, removing the bus and memory controller bottlenecks, and getting the advantages of stream-processing.

I have not looked deeply into the convolution code to see if there’s anything going on that would hurt the stream-processing. But it certainly looks to be a good candidate to me. A lot of nested looping and texture lookups. Very limited memory access per iteration.

In the end, you may find that using the open-source OpenCL binding may be better for you (rather than the artist tools) because the sampling and convolving system you’re working on is really a closed system. There’s no need for anyone to “work with it” once it’s doing what it should. Exposing its inner workings to artists and designers in the editor may not provide any advantage.

But It’s a very good test case for me to think about. Will my “tools” be able to implement this system in a way that is also buttoned up as well as if you didn’t base it on the tools? I will probably make an effort to do so. I’ll use this test case in my thinking. So thank you. It’s a great example of what I should be looking at, as I lock the tools down.

Your biggest problem however, has more to do with some of what I wrote in the first post.

Right now, the OpenCL bindings don’t do what’s known as “GL Sharing.” Which is: initializing OpenCL based on the running OpenGL context, such that they can pass objects back and forth in GPU memory. Right now you can’t pull the cube-map from OpenGL to OpenCL without copying it to system memory and then back into GPU memory, in order to get across the barrier. It’s likely your convolution kernel would still be faster than it is, even with this feature not yet implemented. Because, as it is implemented currently it’s already coming out and into GPU memory to be modified by the CPU. To get it being processed by OpenCL would be one extra round of that. But the advantage of stream processing may overcome that problem even without “GL Sharing.”

“OpenGL Sharing” should be possible to implement. It’s on my list of very high priority matters to attend to in the open-source bindings. In that scenario, you’d have Unity hand you a pointer to the cube-map (which, according to the documentation, it can do already), and then you’d tell OpenCL to acquire that texture in GPU memory (not copy it, just borrow it for a bit). Then you’d execute the OpenCL kernel on it. Then you’d have OpenCL relinquish control back to Unity’s OpenGL, which would likely be none-the-wiser for OpenCL having done anything to it.

The kernel would be the same either way. It’s just a matter of how you’d give it an image argument that would change. So I’d think a port of the kernel to OpenCL’s c99 language would be worth writing either way (GL Sharing or not.) And really, how you go about using it (open source bindings, or artist tools) is the thing I need to look at. Can the artist tools do it? Yes. Can I get them to do it in a way that’s buttoned up? I’ll get back to you on that.

Thankyou for a very informative reply! I learnt a great deal just reading that (The white papers on the process induce brain explosion), and i guess yes, it’s a very good example of efficient parallelisation that is indeed not so much an artists tool but something that can provide material for the artists tools

You’re quite right about the idea, there’s a very vague diffuse cubemap produced and a much better defined (But still blurry) specular cubemap for reflection (If you have a look at moving Lux into a project (Looong shader compilation times beware) you can see the output of its convolusion process itself). This automatically makes it quite nice however as the blur translates to glossy reflections with the original captured cubemap used for mirror-like.

These reflections wouldnt be perfect, but box correction (takes into account the dimensions caught by the map) makes them much more useful ‘cheaply’ and i think its a good augmentation for screen space reflections which have their own drawbacks

Lately the idea with the guys making the Unity PBR implementations have been moving onto box correction and therefore an area being filled with boxes and matching cubemap that approximate their locality, visually - dynamic objects would blend the cubemap relevant to them as they entered the influence of such a rectangular extent. For this, dynamic convolusion, or near-dynamic (maybe a second or two) could get around the need to pre-bake many many skyboxes around a complex area, but more important to me in particular are situations involving quickly changing environmental elements which is something IBL + PBR kinda sucks at, and really could benefit from a very fast cubemap process

If a quick answer is found, then it’s feasible to have things like time of day and weather conditions integrate into this kind of shading and really overcome its current greatest shortcoming to me. So your interest is appreciated!

An update: I did just finish engineering the GL Sharing on the mac side. Documentation is poor so I had to go find header files and read them to get answers. Ugh.

It does work though. My texture is now shared between Unity and OpenCL. No more copying of the image colors in and out of GPU memory. Just hand the pointer to the CL Kernel and it does the job.

As expected, the frame-rate jumped from around 250fps to about 900fps with just that change. It’s old hardware. So the final numbers are not relevant. Rather, the speedup is what matters. CPU usage is the same with the CL Queue running or not. Around 57%. This means that in the profiler, the time it’s spending waiting in my “LateUpdate” is mostly the CPU idling, waiting for the CL to finish. I’ll confirm that with some profiler-fu shortly.

Assuming that proves to be true, it’s not a problem. There’s nothing else for it to do in this test scene, after it kicks off the CL queue during “Update.” In a normal gaming scenario, the CPU would be busy running other Update logic while CL was doing its work. There would likely be little to no idling.

I need to look at how it works on Windows and Linux, implement a solution, and then clean it up such that it’s transparently the same to the Unity developers. But I’m very happy it worked relatively easily.

I was expecting it would take a good while given how you talked about it so the speed is very impressive, good luck with the windows and linux approaches! One of the reasons i was so keen to investigate opencl was that the author of the shader framework im using (or principle author, people contribute in great ways), which is Lux (Although im using skyshop’s tools usually for the convolution for now), is using a mac for development. I’m on windows.

It’s an unenjoyable situation where one could investigate a compute-centric answer with some success to the problem but the developer of the framework cannot use it because you’re using directcompute. Much as I see across the internet there’s a lot of fond sentiment to opencl and a hope it succeeds, and this is a great example of, were a solution found, it being incredibly useful for videogames. And this just one use of many, but convolution via spherical harmonics is traditionally and mathematically entirely disparate from videogames in terms of how it was initially employed (Its extremely useful in cosmology?) but using it for aiding an increasingly common shading approach is a lovely example of repurposing. I wonder how much a widespread adoption of gpu computing will end up with further examples, purposed for videogames, ‘doing the impossible’. FFT oceans are another example of this, i have a directcompute solution to this, not that i understand it a great deal, and its very pretty, but unreasonable and unfair to keep any output to windows only.

The adoption issues are more political than technical now. OpenCL is a private framework on iOS. So it exists. It runs. But if you use it, your app will be rejected by the App Store. Not sure if mono’s late dynamic binding would trick their tools. But even if so, it’s grounds for removal later. Apple wants you to use grand central. Which is apple only. On android the hardware vendor determines if they will include opencl or not. Google dropped it from their nexus line. Sony seems to be supporting it. Not sure about samsung. Google wants you to use renderscript. Which may be portable to other platforms. But in practice, no one seems to be trying. OpenCL exists on all the primary platforms. It’s just those few political holdouts. But that’s MUCH better than all the alternatives both technically, and politically. When unity implemented their compute shaders in direct compute it confused me. Seems like openCL has a much better chance of actually being useful to more Unity platforms. What we’ll likely be looking for from unity, is an ability to populate those objects with native pointers and such. So we can hook them into OpenCL. We’ll see how that goes. Clearly they already did it for textures. So buffers would be obvious.

I was asked off list (e-mail), for code snippets and examples of how to use a Unity texture in OpenCL. I thought I’d post the response here to make sure the basics are available to anyone.

The first part, is done in the open-source component of the project, referenced here:

Though, the changes I will describe here have not been committed yet. That’s because I have only engineered them on the Apple side as yet, and have not finished engineering the Windows and Linux(Android) side. Therefore, I am not willing to commit to the API. You can expect a commit once I engineer the other side and come up with a happy compromise for the general API.

The relevant code is as such:

		#if UNITY_STANDALONE_OSX || UNITY_IPHONE
		public const string Library = "/System/Library/Frameworks/OpenCL.framework/Versions/Current/OpenCL";
		public const string GLLibrary = "/System/Library/Frameworks/OpenGL.framework/Versions/Current/OpenGL";

		//CREATE CGL Context
		[DllImport(GLLibrary)]
		private static extern IntPtr CGLGetCurrentContext();
		
		public static AppleCGLContext AppleGetCurrentCGLContext()
		{
			return new AppleCGLContext(CGLGetCurrentContext());
		}

		//RELEASE CGL Context
		[DllImport(GLLibrary)]
		public static extern void CGLReleaseContext(IntPtr context);
		public static void AppleReleaseCGLContext(AppleCGLContext context)
		{
			CGLReleaseContext((context  as IHandleData).Handle);
		}

		[DllImport(Library)]
		private static extern IntPtr gcl_get_context();
		public static Context AppleGetCLContext()
		{
			return new Context(gcl_get_context());
		}

		//GET SHARE GROUP
		[DllImport(GLLibrary)]
		private static extern IntPtr CGLGetShareGroup(IntPtr context);
		
		public static AppleShareGroup AppleGetShareGroup(AppleCGLContext context)
		{
			return new AppleShareGroup(CGLGetShareGroup((context  as IHandleData).Handle));
		}

		//SET SHARE GROUP
		[DllImport(Library)]
		private static extern void gcl_gl_set_sharegroup(IntPtr shareGroup);
		
		public static void AppleSetCLShareGroup(AppleShareGroup shareGroup)
		{
			gcl_gl_set_sharegroup((shareGroup  as IHandleData).Handle);
		}

		public static Context AppleGetCLSharedContext(){
			var cglconext = AppleGetCurrentCGLContext();
			if ((cglconext as IHandleData).Handle == IntPtr.Zero){
				throw new UnityCLException("Attempt to get a shared CGL Context failed.  Returned null");
			}
			var shareGroup = AppleGetShareGroup(cglconext);
			AppleSetCLShareGroup(shareGroup);
			return AppleGetCLContext();
		}

		[DllImport(Library)]
		private static extern IntPtr gcl_gl_create_image_from_texture(TextureTarget target, IntPtr mip_level, IntPtr texture);
		public static IMem CreateCLImageFromGLTexture(TextureTarget target, int mipLevel, IntPtr texturePointer){
			return new Mem(gcl_gl_create_image_from_texture(target,new IntPtr(mipLevel),texturePointer));
		}

		#endif

also:

	public enum AppleGCLErrorCode : int // cl_int
	{
		NoError = 0,
		BadAttribute = 10000,
		BadProperty = 10001,
		BadPixelFormat = 10002,
		BadRendererInfo = 10003,
		BadContext = 10004,
		BadDrawable = 10005,
		BadDisplay = 10006,
		BadState = 10007,
		BadValue = 10008,
		BadMatch = 10009,
		BadEnumeration = 10010,
		BadOffScreen = 10011,
		BadFullScreen = 10012,
		BadWindow = 10013,
		BadAddress = 10014,
		BadCodeModule = 10015,
		BadAlloc = 10016,
		BadConnection = 10017,
	};

	public enum TextureTarget : int
	{
		GL_TEXTURE_1D = 0x0DE0,
		GL_TEXTURE_2D = 0x0DE1,
		GL_PROXY_TEXTURE_1D = 0x8063,
		GL_PROXY_TEXTURE_2D = 0x8064,
		GL_TEXTURE_3D_EXT = 0x806F,
		GL_PROXY_TEXTURE_3D_EXT = 0x8070,
	}

and

		[StructLayout(LayoutKind.Sequential)]
		public struct AppleCGLContext : IHandle, IHandleData
		{
			private readonly IntPtr _handle;
			
			internal AppleCGLContext(IntPtr handle)
			{
				_handle = handle;
			}
			
			#region IHandleData Members
			
			IntPtr IHandleData.Handle
			{
				get
				{
					return _handle;
				}
			}
			
			#endregion
			


			public static readonly Context Zero = new Context(IntPtr.Zero);
		}

		
		[StructLayout(LayoutKind.Sequential)]
		public struct AppleShareGroup : IHandle, IHandleData
		{
			private readonly IntPtr _handle;
			
			internal AppleShareGroup(IntPtr handle)
			{
				_handle = handle;
			}
			
			#region IHandleData Members
			
			IntPtr IHandleData.Handle
			{
				get
				{
					return _handle;
				}
			}
			
			#endregion
			


			public static readonly Context Zero = new Context(IntPtr.Zero);
		}

This extends OpenCL.Net to be able to call the apple extensions required to get a shared OpenCL Context from the running OpenGL context that is set up by Unity. Also, it exposes Apple’s ability to create a OpenCL image from a OpenGL texture.

As a result, the code in my tools to create the texture argument is very simple:

using UnityEngine;
using System.Collections;

public class UnitySharedTextureArgument : AbstractCLArgument {

	public Texture2D texture;
	private OpenCL.Net.IMem mem;


	#region implemented abstract members of AbstractCLContextConsumer
	protected override void OnContextDestroyed ()
	{
	}

	protected override void OnContextCreated (OpenCL.Net.Context context, OpenCL.Net.Device[] devices)
	{
		var ptr = texture.GetNativeTexturePtr();
		mem = OpenCL.Net.Cl.CreateCLImageFromGLTexture(OpenCL.Net.TextureTarget.GL_TEXTURE_2D,0,ptr);
	}
	#endregion
	#region implemented abstract members of AbstractCLArgument
	public override OpenCL.Net.IMem GetArgument ()
	{
		return mem;
	}
	#endregion
}

Whoops. Forgot the most important part.

So rather than construct an OepnCL context the usual way, you do something like this:

using UnityEngine;
using System.Collections;

public class CLSharedContext : AbstractCLContext {
	#region implemented abstract members of AbstractCLContext

	protected override void GetContext ()
	{
		#if UNITY_STANDALONE_OSX || UNITY_IPHONE
		context = OpenCL.Net.Cl.AppleGetCLSharedContext();
		#endif
	}

	#endregion


}

As a result, when you later create the shared image, and pass it to a kernel, there are no errors. It just works.

Good work again! As usual have self interested things to chime in about, although imo they’re interesting issues particulary relating to PBR, which is becoming increasingly important (I’ve kind of fallen in love with it). Further research has offered up importance sampling as an alternative to spherical harmonics for convolution - so not to stray from the point of making an example of what can be achieved, would it be sensible to approach both spherical harmonics and importance sampling and see how they relate in terms of performance and efficiency gained through parallelisation. You’re probably fully aware of the example but just in case http://http.developer.nvidia.com/GPUGems3/gpugems3_ch20.html

The work you’re doing is exactly the kind of thing i take a huge interest in, its relevant to massive aspects of computation, not just videogames, so i encourage people to get their feet wet in it if they are programming-inclined

Since i was recently introduced by a pal to compute shaders and that they weren’t so scary afterall, i might have a go at the above link using directcompute for now (If this opencl thing works out, and i get it working in directcompute in the first place, i will be switching), but i’m somewhat programming-retarded which is why i end up bothering folk like you

This is quite awesome, especially with getting shared textures to work!

So, the PBR and IBL thing could take on a life of its own. But, it’s still a great set of test cases.

I used to create spherical harmonics in shaders when using mental ray. This was for pre-rendered game cinematics we were contracted to do for a console company I am not allowed to mention, of course. Worked like a charm. Much faster than running per pixel irradiance with montecarlo sampled rays. Technically it was cheating. It wasn’t physically accurate. It was plenty good enough. Just like the probes are an approximation of the physical space and therefore not accurate.

What we were doing was a little simpler. It was pre-rendered and spatial, like with the PBR system you linked earlier. But the convolution was montecarlo based, as described by the nvidia paper. And if I ran it today, it would run with importance sampling (importons), because mental ray added importance sampling to their default shaders and tracing toolkit in the interim. The filtering issues they mention were and are not relevant. They created those problems themselves by insisting that they not do enough samples to hide their sampling pattern, because they want to pretend it’s fast enough to actually run in real time in the general case. Which it’s not really. So they blur the heck out of it.

I suspect, from reading the nvidia paper, that while you CAN do all the work in the shader and reduce your texture sampling via importance sampling, rather than the more wasteful montecarlo methods, it’s probably not realistic that it’s fast. It’s just faster than montecarlo. Which is good. But not production worthy. Maybe a few hardware generations down the line, we will have the horsepower.

What I think is clear though, is that if you are going to pepper the space with probes, you need to be able to cull probes no matter what. And that’s a complex issue in and of itself. Really, you need an acceleration structure like a BSP or OctTree. Because you have to do some limited raytracing and colliding to properly cull. Proximity is not enough. You need to have some way of keeping probes from behind walls from contributing, even though they may be very close by. In the long haul, it’s all going to move to more and more ray traces anyway. So perhaps the more general use-case to add, is acceleration structures and traces and collisions in those structures. That’s something that OpenCL should do very well too.

Interestingly the Lux probes allow you to select which objects in the scene should be regarded by the probe, the probes in both Lux and Skyshop are considered as unity cameras rendering a spherical panorama more or less, this does keep the creation of environment maps consistent without partitioning but in this case you would manually be placing the probes around an area and adjusting their bounds and what influences them, especially over time. The idea of producing this automatically is interesting. I think one big improvement in speed regarding importance sampling in the second nvidia paper is the fact it uses dual paraboloid maps rather than cubemaps for the initial grab, which is going to be faster (Apparently unity’s cubemap grabbing isnt very fast atm), 2 grabs instead of 6. But yes, youre going into even less familiar territory for me but very interesting territory! It is quite a useful and practical set of cases to run OpenCL up against. It would be interesting seeing OpenCL become a proof of a basis of a strong work tool rather than a curio in Unity at the moment, and if you’re aiming at a fully realtime, reactive generation of environment maps for dynamic objects within a scene (As different objects could be using different probes at one time), not just dynamic elements (like lighting or weather) then that would be pretty amazing

Good news. I’ve been able to get GL Sharing with Unity to work on Windows as well. The linux implementation is in, but untested. The linux implementation is nearly identical to the Windows method. So it should be easy to get running if there are issues.

This means a cleanup and publish to the github repository, with proper sharing of textures with Unity on relevant platforms, is imminant. The most important thing to settle now, is how to handle gracefully failing when OpenCL contexts cannot be created for whatever reason, at runtime.

It was pretty horrible to engineer the Windows part of this. Of particular note: is that Windows seems to absolutely require that you “acquire” and “release” GL objects where the mac doesn’t care. And the error it throws when you don’t is not documented. In-fact, it’s documented to mean something else. But I guessed it was programmers being lazy and re-using an error-code poorly, because the documented error-code made no sense. It became obvious they were misinterpreting the spec and throwing an error where they should not. It’s debatable as to if this is a proper implementation of the OpenCL standard, on the PC side. But defacto, because that’s how it works, that’s how it has to work. Also, they didn’t bother implementing the part of the extension spec that’s most important for getting your OpenCL context. So it had to be done a different way. Is this MS’s fault or Nvidia’s (who’s card\drivers I was using?) I don’t know.

Sufficed to say, my time in Windows on this matter was just as horrible and frustrating as it could have been. Oh, also, in Nvidia-land you really need up to date drivers or GLSharing won’t actually work, even though it’s implemented. It just throws errors telling you the device isn’t available. But it won’t be more specific. So, beware.

Right now, it requires running the windows editor (and windows players) with the -force-opengl flag to make sure it’s not using DirectX. There are extensions for sharing OpenCL with DirectX. So it should be able to be modified to handle both seamlessly. But for now, I want to get the framework published with implementations that at least work on the two major editor platforms, OSX and Windows. That will at least allow developers to work with it and contribute to the open-source project. Further player/platform compatibility is something I’d hope the community could help with. Honestly, I don’t have enough types of hardware here to even think about testing and debugging half of them.

Since Windows seems to have a heavy sense of being interwoven with politics, engineered awkwardness and so on to hint just subtly at the wondrously documented MS alternatives to things i tend to regard MS implementation or adoption of open standards with pretty much very disappointed dismay given its so against the quick evolution of personal computing even when ultimately futile

I’m still in the land of being impressed with your efforts but no idea what to do with these things myself (I wish i were a better coder than i am) but I am intrigued by possible art tools and where you could take this whole thing, so continue with the wordy updates!

As promised, I’ve committed and published the GL Sharing updates to the open source project.

I intend to now focus some time on the artist tools.

Again, those who can test and extend platform support are welcome to have at it and submit pull requests.

Quick poke! Will have a look at the github soon, but how is this coming along? Abandoned to the mists of time or just going stealthy?