How to use new vision Inputs ? (from beta 0.5.0)

I downloaded the latest beta 0.5.0 to use the new gaze and pinch detection inputs, through the input system package.
I’ve managed to use the vector3 and int control inputs. However, I don’t understand how to use the “vision OS spatial pointer” control inputs. Is there any documentation on the subject ?
I would also like to know if it is possible to detect the position targeted by the gaze thanks to these new inputs? Because according to my tests, it’s only possible to detect the position targeted by the gaze at the moment of pinching.
Thank you in advance for your answers.

I second the request for a clear example of how to use this new data!
My understanding is that due to privacy policy, gaze direction is only given at the moment your fingers pinch (i.e you can’t get at it ahead of time to highlight a selection).

Hey there! Examples and documentation are coming soon. I wanted to get the base support out ASAP, but missed a few extra controls needed for XRI, and we still don’t have a VR sample.

At a basic level, you should be able to use the primary spatial pointer control as you’ve identified here. You can either bind to the primarySpatialPointer control as you’ve done here, or the individual sub-controls, depending on how you want to set up your bindings.

That is correct. We only get information about the Selection Ray in the first input event. Subsequent events will update the Input Device Pose but that’s it. You can poll the phase of the input event to determine whether this is the first frame of input (Began), an intermediate frame (Moved) or the final frame (Ended). The device position will represent the location of your pinched fingers.

Not exactly, since there is no target as far as visionOS is concerned. The system doesn’t know about any of the objects you are rendering in virtual reality, so there’s no interaction location like there is on the Mixed Reality side. Instead, you need to do a raycast in Unity to figure out which object was hit.

You can use this input device as follows:

  • Poll for a VisionOSSpatialPointerState in Update
  • Using the phase field, decide whether a pinch has occurred
  • When you detect a Began event, cast a ray into your scene using startRayOrigin and startRayDirection to define the ray.
  • If that ray hits a collider, send a “button down” or “grab” event as needed
  • While the pinch is held, use devicePosition and devicePosition to track the location of the user’s pinched fingers to update the position of the grabbed object, slide a slider, etc.
  • When you detect an Ended event, send the “button up” event or release the grabbed object

Here’s a script that will be shipped with the example. It uses an action map similar to yours (via its generated C# script), which exposes the primarySpatialPointer control in an action called PrimaryPointer.

using System;
using UnityEngine;
using UnityEngine.XR.VisionOS.InputDevices;

namespace UnityEngine.XR.VisionOS.Samples.URP
    public class InputTester : MonoBehaviour
        Transform m_Device;

        Transform m_Ray;

        Transform m_Target;

        PointerInput m_PointerInput;

        void OnEnable()
            m_PointerInput ??= new PointerInput();

        void OnDisable()

        void Update()
            var primaryTouch = m_PointerInput.Default.PrimaryPointer.ReadValue<VisionOSSpatialPointerState>();
            var phase = primaryTouch.phase;
            var active = phase == VisionOSSpatialPointerPhase.Began || phase == VisionOSSpatialPointerPhase.Moved;

            if (active)
                m_Device.position = primaryTouch.devicePosition;
                m_Device.rotation = primaryTouch.deviceRotation;
                var rayOrigin = primaryTouch.startRayOrigin;
                var rayDirection = primaryTouch.startRayDirection;
                m_Ray.position = rayOrigin;
                m_Ray.rotation = Quaternion.LookRotation(rayDirection);

                var ray = new Ray(rayOrigin, rayDirection);
                var hit = Physics.Raycast(ray, out var hitInfo);
                m_Target.position = hitInfo.point;

Our next release will add controls for IsTracked and TrackingState which are needed to use this input device with an XRRayInteractor. Using those controls along with the existing controls for device position and rotation, you can hook up a ray interactor which uses a secondary Transform as its Ray Origin, driven by the ray origin and direction controls. We’ll include an example scene to show how this is set up.

If you need to implement a “poke” interaction, or anything that relies on tracking the hand continuously, you can use the XR Hands package.


Thanks, this is saving me a ton of time!

Thank you for your answers that’s really helpful, but i’m a bit concerned. Does this mean that i can’t give my user feedback on what he is currently looking at because the gaze is only detected after his input ?


Thanks for your answers! This piece of code was very useful for my understanding @mtschoen .
I echo @Cowrist concerns, so there’s no way to give our players feedback on what they’re looking at?
Is this the fault of Apple’s privacy restrictions ? as @puddle_mike suspects.

FYI - if you are an app that’s able to use PolySpatial (and its RealityKit-based rendering), then there is a pathway for you to get gaze highlights on things. It’s just that the OS has to render them for you, since privacy restrictions only allow applications to receive the data when given “implicit permission” via a pinch.
But for fully-immersive apps that do their own rendering (relying on Metal-based rendering with URP or built-in render pipeline), we are sorta outta luck unless Apple decides to allow us to request permission for eye gaze data (like they do for hands). I’d bring this up with them if you have a direct line :slight_smile:

1 Like

(I’ll note that I’m unsure if gaze highlights are possible with PolySpatial when in fully immersive mode, it might just be for mixed reality. Someone with more experience can clarify…)

You are correct. Gaze highlights are only available with PolySpatial in mixed reality when rendering with RealityKit. There is no way to highlight an object based on gaze when rendering with Metal. Please do share this feedback with Apple.


We would also like more ability to select objects players are looking at with their eyes.

In case this helps: we’ve found that it usseful to separate “gaze” into two buckets; “eye tracking” where the player is looking; and “head tracking” the general direction of the player’s head pose. The answers to visionOS / PS questions are different based on which form of gaze you are referencing.

It seems that when one hand maintains the pinch, the other hand can only start a new gesture, but it does not stop again and does not update if starting another gesture. Is this known/expected behavior?
I’d like to send multiple secondary inputs while maintaining the primary gesture, is this possible with these OS provided inputs?

THANK YOU!!! This saved me so much time. You are a hero.

No, this is not expected. It sounds similar to this issue which I am looking into today. So far I’ve been testing with our InputDataVisualization sample scene and haven’t seen these types of issues. Do you have a specific scenario where you’re seeing this occur? Can you get it to happen in that sample scene (or any of our other samples)?

I managed to get the pinch gestures working in my app, but ended up having to configure the input actions in a way that seems wrong. I originally did the input action binding such that “Spatial Pointer #0” was left hand, and “Spatial Pointer #1” was right hand. Like this:

In the end I found that the right hand data only comes through if I bind the right hand to “Primary Spatial Pointer” like this:

Is this expected? I’m worried that there is some notion of primary/dominant hand in the OS, and that this will break down for lefties…

No, there is no concept of primary/dominant hand. Are you sure that it’s about your left/right hand? From my experience, Primary Spatial Pointer/Spatial Pointer #0 is the first hand you use, not left vs. right. This action map looks right to me, except for the labels -Left and -Right. At the moment, unfortunately, there is no concept of handedness for this input mechanism. We just get a list of SpatialEventCollection.Event structs. They have an ID value which is consistent for the lifetime of the event, but all we can do on the Unity side is assign them to #0 or #1 depending on which one came in first.

You can think of it similarly to a touch screen. You can detect >1 touch, and each touch has a “finger ID,” but there’s no way of knowing which finger (left index, right pinky, etc.) did the touch. Of course, the platform knows what hand you used to pinch, but that data is not exposed via this input API. The closest you could probably do to handed input is to also enable hand tracking (using an immersive space) and figure out which touch is closest to the index tip joint of the left or right hand.

We’ve asked for a handedness property on touch events in the past, but as always I encourage you to submit feedback to Apple to amplify this signal.

Sorry I can’t see the InputDataVisualization scene anywhere in the template project. Is there another example/test project I have missed? Or perhaps you can share the scene somehow?

It’s in the samples. If you look at the PolySpatial package in the Package Manager UI and click on Samples and you should see an option to import PolySpatial Samples. You will see InputDataVisualization under Assets/Samples/PolySpatial/Scenes.

Oh interesting, I think I’ll have to do some more testing. My recollection is that I was always seeing left hand as Spatial Pointer #0 (tho its possible I always pinched the first time with left). I never saw Spatial Pointer #1 produce any data, but did receive data when I used Primary Spatial Pointer (and it was always right hand in my limited testing). The thing I’m not sure about is what order I pinched when I tested…

These are good clues tho, I can look at the finger tips using the hand tracking data to determine handedness if I have to…

Yeah I’d be curious to know if this setup isn’t working. It looks close enough to our PolySpatialInputActions asset in the samples that it should work the same way. If you still have any issues, please test the InputDataVisualization scene in the package samples (Package Manager UI > PolySpatial > Samples) to verify that you can pinch on the test objects with both hands.

Cheers, found it. Has anyone ported this to VisionOSSpatialPointer already by any chance?
Edit: Nvm.