Use ARKit hand tracking and estimate the object they are looking at using the pose (requires an unbounded app for ARKit features) then calculate the swipe gesture yourself
Setup a collider near the user they can interact with that will register input and then translate that to a distant object.
There is no “finger swipe gesture” and the OS requires an object with a collider to be interacted with (direct or indirect) in order to register input. There is no way to know what object the user is looking at until input is registered.
i feel like a mid/high level interaction framework is needed for XR hands - this is what Leap Unity plugin (and MRTK) provided but getting them to work with Polyspatial is nontrivial