I’m writing a tool that relies on the API’s ability to list all existing exports (GET api/v2/projects/{UNITY_PROJECT_ID}/rawdataexports).
I assumed that the API would return all exports - even if there were a big amount of them - since the documentation did not mention anything about pagination or limits.
Now that I have about 70 completed exports, I noticed that the API only returns the latest 50 exports.
Assuming this limitation is a feature, is there any workaround to retrieve older exports? I’d love to be able to pass a minimal date, or even page index parameters, but I’ll take any solution at this point.
This looks like an oversight in our documentation. Our system does currently limit you to the last 50 exports. However, you can still retrieve older jobs, if you have the job specific ID:
GET api/v2/projects/{UNITY_PROJECT_ID}/rawdataexports/{raw_data_export_id}
If you provide your project ID (in a PM) we can manually pull the full list of exports for you.
We have plans to improve this service, so any feedback you can provide would be helpful to that end.
What is your use case? You mentioned you are making a tool that relies on RDE, could you provide details on what that tool does?
What would be the ideal solution to this problem for you?
How far back do your exports go?
Any features you would like to see for this service?
Any other feedback you can provide would be very appreciated.
I wish this had been in the documentation before I started writing the tool
Sure, I’m using /rawdataexports/{raw_data_export_id} as well, but it won’t help to retrieve older jobs if you don’t have their job ID…
Thank you for the offer, but I’m looking to automate the process, see my scenario below.
I’m currently writing a server toolchain with the general aim to (1) keep a local, up-to-date stash of raw data files and to 2) mine these files and provide custom analytics dashboards.
The tool I mentioned is concerned with (1). In my scenario, the server goes online and wants to (A) make sure it has downloaded all raw data since forever (project epoch), up to today. If it has already download a bunch of exports up to some date in the past, it wants to (B) request and download continuations of the last export, up to today.
With the exception of the very first export ever to be created for a particular dataset, the tool only creates and downloads continuation exports (whose request.continueFrom != null), in order to maintain a trail of exports that guarantees no raw data event is missed.
In this scenario, when a server goes online for the first time and wants to determine (A) if an already created trail of exports is available, it relies on the ability to list all exports ever. It makes a graph from this pool of exports by following the request.continueFrom links. Then, it trims this graph until it finds the best trail of existing exports to download. Or, if none is found, it creates an entirely new trail of exports that spans from epoch to today.
As you can see, (A) requires the ability to list potentially very old exports. The last 50 exports are not enough, because in case (A), the server does not know any particular export ID to query. It could try to fetch the last 50 export IDs and then recursively fetch each parent of these, but this won’t give a complete view on all exports, since there could exist export tree leaves that are older than the last 50 exports. If this happens, in its current implementation, my tool will simply request a whole trail of exports to be created from epoch to today, which is kind of stupid since these exports have already been made and are already in the cloud’s storage space
Also, do notice that in the case (B) where the server knows the ID for the last export that has been downloaded, the same problem can occur. Because the schema for exports only includes parents (request.continueFrom) and not children, it is impossible to construct a full tree of exports using only /rawdataexports/{raw_data_export_id}. In this scenario, the tool must list all exports anyway and bubble up the construction of the tree. Again, if some leaves are older than the last 50 exports, it may become impossible to retrieve a huge amount of actually existing exports, leading again to the requesting of brand new whole trails of exports.
In a way, I could imagine the problem does not really matter to you in terms of storage space, given that the monthly limit of 50 GB is particularly huge. But, as a side note, this does raise the question of export lifetime: what happens when Unity Analytics decides that an export has expired? Will the exports disappear from the list? Will it get a different status (“expired”)? Will continuation (children) exports be disposed as well? Will the disposed export be able to be used as a base for a continuation anyway (request.continueFrom)? I can see several things going wrong there, as far as my tool is concerned…
At the very least, if an export is still available to download, I’d want to be able to be retrieved using GET /rawdataexports. I would like to assume that if an export cannot be retrieved from this API, it simply does not exist anymore.
Ideally, GET /rawdataexports would return all exports, period. If response size matters, pagination (get page #n of all exports) or filtering (get all exports whose start and end date are between a min and max date) would be fine as well. But at some point, my tool would like to discover all exports that exist, so a bunch of big API requests will have to be made…
Also ideally, there would exist an API or export schema field that, given an export, lists all export IDs that continue this export.
And finally, some documentation or idea of how export expiry is handled would be nice :)
The tool and I only started creating exports recently (past months), but some of them span back to late 2015.
Maybe a DELETE API to completely destroy an export and its continuations? My online dashboard is filled with test exports that I don’t have use for anymore. And deleting them would give the extra perk of reducing GET /rawdataexports’s reponse size
Apart from this huge block of text I just wrote (sorry!), I personally love this service! I’m really grateful to be able to benefit from it. I’m trusting you guys will find cool ways to improve it even more