Extracting Data From OpenAir Without API Access
I had a recent need to pull a lot of data out of OpenAir. There was a requirement to audit some data specific to each employee of the organization.
Ordinarily this sort of task would come with API access to the system in question, and it would be fairly trivial to retrieve the required data and offload it to my workstation for the requisite processing.
Unfortunately, I do not have API access to the OpenAir instance in question. Furthermore, the instance is access through Okta, which adds an additional layer of abstraction to the issue. Without the Okta layer in place, I might be able to goose it directly from a script.
So how do we access hundreds of pages of data on a website that sits behind another website, and which provides no documented API access?
Let’s try Selenium.
The Okta issue is actually pretty easy to solve. If we tell Selenium to navigate to the Okta login page, and feed the appropriate credentials to the relevant form elements, it’ll log us in to the Okta instance.
Please note that in the script below, we’re storing the credentials in a separate file