Historical Reference Data with ArangoDb

When traveling from point A to point B you want an overall pleasant journey. Compared to driving, flying is much more complicated due to multiple factors that can delay your flight (weather or air traffic) or increase waiting times (security check, luggage check-in, flight delays, etc.). FlightStats lays the data-centric groundwork for a perfect journey.

All major airlines, airports, hotels, and leading search engines use FlightStats´ products to help improve their services. Our data reaches more than 35% of global air travelers each day and the data request to our systems exceed 300 million each month.

The Challenge

Aviation data is fragmented and not uniformed to one standard. Flight status, weather, reference data, FAA Airport Delays and many other data sources may affect the whole customer journey from destination A (e.g. home) to your final destination B (e.g. hotel). Data from all these sources have to be harmonized, normalized, and transformed to get accurate analytical or predictive results.

Collection-Sources-Diagram

We maintain and use reference data for airlines, airports and equipment. This data has been maintained across many tables in a relational database. We were running into multiple data related challenges and decided we need to do the following:

  1. Store reference data with temporal information
  2. Make it easier to change schema as future needs arise
  3. Improve our authoring web UI
  4. Reduce load on our relational database
  5. Provide API access to the reference data

We needed a document database that allowed us to easily query the data by effective date. Storing and processing data from more than 30k airports and airlines is a challenge itself but for our needs, it’s necessary to store each modification of those entities with its effective date as a document as well (i.e. key = id + effective_date). All changes and their effective date have a significant impact on historical reports and thus on our predictive analytics.

Data-Services-Diagram

For our partners and data sources we needed a solution which enables easy access to multiple environments and a versatile, performance API layer. Due to the growth of our company in terms of API requests, external data sources, and quantity of data, we faced the need for a scalable solution.

Furthermore, our team was and is growing fast. Therefore we need good documentation of the technology we use at FlightStats to get new team members up to speed, especially on advanced topics.

Our Solution

Our reference data is best suited to a document tree format hence using a dedicated document store was our first idea. We experimented with a variety of databases – some of them market leaders in NoSQL – but quickly ran into limitations like expressing non trivial queries in a readable way or pushing performance to the required level.

By chance we found ArangoDB from Germany and it turned out the database was a perfect fit for us. ArangoDB is a multi-model database that let’s you handle data as key/value, graphs and as documents of course. We could solve all the issues concerning our requirements:

  1. We tested ArangoDB intensively and performance was really good.
  2. It enabled us to easily access stored data from multiple environments. HTTP API’s ensure this. Currently we are hitting ArangoDB from a node web app as well as a clojure server app.
  3. Because of their framework Foxx for data-centric microservices we were able to provide new, sophisticated APIs super fast and let any logic run directly in the database.
  4. The documents were good enough to quickly get us up and going with ArangoDB; they have a good amount of depth for advanced topics; and we feel they could easily be consumed by other team members at FlightStats. It´s also the best API documentation I’ve seen (Swagger API implementation).

Having a “Go” on these issues we could start with our project and store the reference data of airlines, airports, weather, equipment, and locations with their effective date.

Our Benefits

Reference data is now used for all of our data products that refer to Airports (Terminals, Gates, referred to from trips, flights), Airlines (referred to from trips, flight alerts, etc), and equipment (which type of aircraft). It is vital that the data be accurate and up-to-date. ArangoDB is making it easy for us to add temporal effectiveness to the data – which is useful for both historical reporting as well as making changes that we know will be effective in the near future.

A big bonus for us is the freedom to scale ArangoDB along FlightStats necessities. In other parts of our organization we have the need for graph models and can easily bring graph-teams up to speed with our experiences with ArangoDB.

AQL (ArangoDB Query Language) is a powerful query language and its intuitive nature made it easy to adapt. We are working in a fast paced environment, quick iterations and prototyping is very important. With the microservice framework Foxx we save a lot of time and effort and get basic environments up and running in hours. Our team is constantly growing and due to good documentation we are able to bring new members up to speed in no time.

Overall ArangoDB is a sophisticated technology meeting high quality needs and we are keen to drive the implementation further for other use cases.