FlightStats Open Source Project

FlightStats Open Source Project: 3 Ways the Hub Makes Data Storage and Transport Easier for Any Business

According to IDC, the world’s data is doubling every two years and most big data applications need to be able to access data all the time. So what are businesses doing to store and transport all of this data that powers their apps and analytics?

Traditionally, companies have relied on databases; however, creating a database is hard and expensive. Additionally, managing a database can become burdensome for both the team that maintains the data and the team that needs to consume the data.

At FlightStats, we removed many of the risks and complexities associated with data management by creating an alternative solution for your typical, clunky database. It’s called the Hub and it’s a data management platform that has helped the FlightStats team develop data products faster and easier than many companies in our industry. The best part is that it can be used within any type of business that handles any type of data, whether they’re experienced or not.

Here are three ways the Hub can streamline your data storage and transport:

1. Simplifies your data replication process

In the data industry, you have to be moving or getting data from someone else at all times, which means you have to have a process in place for replicating data so you can consume it or send it. At FlightStats, we often have multiple teams working with a common dataset at the same time. Everyone in our company needs to touch flight data and we needed a system that could allow them quick access.

Unfortunately, setting up data replication in a traditional database could cause it to break and there are fewer protections in place to prevent too much modification when more than one person is working with the data.

The Hub makes data replication easier with no risk of losing or damaging the data you’re working with. Once the data is in the Hub, it’s written in there forever. If you want to make a change to the data, then you have to replicate it as a brand new addition. The original version will always be there and so will each new version you created when you made a modification or change. This process provides safety rails so when you have many people touching the same data at the same time, you don’t have to worry about anyone making unwanted changes to anything. They will have to publish something new and build upon it, which leaves your version untouched.

2. Keeps your data organized across all teams

Whenever you have a big bucket of data, you have to organize it in some way. Every database is able to look at a file full of items and will allow you to arrange them in some order. You can even pick multiple ways of ordering. At first, this sounds like a good idea; however, organizing data in more than one way can quickly become very complex. In fact, it’s often too complex if you’re sharing data among multiple teams.

As you add more components, one team might organize their data one way and another team will organize another way. In an ideal world, all teams would agree on one way to organize data, but it’s unlikely. Therefore, it becomes difficult for everyone involved.

We removed the complexity of organizing data by selecting a single way to organize it. All the data that flows through The Hub is sorted and organized by time. We picked time because it’s common to most industries. Everything happens in time. For example, in the aviation industry, a plane must taxi, take off, land and taxi again according to a time series.

The Hub forces everyone on every team to speak the same language so they can get more done. It’s like going into a library and instead of the Dewey Decimal System it’s everything is organized by time. If you came into a library and only had to know what time it was, then it’ll be much more simple to understand.

3. Increases your scalability and flexibility

Most databases naturally grow and end up becoming large, slow moving and monolithic structures as more data and more complex organization methods are applied to them. Typically, they’re hard to expand, expensive to maintain and inflexible. They’re not easy to run in a cloud environment, which translates into a complicated development process and slow go-to-market ability. It would not be uncommon for a typical database system to make your product development take an entire year.

At FlightStats, we have been known to take an idea and build a whole product within the timeframe of a week, which is considered alarmingly fast in the aviation industry. One of the main reasons we are able to do that is the flexibility offered by the Hub. Our Hub has a lot of data in it, but since it’s flexible, it can be scaled in more elastic ways. If you need more data, then you can add more without any physical limitations.

You don’t have to make additional investments

Whenever you introduce new technology into your company, you might be fearful about the amount of time and resources it will take to transition into using it. Will you have to hold training? Is it necessary to bring in a specialist?

With a system like the Hub, there is no master’s degree or specific experience level required to get to work. You can hire young engineers right out of college rather than veteran coders who are hard to find and costly to obtain. If you already have engineers on staff, the barrier to entry for understanding and using the Hub is significantly lower for them. They can learn how to use the Hub in about 30 minutes to an hour.

Interested in getting started? You can access the Hub through the Flightstats Github Project. If you have any questions, please feel free to reach out to us.