- Big Data batch systems are 10x more complex than small data, while real-time systems are 15x more complex than small data.
- Programmers have the least increase in complexity when dealing with real-time systems, but they still need to learn the new system and any new APIs or concepts.
- Architects have the highest increase in complexity when dealing with real-time systems, as they need to create various real-time interactions and ensure the system handles errors and error conditions elegantly.
- Operations people have an increase in complexity in terms of risk mitigation, as they need to know what happens during a failure or outage and have a plan in place to handle them.
- All three functions (programming, architecture, and operations) need to consider what happens to real-time data during a failure or slow down, which is known as back pressure.
In my seminal post On Complexity in Big Data I talked about the level of complexity increase with Big Data. The post itself focused on Big Data batch systems. I didn’t really cover real-time complexity increases when dealing with Big Data.
In the post, I argue that Big Data batch is 10x more complex than small data. I believe that real-time is 15x more complex than small data.
There are many reasons for this jump in complexity. For this post, I’m going to focus on how they affect specific job roles.
Programming
Programmers have the least least increase in complexity. Keep in mind that there is an increase in complexity between real-time and batch. A programmer will need to learn the new system and any new APIs or concepts.
Often, the same or similar distributed systems concepts are in play with real-time systems. However, the programmer will still need to understand how the real-time distributed system works.
This is a time where if a programmer can barely do batch, they will have a very difficult time with real-time.
Architecture
Architects have the highest increase in complexity. Creating the various real-time interactions is difficult. Making sure the system handles errors and error conditions elegantly is another level of complexity.
It even more important that an architect understand the systems they’re using. Before they choose the real-time systems, they should have looked a other real-time systems. That’s because each real-time system will cheat in a slighly different way. If an architect doesn’t understand or skips learning about these cheats, that could render a vital part of the use case impossible.
Once the use case is understood and the technologies have been chosen, the architect needs to create a system that works. Not only that, they need to design and anticipate what happens when there is an error. These could be errors from system failure, data, or code bugs. Thinking this will never happen will leave you figuring out failure scenarios while they’re happening.
Note: On a data engineering team, the programming and architecture functions could be the same person or different people on the same team.
Operations
Operations people have an increase in complexity too, but not in the way most people think. With real-time, the SLA (service level agreement) or guarantees about downtime really come into play.
This downtime is one metric I use when evaluating teams. If a team says their real-time pipeline can be down for 6 hours, I postulate that they don’t even have a real-time use case. If their real-time isn’t so mission critical that 6 hours of downtime is acceptable, they’d be better off with a batch system.
To this end, the operational role for real-time systems is all about risk mitigation. What happens during a failure or outage? How well can you survive a failure? Some of these are handled, in part, by the real-time systems themselves. However, there are other failure scenarios that aren’t handled. The operations team needs to know about these cases and have a plan in place to handle them.
All Three
One issue that’s complicated and related to all 3 functions is what happens to real-time data during a failure or slow down? Some technologies will call this back pressure. The design and plan will have extend across programming, architecture, and operations.
If you want your programming and architecture teams to get the best training on real-time systems, you can contact me about my Real-time Data Engineering in the Cloud class.