- Real-time Big Data is becoming increasingly important for organizations, teams, and individuals.
- However, in the past, we lacked the systems that could scale to the sizes and amounts of data needed for real-time processing.
- As a result, many organizations had to resort to batch processing over 24-hour windows, which didn't meet the needs of the business.
- Some teams tried to lower their batch sizes to be smaller and smaller time windows, but this caused operational headaches and the systems couldn't keep up with the demand.
- The business wanted to be no more than a minute behind what's currently happening, which was impossible with batch processing.
One of the benefits of teaching and consulting is the sheer number of organizations, teams, and people I get to work with. Since I deal with so many different groups, I can see patterns emerge much faster than others.
One pattern I saw early on was real-time Big Data. Organizations wanted to do things in real-time. Teams had projects that required real-time. People had ideas the required real-time systems.
And we couldn’t do it
We lacked the systems that could scale to the sizes and amounts of data needed. As a direct result, we had to do terrible workarounds.
As I work with my clients around the world, they’re moving from batch processing to real-time processing. They tell me the stories about how they wanted to do real-time, but could only approximate the system in batch.
Let me share one of their stories.
One large financial company was feeling the need for real-time processing. The use case required real-time, but the project was started at a time when real-time Big Data wasn’t feasible. As a result, they had to go with batch processing over 24 hour windows. This didn’t meet the needs of the business, but that was that was possible.
They would try to lower their batch sizes to be smaller and smaller time windows. What started as a 24 hour batch window gradually decreased down to 30-60 minutes. The business was all over the team to turn around the data faster and faster. It wasn’t acceptable to be 24 hours behind.
But the team couldn’t go any lower than 30-60 minutes. Going to that small of a batch window caused all sorts of operational headaches. The systems just couldn’t keep up with the demand and the operations team crumbled.
The business wanted to be no more than a minute behind what’s currently happening. There was nothing more that the team could do. They had to move to a real-time system.
I mentored the team on their transition to real-time. They could finally accomplish their original use case and its requirements.
Now we have the systems that can scale and do real-time Big Data.
As I work with these teams on their moves to real-time, they’re able to circle back with the business and actually deliver. This is the part that I love. I love being able to remove the pain that a web of terrible workarounds causes and producing a resilient real-time system.
This is why real-time is the future. The business and use case wanted real-time processing. We as data engineers can deliver it now.