In a previous post, I showed how to use Beam’s Regex class to split up a string. In this post, I’m going to going to show some other features of the Regex class. The Regex class gives you a distributed way to work with strings. I tried to make the...
There’s this friendly game in Big Data frameworks. It’s what’s the fewest lines of code it takes to do WordCount. I’m a committer on Apache Beam and most of my time is dedicated to making things easier for developers to use Beam. I also help...
We’re coming on that time of year when many people make their goals for the next year. Before you do that, reflect on how you did this year. If you accomplished a goal, how did you do it? If you didn’t accomplish a goal, what happened? Many people wrote in...
Unit testing your Kafka code is incredibly important. It’s transporting your most important data. This is especially true for your Consumers. They are the end point for using the data. There are often many different Consumers using the data. You’ll want to...
Unit testing your Kafka code is incredibly important. It’s transporting your most important data. As of 0.9.0 there’s a new way to unit test with mock objects. Refactoring Your Producer First of all, you’ll need to be able to change your Producer at...
I’m often asked what I think will happen to Big Data over the next five to ten years. From a Developer’s point of view, they’re asking if investing their time in becoming a Data Engineer will pay off. We’re going to see a continuing maturity of...