Writing your own distributed system shouldn’t be a task you undertake lightly. Too often, I’m seeing teams create their own distributed system. In my experience, this is because they don’t know or think about all of the ramifications of creating...
You’re starting to learn about Big Data or you’re wanting to learn more about Big Data. You start off by googling ‘what is Big Data?’ You get an answer that doesn’t quite make sense. The site talks about 3 Vs or sometimes they’re 4...
As I’ve worked with software teams, I’ve found some interesting views on distributed systems. Some teams think they’re creators of distributed systems. They usually aren’t. I think there are three main groups of teams that interact with...
To achieve the scales of Big Data, you have to cheat in some way. Sometimes people call these tradeoffs. In Big Data, I prefer to call them cheats. A tradeoff makes it sound like a small thing, but the reality is that Big Data tradeoffs can make a use case possible or...
In my book, Data Engineering Teams, I talk about the right skills and people to be on a data engineering team. The right skills and people are incredibly important to the success, or failure, of a Big Data project. Sometimes it’s easier to understand this point...
We’re creating more and more complicated data pipelines and systems with Kafka. These interactions are becoming even more complex as we create microservices. As we create these complex systems, we aren’t thinking about how to test, debug, or fix them....