This post will focus on the key differences a Data Engineer or Architect needs to know between Apache Kafka and Amazon Kinesis. Cloud vs DIY Some of the contenders for Big Data messaging systems are Apache Kafka, Amazon Kinesis, and Google Cloud Pub/Sub (discussed in...
When a Big Data project fails, there’s plenty of blame to go around. When I do the retrospectives with teams who are failing or about to fail, their blame is often misplaced. There’s a focus on blaming the technology. The more difficult considerations of...
Most companies aren’t experiencing Big Data or small data problems. They’re experiencing a witching hour of sorts. This a point in their growth where their data is too big for small data and too small for Big Data. As I’m teaching at companies,...
There’s a common difficulty that companies are having in transitioning to Big Data, especially Kafka. They’re coming from systems where everything is exposed as an RPC-esque call (remote procedure call/REST call/etc). They’re transitioning to a data...
At Strata London, I premiered a new talk based on my Data Engineering Teams book. Companies are seeing great efficiency gains and ROI from using Big Data technologies. However, the vast majority of teams fail and never get something into production. I want to prevent...
Designing data for consumption in a Kafka topic requires more forethought. Instead of the messages being a consumed from point to point, there are many different consumers. You will need to decide on: Name Schema Contents Key/Ordering Number of Partitions Number of...