Most companies aren’t experiencing Big Data or small data problems. They’re experiencing a witching hour of sorts. This a point in their growth where their data is too big for small data and too small for Big Data. As I’m teaching at companies,...
Apache Beam just had its first API stable release. Now that we have an API stable release, I want to update what’s changed in the Beam ecosystem. I want to highlight the growth of Beam as a project and the increased usage of Beam in pre-production/development or...
There’s a common difficulty that companies are having in transitioning to Big Data, especially Kafka. They’re coming from systems where everything is exposed as an RPC-esque call (remote procedure call/REST call/etc). They’re transitioning to a data...
At Strata London, I premiered a new talk based on my Data Engineering Teams book. Companies are seeing great efficiency gains and ROI from using Big Data technologies. However, the vast majority of teams fail and never get something into production. I want to prevent...
Open source is a great way to solve problems. Mostly we focus on the open source project from a technical and architectural points of view. In this post, I’m going to talk about it from a business point of view. Sometimes you’re look through 3-10 different...
Designing data for consumption in a Kafka topic requires more forethought. Instead of the messages being a consumed from point to point, there are many different consumers. You will need to decide on: Name Schema Contents Key/Ordering Number of Partitions Number of...