- Starting to write code or design a solution before receiving proper training is a bad idea, especially in Big Data.
- Making a mistake with small data isn't costly and can be fixed quickly, but making a mistake with Big Data is very costly and can take a while to fix.
- Companies who start coding before being trained waste an average of $100,000 to $200,000, and this number can go as high as $1,000,000 to $1,500,000 for companies that waited months before being trained.
- Training saves money by avoiding bad ideas or abuses of technology that can turn into major problems and wastes of money down the road.
- The average cost of hypothetical "what if" scenarios due to not receiving training is $300,000 to $400,000, based on downtime estimates, extra operations time, and code rewrites.
Sometimes companies will start writing code or designing a solution before I train there. This is usually a bad idea. It really shows the difference between Big Data and small data. Making a mistake with small data isn’t costly and doesn’t take long to fix. Making a mistake with Big Data is very costly and can take a while to fix.
Companies who start coding before they’ve been trained waste an average of $100,000 to $200,000. I’ve seen this number go as high as $1,000,000 to $1,500,000 for companies that waited months before being trained. For them, training was a way to get out of a deep hole.
These numbers are based on my conversations with the engineers about how much time was spent already, how much time they’ll have to spend fixing things, and the opportunity cost. I’ve written extensively about how training saves you money.
The numbers you just read are only the numbers for wasted time up to that point. They don’t cover the hypothetical “what if” they didn’t receive the training. While I’m training a team, I’m paying attention to any bad ideas or abuses of a technology. These are the genesis for major problems down the road. These major problems turn into major wastes of money down the road. The average for this is $300,000 to $400,000.
These numbers are based on downtime estimates, extra operations time, and rewrites of code.
What If
Let me give you an example of a company that avoided a “what if” scenario. I was training at a company on real-time distributed systems. They were going to do a real-time, non-time bounded join. That means two streams would be joined in real-time, but the two streams weren’t temporally in-sync. It could take an hour or 12 hours for the other message to come through the system. This scenario is possible, but it was over-engineered and operationally fragile.
In talking to the engineer, I found a much simpler and less operationally intense method. It still satisfied all of the requirements. The engineer had spent a month solid writing that code. The operations costs would have been weeks of time from diagnosing weird problems to outright downtime from the system not working.
My $25,000 in training saved that company at least $400,000. Had they come to me before starting it would have been at least $500,000. I’ll take ROI like that anytime.
If you’re still looking at those numbers and thinking it isn’t possible, you’re still thinking in small data terms. Due to its sheer complexity, a mistake or outright misunderstanding of Big Data technologies is costly.
If you’re starting on a Big Data project or wanting to become a Data Engineer, I strongly urge you to get training. Otherwise, you’ll be risking hundreds of thousands of dollars.