- Big Data is defined as "can't" - when a technical limitation prevents you from doing something.
- Examples of Big Data problems include being unable to add new features, run reports, or execute SQL statements due to technical limitations.
- This definition of Big Data is more helpful than the traditional definition based on the 3 Vs (volume, variety, veracity).
- For management, it's critical to determine if a problem is truly a Big Data one before investing in a project, and to calculate the potential costs and benefits.
- For individual contributors, it's important to identify if a problem is technical and Big Data-related, and to acquire the necessary skills and training.
You’re starting to learn about Big Data or you’re wanting to learn more about Big Data. You start off by googling ‘what is Big Data?’ You get an answer that doesn’t quite make sense. The site talks about 3 Vs or sometimes they’re 4 Vs or even 5 Vs.
These 3 Vs are usually defined as:
- Volume
- Variety
- Veracity
That definition really isn’t helpful. How big is volume? How much variety do you need? How much veracity do you need? For you, it all comes down to: do I need to use Big Data?
I don’t like this definition of Big Data. It leads to incorrect interpretations and abuse by marketers.
Can’t
I define Big Data as can’t. You can’t do something because of a technical limitation that says you can’t.
Say these next few bullet points out loud in your best Jeff Foxworthy imitation. You might have Big Data problems if:
- You’re a manager and you ask for a new feature or a report and the technical person says they can’t due to a technical limitation.
- You’re a developer and you can’t add new features because the database will fall over and die.
- You’re an analyst and you can’t do a report because it would take too long or process too much data.
- You’re a DBA and you can’t run a SQL statement because that kills off your production database.
- You’re a Data Warehouse Engineer and you still can’t do the most intensive queries because they take too much time and resources to run.
/end Jeff Foxworthy voice
I prefer this definition of Big Data because we’re diagnosing real and business-critical problems with a quantifiable definition. We don’t have to wonder if we really need Big Data because our ‘can’ts’ are real problems that can only be solved with Big Data technologies.
What to Do Now?
Now that you understand what Big Data is and how to identify where it applies, you’ll need to take the next step. That next step depends on if you’re a manager, VP, CxO, or an individual contributor.
Management
For management, it’s first critical to decide if a problem is truly a Big Data one. The costs for all parts of a project jump when you’re doing a Big Data project. Once you’ve decided it is a Big Data problem, you need to think about these questions:
- How much is it costing you to say or be told ‘can’t?’
- How much is it costing you to use a small data program for the wrong job?
- How much is the technology limiting you and how often are you saying ‘can’t?’
If you can calculate the amount of money you’re losing due to saying can’t or how much more you could make by saying can more often, you can make an easy business case. For example, if you’re losing $5,000,000 by saying can’t and you could make another $5,000,000 by adding a new project, you could make an additional $10,000,000. To make that happen, you might have to spend an additional $2,000,000 in hiring new team members with Big Data skills, training the existing team, increasing operational expenses such as licenses and hardware. Overall, you’re making an additional $8,000,000 by saying yes and using your data to its fullest potential.
Once you’ve defined the basics of the business value, you need to start looking at the use case, team, and plan. I’ve written an entire book on the preceding sentence. Saying any more on the subject would do this vast topic a disservice and I highly encourage you to read it.
Individual Contributors
For individual contributors such as Developers, Analysts, DBAs, and Data Warehouse Engineers, your path is focused on how the technology solves the problem.
You might be the canary in the coal mine saying can’t. You’re quickly realizing that you’re saying can’t too often. I’ve taken to nicknaming the data warehouse team the can’t team because they say can’t so often (see the story below).
First, you’ll need to identify if the problem is really technical and Big Data. Are you saying can’t be there’s too much data and the processing will take too long? Or are you saying can’t because you’ve hit a skills gap? The two answers lead to very different directions. Taking too long is a good hint that you’re hitting Big Data problems and a skills gap requires a closer look.
Big Data isn’t like something you’ve dealt with before. You’re going to need brand new skills and training on the various technologies.
From there, you can make an educated decision on what to do. I’ve written an ultimate guidebook for people in your situation. I highly encourage you to read it. It goes into a level of detail that you’ve never seen before on how to switch careers into Big Data.
An Insurance Company’s Big Data Can’ts
I want to give you a concrete example of can’t. I worked with a large insurance company that was saying can’t. In this case, it was a data warehousing team that wasn’t able to keep up with the business requirements.
One of their biggest pains was a query that took 3 days to run. This was a query that updated their actuarial tables. The team was completely stymied with can’ts. When the business originally created the requirements, they wanted the query to run every month; the team said they can’t and it had to run every 3 months. The business wanted the query to run over 20 years of data; the team said they can’t and it had to run over 5 years of data.
All of these can’ts cost the business several million dollars a year. Worse yet, the data warehouse team had no ability to make any improvements on speed or amount of data. This was the can’t team.
I was brought in to train and mentor the team. I trained the team on the Big Data technologies they’d need to use. I showed the team the Big Data architectures that would turn them into the yes team.
How to Do It
Ways I’ve seen companies be successful with Big Data:
- Allowing enough time to have a sane project plan
- Having realistic expectations for what Big Data would do for the company
- Spending the money on excellent training
- Getting the team the mentoring and help they need
- Realizing Big Data is a complex animal
Ways I’ve seen companies fail:
- Thinking Big Data is the silver bullet that will save the company from itself
- Rushing through the process and not giving the team the time and resources to succeed
- Thinking the team can just read some books or watch some YouTube videos to learn Big Data
- Cheaping out on training and help for the team
- Having a team without the right skills
If you’re running a business that needs help with your Big Data strategy, you can read about my mentoring service.