Is Big Data Cheap?

Blog Summary: (AI Summaries by Summarizes)
  • Big Data is not always cheap, some things are cheap and some things are more expensive.
  • Hadoop is the gold standard for both startups and enterprises, and it is not an open source knock off of a better closed source framework.
  • Small data solutions often need a single computer, while in Big Data, many different computers are needed.
  • Depending on the SLA for the cluster, even a starter cluster can need five or more computers.
  • Small data engineers tend to be more plentiful and their salaries are lower in comparison to Big Data engineers.

Companies and individuals often come into Big Data thinking everything is cheap. After all, the entire stack is open source, right? Well, some things are cheap and some things are more expensive.

Software

One of the important distinctions with Hadoop is that it isn’t an open source knock off of a better closed source framework. Hadoop is the gold standard for both startups and enterprises.

That often stands in contrast to small data solutions. Sometimes, companies will have an open source alternative that’s the clone of the commonly used closed source solution. The open source option will lack the polish or features present in the closed source option.

Since Hadoop is used both at startups and enterprises, this allows data engineers to learn one system and use it no matter what size of company they move to.

Hardware

Small data solutions often need a single computer. This computer can serve as the database, application layer, and webserver.

In Big Data, many different computers are needed. If you have true Big Data needs, a single computer won’t be able to handle all of the processing and storage necessary. You’ll need at least three computers just for storage and processing. Often, you’ll collocate the server daemons on these three computers and, as the cluster grows, you’ll move these daemons onto their own computer.

Depending on the SLA for the cluster, some small clusters having high availability (HA) from the beginning. These clusters locate their server daemons on at least two other computers. This means that even a starter cluster can need five or more computers.

People

Small data engineers tend to be more plentiful. Due to their lack of specialization, their salaries are lower in comparison to Big Data engineers. Some small data engineers simply won’t be able to make the leap to Big Data.

Even within the engineers that identify as Big Data or Data Engineers, there is a great deal of variation in abilities and experience. Finding a “Data Engineer” who is willing to work for the same amount as a small data engineer should be red flag. I’ve taught these data engineers and they have a very low probability of success on any given project. The quality of people is one of the first things I look for when judging the probability of success on a team.

Is Big Data Cheaper?

Some things like software are cheaper. The rest of Big Data is more expensive. Cheaping out on the expensive parts leads to project failure. It’s one of the most common ways of failing I’ve seen to fail quickly.

Related Posts

Data Teams Survey 2020-2024 Analysis

Blog Summary: (AI Summaries by Summarizes)**Total Value Creation**:**Gradual Decrease in Value Creation**:**Team Makeup and Descriptions**:**Methodologies**:**Advice**:Frequently Asked Questions (AI FAQ by Summarizes)

Data Teams Survey 2024 Results

Blog Summary: (AI Summaries by Summarizes)Companies are not fully utilizing LLMs in data engineering, with 24.7% of teams not using them at all.Only 12% of