Q and A: Is a Data Engineer the same thing as a BI or DBA?

Blog Summary: (AI Summaries by Summarizes)
  • A Data Engineer is someone who specializes in creating software solutions around data, predominantly based around Hadoop, Spark, and the open source Big Data ecosystem.
  • Data Engineers are not the same as DBAs, Business Intelligence, Data Analysts, or ETL Developers, but people with these titles can become Data Engineers with training and new skills.
  • Data Engineers are tasked with creating data pipelines and data products, which are often outside the abilities of non-programmers because they require custom programming and code.
  • A Data Engineer's primary language needs to be Java, but they also need to know SQL and at least one dynamic language like Python or Scala.
  • Virtually every project in the Big Data ecosystem has a Java API, but some pipelines will be a mix of Java, SQL, and a dynamic language.

Today’s blog post comes from a question from a subscriber on my mailing list. The question come from Alpesh D.:

I have been getting your emails and they all seem to make sense. However, did I understand it correct that you believe all big data engineers need to be to use Java? I come from a heavy SQL, MPP data warehousing and BI background. With having done shell scripting from my days when I was a DBA I am able to pick up Python and move ahead but Java seems like a little too much. What are your thoughts?

I think your questions could be restated as two questions:

  • Is a Data Engineer the same thing as a BI or DBA?
  • Does a Data Engineer need to use Java?

Is a Data Engineer the same thing as a BI or DBA?

A Data Engineer is someone who has specialized their skills in creating software solutions around data. Their skills are predominantly based around Hadoop, Spark, and the open source Big Data ecosystem. They usually program in Java, Scala, or Python. They have an in-depth knowledge of creating data pipelines. Data pipelines are how data is brought in, processed, and create some kind of business value. This business value is usually reports, analytics, and dashboarding. More advanced examples are fraud analytics or predictive analytics pipelines.

They are not a DBA (Database Administrator), Business Intelligence, Data Analyst, or ETL Developer. That’s not to say a person with these titles couldn’t be a Data Engineer. Rather, people with these titles will need training and probably entirely new skills to become a Data Engineer. Usually, they’ll need more programming skills and Big Data skills than most people with these titles.

Data Engineers are tasked with creating data pipelines and data products. Complex data pipelines are often outside the abilities of non-programmers because they require custom programming and code.

Does a Data Engineer need to use Java?

A Data Engineer’s primary language needs to be Java. They’ll also need to know SQL and I highly recommend they know at least one dynamic language like Python or Scala.

If you look around the Big Data ecosystem, virtually every one of the projects has a Java API. Some projects may support a Java API and another language. That doesn’t mean everything in a data pipeline is limited to Java. Some pipelines will be a mix of Java, SQL, and a dynamic language.

I’ve trained at companies where their data team was limited to a knowledge of SQL. They are severely limited in what they can accomplish with SQL. You can do some interesting things with SQL and I recommend using SQL for some operations. But when SQL is your only tool, you can’t use the other ecosystem tools that don’t have a SQL interface and, if SQL couldn’t do it, it simply wasn’t done. They had no other alternative to create something else.

Join my mailing list and I might answer your question next time.

Related Posts

Data Teams Survey 2020-2024 Analysis

Blog Summary: (AI Summaries by Summarizes)**Total Value Creation**:**Gradual Decrease in Value Creation**:**Team Makeup and Descriptions**:**Methodologies**:**Advice**:Frequently Asked Questions (AI FAQ by Summarizes)

Data Teams Survey 2024 Results

Blog Summary: (AI Summaries by Summarizes)Companies are not fully utilizing LLMs in data engineering, with 24.7% of teams not using them at all.Only 12% of