- There are two types of data engineering: SQL-focused and Big Data-focused.
- SQL-focused data engineering involves working with relational databases and processing data with SQL or a SQL-based language, sometimes using an ETL tool.
- Big Data-focused data engineering involves working with Big Data technologies like Hadoop, Cassandra, and HBase, and processing data with Big Data frameworks like MapReduce, Spark, and Flink, using programming languages like Java, Scala, and Python.
- There are also two types of data engineers: those who primarily use SQL and may have titles like DBA, SQL Developer, or ETL Developer, and those who are software engineers with specialized Big Data skills and extensive programming experience.
- It is important for managers to understand the differences between these two types of teams and to have a Big Data-focused data engineering team for Big Data projects.
There are two different types of data engineering. There are two different types of job types with the title data engineer. This is especially confusing to organizations and individuals who are starting out learning about data engineering. This confusion leads to the failure of many teams’ Big Data projects.
Types of Data Engineering
The first type of data engineering is SQL-focused. The work and primary storage of the data is in relational databases. All of the data processing is done with SQL or a SQL-based language. Sometimes, this data processing is done with an ETL tool.
The second type of data engineering is Big Data-focused. The work and primary storage of the data is in Big Data technologies like Hadoop, Cassandra, and HBase. All of the data processing is done Big Data frameworks like MapReduce, Spark, and Flink. While SQL is used, the primary processing is done with programming languages like Java, Scala, and Python.
Types of Data Engineers
The two types of data engineers closely match the types of data engineering teams.
The first type of data engineer does their data processing with SQL. They may use an ETL tool. Sometimes, their titles are DBA, SQL Developer, or ETL Developer. These engineers have little to no programming experience.
The second type of data engineer is a software engineer who has specialized in Big Data. They have extensive programming skills and can write SQL queries too. The major difference is that data engineers have the programming and SQL skills to choose between the two.
On my site and in my writing, I am always referring to the Big Data definition of data engineering and data engineer.
Why These Difference Matter To You
It’s crucial that managers know the differences between these two types of teams. Sometimes organizations will have their SQL-focused data engineering team attempt a Big Data project. These sorts of efforts are rarely successful. For Big Data projects, you will need the second type of data engineer and an data engineering team that is Big Data-focused.
I’ve written an entire book called Data Engineering Teams. I highly recommend all managers and leads read this book before starting on a Big Data project. It covers the core skills that are required for a Big Data-focused data engineering team.
For individuals, it’s important to understand the required starting skills for Big Data. While there are SQL interfaces for Big Data, you will need programming skills to get the data into a state that’s queryable. I’ve written a book for individuals who want to switch careers to Big Data. In it, I give specific advice for people with SQL-focused skills to become a Big Data-focused data engineer.
Only by knowing and understanding these two definitions can you be successful with Big Data projects. You absolutely have to have the right people for the job.