- A strong programming background is crucial for Data Engineers, with many having a Master's degree or above in Computer Science with a focus on distributed systems or data.
- The best Data Engineers are not content with just programming and have started to cross-train into other fields, such as data science or marketing.
- Data Engineers are driven to create bigger and more complex systems to create data products that can be used by everyone.
- Great Data Engineers have a love or at least an interest in data and are inherently curious about what is happening and why.
- Understanding systems and distributed systems is more important than knowing specific technologies, although knowledge of Big Data technologies and APIs is necessary.
I want to share with you some of the traits that I’ve found in especially good Data Engineers. Every one of these traits may not be in every Data Engineer, but you will find several.
I can’t stress enough how important it is for a Data Engineer to have a strong programming background. Data Engineers are commonly more mid to senior in their careers. Those fresh out of school usually have a Master’s degree or above in Computer Science with focus on distributed systems or data. I have seen some especially bright junior engineers make great contributions to the team.
This will sound odd given how much I talked about the importance of programming, but the best Data Engineers are bored with just programming. That means that they’ve mastered or nearly mastered programming as a discipline. Writing another enterprise system or small data project doesn’t have much interest.
As a result, they’ve started to cross-train into other fields. These could be related to programming like data science or unrelated like marketing or analysis.
Data Engineers are bored of creating small data systems. They aren’t as complex. They want to create bigger and more complex systems. The main driver for this is their desire to create data products that can be used by everyone.
This desire to create data products comes out of a common love of data. You might have seen a Software Engineer love coding or maybe even love a language. They are happiest when coding. Data Engineers love coding and data. If there isn’t a love, there is at least an interest in data. I’ve found this distinguishes the great Data Engineers from the good Data Engineers.
They use this data because they are inherently curious about what is happening and why. They’re going to use their data to either prove or disprove that hypothesis.
I don’t focus on what technologies a Data Engineer knows. I focus on their understanding of systems and distributed systems. They obviously need to know some Big Data technologies and APIs. However, learning APIs or another technology is much easier once you know the basic architectural and design patterns of Big Data systems. A Data Engineer who has shown they can learn some Big Data technologies is likely to have the ability to learn other technologies.
I see this all the time when I train a team that is already working with Big Data technologies. They catch on quicker to the concepts because there are similarities to their other Big Data technologies. The team learns more from the training because they’re not starting from scratch.