- Companies are not fully utilizing LLMs in data engineering, with 24.7% of teams not using them at all.
- Only 12% of teams are using LLMs for data processing, the most ideal use.
- Challenges in using LLMs include concerns about human-generated data, costs, and long response times.
- Data science, engineering, and operations are essential for successful data projects.
- Operations teams are lacking in half of the respondents, indicating a need for improvement in team representation.
In the spring of 2024, I ran a new survey to gather more data for my Data Teams book and update my 2023 and 2020 surveys. In total, we had 81 respondents.
This survey was designed to get information about how management uses data teams, the value they’re creating, and how they’re creating it. The survey asked about the best and worst practices that teams are using or experiencing. We rounded out the survey by asking about the economic effects on data teams. In this post, I’ll analyze the results and what they mean.
GenAI
One of the most significant changes in data engineering between 2023 and 2024 was the release of LLMs that could handle human language much better than ever before.
I think data teams aren’t using LLMs as much as they could, and the survey results bear this out. As we see, 24.7% of teams aren’t using LLMs. Those using LLMs primarily do so for code generation, ideation or copy creation, and code debugging.
The biggest and most ideal use of LLMs for data teams, data processing, is only used by 12% of teams and behind an API endpoint for 14%. I think data teams aren’t using LLMs because they may think they don’t have human-generated data, the cost associated with LLMs, or the long response times when processing large amounts of data. There are ways of dealing with the costs and response times that many teams aren’t aware of.
Data Teams
The fundamental thesis of Data Teams is that companies need data science, engineering, and operations to be successful in their data projects. We start by asking some questions about each respondent’s data team.
We see that data science and data engineering are well-represented. However, operations teams are only present in half of the respondents.
Merely saying you have a team doesn’t mean it’s the right team. The individual contributors must meet the criteria and definitions to represent the job title. We see well-represented responses for data scientists and data engineers with operations lagging similarly.
Overall, we see opportunities to improve operations. These improvements include the presence of operations and getting the right people in place.
Maturity and Success
It’s essential to gauge how far the respondents are in their big data journey.
73.4% of respondents said they are in production or further along, while 26.6% are in pre-production.
I’ve found perceptions of success to be highly varied. To achieve a higher level of success, I asked two questions. I asked respondents how they felt and what the business would say about the success (the higher the number, the more successful the project). From the responses, the individuals thought that the business would say they’re more successful than what they would say.
To get a combined view of the business and personal opinions of success, I added the two numbers to get a range of 1 to 10. This combination showed predominantly 6 to 8 ratings from respondents.
Weakest and Strongest Points
Each year, I ask people to write responses to the questions “What do you think are the weakest points in your management of data teams?” and “What do you think you really nailed in your management of data teams?” In previous years, I created a word cloud to show commonalities. Now, we have LLMs, which can take large amounts of text and create meaningful summaries.
Reading through the summaries, we see the common themes from previous surveys. Data teams need help aligning and communicating with the business. Data teams do well with data products and technical practices.
Summary for “What do you think are the weakest points in your management of data teams?”
- Lack of Business Alignment: Many responses highlighted a disconnect between data teams and business needs, including understanding domain knowledge, communicating data value to stakeholders, and aligning team maturity with organizational needs.
- Team Management & Communication: Issues with delegation, long-term planning, resource allocation, and communicating business value were frequently mentioned.
- Data Strategy & Processes: Several weaknesses revolved around data governance, data ownership, data lineage, data quality, and establishing data management practices (DataOps).
- Other: Additional concerns included lack of budget, innovation, training, and managing a decentralized workforce.
Overall, the responses show a gap between data teams and the broader organization, with challenges in communication, strategy, and operational efficiency.
Here is the summary for “What do you think you really nailed in your management of data teams?”
Successful data team management involves building high-performing teams, aligning data with business goals, and leveraging modern tools and processes. Key elements include:
- Strong team culture with collaboration, learning, and psychological safety.
- Deep understanding of business needs and data consumer requirements.
- Effective use of data platforms, CI/CD, and data quality practices.
- Transparent communication and collaboration with stakeholders.
- Focus on delivering data products that drive business value.
How Did They Do It?
The respondents consistently selected friction as their biggest challenge. After that, the two most common issues revolved around needing more individual contributors and poor feedback.
Regarding best practices, teams value working with businesses on data projects. Three other consistent best practices were continuous integration, having a qualified data engineering team, and leveraging automation to make tasks easier.
You’ll remember the questions about the value a person and the business created. I found it interesting to just look at what the highest and lowest value creation respondents said.
The highest value creation (combined score of 10) respondents selected having all the teams with the right skills as a best practice. They focused on creating velocity and using automation. For their write-ins, they added “listening and sharing responsibility’ along with using CI/CD.
The lowest value creation (combined score of 3) respondents selected missing many, often all, of the data teams as a significant contributor to low-value creation. They also point to friction as getting in the way. Oddly enough, all of them “hired a consulting company, and they aren’t delivering,” which is a widespread way I see larger companies fail with data projects. For their write-ins, they added “shifting priorities,” “not being able to persuade the rest of the company that change is ok,” and “poor leadership.”
Remote Work, Economic Trends, and Data Teams
Many data teams are working remotely right now. I wanted to determine if this was negatively affecting them. There are several reasons that data teams could be negatively impacted, such as home distractions, lack of cluster access, or improper cluster setups. The survey respondents said that they weren’t being affected negatively or that it was neutral.
For some companies, the recent economic issues were a wake-up call for data. I asked survey respondents to tell me if the current economy affected their perception of the value of data (higher numbers strongly agree, and lower numbers strongly disagree). For most respondents, the economic impact was either neutral, or they agreed that their views changed.
The current economic downturn brought tremendous changes to companies worldwide. I asked if these business changes affect their data strategies. More than half of the people said no, while the other half were either partially or fully pivoting.
With all of the announcements of layoffs, I wanted to get an idea of how it directly affects the data teams. 56.8% of respondents said they are increasing the size of the team, and 25.9% are keeping things the same. Only 8.6% said there could be some decrease.
These numbers confirm what I’ve seen and heard anecdotally. Company-wide layoffs weren’t affecting the data because they were already understaffed, and any further decrease would compound issues. I highly recommend that data teams not take layoffs lightly and focus on creating value.
What Good And Bad Looks Like
A big part of my work and the goal of this survey is to establish best practices with data. This data allows us to see what the lowest- and highest-value-creating data teams are doing. I’ve also added methodology to the mix.
To establish the bad, I took all the responses that created a total value creation of 4 and lower values. For the good, I took all the responses that created a total value creation of 9 and above.
There are some surprises about the breakdown of methodologies. I expected a good representation of DataOps, Data Mesh, and Center of Excellence. I am surprised to see how many people use no methodology or a homegrown one. Perhaps I’ll have to dig into what these homegrown methodologies are.
I wasn’t surprised about the lack of usage of Data Fabric. As much as Gartner is pushing it, I don’t see it in the field much. Despite Gartner saying Data Mesh is obsolete, 20.3% of respondents used it.
When comparing low- and high-value creation, we see they use some of the same methodologies. This shared usage shows that there are broadly applicable methodologies to help low-value creation teams improve their value creation. However, high-value creation teams only use some methodologies.
Some problems always persist, no matter how hard you try or how successful the team is. Both high-value and low-value teams share some of the same challenges. However, high-value creation teams are experiencing more advanced challenges.
The most significant difference between low-value creation teams and high-value creation teams is their use of best practices. The high-value creation teams use far more best practices than their low-value creation counterparts. While the challenges were similar, best practices set teams apart.The most significant difference between low-value creation teams and high-value creation teams is their use of best practices. The high-value creation teams use far more best practices than their low-value creation counterparts.
The final comparison and significant differences are in the team makeup and description. The low-value creation teams skew toward one or two, while the high-value creation teams skew toward two or three. This divide supports my thesis that all three teams are required to generate the highest possible value.
Demographic Data
Since the survey concerns management, we’ll start with the breakdown of positions. 55.5% have a management position. The other positions represented were data engineers, architects, consultants, project managers, and project managers.
Another critical question is the size of the companies represented. Companies of different sizes have different organizational needs, and we can see many employees represented.
Key Takeaways
The data clearly shows a correlation between value creation and having data teams. The highest-value producers credit their data teams, while the lowest-value producers lament their lack of data teams. The highest-value-creating teams are doing the most best practices.The data clearly shows a correlation between value creation and having data teams. The highest-value producers credit their data teams, while the lowest-value producers lament their lack of data teams.
Management must look at friction and its impact on the data teams. For some companies, this means data projects go nowhere or underperform. Working well with the business side is equally important.
We can see that remote work isn’t affecting teams’ productivity. In some companies, the economy changes the perception of data within the company and causes them to pivot their data strategy. Management should look for productivity issues and verify that their data strategy doesn’t need to be slightly updated or pivoted to leverage data better.
If you’d like to learn more about these results or how to use them to accelerate your data team, I would be happy to talk to you further. Please reach out to me here.
Thanks to everyone who filled out the survey and helped me promote it. It represents a unique look at what’s happening in a vendor-neutral environment.
Frequently Asked Questions (AI FAQ by Summarizes)What is the significance of the release of LLMs in 2024?
The release of LLMs in 2024 significantly improved handling human language in data engineering.
What percentage of data teams are not utilizing LLMs?
24.7% of teams are not using LLMs at all.
What is considered the most ideal use of LLMs?
Only 12% of teams are using LLMs for data processing, which is considered the most ideal use.
What are some challenges in using LLMs?
Challenges in using LLMs include concerns about human-generated data, costs, and long response times.
What are the key roles needed for successful data projects?
Companies need data science, engineering, and operations to succeed in data projects.
What percentage of respondents are in production or further along in their big data journey?
73.4% of respondents are in production or further along in their big data journey.
How does the perception of success vary in data projects?
The perception of success varies, with individuals often rating their success higher than what the business would rate.
What is crucial for successful data projects?
Collaboration between data teams and the business side is crucial for successful data projects.
What is emphasized by companies increasing the size of their data teams?
Companies are increasing the size of their data teams rather than decreasing them, emphasizing the importance of data teams in creating value.