Big Data and Analytics in the COVID-19 Era

Jesse Anderson
March 15, 2020
Blog, Business, Data Engineering, Data Engineering is hard, Magnum Opus
No Comments

Blog Summary: (AI Summaries by Summarizes)

Data teams should focus on solving real business problems rather than just storing data.
Creating models that optimize efficiency and save money is crucial, especially in the COVID-19 era.
High-quality data is essential for successful model deployment.
Organizations must ensure data security and proper handling of sensitive information, especially when working from home.
Models in production may need retraining to adapt to changing economic conditions.

Big Data and analytics are going to change in this COVID-19 era. I want to share with everyone the same messages that I’m giving my own clients. I’m hoping that this post will help those data teams that aren’t living up to their potential to start creating real value.

Focusing on Value Creation

Data teams need to stop looking for a problem to solve and solve a real business problem. The days of just looking for a business problem to solve or just storing data in the hopes the data team will eventually create value are gone (this shouldn’t have been possible in the first place). If teams aren’t creating any business value, management needs to go back to the business to see how existing and emerging problems can be solved with data.

We need to be focusing on creating models that optimize or improve efficiency. These models should save the company money or improve a process within it. Some examples would be models for pricing optimization, inventory management, ad spend, or customer acquisition. Some organizations have been delaying the deployment of a model to production until there is a higher improvement in optimization over an existing model or even the lack of a machine learning model. Pre-COVID-19 era, this sort of delay was feasible until the model improvements met the data science team’s approval. In this COVID-19 era, even marginal improvements could be the difference between a company staying afloat versus layoffs. The models could be the driver for some improvements that weren’t critical before but are vital now.

We need to be focusing on creating models that optimize or improve efficiency. These models should save the company money or improve a process within it.

While deploying models is all well and good, the organization is still constrained by the quality of its data. If it hasn’t already, the data engineering team needs to be creating value with high-quality data products and correct infrastructure so the data scientists can create meaningful models.

Has Covid-19 Really Changed Things?

In some ways, the things you’re about to read are things that companies should have been doing all along. The good times glossed over our inefficiencies, but now we need to get back to more efficient data teams.

In a basic sense, it’s even more important to stick to our fundamentals. If you feel the team has mastered the fundamentals and is creating great value, keep up the good work. If the data teams lack these fundamentals, management should be scrambling to put them in place. This is a precarious position to be in.

The rest of this article looks at different aspects of data analytics affected by the sudden new situation we’re in.

Implications of Working From Home

As part of many organizations’ responses to COVID-19, the employees are starting to work from home. For most IT-related employees, that means working with code on a laptop. For data teams, we deal with both code and data. Data teams need to be cognizant of the security and nature of the data that could be on their laptops. This data could range from public datasets all the way to PII (Personally Identifiable Information).

If organizations do decide to copy data to laptops, processes must be put into place to prevent uncontrolled dissemination of data that could hurt you as much as the spread of COVID-19. For example, employees should be clearly told that data and code can’t be copied onto a personal computer or laptop. Putting data on unsecured computers could put the organization at incredible risk. At a minimum, company data should be stored on encrypted disks with a strong key or password. The laptops should have antivirus software installed, up-to-date, and running. To combat weak or non-existent firewalls for the person’s internet connection, there should be a software firewall installed.

Working from home can expose an inadequate data infrastructure.

Working from home can expose an inadequate data infrastructure. If a member of the organization feels the need to copy data locally, apart from test data, that could be a sign that the organization doesn’t have an infrastructure that properly supported its data engineers even when they worked in the office. Ideally, the path of least resistance should be to use the organization’s existing infrastructure because it is easier than copying it locally. For organizations without the right infrastructure, the path of least resistance, or even a requirement for getting the job done, is to download the data locally. Another reason that data teams will download locally is to circumvent security measures they perceive to be excessive or difficult to deal with. They might have to connect to a VPN, then SSH to another computer, the login to another website just to get the data. The data engineering team should be watching out for usage patterns and what they say about why the staff is bypassing the infrastructure.

Changes in Models

The models in production will need to retrained or have their parameters tweaked. Over the past decade, most of the models were running with good to great economic conditions. In the COVID-19 era, they’ll need to be trained for more pessimistic economic growth.

Ideally, the economy will recover quickly. We’ll need to save these currently running models and revert back to them once the economy recovers.

Focus on Efficiency

For some organizations, minimizing the spending on compute resources wasn’t a focus in the good times. I’ve seen as high as 50% of an organization’s cloud spend being either underutilized or completely wasted. Organizations should take a look through their current usage to see what could be shut down or utilized better. They may put new processes in place to quickly identify the person who spun up some resources or the type of workload running on a cluster.

At some organizations, the move to a cloud has been put off. Moving to a cloud provider could allow for efficiency gains that aren’t possible with an on-premises cluster. Some organizations don’t understand what they could save because they looked at cloud efficiency gains purely from an IT perspective. In this perspective, most programs run 24/7 and can’t gain as much efficiency. For analytics, the demand can be spiky. During the workday, the cluster is used heavily. After the workday is over, the cluster is virtually unused. Use cases such as this are ripe for the efficiency gains that only the cloud can offer. In my experience, analytics and big data use cases can leverage the efficiencies gained from the cloud the most when compared to the rest of the organization.

Data teams should be careful about adding new technologies that aren’t giving a specific business value.

Efficiency gains and losses can be achieved through new technologies. If an analytic is inefficient and the user spends large amounts of idle time waiting for the query to finish, new technology could completely change the efficiency of the person or the entire team. Adding a new technology could lead to losses of efficiency where the team seeks to operationalize a new technology that wasn’t improving a specific business need. Adding unneeded technologies shouldn’t be done and management may want to reevaluate their project roadmap to really establish the business need for new technology. Operationalizing a technology could even lead to downtime and loss of customer goodwill. The data engineering team should be cognizant of the potential pros and cons of adding new technologies.

Workforce Reductions

Some organizations may be forced to make the difficult choice to have workforce reductions. If there is a reduction in workforce, data teams tend to have large amounts of tribal knowledge. This tribal knowledge could cause how something works or issues to be lost because that person was let go. Managers should take into account how well a data pipeline is documented and functioning.

What would the business say if most or all of the data teams were fired?

In my forthcoming Data Teams book, I ask managers to think about a hypothetical situation where the entire data team was fired. What would be the reaction? This is a scenario that should be thought about in the best and worst of economic times. In the worst economic scenarios, this will be an actual discussion from high-level managers, although they may not find it necessary to lay off the entire team. In the best economic scenarios, this exercise provides a metric for the business value created by data teams.

In the book, I show that there are generally four levels of value created by data teams. Instead of asking the data teams how much value they create, I ask the business how much value data teams created for them. Here are the 4 general responses from the business:

The data team is creating the most value. The business leaders will give a vehement, “No way!” The business is so opposed to making any changes to this lifeblood of data that’s creating incredible business value. Making a slight change or removing the teams altogether would affect their day-to-day usage of data products and, ideally, decision making. These projects and teams are creating extreme value for the business.
The project is creating minimal value. The business leader’s reaction is “meh”. Their ambivalence shows that the business isn’t really using the data products on a day-to-day basis.
A stagnated project that isn’t creating any value. The business’s reaction to a proposed cancellation is a snarky or pained “what project?” In such cases, managers promised the business that they could take advantage of data to make better decisions, but the data teams left this dream completely unrealized. The business has never had anything delivered into in their hands and couldn’t ever achieve any value.
A project is in the planning stages. The business has been promised new features, analytics, and the fixing of previous can’ts. There is a huge amount of anticipation from the business to finally get what they’ve been asking for. Now it’s time for the data teams to deliver on these promises.

As you read through these scenarios, I invite you to take an honest look at the value created by your organization’s data teams. Any project that isn’t scenario #1 isn’t living up to its potential and faces a high risk of layoffs or cancellations.

I don’t want to see organizations cutting their data teams because of low-value creation. If your project is in scenarios 2-4, I will do a free 1-hour consultation with the management team to see how you could be more effective or fix inherent issues. There isn’t any obligation to buy something and it won’t be a 1-hour sales call. I will treat the call as if you were paying me to consult and help the team. To take advantage of this, just go to my company’s contact us page and say that you want to take advantage of this consultation. From there, we will schedule a time to talk. I hope to hear from you!

Frequently Asked Questions (AI FAQ by Summarizes)

Why do data teams need to focus on solving real business problems rather than just storing data?

Data teams need to focus on solving real business problems to create value and ensure the survival of the company, rather than just storing data and hoping value will be created eventually.

Why is the quality of data essential for successful model deployment?

The quality of data is essential for successful model deployment because it requires high-quality data products and infrastructure to optimize efficiency and save money or improve processes.

How can organizations ensure data security and proper handling of sensitive information while working from home?

Organizations can ensure data security and proper handling of sensitive information while working from home by addressing inadequate data infrastructure and implementing measures to protect data.

Why should data teams be cautious about adopting new technologies that do not provide specific business value?

Data teams should be cautious about adopting new technologies that do not provide specific business value to avoid unnecessary technologies that may impact individual or team efficiency significantly.

How should managers evaluate the value created by data teams from a business perspective?

Managers should evaluate the four general levels of value created by data teams from a business perspective to prioritize projects that create the most value and preserve them, while considering the risk of layoffs or cancellations for stagnant projects.

Big Data and Analytics in the COVID-19 Era

Focusing on Value Creation

Has Covid-19 Really Changed Things?

Frequently Asked Questions (AI FAQ by Summarizes)

Why do data teams need to focus on solving real business problems rather than just storing data?

Why is the quality of data essential for successful model deployment?

How can organizations ensure data security and proper handling of sensitive information while working from home?

Why should data teams be cautious about adopting new technologies that do not provide specific business value?

How should managers evaluate the value created by data teams from a business perspective?

Related Posts

Gemini Batch API for Java

Unapologetically Technical Episode 20 – Shane Murray

Unapologetically Technical Episode 19 – Jacopo Tagliabue

Unapologetically Technical Episode 18 – Adrian Woodhead

Unapologetically Technical Episode 17 – Semih Salihoglu

Unapologetically Technical Episode 16 – David Jayatillake

Unapologetically Technical Episode 15 – Frances Perry

Unapologetically Technical Episode 14 – Cliff Crosland

Data Teams Survey 2020-2024 Analysis

Join the Newsletter