Amazon and Open Source Business Models and You

Blog Summary: (AI Summaries by Summarizes)
  • Open source companies like Cloudera and Confluent make money by selling support, training, consulting, and management tools for their free and open source projects.
  • Amazon Web Services (AWS) makes money by creating managed services that make it easy for customers to use open source projects like Kafka and Hadoop.
  • Apache licensed source code is very permissive, allowing companies like Amazon to create managed services without legal repercussions.
  • However, some open source projects have added clauses to their licenses to prohibit creating managed services with their code.
  • The community is upset with Amazon because they are not giving back to the open source projects they are using to create their managed services.

You might have seen some posts or tweets about Amazon using open source technology and creating managed services with them. You may not understand the reasons why people are upset with this. Let me explain as neutrally as possible the issues and background information.

But first, you’ll need to understand open source business models.

Full disclosure: I don’t own shares of Confluent, Cloudera, or Amazon. That should allow for as neutral a post as possible, though I worked at Cloudera for a while.

Most of these tweet screenshots are from this Twitter thread.

Open Source Business Models

How do you make money off something that is not only free but the source code is available? That’s the question that open source companies like Cloudera and Confluent are dealing with.

This meme is from Doug Cutting. The companies can’t sell and make money on just the technology; they have to sell other products. These other products are usually: support, training, consulting, and a console or manager that makes the operations of the product easier. They make money directly off these things.

The problem is that the open source companies have to continue to develop the original project – be it Kafka or Hadoop. Those are expensive software engineers and their salaries don’t directly make the company money. Not just that, all of their work goes into the common pool of code that makes the project better. All of the companies who use the project benefit from those contributions.

The value proposition to customers is that they will be able to support, training, easier operations, and bug fixes. This is because the company is actively developing and fixing the core project. Some companies really focus on their contributions or how they have the founders of a project working at the company.

Amazon Web Service’s Business Model

Amazon Web Services (AWS) business model is different than a typical open source company. When Amazon creates a managed service, they focus on making it easy to start or deploy the technology. WIth Kafka, for example, AWS makes it easy to start and run a Kafka cluster.

The value proposition to customers is that they can easily start using a project. AWS’ continued support or contributions to projects like Kafka or Hadoop are not a key motivation for a customer to start using it.

Open Source Licensing

Apache licensed source code is one of the more liberal licenses out there. There is nothing that legally prevents Amazon from creating a managed service using Kafka. To deal with this, some projects have started to add clauses to their licenses expressly prohibit creating managed services with their open source project.

The motivator to give back to open source projects is more of a moral or social contract. There is more social pressure or contract that companies making extensive use of the project should give back.

Update: I’ve received a decent amount of pushback on the moral or social contract part.

Others have said there isn’t even a moral or social contract. There is absolutely no obligation for a company to give back to open source, no matter how you’re using it.

Why the Community Is Upset

With that background, we can talk about the issue at hand. Most of the discourse on the subject has been on blogs and Twitter. I think Neha sums up the issue best:

The community doesn’t like that Amazon is not complying with the social contract. As you saw in the business model, Amazon’s value prop isn’t around improving Kafka; it’s around making it easy to spin up a Kafka cluster.

That sounds like a trivial thing. It shouldn’t make a difference to anyone if Amazon decides to start using Kafka.

The issue is that companies like Cloudera and Confluent can’t compete with the pricing of Amazon. Amazon only needs to employ people that write managed service code. They don’t need to employ the others working on the project directly. Amazon’s cost are significantly lower.

The ability to commercialize the open source without having to pay direct project developers makes it very cheap for Amazon to create managed services with Hadoop and Kafka. When customers focus on the cost differences, they often choose the cheapest solution. This choice is directly affecting vendors like Cloudera.

What Should Be Done?

I asked what people think should be done. The response is to start giving back at a higher than current level.

Adrian Cockcroft responded that Amazon is giving back. Amazon is open sourcing certain projects. The community issue here is if Amazon is giving back directly to projects like Hadoop or Kafka. Others would add, that Amazon should give back at a level relative to its revenue for open source managed services.

Each project has an email list of every project and the commits (code changes) that happen. The email has information like who made the change, the change itself, and who reviewed the change. Using a Google search, you can quickly query the company that made the change. For example, a search on the Kafka commit mailing list for an Amazon commit shows no commits by someone from Amazon. Likewise, a search of the Hadoop commit mailing list shows none. It’s possible that people from Amazon are contributing, but aren’t using their @amazon.com email addresses.

Felipe Hoffa did a more in-depth look at open source contributions by cloud vendors. The numbers from Amazon were quite low.

Update: Matt Wilson points out this gist with early Amazon contributions back to Hadoop. He also confirms that Amazon employees don’t use their “amazon.com” email address when committing. This is the same standard you’ll find at other open source companies. They’re encouraged to use their “apache.org” email address.

That leaves us with a more difficult question. Does Amazon need to give back? Should we even believe that Amazon should give back? There are diverse opinions on the subject. Some of them are deeper and to go the very nature and definition of open source.

Roman also wrote a post with his synopsis.

Edited: To update with more feedback from the thread.

Related Posts

Data Teams Survey 2020-2024 Analysis

Blog Summary: (AI Summaries by Summarizes)**Total Value Creation**:**Gradual Decrease in Value Creation**:**Team Makeup and Descriptions**:**Methodologies**:**Advice**:Frequently Asked Questions (AI FAQ by Summarizes)

Data Teams Survey 2024 Results

Blog Summary: (AI Summaries by Summarizes)Companies are not fully utilizing LLMs in data engineering, with 24.7% of teams not using them at all.Only 12% of