Engineer Your Core, But Only Your Core

When do you buy? When do you build?

This question of "buy vs. build" is at the heart of many a debate in companies, not only inside engineering teams, but between engineering, product management and executives.

Fact #1: Engineering is Hard

Engineering is very hard. Despite the enormous advances over the years, and the number of system tools and development frameworks and languages, every one of which is touted as a "silver bullet" to solve all of your really hard problems (warning: they won't), engineering really is complex.

This very complexity helps us understand the psychology behind engineers.

Fact #2: Engineers Take Pride in Solving Hard Problems

Want to see a happy engineer? Give him or her a really tough problem to solve, wait until they are done, and see the glow on their face.

Indeed, the drive to solve really complex problems is one of the primary reasons people choose a career in engineering.

Look at the introduction to software engineering careers at Facebook:

Working at Facebook means doing what you love. We hire trailblazers, hackers and pioneers. We want people who can solve challenging problems, make a real impact and build something big.

Want to hire great engineers? Tell them how much you will challenge them!

But this tendency has a downside.

Fact #3: Engineers Look for Hard Problems to Solve

Because engineers love doing hard work, and take pride in cracking tough nuts, they often look for hard problems to solve. This can lead to solving problems that would be really cool to solve, but whose solutions the business as a whole would be better off acquiring.

A few years back, I was working with a company building a new service. The relatively young team lead wanted to build a key data processing component from scratch. Sure, there was an open-source product that could, more or less, do the same thing, but his would be "much better".

To the SVP's credit, despite knowing that it was a "fool's errand", he gave the team lead a lot of rope. The team lead failed on the project, but learned a career-long valuable lesson. The SVP knew that the team lead was trying to solve a non-core problem, but let him learn the lesson anyways. Not every company has that luxury of time.

Fact #4: The Hardest Part of Engineering Isn't the Engineering

The hardest part of engineering isn't solving your business processes, or representing them in software.

The hardest part, actually, isn't the primary engineering at all.

The hardest part is building the systems so they can handle unforeseen problems. Whether those problems are bugs in your own code, or environmental issues like servers or data centres deciding that today is the day to go on an extended vacation, managing problems - and definitely not exacerbating them - is the hardest part of any system design.

There is a method to prepare, of course, and that is testing. But the more complex the system, the more potential combinations and permutations and edge cases, and the more difficult it becomes to actually test for the problems.

I was reminded of this again by the ever-deep Adrian Colyer's Morning Paper, which discusses methods of finding problems in the design of distributed systems. Systems, by and large, are designed in one of three major ways:

  • Single: If you just have one item, for example a database, you can be very confident about its consistent state. Either it is working, or it has died and your service is dead too. Of course, you really don't want that!
  • Failover: If you have one primary database and a backup, when the primary dies, you can make the backup the primary, and you know things are fine. However, this really does not scale well.
  • Distributed: You have 3, 4, 5 perhaps 100 databases running. Each of these connects and coordinates with the others, so that the loss of any one, or even several, has zero impact on your application. It just keeps running without any human intervention.

While the distributed solution seems to make a lot of sense, it turns out that it is very complex to keep all of the instances coordinated. What happens if you don't lose 3 out of 10, but the 3 lose connectivity to the other 7? Each side thinks the others are dead, and tries to become authoritative. Are you looking forward to the nightmare of synchronizing and resolving the data conflicts when it all comes back? Probably not.

The more complex the solution, the better it solves some problems, but the greater the complexity of the system as a whole, the exponentially greater number of ways it can misbehave and cause unrecoverable failure.

Fact #5: You Can Solve the Problem

So how do we get around the complexity?

Build what you must, buy the rest.

In every business, there are elements whose in-house expertise is crucial for the business (consultants like to call them "core competencies", another heavily abused term), and those that are not.

  • You must make payroll; you do not care how to do ACH transfers.
  • You must have an office; you do not care how to replace the lightbulbs.
  • You must enable customers to pay securely at checkout; you do not care how to do fraud checks and card verification.
  • You must offer online registration; you do not care how to check email blacklists.

Maybe you can build a better database or application server or firewall. However, the potential errors grow exponentially (or higher) with the complexity of the system.

If the design of a system is not core to your business and provides no competitive advantage, do not build it. Buy it.

How do you evaluate what you should buy and what you should build? There are three rules which are simple to describe and complex to implement:

  1. Is it available to acquire?
  2. Is it crucial to you?
  3. Does it provide competitive advantage?

Obviously, if you cannot acquire it, whether through a commercial vendor or as open-source, then you almost have no choice but to build it. Of course, sometimes it looks like you need it, but an alternate design obviates the need for it.

If it is not crucial, buy it. It isn't worth wasting energy on a Web UI request routing mesh if 99% of your business runs as an iOS app.

Finally, does it provide competitive advantage? If it does not, do not waste your energy on it. Even crucial systems can be acquired.

Let's look at a few examples:

  • Payroll: Every company in the world must run payroll. If you do not pay your employees, they leave. Yet almost no one runs their own payroll service. Can you acquire it? Sure. Try paycycle or surepayroll or paychex or ADP.  Is it crucial? Definitely. Does it provide competitive advantage? No.
  • Heroku: Before Heroku was acquired, it ran a routing mesh (written in Erlang) to route requests from its front-end Web servers (nginx) to its back-end application instances. While they acquired most and open-sourced almost everything else, they built their own mesh and kept it closed-source. Could it be acquired? To some degree. Was it crucial? Yes. Without it, they could not do all of the intelligent routing and scaling that is a key selling point. Did it provide competitive advantage? Definitely.
  • Cassandra: Cassandra is a specialized type of database, known as NoSQL. It was originally developed by Facebook to power one of their search features, yet they later open-sourced it. Could they acquire it? No. At the time it was developed, no serious alternative was available. Was it crucial? Yes. They needed it (or something like it) to power a key feature supporting growth. Did it provide competitive advantage? No. There were other ways to do it, but none anywhere near as good or performant for Facebook's needs. Yet Facebook's competitive advantage comes from its social network, not its databases. So Facebook turned it around and open-sourced it. They made it available to acquire. They benefit from broader adoption and updates, and get "street cred" for contributing to the technology community.

Summary

Choosing to build technology systems and components for your business has two costs:

  1. Build cost: It usually is more expensive to build it from scratch than take it off-the-shelf.
  2. Complexity cost: The more complex the system, exponentially more complex will be the testing and edge cases.

The complexity cost normally far exceeds the build cost. Only build that which is crucial to your business. Sometimes, you even build it and donate it, in a win-win scenario.

Do you find that your services suffer from high costs, slow deployment and lower-than-desired stability? Are you building more than you need to? Are there parts of your systems that would be better as open-sourced? Ask us to evaluate your systems. You may free up your staff, deliver faster, build your reputation and improve service availability, a win-win all around.