Should you move to serverless? Is GraphQL the answer to your API woes? Should you follow the latest DevOps playbook to increase your system reliability? In the world of tech tools, there’s a lot of buzz. But it doesn’t always reflect the daily reality of programmers.
As the founder of a developer tools startup, I’ve talked with hundreds, if not thousands, of software developers over the last few years in the course of routine user research. The common theme in these conversations, even bigger than the need for the product we were building, was an overarching need that is currently underserved: building for real developers, or what I like to call the 99% Developers.
These are developers who are getting work done outside of the hip companies and frameworks, who often get neglected in conversations about “what developers want.” There’s a huge gap between what “developer-influencers” are talking about, and the daily reality of most developers. When you look at what gets covered by the tech media, or the speakers at top tech conferences, it’s often people from high-growth darlings like Airbnb or Stripe, or established, highly profitable companies like the FAANGs.
In fact, there’s a longstanding assumption that companies, outside of a small number of Silicon Valley unicorns, should aspire to have the processes of a “baby FAANG.” But this is increasingly not true. Our users would tell us, often sheepishly, that their practices look nothing like what they are “supposed to.” But for these “dark matter developers,” as Microsoft’s Scott Hanselman calls them, the practices of a Facebook or a Pinterest don’t make sense. Their user needs are different and their team needs are different.
It matters to talk about the 99% Developers because these are the developers building the software that powers our lives — insurance, health care, retail, and banking, just to name a few. It’s not only small companies that can’t easily adopt the processes of modern, tech-first companies; it’s most companies that were not built around technology and that have decades of legacy software practices firmly in place. Many of these companies move around quite a bit of money. Many of these companies handle quite a bit of our personal data. If technology innovations are not benefiting these software teams, we’re losing out on a lot of meaningful improvements to everyone’s quality of life.
In this piece, I’ll present some truths that both enterprise software buyers and builders can embrace to dispel harmful myths and improve developer experience for all.
“Trickle-down” tooling is aspirational
Because a disproportionate amount of writing and tooling comes from companies like Facebook, Netflix, LinkedIn, Google, and Amazon, many people assume there’s a trickle-down effect: Great engineers at companies with money to burn come up with good solutions to problems everyone else will have someday. It’s simply a matter of time until your typical small-to-medium business or Fortune 500 company experiences the same issues as Amazon or Facebook.
A FAANG-like company is different from an SMB or your typical Fortune 500 company along many dimensions, including scale needs, stance on building vs. buying, and makeup of the engineering team. A small number of large, well-capitalized companies have entire teams with world experts dedicated to observability, testing, developer productivity, and more. On top of this, it’s worth noting that FAANGs are optimizing around a small set of products that are digital from the start, something that’s not true of most software shops out there.
Many non-FAANG teams have a small non-expert team, or even a fraction of a non-expert engineer, to do things that FAANG-like companies have multiple teams of experts to do. These organizations rely primarily on external tools and services that they have little bandwidth for customizing.
What to do about it
First, the tech industry needs to acknowledge that organizations operating at different scales and with different engineering budgets are going to have different needs. A company that serves in the low millions of user requests per day doesn’t need to optimize its systems to the degree of a Netflix or a Google. Most companies do not have the latency, data storage, and other concerns that would lead them to write their own bespoke infrastructure components and tools like, for example, Facebook did with its Tao data store and Hive data warehousing tool — and they likely don’t have concerns that would warrant them even using such tools.
Acknowledging different demands will make space to talk about diverse needs across a wide range of organizations. For instance, companies with legacy systems that can’t afford to migrate to the newest architectures need to adopt new tools differently than newer companies, or companies that can dedicate a team to a large migration.
Recognizing that there isn’t a single aspirational company profile can help builders reach outside of the usual suspects when understanding users. This is crucial for bridging the gap between software needs and software tools in an achievable way.
Spot on by @jeanqasaur. Eg big tech invests big time in internal platform teams that are rarely present elsewhere.
Example: at Uber the mobile platform team identified slow-running tests *that my team wrote/owned* & pinged us on how we should fix it.
Let that reality sink in… https://t.co/gEZFVzvflW
— Gergely Orosz (@GergelyOrosz) November 3, 2021
There is no gold standard development environment
If you watch enough conference talks or read enough blog posts, it would appear there are many software teams out there with pristine coding standards, non-flaky unit tests, staging environments that reflect production environments, and/or smooth people processes for responding to incidents. Getting to this point just requires a strong directive from above and discipline across engineering teams.
Just like influencers in any other field, developer-influencers often describe a reality that is aspirational even for their own companies. It may be true that people writing about ideal processes live in an idealized situation where it’s possible, in which case they’re the exception that provides the rule. But most of the time — even if it’s true in one part of an organization or at one moment in time — this reality does not hold across their entire company and forever more.
For instance, Spotify admitted that their lionized approach to DevOps did not scale once their team reached a certain size. We also see examples of companies adopting the new hot technology and then reverting when things didn’t work as well as planned — for instance, Segment switching from microservices back to a monolith.
What to do about it
As the audience powering all of this, developers should be more critical in asking for the truth. We should welcome posts about “real software process” as much as we are hungry for the idealized content. If someone is working in an idealized process, with a world-class ops team and whole teams of people who exist to support improved software quality, the audience should be made aware of this! And we should welcome more talks, blog posts, and books that give guidance for “real software environments”: what coding, testing, and shipping looks like with short-staffed teams, teams without dedicated devops experts, and teams where everyone who originally built the system has left.
Another little secret: a lot of what most “developer influencers” say is fairly aspirational.
Their own companies don’t necessarily do things as smoothly as they preach to others.
This is especially true at larger companies where the culture might vary vastly between orgs/teams https://t.co/FW2aI8lppu
— Cindy Sridharan (@copyconstruct) November 4, 2021
The goal is progress, not perfection
Too many people believe that aiming for good software quality means you need to fully adopt that new technology, whether it’s microservices, GraphQL, or distributed tracing. You’re not done until you’ve switched fully over to the ideal technology.
Today, the mismatch between where “real developer” teams are and the mainstream advice given out means that many teams don’t know where to start when it comes to improving code quality or system reliability. For the 99% Developer, most of their code will never need to scale to an organization of thousands, or to billions of users. Many of these developers work on code bases older than the length of their entire careers. Most of these organizations don’t have dedicated developer tools or developer-productivity teams internally.
Pristine code is not the goal — rather, the goal is code that is as reliable and secure as it makes sense to be, given other constraints. For instance, if your company does not operate across multiple clouds deploying hundreds of changes a day, a continuous delivery system like Netflix’s Spinnaker is likely unnecessary. Similarly, developers at a company with devops experts who know how to set up and maintain observability “power tools” will likely have a much better time with those tools than a company without an expert team in place.
What to do about it
Recognize that there is progress to be made — there are lessons to be learned from companies that have devoted entire teams to perfecting their processes — but perfection in most cases is unrealistic. Rather than lifting processes wholesale, see which ones translate well to teams with fewer resources and teams with different goals.
For instance, this Google blog post provides the guideline of 60% test coverage as “acceptable,” 75% as “commendable,” and 90% as “exemplary.” When you’re a company of Google’s maturity, with the size and caliber of Google’s engineering team, this may make sense. But for most smaller, earlier-stage companies, the actual test coverage is going to be far less despite what the company might aim for. And with the rise of service-oriented architectures and external APIs — practices that are much more prevalent outside of Google — testing in production is becoming a viable alternative to the traditional unit- and integration-testing techniques where “code coverage” makes sense as a concept.
In many of these modern systems, as Honeycomb co-founder Charity Majors wrote, “Once you deploy, you aren’t testing code anymore, you’re testing systems — complex systems made up of users, code, environment, infrastructure, and a point in time.”
A good demo doesn’t show the Day 2 snags
It’s too easy to fall into the trap of assuming that demos and onboarding are indicative of the experience of using a product day in and day out. People have to make buying decisions about new products relatively quickly, so we’ve defaulted to judging a product by its demo.
Developer tool empires are built through slick gifs and video clips. Teams commit to tools — sometimes for years — after a few minutes of demo. Investors invest based on demos. Builders are told to focus on the demo — and, after that, the first 60 seconds of use — as the make-or-break parts of the product. But while being able to execute well on a demo might suggest that the team can execute on developer experience in day-to-day use, the two aren’t necessarily correlated.
It should not be a surprise that with most tools worth their price tags, most of what a developer experiences — and experiences pain around — occurs outside of the first minute of use. First, integrations with developers’ day-to-day-workflows (for instance, existing code review, CI/CD workflows, and modes of collaboration) is a better indicator of whether a product will be sticky than are initial delighters. It’s well-known among developer tools creators, for example, that integrating with GitHub and GitLab will help make your tool much more useful and appealing.
Second, there are whole classes of tools where the true test of their effectiveness doesn’t happen on Day 1. One example is debuggers. More than your development environment and more than your CI/CD environment, your debugging toolbox is the most crucial factor to your quality of life as a developer. When you fix an issue before you deploy, are you confident that the issue won’t appear in production? When you have to work late or on a day off because of a production incident, are you able to quickly identify the root cause and come up with potential fixes? Because of the complexity of these tools — and because they often don’t show their true colors until you have a major issue — debugging tools often get the least hype and the least reward for good developer experience.
What to do about it
The reason tool builders often disproportionately focus on the Day 1 experience is because of how users evaluate tools. So I’m going to give a special callout to users here. Users need to:
- Refrain from hyping up tools based on demo or Day 1 experience alone.
- Push back on tools that get you to commit to big contracts before you or your developer team have spent time working with the tool day in and day out for some period of time.
- Recommend tools to other people based on the less “sexy” dimensions that are key to your team’s productivity, such as how they integrate with workflows, or how they reduce collaboration friction within and across teams.
Doing these things will create a lot more space for a better developer experience.
Heterogeneity is here to stay
There’s often the assumption that the hot new language or framework is all that exists in someone’s system. Developers and developer-influencers alike will evangelize new tools as if those are the only tools being used: for instance, microservice architectures, GraphQL, and OpenTelemetry-based tracing for observability. “One true framework” evangelism implicitly assumes that it’s possible for organizations to switch over to using entirely that new language, tool, or framework.
I’ve encountered so many teams who say that migration will happen “next quarter.” The reality is that, even when they manage to finally start, migrations have become continuous, rather than discrete, processes. A 99% Developer team with legacy code and a lean team is probably never going to convert their entire code base over to microservices or GraphQL. For most organizations, tech stacks and tool chains are heterogeneous, a combination of the layers of languages, frameworks, and tools that have been picked up over the years.
Many of the teams we encounter will tell us that they’re starting to adopt microservices or GraphQL or OpenTelemetry. When I ask them how much of their services are within the new framework already, the answer will often be quite small, especially for organizations older than a few years. Some of these organizations will tell me that they realistically don’t expect to convert their entire code base. (For example, organizations will expect to maintain their legacy monolith alongside microservices, or REST and gRPC endpoints along with GraphQL endpoints.) For many other organizations, when I check some quarters later, they are often less far along in the planned migration than expected — and living with the reality of maintaining software across a mix of frameworks and tools.
What to do about it
Software buyers, from individual developers to directors and CIOs, know heterogeneity exists. But fully accepting this means:
- Accepting slow migrations. I’ve come across many teams that assume their problems will be solved once they finish migrating from outdated tool X to hot new tool Y, each of which lives in its own solution ecosystem. Unfortunately, the migration to tool X may not finish until the hot new tool is Z — and now you have the two-ecosystem problem again.
- Accepting legacy subsystems. I’ve come across many teams that focus on innovating their toolbox around the newer parts of their system. Unfortunately, the legacy subsystems aren’t going away — and having less tooling for them means you have longer triage and debug times when something goes wrong.
Accepting that your APIs are unlikely to converge on GraphQL will lead you to invest in more sustainable, multi-API protocol tooling. Recognizing that your org is probably not going to convert all of its legacy monolith to microservices will allow you to invest in tooling that does not neglect monitoring and debugging code in either the monolith or the microservices.
On the builder side, it’s conventional wisdom that you go hard after homogeneity for the “land” and embrace heterogeneity for the “expand.” Whether you plan for heterogeneity will greatly impact how quickly you can expand. Certain kinds of developer tools need to be custom-built for each new language or framework. For instance, a tool that provides insights only for GraphQL APIs may not easily expand to other kinds of APIs, especially because GraphQL contains more rich information than do REST or gRPC.
Other kinds of developer tools can expand across languages and frameworks easily. A SaaS tool that simply needs to be able to be called from different programming languages supports language heterogeneity, as major core components do not have to be translated to support each new language.
To start shedding unattainable software standards, let’s:
🛑 Stop thinking of software as homogeneously represented by a small number of unrepresentative companies
🗯 Start being more honest about “real software process”
🛠 Demand more solutions to the real problems!!
— ✨ Jean Yang ✨ (@jeanqasaur) November 2, 2021