At work, there was some chatter going on around the topic of helping teams focus more. I decided to pull on that string for a bit, and this document was the result. It turned out a little longer than I expected, so my apologies if you weren’t planning to read quite this large a wall of text! But it was interesting to think about; I won’t claim that anything here is original, I’m mostly collecting existing ideas, but I enjoyed putting it all in one place. So I asked if it was okay to put on my personal blog as well; many thanks to Sumo Logic for letting me do that. (And if you need a tool to help you store, analyze, and learn from your logs, give Sumo a look!)

Introduction

In this document, I’m talking about what it looks like when a team works with focus: what it looks like for a team to work with focus at longer time scales (weeks to months), what it looks like for a team to work with focus at smaller time scales (over the course of a day), and what some of the resulting tradeoffs would be. At larger time scales, my answer is: the team works on as few projects at the same time as possible; at smaller scales, my answer is: the team guards against uncontrolled interruptions by inserting a triage step into any new incoming requests. There’s a lot of subtlety behind those simple answers, however; most of this document is spent digging into those subtleties.

Focus at the Project Level

First, I want to talk about what focus means for teams at the level of their week-to-week, month-to-month, quarter-to-quarter work. At this level, teams have projects that they’re working on; so the question is what it means to focus when working on projects.

Focus Leads to Value

A thought experiment: imagine that a six-person team has six projects that it’s planning to work on over the next two quarters. Assume that each project takes six person-months to complete, and will save or earn $10k / month once it’s implemented, but it will be worth nothing until it is complete. Six months from now, all six projects will be complete, delivering a total value of $60k / month.

But the team has choices for how to get there, and the path they choose strongly affects the value delivered over the initial six months. If each team member works on a separate project, then no value will be delivered until the six months are up. If, however, we put two people on each project, then at the end of three months, three of them will be delivered, so for the 4th / 5th / 6th months we’ll get $30k / month, or a total of $90k of extra value. If we put three people on each project, then we’ll deliver two at the end of month 2, two more at the end of month 4, and the remaining two at the end of month six, so we’ll have 2 * 4 * $10k + 2 * 2 * $10k = $120k of extra value. And if the whole team swarms on each project, delivering one a month, then the extra value is $50k + $40k + $30k + $20k + $10k = $150k.

So, just by focusing, the team gets free extra value! Focusing more delivers more value, but even the starting level of extra focus (doubling up people on projects) delivers 60% of the potential extra value in this example.

I should instead say “just” by focusing, because in sentences like the above, the word “just” is usually a warning that something is being swept under the carpet; that’s certainly true in this example. Here are some ways in which this simplified example isn’t realistic:

  • Not all of the projects will deliver the same value.
  • Not all of the projects will take the same amount of time.
  • We can’t reliably predict the value of projects in advance.
  • We can’t reliably predict the time it will take to complete projects in advance.
  • Projects might deliver partial value along the way, instead of having the value only appear once the project is completed.
  • Projects won’t deliver a constant amount of value per month after being delivered.
  • It is difficult to break down tasks within a project in a way that will enable all team members to work on it productively in parallel.
  • Not all team members will be able to work on every project at the same efficiency.

I’ll go through those in subsequent sections. But still, there’s a basic takeaway from the thought experiment:

Lesson 1: Focus on fewer tasks in order to benefit from them earlier.

Lesson 1a: Total focus is great, but partial focus makes a big difference too.

Prioritization Between Projects of Different Value

Now let’s change up the example by assuming the projects deliver values of $25k, $15k, $10k, $5k, $3k, $2k, instead of all having the same value. The total value still adds up to $60k, but now the sequencing of the projects matters. If the team swarms on them, focusing on delivering one per month, then, in the optimal order, the value delivered in the first six months is 5 * $25k + 4 * $15k + 3 * $10k + 2 * $5k + 2 * $3k = $228k. If they deliver one a month in the least optimal order, then the value turns out to be only $72k. Both of those are a lot better than delivering no value in the first six months (and even the bad order is almost as good as the value in the “deliver in two batches” case of the first example), so focus is still good, but we get a factor of three difference by getting the sequence right. (The Pareto Principle suggests that real world examples of this will actually be significantly more extreme than this made-up example, for what it’s worth.)

This is important enough that I almost called this document “On Prioritization” rather than “On Focus”. The two topics are certainly tightly linked: if you’re going to focus, you need to decide what you’re focusing on first, and you probably want to start by focusing on the most valuable items! (Or the items that are most important in some other sense.) Conversely, if you really care about prioritization, then you have to act on those priorities: this means that you do the high priority stuff before the low priority stuff, which means that you need to focus.

Lesson 2: Do the highest value tasks first.

Lesson 2a: Even when you get Lesson 2 wrong, Lesson 1 still holds.

Smaller and Larger Projects

Now let’s modify the first thought experiment in a different way: say that the different projects take 2.5 months, 1.5 months, 1 month, .5 months, .3 months, and .2 months respectively. And, let’s assume that they each deliver $10k/month in value when completed. (You may be objecting to that last sentence; I’ll say more about that below.)

Then, of course, for best value delivery, we want to start by focusing on the shortest project and end with the longest project; the excess value delivered over the course of the first 6 months in that situation is $10k * (5.8 + 5.5 + 5 + 4 + 2.5) = $228k. And, conversely, if the team delivers them in the least optimal order then the excess value is $10k * (3.5 + 2 + 1 + .5 + .2) = $72k. (Familiar numbers!)

Of course, to really figure out the correct order, you would ideally figure out both the time and value and prioritize by “the most bang for the buck”. (Or “the most buck for the effort”, I suppose.)

In a world where the value of projects was proportional to the time taken, that last point would actually mean that the order in which you did projects wouldn’t particularly matter. That isn’t the case, however: we can all dream up no end of low-value time-consuming projects! If I had to guess, those two variables are only very faintly correlated, closer to being uncorrelated than being strongly correlated.

Lesson 3: Do the fastest tasks first.

Lesson 3a: Even when you get Lesson 3 wrong, Lesson 1 still holds.

Uncertainty around Value

Using Lesson 2 to sequence the order of tasks only works if you can predict the value. Sometimes we can: we’ve had pretty good success at Sumo with predicting the value of cost savings projects. Sometimes it’s harder: maybe you can tie a feature to specific customer deals, but sometimes those deals don’t materialize even with that feature, sometimes we close a deal even without a feature that a customer wants, and of course we hope that a feature will appeal to customers in the future (whether by attracting new customers or keeping existing customers happy). And work around improving operations, making development easier, etc. is also hard to precisely value.

We can spend effort to try to get a better idea of the expected value for a piece of work, and sometimes we should do that. But we also shouldn’t be under the illusion that the estimates that come out of that work will be at all accurate. Frequently, the only way to get an estimate that’s significantly better than our gut feeling is to implement the task and see what happens. And, fortunately, as per Lesson 1 and the bad example in Lesson 2, even focusing on tasks in the wrong order is still better than implementing everything in parallel.

Also, even though we don’t really know how users will react to a feature until it’s out there, we can still use that fact to our advantage: if we can get something out there and see how users react, that will give us very valuable information about whether that’s an area to focus on in the future. (Value delivered doesn’t always show up in the form of immediate dollars.)

Lesson 4: Don’t spend too much time worrying about detailed estimates of value: gut feelings are okay.

Lesson 4a: Get something out there quickly to learn from.

Uncertainty around Time

A lot of software development processes have a step where teams estimate how much time a task will take, to support Lesson 3. And that’s useful!

But also, time estimation is hard. I would even say it’s really hard: it’s not as hard as estimating value, but what I mean by that is more “estimating value is extremely difficult to get right” rather than “estimating time is easy”, because estimating time is definitely not easy. One good survey of the topic is McConnell’s Software Estimation book; the main takeaway that I got from that book is: no matter how hard you work, you won’t be able to reliably estimate the duration of tasks better than a factor of two error bar, and even getting to that level requires both a significant amount of time spent developing skill at estimating accurately and a significant amount of time spent on each individual project that you want to estimate. Almost all the time, doing that level of estimation work isn’t worth it: the benefits aren’t there, and also people asking for estimates aren’t likely to have their actual needs met by providing estimates with a 2x error bound. (I’ll talk about actual needs more below.) Fortunately, as with Lesson 4, just focusing on getting something out there has value, even if your estimates are wrong.

Some other observations from my experience:

  • Estimates are a lot more likely to be too short rather than too long. (Unless they have a lot of disciplined practice at estimation, an engineer’s estimate will be how they envision the project going if almost everything goes well.)
  • The long tail of estimation errors is very real: a project can easily take not even 2x or 3x as long as you estimate but 5x or 10x or long enough that it gets canceled before it gets finished.
  • The smaller a task is, the more likely you are to be able to estimate it accurately. (I remember working on a team that estimated projects as 1, 2, or 3 points, and we quickly discovered that the 1 and 2 point estimates weren’t too far off but a 3 point task could easily take 15 times as long as a 2 point task instead of just 1.5 times as long, and there were many examples that took longer than that.)

Lesson 5: Don’t spend too much time worrying about detailed estimates of time, they’re only a little more likely to be accurate than estimates of value.

Lesson 5a: Small projects are much easier to estimate and much less likely to balloon out of control than larger projects.

Partial Delivery of Projects

In our examples so far, we’ve assumed that the value in each project only shows up when the project is completed. But frequently you can deliver a project in phases, where each phase delivers real value.

The good news is that our model here already accounts for that: just rename each phase as a project. Once you’ve done that, Lesson 1 applies immediately; Lessons 2 and 3 are harder, because you might have dependencies, but if you can break those dependencies, then you can apply Lessons 2 and 3 as well.

Lesson 4a is also key here. If you have a big project in your mind, your vision of the later parts of the project are probably pretty hazy: you might not know what the later parts will exactly look like, you probably don’t know how customers will react to them. But if you can deliver the project in phases, then the early phases can help you guide the design and sequencing of later phases. In fact, the value of earlier phases frequently comes largely in the form of this sort of learning: it’s not so much that the early phases directly make a lot of money, their value is that they instead show where to focus your future work in order to increase value.

Lesson 6: Break down a large project into many smaller value-delivering projects.

Lesson 6a: Use the results of the earlier projects to inform the later projects.

Small Projects

Lessons 5a and 6 both push us in the direction of shrinking the size of projects, for different reasons. And there are other reasons why smaller projects are good as well: it’s easier to change course at the end of a project than when you’re in the middle of a project, so smaller projects give you more options for when to change course; and it just feels good to complete something, so smaller projects give you more frequent psychological boosts.

There are limits for how far you want to slice, but you can push them pretty far down: for example, having a team make a user-visible change to production every day is a reasonable goal in many circumstances. And usually, when there are barriers to that sort of frequent change (e.g. long build times, or untrusted test suites that need augmentation from manual testing), removing those barriers provides benefits beyond enabling smaller projects. (Trustworthy test suites are good no matter what!)

Lesson 7: Keep your projects small.

Lesson 7a: No, smaller than that.

Lesson 7b: Fix the bottlenecks that get in the way of making projects small.

Hard Deadlines

In Lesson 5, I claimed that accurately estimating is difficult. Nonetheless, leaders continue to ask for estimates from teams. Part of that is because of Lesson 3: the size of a task affects how high a priority it is. And part of that is to be able to predict in advance how much work will get done by a given point in time.

That latter case in turn has a couple of different rationales. Sometimes, management asks for time estimates because they feel uncomfortable about whether work is proceeding well: one way to get a handle on that is to see if people are making their estimates. And, if work isn’t proceeding well, why not: are people not focusing properly, or is there some piece of new information that we need to consider? Engineers often don’t like this branch of the analysis, because of course they’re working hard, it’s just that surprises happen; honestly, it’s not my favorite branch either. But I will ask people with that objection: do they really believe that managers don’t ever need to intervene with engineers’ work? Even in a magical world where engineers are always working with focus, sometimes engineers get stuck and need help, and a good team needs a way to detect and help with that. And asking “are the dates slipping?” is one way to try to detect that.

It’s not a great way to try to detect that, though: it doesn’t take into account the very low precision of even good estimates, and it’s a relatively laggy indicator. My favorite tool for this is to instead lean into Lesson 7 and see if there’s a regular hum of tasks and even projects getting completed. The more you follow Lesson 7a, the less laggy this is: if the normal case is checking in code every day that directly affects production, then you can detect hiccups after a day or two. If a team is functioning well enough in that state, people will even be glad to get the help: instead of reacting by saying “don’t you trust me to be working hard?”, people will respond “yeah, I’ve been banging my head against this for a day, I appreciate other people volunteering to help me figure out what I’m missing”.

So that’s the case of fitting estimates into artificial deadlines. But also sometimes there are hard deadlines: maybe there’s a trade show coming up and your team is definitely going to show something at the trade show. So the company wants to make sure that there’s something good there, and they need to figure out the shape of that thing to spin up large-scale marketing activities, say. There is real, tangible value of hitting the dates in situations like that.

But that doesn’t mean that Lesson 5 isn’t true, even though it’s inconvenient! The value of hitting your dates, though, means that Lesson 2 says that we should prioritize tasks associated with hard deadlines higher than we would otherwise. And Lesson 5 also implies that we should take Lesson 6 particularly seriously in this case: we’re going to show something at that trade show, but we can’t predict exactly what we’ll have done by then, so let’s give ourselves as many options as possible so that, no matter how much we get done, it will be a coherent amount of work with as many high-value components as possible.

In sum:

Lesson 8: Don’t create artificial deadlines and focus on meeting estimates associated with them: focus instead on producing a steady stream of value, and keeping the flow of that stream healthy.

Lesson 8a: If you have an actual hard deadline, prioritize projects associated with that deadline above projects that aren’t associated with that deadline, and break down / sequence your work so that the most important stuff gets done first.

How a Project Delivers Value Over Time

In my initial example, I assumed that a project delivered a fixed amount of value each month after delivery; and, aside from the caveat from Lesson 6, I kept that assumption for the rest of the examples. The problem is that this assumption is completely untrue: some projects launch small and then grow in value as they get more adoption; some projects launch with a big splash and then diminish in value soon after their launch; and nothing lasts forever.

Having said that, I’m not sure what focus-related lessons to draw from that? People who are experts at estimating value, who really want to do Lesson 2 well, need to come up with a richer model of value that better reflects the realities of value delivery. But no matter the shape of the value curve, moving that curve earlier in time is good (or, from a financial point of view, a dollar today is worth more than a dollar tomorrow), so Lesson 1 still holds. And, as per Lesson 2a, Lesson 1 still holds even if your value estimates are wrong.

So: no lessons in this section.

Working Productively in Parallel

Lesson 1 says that you should focus on as few projects as possible at once, which in turn suggests that you should want to parallelize the projects that you are currently focusing on. But what if the work on those projects is inherently sequential?

That does sometimes happen; in my experience, a creative team can find quite a lot of potential parallelization in most projects, however. Having a 5–10 person team work on two projects at once is pretty normal; and, as per Lesson 1a, a team working on two projects at a time in that example delivers about 80% of the extra value (compared to a no parallelization scenario) that you would get if the whole team worked in parallel on a single project, which is quite good.

Also, if there’s a single task that is blocking a bunch of other tasks, then that task is a very central one, so consider throwing multiple people at that task, even if you’re a team that doesn’t normally pair program: the more central a task is, the more benefits you’ll get from having multiple eyes on it and having multiple people be familiar with it. For example, I’ve seen a lot of instances where a single person does initial design for a project, and that usually leads to a situation where that person is the only person who feels like they really own the project; this in turn directly limits the effectiveness with which other people can work on that project. Having multiple people work on the initial design significantly helps with that risk.

Lesson 9: Try to figure out ways to parallelize the work within a single project, but don’t worry if you hit a limit.

Lesson 9a: Throw multiple people at a single task if it’s central enough.

Skill Differentials

If a team is used to working on a bunch of projects in parallel, then asking the whole team to focus on one or two projects is, honestly, pretty jarring. There’s often only one expert on the team in the area of your chosen initial project; people who aren’t that expert feel like they’re working at a small fraction of the speed at which they’re capable of and are worried that they’re making mistakes, whereas the expert feels like they’re spending all of their time answering questions, so they’re slowed down too!

This is painful; it is also temporary, it doesn’t take very long for this to get a lot better. And going through that pain is worth it, because the comfort of working on a siloed team hides quite a lot of risk.

One of the risks is that, on a siloed team, it’s almost impossible to follow Lesson 2. Instead of asking the question “what’s most valuable for the team to work on”, the implicit question behind prioritization discussions turns into “what’s most valuable for person A to work on, what’s most valuable for person B to work on, what’s most valuable for person C to work on, etc.”. And that’s a much worse question to ask; I don’t think it’s an exaggeration to say that, by asking that latter question, you’re giving up the focus on value entirely, you’re instead prioritizing having people be busy over team-wide value. (The difficult thing in software development planning isn’t in finding something of value for engineers to do, it’s in choosing the most valuable tasks out of a large number of valuable tasks.)

But individual ownership creates other more direct risks, too. What if the person who owns a component goes on vacation or gets sick, and then a crisis arises where that person’s knowledge would be really helpful? (I’ve seen organizations where the answer to that is “pull them into outages even if they’re on vacation”; that is a really crappy answer.) What if that person quits? If one component is a frequent source of production issues, are you going to be paging the same person over and over? (Hopefully you’re on top of production issues enough that any production issues are rare; if so, congratulations, you don’t have to worry as much about that last one.)

Also, none of our design skills are perfect. If there’s a component in your system that only one person knows how to work well with, chances are that there are some rough edges in the design of that component that would be helped by another set of eyes.

I don’t want to go 100% down the route of “everybody should be able to work on anything”. I’ve been in organizations that tried to have a bunch of teams pull from the same backlog, to increase the flexibility with which the organization could respond to changes in priorities; that doesn’t turn out well either, so while I’m strongly against individuals owning components, I’m strongly for teams owning components. And, even within teams, chances are that some people will know more about some components or technologies than other people; that’s okay as long as it doesn’t get out of hand. The arguments in this section do make a case for cross-functional teams that I find pretty convincing, but I’m willing to bend on that up to a point. (It’s similar to Lesson 9: having the whole team work on one project is the best, but having the team work on two projects is fine too.)

But the mindset that you need is: whenever you find yourself thinking “work on area A would go faster if person X worked on it”, realize that what you’re saying is “area A is a source of risk for our team”, and react to that by spreading the knowledge. Don’t fall into the trap of always assigning a given task to the person who most recently worked on that code – they’re arguably exactly the wrong person to work on that task! Sure, have that person consult on the task (or pair on the task, if your team does pair programming), but have somebody else take the lead. It doesn’t take very much time doing that before most of your team is comfortable with working on most stuff that comes up in practice, and you’ll feel like you’re working as efficiently as you were before most of the time; I bet your team will feel like more of a team, too.

And if you really do find it difficult to cross-train team members, consider reacting to that by splitting the team: that’s frequently the right thing to do, especially if the team is a little on the large side either in terms of people or in terms of the number of domain areas that the team is covering. If the resulting teams can each work autonomously and if, based on past experience, both teams will have a steady stream of high-value work, then splitting the team is an excellent choice.

Lesson 10: Knowledge silos are a source of risk: respond to that risk by aggressively cross-training whenever you notice silos developing.

Lesson 10a: Structure your teams so that each team has a stream of high-value work to focus on that you can comfortably prioritize at the team level.

Focus at Smaller Scales

We started off by talking about multi-month projects; then, by focusing on fewer projects, we shrunk those projects to a month or two. Later on, we broke down those projects into smaller sub-projects: that brings us down into the range of a week or two, or even to a day or two if we break them down enough. At this range, the distinction between tasks and projects start blurring. (The “User” part of the term “User Story” isn’t there to tell you to phrase your tasks in a strange way, it’s to get you to think about how to craft your tasks to deliver user-visible value.)

But, now that we’re at the time scale of our day to day work, we’ll find a lot of other activities competing for our time: unplanned maintenance work of various sorts, Slack messages requesting help, production issues to deal with, etc. So, in the second half of this document, I want to think about what it looks like to focus on the day-to-day work of a team.

Planned Work

The projects that we talked about in the first half are going to translate into stories / tasks: chunks of work that will take an engineer a few hours to a few days. (But not as long as a few weeks.) And you might have other chunks of work that have a similar flavor to them: e.g. maybe some sort of global infrastructure upgrade has created tasks for your team. There might also be routine work that has to be done on a regular cadence. So you’ve got a collection of tasks that are known in advance, and you need to get them done.

As per Lesson 1 above, you’ll want to prioritize these tasks: don’t pick them at random, use Lesson 2 and/or Lesson 3 to give guidance as to how to order them. The goal of this is to get a single prioritized list of tasks: then your team can focus on knocking them off in order. (And hopefully you’ve been following Lesson 10, so the team can actually attack them in order, instead of having people on the team who can’t help out with the top tasks!) That does raise the question as to who prioritizes the list and exactly how they do it; for purposes of this document, though, all that matters is that the list gets prioritized in a way that’s consistent with Lessons 1 / 2 / 3.

Unless you’ve got quite unusual working norms or a tiny team, you won’t be able to apply Lesson 1 at full strength when working with tasks of this size, so the whole team won’t focus on the top item. A reasonable goal is to have the team only working on the top N items at any given time, where N is the number of team members, but that goal frequently starts off as an aspirational one: if teams haven’t taken Lesson 8 to heart, then one way that that manifests itself is if the number of active tasks is more than the number of team members, often significantly more. So the flow of this prioritized list is a very useful diagnostic sign for how well the team is focusing on the most important tasks, and it’s one of the main ways that problems with Lessons 8 and 10 manifest themselves.

Lesson 11: Put your planned work in a single prioritized list.

Lesson 11a: Watch the flow of the list: people should be working on approximately the top N tasks where N is the size of the team, and should be completing those tasks regularly.

Interruptions

Your team will not, however, spend all of its time working on planned work: surprises happen, those surprises can interrupt planned work. Some examples:

  • High priority Pagerduty alerts.
  • Low priority Pagerduty alerts.
  • Production outages discovered by our monitoring.
  • Customers reporting that something is going seriously wrong with them right now.
  • Customers reporting that something non-urgent is going wrong.
  • Customers having a question that support can’t handle.
  • Repeated build failures.
  • Sporadic build failures.
  • Somebody within your team asking for help with something that’s blocking them on their current task.
  • Somebody within your team asking about something that they’re curious about but that isn’t blocking them.
  • Those last two, but where the question comes from outside your team.
  • A random comment on your team’s Slack channel, of unclear importance.
  • Somebody filing a Jira that gets assigned to somebody on your team.
  • Random messages on Slack, random notifications from your computer, random notifications on your phone, somebody stopping by in person to say something.
  • The alarm on a team member’s phone going off reminding them that they need to drive to daycare to pick up their kid.

These are not all created equal; they all have the potential to interrupt your focus, and for some of them (e.g. production outages), allowing those interruptions to take over your focus is the correct choice! But for most of them, it isn’t.

So we want to get that list (or whatever the list looks like for your team) under control: we want to understand that list, and figure out how to respond to them appropriately without losing focus. But there’s a prerequisite to that: before your team can figure out how to respond to them appropriately, you need to figure out what “appropriately” means. Which interruptions are drop-everything priority, which interruptions are something that somebody should respond to in a couple of hours but they can wait until somebody reaches a natural break in their day, which ones can wait until tomorrow (and where you can talk about planning in your team’s daily standup, if you do that), which ones should turn into items on the prioritized list, which ones are useful but don’t lead to direct action, and which ones are just waste. (Keeping in mind that we’re not robots, we all need some slack in our schedule, our goal isn’t 100% on-focus time.)

Summing up that initial step:

Lesson 12: Gather data about the interruptions that your team encounters, and put them into buckets.

Lesson 12a: For each bucket, figure out the desired Service Level Objective (SLO): work on it immediately, work on it today, work on it tomorrow, work on it eventually, informational only, pure waste.

Funneling Incoming Interruptions

After the above, you’ve got a taxonomy of interruptions, and an SLO for each class in that taxonomy. The lesson to take from that seems simple: if an interruption has a “respond immediately” SLO, then one or more team members should switch their focus to the interruption, otherwise they should ignore their interruption and continue focusing on whatever task they were working on.

The problem is that, when an incoming message arrives, it’s not immediately clear which interruption class it falls into. A slack message is unlikely to be something that demands a “drop everything” response (I’ll say more about that later), but it could be anything from a decently high priority customer escalation to pure noise.

One concept that I find very helpful here is the Getting Things Done concept of “Inbox”. You don’t act on incoming items immediately: instead, incoming items go into an inbox. And it has to be a trusted inbox: you need to have a process that allows your team to regularly and reliably go through all of the inbox items, triaging them appropriately to figure out the SLO for that item.

Concretely, there are three parts to this process:

  1. A list of inboxes that your team monitors.
  2. A process for monitoring each inbox that has it evaluated regularly enough to meet the most stringent SLO for any item that could enter that inbox.
  3. An agreement among your team members to not bypass the inbox process.

One implication of the second point is that almost none of the inboxes should ever get items that fall into the “drop everything” SLO, because otherwise people will be so busy being distracted by the inboxes that they won’t be able to focus on anything else. In practice, what this turns into is: if somebody needs an immediate response from somebody on the team, then they should page the appropriate person. In particular, Slack is not an appropriate location for “drop everything” events: it’s impossible for the team to monitor Slack at that fidelity, and trying to do so will cause a massive loss of focus. So make sure that the whole organization knows not to try to use Slack in that fashion, and how to page somebody on your team if there’s a true emergency.

The third item is really important too: getting it right makes a huge difference but doing so will take practice and discipline. It’s a very human response to want to help people immediately when they ask for help; letting somebody else do the triage and having the result of that triage frequently lead to a response of “we’ll get to it later” both clash with that human response. And it’s also very human behavior to hit a stumbling block in your work, to switch over to Slack, and then to start answering questions or whatever once you’re there, and all of a sudden an hour has gone by. So you have to fight that; the first step is to minimize notifications (whether on your computer or your phone); in particular, nothing on Slack (not even a Slack Direct Message) should ever generate a notification, in my opinion.

Lesson 13: Come up with a fixed set of inboxes that your team will monitor.

Lesson 13a: All “drop everything” priority items should go to a single inbox.

Lesson 13b: Take inboxes seriously: don’t bypass them, don’t be distracted by them.

Who Does the Triaging?

If you’ve followed Lesson 13, your team has a set of inboxes. But then you need to actually monitor them, and to do that in a way that leads to appropriate action being taken on appropriate timescales.

There are a few choices that I’ve seen for who does the triaging for a given inbox. One is to have a rotating person doing the monitoring; this is basically the on-call strategy, it’s very common for the PagerDuty inbox, but you can use that person (or people, the primary and secondary) to monitor some of the other inboxes as well. One argument for this is that the on-call is already going to have their work interrupted unpredictably, so it’s hard for them to maintain focus; given that, you might as well lean into that and have them absorb interruptions for the team. In particular, it might make sense for that person to triage items that fall into your inboxes that need triaging multiple times a day.

Another candidate for a triage person is management: the engineering manager and/or the product manager. Lesson 11 said to put incoming work into a single prioritized list, but it didn’t say how the team maintains that list; a frequent answer is that either the engineering manager or the product manager owns that list. And if they own that list, then it makes sense for them to own triaging of at least some of the inboxes, since the result of that triage will frequently lead to items being put into that queue. This answer works well for new items that appear in Jira: those items generally won’t have an SLO that requires triaging within hours, so looking at those items once or twice a day should be fine. If you go this route, though, just make sure that you’ve got a backup plan for when that person is sick or on vacation.

And another possibility is that everybody on the team handles the non-drop-everything triaging. In particular, for items that need to be triaged within a day, part of your daily standup can involve looking at dashboards that represent your various inboxes, and you can decide as a group what to do with items that show up on those dashboards.

Lesson 14: On calls can triage inboxes with tighter SLOs.

Lesson 14a: Managers or teams can triage inboxes with looser SLOs.

That’s Still Too Many Interruptions

Following Lesson 14 is better than having the whole team constantly interrupted, but still, you might not want the on call to be interrupted much either. Also, corralling incoming new items into inboxes is a very useful step, but sometimes the result of that triage will be that somebody needs to start working on the triaged item fairly soon: maybe right now, frequently at some point today or tomorrow. And that sort of task switching can interfere with focus, too.

One strategy for dealing with this is to systematically go through your sources of interruptions and try to reduce them. Don’t treat production alerts or flaky test failures as a fact of life: treat each one as a learning opportunity. If you’re getting lots of questions from customers, figure out what the common questions are and how you can redesign the product to reduce the frequency of those questions. I’m not going to say that you’ll eliminate interruptions entirely, but if you focus on them, you can reduce them by a quite noticeable amount; once you’ve done that, you shouldn’t declare victory, you should instead work to maintain the new, improved level, but it should be quite a bit easier to maintain that level than it was to reach that level in the first place.

You can also attack this problem from the opposite direction: something is an interruption if it’s causing you to lose your mental state, making it harder to get back to what you had been doing. So the more easily you can gracefully stop work, the less you feel like you’re getting interrupted.

Lesson 7a helps a lot with that second approach. If you’re working on coherent tasks that take less than a day, and if you show up at standup and something has come up that somebody needs to work on today, then that’s not an interruption for you: it’s just the next thing on your plate. Test-Driven Development can also help with this in a couple of different ways. For short term interruptions (somebody taps on your shoulder and asks you a question), if you’re doing TDD, you know that your code was in a coherent working state a couple of minutes ago, you know that you’re working on some specific micro-problem right now, and you’ve got a list of tests that you want to implement next. And all of that makes it much easier to get right back into the flow of your work after you’ve answered the question: these claims that it takes 30 minutes or whatever to get productive again after an interruption don’t really apply in that case. And if the interruption requires more sustained action from you, TDD helps there too: you know that your code is in a working state, so you can generate a pull request and then switch to the new task: the pull request won’t completely solve the problem that you were working on, but the code will still work fine. There will still be a switching cost when you come back to this work, but that will at least let you avoid having an ongoing integration cost.

What both of those TDD benefits have in common is: if you work in small, coherent steps, then you’re much less likely to be thrown off guard by surprises. This is a true statement about many aspects of life; it certainly applies to software development.

Lesson 15: Systematically go through the sources of interruptions and reduce them.

Lesson 15a: If you work in small, coherent steps, then you’re much more resilient to potential interruptions.

The Call Is Coming From Inside the House

The strategies in the second half of this document are largely written with the point of view that interruptions are bad, that the resulting loss of focus isn’t worth it. And that’s true most of the time! Having said that, I still believe in Lesson 10: spreading knowledge across the team is good.

Sometimes that knowledge spreading can happen in ways that don’t create interruptions: talk about what you’re doing in your team’s Slack channel and meetings, make your pull requests easy for your team members to read, etc. That sort of knowledge can be consumed asynchronously, and it’s healthy. But sometimes you need a second person to help you think about some problem; and if your team members are going to be working widely across your code base, then you’ll run into situations like that more frequently, where you really could use help from somebody before proceeding with your work. Also, code reviews are an important special case to keep in mind here: they mean that tasks are blocked from flowing through the system if nobody is helping out on those tasks, so postponing those to the next day or even for several hours can seriously affect your team’s delivery.

If you don’t mind some delay in responses to those requests, then your team members can help each other out asynchronously. You can only stay working heads-down for so long; so every hour or two, people are going to want to take a break. And you can have a team working agreement that, when you’re taking a break, check in a Slack channel for help requests from team members to see if somebody has asked for help. If so, help them; if not, otherwise just take your break. That way, it shouldn’t take too long to get requests for help from your team members. (Especially if you’ve been following Lesson 10, and hence most team members can help with most requests.)

If you want to make it possible to get help immediately, the best solution that I’ve seen comes from eXtreme Programming. People work in pairs in a team room; if a pair is stuck, they go and ask somebody else (or the whole team!) for help. Pairing helps with this in two ways: if both members of a pair are stuck, it really is a problem that’s worth interrupting somebody else, it’s not that you just haven’t put in the work to try to figure it out yourself. And pairs are much better at quickly recovering from interruption than individuals: they’ve constantly been talking about what they’re doing, and that makes the state explicit in a way that makes it much easier to get back to work. (Though it’s also not a coincidence that eXtreme Programming has you diligently applying Lesson 15a as well.)

Of course, most of us don’t work in person today; that makes that technique require some adaptation. Remote pairing tools exist, so that’s one aspect of the XP solution. And you could try to mimic the ability to walk over to somebody’s desk to ask them a question by having an always-on Zoom room that people are listening to, or to set up a system where messages in one specific Slack channel notify team members.

The main danger in any solution to this problem is that you start allowing interruptions from a wide range of sources. So make sure that you don’t do that: don’t help random people during your breaks in a way that bypasses your Inboxes, or if you have a special Slack channel that people can use to synchronously ping team members, make that Slack channel private to the team.

That’s not necessarily the end of the world; we all have to take breaks some time. So, if you’re blocked, post your question somewhere, and then you can go through your email or catch up on those Slack channels that you’ve been ignoring while you’ve been focusing or you can just go for a walk or have lunch. (And, if you’ve been following Lesson 15a, you can even package up your code nicely, to help you come to a graceful stop and to help make your question more specific.) And, during that break, you can also do your code reviews and answer questions that have come up from your team members. But sometimes it would be nice to get an answer quickly; and, ideally, it would be nice for that question and answer to be a learning opportunity for the entire team. Interruptions that make it harder for the team to get their work done aren’t so good; but interruptions that help the team are a different story.

Lesson 16: Have discussions within the team about how quickly you need to be able to respond to requests from help and code reviews that come from other team members.

Lesson 16a: If you decide that you need to respond at a low latency, find ways to treat interruptions from within the team differently from interruptions from outside the team.

Summary

In the first part of this document, the main lesson was: don’t have your team work on a bunch of projects at once, instead pick one or two projects at a time for your team to focus on. That then raised the question of how to choose those one or two projects; we talked about strategies for that choice, but we also ran into uncomfortable consequences of the large amount of variance in the inputs to those strategies. That discussion also led us to want to subdivide projects and then apply our lesson recursively; that ended up leading to questions about team norms, especially around knowledge silos.

That recursive division brought us down to the hour-by-hour level of work: at this level, interruptions and distractions are a significant threat to focus. The main tool that I propose there is to funnel potential interruptions through inboxes; making that work involves paying attention to SLOs both for initial triage of inbox items and for acting on the work that remains after triage. It requires significant discipline by team members to stick with that separation, but if you do, the potential focus benefits are large.

Post Revisions:

This post has not been revised since publication.