Data Chats

What We Measure and Why

6 min read
What we measure and why image

At Multitudes, we’ve spent a lot of time thinking about indicators of team success. Everything we measure in our tool is based on a combination of research plus our team’s experience – we’ve worked as developers, data scientists, engineering leaders, and coaches for engineering teams. In addition, equity and inclusion is at the heart of all that we do; I ran a diversity, equity, and inclusion consultancy before starting Multitudes, and our whole team is committed to making equity the default at work.

Read on for an overview of some of our key metrics – what they are, why they matter, and how we measure them.

About our measures

Flow of Work

We have several analyses that look at the flow of the work – things like how quickly the team collaborates to deliver work, where delivery is flowing smoothly, and where it’s getting blocked.


Time to Merge

A stylized line graph shows time to merge is high at 18 hours but is trending up.


What it is: This metric shows how long it takes a team to get a piece of work production-ready. It’s an indicator of how long it takes to deliver value to customers.

Why it matters: Our <code-text>Time to Merge<code-text> metric is a subset of <code-text>Lead Time<code-text> (the time from first code commit until the code is running in production). Research from Google’s DORA unit (the DevOps Research and Assessment unit) shows that better <code-text>Lead Time<code-text> is correlated with better business outcomes – specifically, teams with a better <code-text>Lead Time<code-text> do work that is faster, more stable, and more secure. If you want to dive deeper into this, check out the Accelerate book.

How we calculate it:  To calculate <code-text>Time to Merge<code-text>, we measure the number of hours from pull request (PR) creation to merge – this shows how long it takes the team to give feedback, make revisions, and then merge the PR. A few additional notes:

  • Our focus is on PRs that the team collaborated on, so we exclude bot merges and selfie-merges (merges by the PR author with no comments or reviews by other collaborators).
  • We exclude the time that the PR spends in a draft state, since people use that feature to pair and brainstorm on work that’s not yet completed. 
  • We also exclude non-working hours from our calculations; for example, we don’t include weekend hours in the <code-text>Time to Merge<code-text>, so a PR that gets created on a Friday and merged on a Monday won’t necessarily have a longer <code-text>Time to Merge<code-text> than a PR that was created on a Monday and merged on a Tuesday.

What good looks like: The DORA research showed that elite performers have a lead time of less than 1 day. Since <code-text>Time to Merge<code-text> is a subset of <code-text>Lead Time<code-text>, we also recommend that you aim to keep <code-text>Time to Merge<code-text> to less than a day. On our charts, a good <code-text>Time to Merge<code-text> would be less than 10 hours since we don’t count non-working hours.

Feedback Wait Time: PR Wait Time and Review Wait Time
Two stylized line graphs - one shows PR wait time is high at 21 hours but is trending down and another one shows one shows Review wait time is high at 15 hours but is trending down.


What it is: This shows how long people wait to get feedback on their PRs.

Why it matters: When people have to wait longer for feedback, it can mess up their workflow: They’re more likely to start a new piece of work while waiting for feedback. When they get that feedback, they have to context-switch, making it harder for them to remember what they did, and often resulting in longer times taken for each of the tasks to be completed (for example, one study showed that it takes 10-15 minutes to get back into context). Moreover, there’s bias in how long different groups of people have to wait for feedback. For example, this research showed that women had to wait longer than men for feedback.

How we calculate it: Different teams follow different workflows for getting feedback. In order to account for this, we have 2 metrics; which one you use will depend on how your team asks for feedback on PRs:

  • PR Wait Time: This metric is for teams where people will ping each other directly (e.g., on Slack) when a PR is ready for feedback. Specifically, this measures the number of hours from PR creation until the PR gets feedback (either in a review or a comment by a collaborator). This measure excludes time that the PR spends in a draft state, since the draft state indicates that the PR author is still finishing the work. 
  • Review Wait Time: This metric is for teams where people use the “review requested” feature in GitHub to indicate that they’re ready for feedback. It is the same as <code-text>PR Wait Time<code-text>, except the start time is from the first review request on GitHub.
  • Like <code-text>Time to Merge<code-text>, both of the above measures exclude non-working hours, bot merges, and selfie-merges.

What good looks like: Ideally, this should be less than half a day to ensure that the overall <code-text>Time to Merge<code-text> is less than a day. In the Multitudes data, this would be less than 5 hours, since we exclude non-working hours.

Wellbeing

In this group, we look at measures that reflect how well the people on a team are doing. Building a great product is a marathon, not a sprint, so we look at indicators of how sustainably people are working and how well the work environment supports people to be healthy and well. 


Out-of-Hours Work

Stylised line chart showing Out-of-Hours Works at 5 commits and holding steady.


What it is: This measure shows how often people are doing work late at night or on weekends. Given that more and more people are working flexible hours, our metric specifically focuses on work done during “hours of concern” – work that people did in the wee hours of the morning or on a non-working day. 

Why it matters: Working long hours is a risk factor for burnout. Moreover, the longer someone works, the harder it is for them to solve challenging problems: a study from the Wharton School of Business and University of North Carolina demonstrated that our cognitive resources deplete over time, so we need breaks to refuel. At Multitudes, we’ve seen that the faster a team’s <code-text>Time to Merge<code-text>, the higher their <code-text>Out-of-Hours work<code-text> is likely to be – so it’s important for teams and leaders to keep an eye on both metrics together.

How we calculate it: We look at the number of pull requests that people created outside of their usual work hours. We localize our analysis, adjusting for each pull request author's time zone, and we also take into account each person’s preferred working days and hours. In the near future, we will be improving this to include comment and/or commits that were made out of hours too.

What good looks like: This should ideally be 0, with people doing as little work out of hours as possible. If this does rise above 0, it’s important to ensure that it doesn’t become a trend so that people aren't doing sustained days of long hours.


Collaboration

We also look at several indicators of collaboration. In this bucket, we’re examining who gets support and who’s not getting enough support. We also show the people who are doing a lot of work to support others, since this type of “glue” work is easy to miss but is critical for team success.


Comment Participation Gap


What it is: This looks at the comment ratio between the loudest and quietest voices on the team. 

Why it matters: This measure shows whether everyone on the team is participating equally; this is an indicator of psychological safety. Google’s Project Aristotle research showed that psychological safety is the number one determinant of team performance, and that equal share of voice is a behavioral indicator of psychological safety. Our metric looks at this in practice: Does everyone have an equal share of voice in code reviews?

How we calculate it: We count the number of comments that each person has written and then divide the highest count by the lowest count.

  • If one person on the team didn’t write any comments, then we set the gap equal to the highest number of comments that one person wrote (essentially setting the lowest number to 1, even if it was actually zero). This allows us to still show an indication of the magnitude of difference.
  • We can only calculate this for teams with at least 2 people; for a team of one person, there is no gap to calculate.

What good looks like: The smaller this number is, the better – a smaller gap means that people are contributing more equally. Our rule of thumb is to try to get this down to 5 or below.


Feedback Flows

A stylised graph showing feedback flows.


What it is: This graph shows how much feedback each person gave on other people’s PRs, how much feedback they got on their own PRs, and how feedback flows between people. 

Why it matters: Microsoft research showed that feedback in code reviews is important not only for improving the quality of the code, but also for knowledge transfer, greater team awareness, and problem-solving to identify other possible solutions. In addition, there’s also unconscious bias in what kind of feedback people get – and who gets actionable feedback or not. That’s why it’s important to visualize feedback, so that teams have clear data to help them make sure that they’re supporting everyone. 

How we calculate it: We look at the number of comments and reviews that each person gave and received on their PRs. We then show how the feedback moves between people on the team. 

What good looks like: In the best teams, everyone is getting feedback and everyone is giving feedback, or at least asking questions about others’ work. In these teams, seniors give plenty of feedback to juniors and intermediates – and juniors and intermediates feel comfortable asking questions to seniors. 


Want to see more of what we measure?

That gives you a taste of what we measure and why. To learn about our other measures or see the tool in action, we invite you to try it out – you can sign up for our beta program here!

Contributor
Lauren Peate
Lauren Peate
Founder, CEO

Try Multitudes for free

Join our beta