Merge Queue

Graphite, 2023 – 2024

Merge Queue is an automated system designed to manage the sequential merging of pull requests into a codebase, ensuring each one meets the necessary tests and checks before integration. Among the Graphite suite, this is one of the most technically complex products, one I became intimately familiar with while working on various settings.


This case study highlights the evolution of several key features that were developed over time. It also sparked a personal initiative where I created diagrams for our documentation to better explain intricate merge queue capabilities.

Impact

Instrumental in closing several enterprise deals, before our pricing model changed.

Role

Product design

Timeframe

Project has been ongoing for 8 months (Nov 2023 – June 2024).

Target customer

Large organisations, often with hundreds of engineers merging into the main branch, have unique and complex requirements for their merge queue.

Target user

Admins, CTOs and GitHub owners


It's also important to note that the person setting up the merge queue is different from those who use it.

Business goal

Meet the critical needs of enterprise customers by achieving feature parity with competitors.

Design challenge

Users setting up a merge queue need to quickly grasp the individual settings, their interactions, and their impact on the merge queue. The goal was to simplify the process to the point where users could set it up and forget about it, ensuring ease of use.


  • The settings for the merge queue are highly technical and interact in nuanced ways, requiring careful consideration.

  • Many potential customers were unfamiliar with merge queues and the associated settings, highlighting a need for better education and documentation.

  • During feature development, I encountered various implementation challenges and workarounds. Each solution had to be easy to understand, clearly communicated, and user-friendly.

  • My initial lack of knowledge in both merge queues and the broader domain of code review, pull request creation, and merging posed a challenge as I navigated these new areas.

My approach

  1. Review documentation from competing products.

  2. Study papers from Uber and other companies that have successfully developed merge queues with similar capabilities.

  3. Consult with the lead engineer to understand the concept of a merge queue and the implementation of various features. I engaged in detailed discussions, asked numerous questions about the system and different scenarios, and examined their sketches to gain a comprehensive understanding of the benefits, limitations, and functionality. This process also helped me identify user-facing requirements and key touch points.

Step 1: Redesign existing merge settings and third party merge queues

The initial settings for merge queue was under a section called “repository” in Graphite settings.

Problems with original design

  1. Lacking hierarchy and relationships between different settings weren’t clear

  2. What the setting did wasn’t clear

  3. Too easy to turn the merge queue on and off (just a toggle)

Solution

The first step involved refining the hierarchy and copy. I started exploring several solutions here, playing with copy and text styles. This was explored keeping in mind that shortly after we’d be completely overhauling merge queue settings to bring in new features like 3P merge queue and different CI settings. Eventually, non of these options were ideal and thus implemented. We directly moved to a complete overhaul of Merge Queue with 3P queues.

Next, to align with business goals, we needed to support third-party merge queues. Since organisations could use either our merge queue or a third-party solution, the workflows diverged and became more linear. It’s crucial to recognise that the merge queue is a top-down sales product; the product UI has limited influence in promoting our merge queue. The decision to use our merge queue is typically made before an admin reaches this screen.

Process

I started off with writing some jobs-to-be-done for admins and engineers –

Admin –

  • Enable a new merge queue

  • Pick kind of MQ

  • Add relevant settings

  • View all repos that have active merge queues

  • Edit settings for a MQ

  • Disable a merge queue

  • For 3rd party MQ, understand the cause of slow merges or errors (and what to do about it)

  • View error or dependent settings that are blocking my use of MQs

Engineer –

  • View all repos that have active merge queues

  • View settings of all merge queues

  • View the most relevant info for using the merge queue

    • Labels and maybe merge stat for graphite MQ

Then I conducted some internal user research (we’re users of our product) to answer a questions around what critical information to show.

It's also essential for non-admins to know which labels to use for adding their PRs to the merge queue. They can visit the settings page for guidance, and the relevant labels are also provided as comments on the PRs. They can also visit the merge queue page for this and look for their repository.

Step 2: Parallel and batched merges

These features, present in competitor products and requested by enterprise organizations evaluating our merge queue, were both complex to build and challenging to design. Through extensive diagramming and sketching, I gained a thorough understanding of their functionality. A significant challenge was explaining the impact of selecting specific settings using clear language.


For implementation, we adopted a phased approach: we first released parallel CI, followed by the combined Parallel + Batching feature.

What are these settings?

  • Parallel CI: This setting allows CI to run simultaneously on individual PRs or stacks of PRs, saving time and accelerating the merge process.

  • Batched CI: This feature combines a group of PRs or stacks and runs CI for the entire batch at once, reducing the number and cost of CI runs.

  • Combined CI: This approach merges multiple batches to run CI for several batches concurrently.

Merge queue page

Users merging PRs with Graphite often need to track the position and status of their stack in the queue, including error states. When I took on this project, the page was underutilized by most users—only a few power users engaged with it— because it lacked essential information. Given our limited resources for a full redesign, I optimized the existing layout by reorganizing columns and merging cells to display all critical information. Additionally, I added a dropdown for settings to help users understand the context and rationale behind the displayed data.

Rejected approach

When we initially rolled out Parallel CI, the designs explicitly featured this capability. While this approach provided users with the information needed to activate the feature, I revised the final designs for two key reasons:


  1. Using terms like Parallel CI, Batching, and Speculative assumes users are familiar with these concepts. Although we link to documentation, I wanted the page to be self-explanatory, focusing on the settings and their functions rather than their names.

  2. With the introduction of Parallel + Batched, the interaction between these features resembled a Venn diagram. It was more intuitive for users to follow a logical workflow based on their desired CI operation, selecting settings that matched their needs, rather than deciphering how the features interrelate and fit into their understanding.

Failure handling

This feature had the fastest turnaround. When CI fails in any part of a batch, the entire stack is evicted, and this needed clear communication. To address this, I added tooltips for better clarity and organised the PRs effectively in the side panel to convey this information more transparently.

Removing PR from merge queue

When a PR is removed from the merge queue, the removal can either succeed or fail based on the CI status, and it may take a few minutes for this to be reflected on the frontend. To address this, we needed an intermediate state to inform users about the process.


The user need was to ensure they understood that removing the entire stack would require a CI rerun and that the removal was an attempt rather than a confirmation, urging them to reconsider before taking this potentially destructive action.


We explored several placement options during office hours with the cross-functional team, considering locations such as the CI column, the right-hand side column, adding a new column, and even the queue position.


Note for later: Merging an incorrect PR is more detrimental than removing and re-enqueuing it.

Above and beyond

When I began working in this domain, I relied heavily on visual aids as a visual thinker. I created numerous sketches and diagrams to grasp the concepts. Despite the temptation to use interactive diagrams instead of text in the settings panel to guide users in selecting the correct options, I resisted. Eventually, this effort evolved into a side project that contributed to our sales deck, customer calls, and the merge queue documentation page.

Beta Reception

tibrewala.kanika@gmail.com

tibrewala.kanika@gmail.com

Kanika Tibrewala

tibrewala.kanika@gmail.com