This lesson is in the early stages of development (Alpha version)

Collaborative Git

Introduction

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • What are the challenges for sharing code?

  • How can we ensure everyone has access to the latest version?

  • How can we enable equal contribution?

Objectives
  • Understand how git tools and processes address challenges for collaborative code development

You may have used git for your own projects, but how can it be used to work collaboratively on a software project?

There are a few challenges that come up when working on shared code. We want to make sure everyone has access to and knows what is the latest version, and we want to make sure everyone has an equal opportunity to contribute. It can also be difficult to keep organized while code is being actively developed and make sure everyone is on the same page about the state of the code.

We can use sites like GitHub, GitLab, BitBucket, and similar to share code with others. These sites also enable teams to contribute to software projects that they are working on together by hosting the code in a centralized location that everyone has access to. This allows everyone to view the latest version, and make updates as needed. It also provides a mechanism for outside collaborators to contribute to the project, say for open source software.

In this lesson we will be briefly going over some more advanced git tools that become essential when working collaboratively. These are branches and pull requests. In a collaborative environment, branches allow more than one person to work on the code independently, separate from the centralized main branch. Pull or merge requests provide a transparent mechanism to bring the changes made on a branch into the main branch, and encourage communication amongst the maintainers on whether the changes should be pulled in or if further changes are needed.

While technology helps, it is not the solution on its own. We will also discuss git workflows, which describe the strategy of how branches and pull requests should be used to stay organized and productive. We will show a few common workflows, and give general guidance on how to choose what will work well for your project and team.

Before diving in, we should review some basic terminology and commands. Note that while other options are available, we are focusing on GitHub for this lesson.

A note about git integrations: You may find that your IDE has git built in allowing you to use the GUI instead of running the commands we talk about here. In this lesson we are focusing on the command line git commands, since they should be universal across any system you use. After this lesson we encourage you to use what you are most comfortable with, and the commands we cover will also help you better understand the functionality of your IDE git integration.

Activity

Verify that you can access your GitHub account, have git installed on your computer, and can authenticate with GitHub from your computer, either using gh auth, ssh keys, or using your preferred GUI application.

In groups: have one person in your group create a repository from the provided template. Give access to the others in your group. Everyone should create a local clone with git clone. We will use this repository for the remainder of the exercises in this section.

Outline

TODO

Key Points

  • Challenges for sharing code include making sure everyone has access to the latest version, everyone can contribute equally, keeping organized, and making sure everyone has equal opportunity to contribute.

  • Branches and pull/merge requests provide mechanisms for individuals to work on the code independently and then integrate those changes into the main codebase.

  • Git workflows are used to stay organized and productive.


Branches

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • Why use branches?

  • How do we use branches?

Objectives
  • Be able to create and merge a branch

About Branches

Branches are independent development lines. Working independently, you can likely get away with using git without creating any branches, as in the first diagram in the figure below. However, when you begin to work with others, branches allow team members to work on the code without impacting the work of other developers or users.

Figure showing three graph diagrams representing different repository states. Each is a graph consisting of vertices (circles) and directed edges (arrows) where each horizontal path is a branch. The first shows a single path representing a repository with a single main branch, the second has two branches, a main and a feature branch, and the third has three branches, a main and two feature branches. These graph diagrams show repositories with different numbers of branches. The vertices, or circles, in these graphs show different commits, and each horizontal path is a branch. The first shows a repository with 1 main branch, the second a repository with 1 main and 1 feature branch, and the third repository 1 main and 2 feature branches.

Branches can have a few different purposes. Branches created to develop a specific feature are referred to as feature branches, and are often short-lived. Branches can also be used for longer-term purposes, such as having separate development and production branches. How you organize and use branches in your project is referred to as a branching strategy, which we will cover more when we talk about git workflows.

Changes made in one branch can be brought into another branch with a merge. For example, when a feature branch is completed it may be merged into the main branch. And, if while you are working on a feature branch the main branch changes, you can merge the changes from the main branch into your feature branch while you are still working.

How to Work with Branches

Creating Branches

You can create a branch on your local clone of the repository. If didn’t just clone the repository it is always good to make sure you have any recent changes to the main branch by checking out the main branch and running “git pull”:

git checkout main
git pull

Then, to create a branch we can run:

git checkout -b my_new_branch

This will create the branch my_new_branch and move you to the new branch. If you run git status at this point it will display On branch my_new_branch. Making and committing any changes will only update my_new_branch.

Figure showing the process of creating a branch. First a graph diagram of a repository is shown with two commits on the main branch. To the left is an arrow with the text "git checkout -b my_branch" followed by the same repository with the new feature branch added. A smiley face is used on each diagram to show where you are in the repository, the first has it on the second commit in the main branch, the second on the feature branch. Creating a new branch. When you run git checkout -b my_branch your new branch gets created and checked out, meaning you are now on your new branch (represented by the smiley face). Any commits you make will be on this branch until you checkout another one.

Figure creating commits on a branch. First a graph diagram of a repository is shown with two commits on the main branch and one on a feature branch. To the left is an arrow with the text "git commit (x2)", followed by the same repository with to additional commits on the feature branch. A smiley face is used on each diagram to show where you are in the repository, each on the last commit on the feature branch. Every time you run the git commit command the commit will be added to your current branch.

The first time you push your branch to the remote repository, you will need to publish the branch by running run:

git push --set-upstream origin my_new_branch

After this any commits can be pushed with a simple git push.

Changing Branches

If you need to switch to another existing branch you can use the git checkout command. For example, to switch back to the main branch you can run:

git checkout main

Remember, if you aren’t sure which branch you are on you can run git status. It is good practice before you start making changes to any of your files to check that you are on the right branch with git status, particularly if you haven’t touched the code recently.

Figure showing the process of changing, or checking out a branch. First a graph diagram of a repository is shown with two commits on the main branch and three on the feature branch. A smiley face is on the third commit of the feature branch to show the current location. To the left is an arrow with the text "git checkout main" followed by the same repository with the smiley face now on the second commit in the main branch. Switching branches using git checkout.

Merging Branches

Bring the changes from one branch into another branch by merging them. When you merge branches, changes from the specified branch are merged into the branch that you currently have checked out. For example,

git checkout my_new_branch
git merge main

will merge any changes introduced to main into the my_new_branch. Merging in this direction is especially useful when you’ve been working on my_new_branch for a while and main has changed in the meantime.

Figure showing the process of merging the main branch into the feature branch. First a graph diagram of a repository is shown with a main and feature branch. The feature branch with three commits splits off the main branch after two commits, and the main branch has an additional two commits after the split. A smiley face is on the third commit of the feature branch to show the current location. Below is an arrow with the text "git merge main" followed by the same repository with a new arrow from the last commit on the main branch to a new commit on the feature branch, showing the changes from the main branch have been merged into the changes on the feature branch. Merging new commits from the main branch into the feature branch with git merge main.

When development is complete on my_new_branch, it would be merged into main:

git checkout main
git merge my_new_branch

Figure showing the process of merging the feature branch into the main branch. First a graph diagram of a repository is shown with a main and feature branch. The feature branch with three commits splits off the main branch after two commits. A smiley face is on the second commit of the main branch to show the current location. Below is an arrow with the text "git merge my_branch" followed by the same repository a new commit on the main branch with two arrows from the main and feature branches, showing the changes from the feature branch have been merged into main branch. Merging new commits the feature branch into the main branch with git merge my_branch.

Git will do its best to complete the merge automatically. For example, if none of the changes have happened in the same lines of code, things will usually merge cleanly. If the merge can’t complete automatically, this is called a merge conflict. Any conflicts in the files must be resolved before the merge can be completed.

Merge vs Rebase

There is another way to introduce changes from one branch to another: rebasing. A rebase rewrites history. For example, if there are new commits on the main branch while you are working on a feature branch, you could merge those commits into your feature branch, as we describe above. This creates a new commit with two parent commits: one in your feature branch, one in the main branch. Alternatively, you can rebase your feature branch onto the new end of the main branch with git rebase main. Rebasing “replays” your feature branch commits onto the new commits of the main branch, as if you started your branch there.

Figure showing the process of rebasing the feature branch onto new commits in the main branch. First a graph diagram of a repository is shown with a main and feature branch. The feature branch with three commits splits off the main branch after two commits. A smiley face is on the third commit of the feature branch to show the current location. Below is an arrow with the text "git rebase main" followed by the same repository, but the feature branch now splits off the main branch at the last commit. Rebasing the feature branch onto new commits in the main branch with git rebase main.

Rebasing creates a cleaner history, without the extra merge commit. However, rebases never be done on public branches that others might be using or even looking at. It will result in your repository and everyone else’s having different history, which can be confusing and difficult to fix. If no one else is looking at your branch, especially if you haven’t published it yet, rebasing is safe.

Activity

Everyone:

Now, all but one should run the following:

git pull
git checkout main
git merge branch_name
git push

Did anyone run into merge conflicts? How might this have been prevented?

There should be one team member who still has an unmerged branch at the end of this exercise.

Additional Resources

Key Points

  • Branches allow team members to work on code without impacting the work of other developers or users

  • Create a new branch with the git checkout -b command, for example: git checkout -b my_branch

  • Change to another branch with the git checkout command, for example: git checkout my_other_branch

  • Merge my_branch into the current branch with with the git merge command, for example: git merge my_branch

  • Rebasing rewrites history and can sometimes be used instead of merge, where appropriate

  • Never rebase on a public branch


Pull Requests

Overview

Teaching: 10 min
Exercises: 15 min
Questions
  • What is a Pull Request?

  • Why use Pull Requests?

  • How do we use Pull Requests?

Objectives
  • Understand the Pull Request process

  • Be able to create and merge a Pull Request

What are Pull Requests?

A Pull Request is a GitHub a process to request that a branch be merged into another branch. It’s called a “Pull Request” because you are requesting that the change introduced be pulled into the other branch. In GitLab it is called a “Merge Request”. Pull requests provide a process and an opportunity for code review before merging branches you don’t own or manage by yourself, such as the main branch. They can also be a mechanism for proposing and discussing changes, and requesting additional changes be made before the branch is merged.

Working with Pull Requests

Creating a Pull Request

First, make sure you don’t have any commits you haven’t pushed to GitHub. Go to your local repository and run git status. If you have any commits you haven’t yet pushed, push them now with git push. Remember, if you haven’t pushed from this branch to the remote repository yet, you will be prompted to publish your branch with git push --set-upstream origin my_new_branch.

Now navigate to your branch in GitHub. If you’ve pushed any commits recently you may see a callout that says your branch has recent pushes and provides a button that says “Compare and Pull Request”. Pressing this will bring you to the form to create your Pull Request to the main branch. If you don’t have this button, you can select “Pull Requests” at the top of the page and select “New Pull Request”. From there select your branch under the “compare” dropdown. If you would like to merge your branch into a branch other than main, select that branch under the “base” dropdown. Then select “Create Pull Request”.

A screenshot of a branch on GitHub. A button with the text "Compare and Pull Request" is at the top fo the page. Creating a Pull Request: If you recently pushed changes to your branch you may see a button that says “Compare and Pull Request” on GitHub.

At this point you will see a form. Fill out the form with a title and description for the Pull Request, and then click “Create Pull Request”. We will go more in depth on how to make good pull requests in a later section.

A screenshot of a Pull Request form on GitHub. Creating a Pull Request: Give your Pull Request a title and description.

Anatomy of a Pull Request

Once the Pull Request is created there are a few different tabs on the page. The first is the conversation tab, which shows a timeline of comments and commits, starting with the initial description that you gave when creating the Pull Request. At the bottom will be a box that shows the result of any Continuous Integration that has been defined, such as automated tests (the one in the image doesn’t have any set up), as well as whether there are any merge conflicts for the Pull Request.

A screenshot of the conversation tab of a Pull Request. The conversation tab of a Pull Request.

At the very bottom of the page is a box where you can leave a comment. GitHub supports markdown so you can include formatting with your comment.

A screenshot of the bottom of the conversation tab of a Pull Request. At the bottom of the page is a box to leave a comment. The conversation tab of a Pull Request. Scroll to the bottom of the page to leave a comment.

There is also a tab that shows all the commits in the Pull Request (the Commits tab). The last Files Changed tab shows the diff between the base branch and the Pull Request branch. This is useful in reviewing the Pull Request, and it allows you to leave in-line comments. If you see a line of code that needs to be updated or changed, you can click the + next to that line to leave a comment.

A screenshot of the Files Changed tab of a Pull Request. The page shows one line of code has been updated. The Files Changed tab of a Pull Request, showing what the proposed changes are for the Pull Request. You can click the + next to a line number to leave an in-line comment.

Updating a Pull Request

Once the Pull Request is created you and anyone with access to the repository can make comments on the request. These comments might be clarifying questions or requests for additional changes. If additional changes are needed, make those changes in your branch and push them. The commits will be added to the Pull Request automatically and appear on the main Conversation page.

Accepting and Merging a Pull Request

Once a Pull Request has been reviewed and ready to be accepted, it can be merged. First verify that the branch has no conflicts with the base branch (the one that will be merged into). Any conflicts should be addressed with additional commits. When ready to merge, click “Merge Pull Request”, then click “Confirm merge”. You will be given the option to delete the branch. Whether you delete the branch depends on its purpose and how you and your group want to handle branches that have been merged. Deleting feature branches helps keep clutter down and makes it easier to find relevant branches. However, if you have any longer term branches, such as a development or production branch, you wouldn’t want to delete that.

A screenshot of the bottom of the conversation tab of a Pull Request after clicking "Merge Pull Request". There is a box with the original Pull Request title and description, with a button that says "Confirm Merge" below that. After clicking the “Merge Pull Request” button on the conversation, click “Confirm Merge” to merge the pull request.

Activity

One team member should still have an unmerged branch from the previous exercise. If not, have one of you create a branch, introduce a small change, commit, push, then publish the branch.

Log into GitHub. The team member with the branch should create a pull request. Once it is created, everyone should look at the different parts of the PR and comment whether the branch looks okay to merge. The branch owner should make changes requested. If there are merge conflicts, those should be addressed (you can work together to address them). Once it is ready to merge, someone else on the team should merge the Pull Request.

In both this activity and the last one your branch was eventually merged into the main branch. How does this experience compare to the previous activity?

If you have extra time, the remaining team members can create their own Pull Requests with new branches.

Key Points

  • A Pull Request is a GitHub a process to request that a branch be merged into another branch.

  • Pull requests provide a process and an opportunity for code review before merging branches and can be a mechanism for proposing, discussing, and requesting changes before the branch is merged.


Git Workflows

Overview

Teaching: 30 min
Exercises: 10 min
Questions
  • What is a git workflow?

  • Why use a git workflow

Objectives
  • Know why to use a git workflow

  • Understand some of the most common git workflows, and when you should deviate

  • Be able to choose or design a workflow that meets your group’s needs

Introduction

When you are working with others in the same git repository, you’ll find you’ll need to come to an agreement on how you are going to work in the repository. This can include:

These questions are often answered by what we will call a git workflow. A git workflow helps you keep organized when working on a team, helps everyone stay on the same page about the state of the code, and helps enable communication. If you’ve ever emailed around documents you were working on with a team, you might understand the value of everyone on the team always having access to both the latest version, and any material still in progress.

When we talk about git workflows often we are really discussing is branching strategies. The branching strategy is how you are going to use branches to work with the code. This includes the purposes of each branch (ex: a feature branch vs a release branch), where a new branch is created, and how, when, and where branches are merged. Different branches and branch types will have different lifetimes as well. For example, once a feature branch has been merged it can often be deleted, however a dedicated development branch might exist for the entire project.

The git workflow should also dictate what is stable under what conditions, what is tested, and may also include testing strategies.

Some Common Workflows

There are a few common workflows that are talked about the most. One might be better in some cases, another in others, but no one is necessarily the best in every case than any of the others. There will likely be one that best fits your application and your team’s goals, but you may find you will still need to make adjustments, and that is okay.

Centralized or Truck-Based Workflow

With a centralized workflow you only have one main branch (or trunk, borrowing terminology from SVN, another version control system). Team members work in their local clone and push commits less frequently. The central repository is meant to be “clean” and should be stable. Git will also prevent you from pushing any changes that cause conflicts. If someone else as pushed a change that conflicts with one of your local commits, git will not let you push those changes and will require you resolve the conflict locally before pushing.

A centralized workflow with a single main branch is not the best for collaborative development, although it may work okay in small teams. We mention it to give a name to a process you might already be familiar with, and as a starting point for other workflows.

Feature Branching Workflows and GitHub Flow

Since we’ve covered branching, a simple extension to the centralized workflow is a Feature Branching Workflow. The main branch is like the central repository in the centralized workflow, it is meant to be clean and stable. Instead in working directly in clones of the main branch, all development is done in feature branches, where each branch is created from the main branch to develop a specific feature. The branch is published soon after it is created in the remote repository and commits are regularly pushed. This allows other developers to see the branch and any changes early. Using feature branches also allows an individual developer to work on more than one feature at a time if needed. Say you are working on developing a feature that requires some time and thought, but in the process find a bug that should be fixed immediately. You can create a new branch, fix the bug, and merge it back into the main branch without introducing any of the partially developed code you are still working on in the other feature branch.

Once the feature is complete and tested, feature branches are merged back into the main branch through Pull Requests. Any merge conflicts should be handled in the feature branch before merging. This can be done by merging any new commits from the main branch into the feature branch, handling any conflicts that emerge. The changes are discussed and reviewed in the Pull Request, and once accepted hey are merged. Once merged, the lifecycle of a feature branch is general complete and it can be deleted if it is no longer needed.

This is also known as “GitHub Flow”, as it is used by the developers of GitHub for sections of their website, including their documentation. GitHub even has a dedicated page in their documentation describing the workflow. It is most useful for projects that have a rolling development cycle without specific versioned releases.

Git Flow

Git Flow is one of the most commonly used workflows. It is used for software projects that require robust, production-ready releases. It is suitable for larger teams where more than one person might work on the same feature.

Git Flow builds on a feature branching workflow, where instead of creating feature branches from the main branch, they are created from a separate permanent development branch. When a feature is complete, its branch is merged into the development branch. Individual release branches are created from the development branch, and once production-ready are merged into the main branch upon release and tagged. The main branch therefore contains only fully tested, stable, production code associated with a particular version. If the code has a bug in the main branch that needs to be fixed immediately, hotfix branches can be created off the main branched and merged directly back into the main branch without being added to the development branch or a release branch.

Git Flow was introduced with a blog post in 2010, which you can see here if you are interested.

GitLab Flow

Similar to GitHub Flow, GitLab Flow gets its name because it is the workflow used by GitLab. GitLab is a bit closer to the simplicity of GitHub Flow, but is more appropriate in cases where the rolling releases of GitHub Flow are not possible. GitLab Flow is similar to Git Flow in that feature branches are not merged directly into the production branch, however instead it has a separate named production branch and the feature branches are merged into the main branch, which is more like the development branch of Git Flow. Release branches are optional, and only recommended only when software is released to the outside world.

For more information, see GitLab’s article on GitLab Flow.

Forking Workflows

In each of these example workflows we have assumed that anyone contributing to a repository has the ability to directly push commits to a repository. However, sometimes you want to enable developers to contribute to a project without giving them access to the repository itself. In this case, a forking workflow might be suitable. Any of the workflows we’ve talked about already could incorporate forking. With a forking workflow individual developers create their own fork of a repository and do their development within their fork. When a feature is complete, the developer creates a Pull Request into the appropriate branch of the original repository. The owners of the repository can then approve the Pull Request at their discretion. This is how many open source projects encourage contributions.

With a forking workflow it can be easy to end up with an old version of the code, especially if it changes frequently. It can get confusing whether we are attempting to pull from or push to the fork or the original repository. Setting up remotes with easy to remember aliases makes this much easier. A “remote” is a named connection to a remote repository. You can add and remove remotes with the git remote add and git remote rm commands, respectively. For example:

git remote add upstream https://github.com/project_team/project.git
git remote add my-fork https://github.com/me/project.git

Here we are adding a remote called “upstream” for our central team repository and another called “my-fork” for our own fork. We can view our remotes with git remote -v. Now when you want to retrieve any changes from the upstream repository you can run git fetch upstream. To update your main branch with these changes, you would merge them into your local forked repository:

git checkout main # this is the main branch for your fork that you have cloned locally
git merge upstream/main

Using remotes, fetching and merging upstream changes frequently, and running commands like git status and git remote -v can help you be successful using a forking workflow. Merging upstream changes directly before making a Pull Request and ensuring there are no merge conflicts is good etiquette and will make it more likely that your Pull Request will be accepted.

Communicating with your Collaborators

Now that you’ve chosen a workflow for your project, how do you communicate it? This is most often done with a CONTRIBUTING document. The CONTRIBUTING document describes the git workflow you are using, any conventions for branch naming that you have, and anything else a collaborator should know in order to contribute. It should be tailored to your project and team, and so it might also include how to run the tests for the project, requirements before creating Pull Requests, or perhaps detailed instructions for how follow the workflow if your team is less familiar with git. One of the goals of a CONTRIBUTING document is to lower the barrier to entry to contributing to a project and a good one will encourage collaboration.

Enforcing your Workflow

There are tools that you can use to make sure everyone follows the agreed upon workflow, which can make it easier to collaborate with others. First, you can add branch protections, which will enforce certain rules for certain branches. A common one is to not allow commits directly to the main branch, requiring changes to go through a Pull Request. There are many options for branch protection that you can read about on the GitHub web page. Similarly, you can limit who on the team is able to push or merge changes into a particular branch, or in the entire repository. It is also a good idea to have testing and review requirements, particularly for production branches. These requirements and branch protections should be documented in the CONTRIBUTING document as well.

We will end with a quick note if your team works on shared systems, including clusters and HPC systems. Be careful with clones on shared drives, you never want to share a cloned repository. Each person should work within their own clone. You might also consider having additional “production” clone that is used for running. No one edits the production clone directly, it is only updated by pulling changes from the repository.

Activity

We provide a base repository to each group that each group member creates a local copy of the repository (either by forking or cloning). Students should decide which workflow to use and set up their repository accordingly. Start a contributor guide that explains the layout and rules.

Additional Resources

Key Points

  • A git workflow describes how a team or group of contributors is to interact with a repository

  • Git workflows keep a team organized and on the same page by helping them understand what state the code is in