Clean Git Tree Practices

A a software developer, git is deeply integrated in our day-to-day workflows As we need to learn and maintain “clean code”, we need to do the same for our git history: clean git. But, what does this mean?

What is “Clean Git”?

My definition of clean git history might be different from other people. In a nutshell:

Atomic commits: each commit leaves the repository in a “deployable” state. This is a requirement if we want to use any CI/CD approach in our repos.

Smaller changes possible in each commit: for a big feature, we might need to do several refactors before we can even start the feature implementation. Each of this refactors should separated in different commits. This makes the changes easier to review and rollback, even if it is a big merge request.

No “dirty” commits: this is the complementary definition of the previous two points. We should avoid leaving:

Commits with WIP (work in progress) changes
Commits with partial fixes
Commits that breaks the build or tests
Commits to try fixing the CI/CD pipelines that are not successful
… etc

Why?

The idea is that following those guidelines, we would have:

Meaningful commits that are easier code review

Easy to deployable and reverse commits

Git history that is easy to follow and understand in the future. We can also leverage a good commit history to autogenerate CHANGELOGs and release notes.

How?

Maintaining a clean history is challenging because we are humans: we make mistakes, and we multitask naturally (not that we should). Both of these things go against leaving a clean and atomic git history.

However, there are several techniques that can help you even in the most chaotic scenarios

Stage only what you need

It is common to just do a git add . and commit straight away. However, git has a lot of flexibility on what can be added (staged) for a commit. You can stage whole files, or only a section or chunk of the changes of a file. This will help you to create smaller are more atomic commits even if you were multitasking refactors.

I always do it using graphical tools, such my editor version control functionality. In vscode, you can stage/un-stage a range of lines with the command palette (cmd + shift + p) and searching or “stage/un-stage selected ranges”.

Of course, you can do it with the command line two: https://git-scm.com/book/en/v2/Git-Tools-Interactive-Staging. However, I always find it more intuitively to use the visualization the IDE provides.

Once you have committed only what you need you can push your changes and let the CI/CD run the tests and build pipelines to verify your changes are correct. Or even better, stash the rest of your changes to go back to a clean working directory, and run the build and test yourself in your local (faster and cheaper almost always).

So, what happens if my commit is not correct and the pipelines breaks?

Correct previous commits

There are a lot of strategies to fix changes already committed. I am going to explain the ones I use the most on a daily basis.

Amend Commits

When you want to change just the previous commit, use git commit --amend .

Basically, do your changes normally and stage them. Then, instead of doing a standard new commit, use the --amend option. Normally your editor IDE will have this choice built in before a commit

Reset Commits

If you need to fix a whole set of commits, or you just want to go back to a previous state in a branch, use git reset .

The idea is that you are “deleting” the commits from the branch. The way I use it is reseting a specified number of commits from where the repo is currently in. For example, the full command to reset 5 commits will be: git reset --soft HEAD~5 .

Remember to `--force`

Both of these approaches are “destructive” operations. Meaning that you are deleting or rewriting parts of the git history that can’t be recovered again. This is not something you really want to do in your normal workflow, so git will not allow you to push this changes to the origin by default.

In fact, your IDE might complain and will tell you to do a pull first. This is because it detects that there are changes in the origin branch that are not in your local branch anymore. This is normal in this case (we just delete commits or rewrite them on purpose). Be aware that doing a git pull might mess up your local branch.

In these particular situations, we really want to rewrite the history. So it is fine for us to do a git push --force . Just make sure that you understand what it is happening.

Other Useful Tools

Maintaining an standard on commit messages is something I have found super useful. Using something like conventionalcommits.org specification should be considered best practice. However, it can be a challenge to enforce it.

We have used in the past tools like pre-commit in combination with commitizen to enforce the standard on git messages. A minimal .pre-commit-config.yaml with this example will be:


default_install_hook_types: [commit-msg, pre-commit, pre-push]
repos:
  - repo: https://github.com/commitizen-tools/commitizen
    rev: v3.13.0
    hooks:
      - id: commitizen
        stages: [commit-msg]
      - id: commitizen-branch
        stages: [pre-push]

.pre-commit-config.yaml

We also use it in Gitlab CI to ensure consistency:


check-commits:
  stage: lint
  image: commitizen/commitizen:latest
  before_script:
    - git remote set-branches origin master
    - git fetch --depth 2 origin master
  script:
    - cz check --rev-range origin/master..HEAD