Posts tagged ‘poly repo’

Mono Repo vs Micro Repo – Micro Repo wins in a landslide

A while back, I had to make the decision between a MonoRepo or a Micro Repo for my Rhyous libraries.
I chose Micro Repos because it seemed to be “common sense” better. I’m so glad I did.

What is a Mono Repo?

1 Repo for all your projects and libraries

What is a Micro Repo? (i.e. Poly Repo or Multi Repo)

1 Repo for 1 library. Sometimes Micro Repos are called Poly Repos or Multi Repos, but those terms don’t really imply the the 1 to 1 ratio or the level of smallness that Micro Repo implies.

Why My Gut Chose Micro Repos

I do not have the amount of code a large company like Microsoft or Google has. However, for the average developer, I have way more open source projects on my GitHub account.

Despite choosing micro repos over a mono repo for my open source, I have more experience with mono repos because the companies I worked for have had mono repos. Also, I chose a mono repo for a couple of my open source projects. I have zero regrets for any of my poly repos, but I regret all of my mono repos.

My employers were mono repos because they grew that way. It wasn’t a conscious, well-informed decision. To the last of my knowledge, they continue to be mono repos only because they are stuck. It is too hard (they think) to break up their mono repo.

I do not enjoy working with my mono repos.

It might sound awesome to have everything in one place. It isn’t. We’ve already proven this time and again.

Stop and think about what it looks like when you put all your classes and methods in one file. Is it good?
Think about that for a minute. Smaller is almost always better.

If this article sounds bias toward micro reops, it probably is, because I have long ago seen micro repos as the clear winner to this argument and struggle to find reasons to support a mono repo.

The goal of this article, is to show that the move to microservices isn’t the only recent movement, as we are well into the age of Microlibraries, and the move to microservces and microlibraries also includes a move to micro repos.

GitHub won the source control world and dominates the market. They won for many reasons, but one of those reasons is how easy they made working with micro repos.

The Dev world is better with micro repos. Your source code will be better with micro Repos.

I am writing a book called “Think Smaller: A guide to writing your best code” and before I unequivocally declare micro libraries as the way to go, I need to do an analysis on it because gut feelings can be wrong. The goal of this analysis is to investigate if my gut was wrong. It pains me to say it, but my gut has been wrong before. This time it wasn’t. Now here is the analysis of why my gut was right.

Mono Repo

Mono Repo with:

  • Direct project references (instead of use of package management)
  • Automated CI/CD Pipelines

Pros of Mono Repos

If a pro that is shared with micro repos, it is not listed.

  •  Atomic Changes/Large-Scale Code Refactoring – For a given set of code openable by an IDE as one group of code (often called a solution) you can do large scale refactoring using the IDE. There is a lot of tooling in the IDE around this.
    – However, when a mono repo has multiple solutions, you don’t get that for the other solutions. After that you have to write scripts, in which case, you get no benefit over micro repos.

Yes, it is true. I found only 1 pro. If you have a pro that is truly a pro of mono repo that can’t be replicated with micro repos, please comment.

Pros from other blog posts (Most didn’t stand up to scrutiny)

I did a survey of the first fie sites I found on a google search that list Mono Repo pros. Most of these turned out not to be pros:

  1. https://betterprogramming.pub/the-pros-and-cons-monorepos-explained-f86c998392e1
  2. https://fossa.com/blog/pros-cons-using-monorepos/
  3. https://kinsta.com/blog/monorepo-vs-multi-repo/
  4. *https://semaphoreci.com/blog/what-is-monorepo
  5. https://circleci.com/blog/monorepo-dev-practices/

* Only site that had real-world use cases.

Now, remember, just because someone writes in a blog (including this one that you are reading) that something is a pro or a con, you shouldn’t trust it without evidence and argument to back it up. A survey of such pros found most of the data is “made up” for click-bate. Most of the so-called pros and cons in these articles don’t hold up to scrutiny.

I will tag these blog-post-listed benefits after my analysis.

True = It is a benefit of only a mono repo
False = It is not a benefit of a mono repo at all. It is con listed as a pro.
Shared = You get this with both Mono Repos and Micro Repos

  • One source of truth — Instead of having a lot of repositories with their own configs, we can have a single configuration to manage all the projects, making it easier to manage.
    (False)
    Why false? Micro libraries are actually more of a single source of truth for any given piece of code. With a mono repo, every branch has a copy of a library even if there are no plans to edit that library. Many teams end up using many branches. Teams end up with dozens of branches and no one ever knows which ones have changes or not. Devs often don’t know which branch has the latest changes. There is nothing further from one source of truth.
  • Code reuse/Simplified Dependency Management — If there is a common code or a dependency that has to be used in different projects, can it be shared easily?
    (Shared or False)
    Why Shared? Sharing code is just as easy with Micro Repos. Publish the code to a package management system and anybody can share your code.
    Why False? There are huge burdens to sharing code as files as opposed to using a package managers such as npm, maven, nuget, etc. If 10 separate projects share code, and you need to update something simple such as folder layout of that code, you now can’t change a folder layout without breaking all 10. You have to find every piece of code in the entire repo that references the code and update all of them. You might not even have access to them all as they may be owned by other teams. This means it takes bureaucracy to make a a change to reused code. If a design (mono repo) leads to a state where doing something as simple as moving files and folders breaks the world, how can you call that design a pro and not a con?
  • Transparency — It gives us visibility of code used in every project. We will be able to check all the code in a single place.
    (Shared)
    Why Shared? Well, with Micro Libraries, just because they are separate repos doesn’t mean they aren’t in one place. Whether you are creating your repos in public GitHub, GitHub Enterprise, BitBucket, Amazon, Azure, or wherever, you still have your code in one place.
  • Atomic changes/Large-Scale Code Refactoring — We can make a single change and reflect the changes in all the packages, thus making development much quicker.
    (True)
    This is true. If you want to change something that affects an entire repo, or even a handful of projects in a repo, you can do it faster in mono repo.Careful, however. While this is true, this breaks the O in SOLID. If a library has to update all its consumers, it probably isn’t doing something right in the first place. This is an architectural warning sign that your architecture is bad. A second issue is that this ability also means you can make sweeping breaking changes.
  • Better Visibility and Collaboration Across Teams
    (Shared)
    Why Shared? Because with Micro Repos everyone can still have read-only access to all repos. They can still know what other teams are doing.Tooling is what matters here. With GitHub, I can search for code across multiple repos. A dev doesn’t have to be in a mono repo to see if code already exists. In fact, repo names give you one more item to search on that mono repos don’t have, which can help search results be better in micro repos than in mono repos.
  • (False) Lowers Barriers of Entry/Onboarding — When new staff members start working for a company, they need to download the code and install the required tools to begin working on their tasks

    Why False? A mono repo does not do a new developer any favors. This actually is more of a con than pro. There is no evidence that this a pro, while there are evidences that it is a con. A new dev has to check out often gigs of code. The statement “they need to download the code and install the required tools to begin working” implies they need to download all the code. If you have 100 GB, or even 10 GB, is checking all that out easier when onboarding someone? What about overwhelming a new dev? With a micro library, a new dev can download one micro repo, which is smaller, making it quicker to see, read, understand, run tests against, and code against. A new dev can be productive in an hour with a micro repo. With a mono repo, they might not even have the code downloaded in an hour, or even in the first week. I’ve seen mono repos that take three weeks to setup a running environment.

     

  • Easy to run the project locally
    (Shared)
    This usually requires a script. In a mono repo, the script will be part of the mono repo. In a poly repo, you can have that script in a separate repo that a new dev can check out in minutes (not hours or days) and run quickly.
    -This is about tooling, and isn’t a pro or con of either.
  • (False/Shared) Unified CI/CD – Shared pipelines for build, test, release, deploy, etc.Why false? Because sharing a pipeline isn’t a good thing. That is a con. That breaks DevOps best practices of developers managing their own builds. How can a dev have autonomy to change their pipeline if it affects every other pipeline?

    Why Shared?
    This is about tooling, and really is not a pro or con of either mono repos or micro repos. You can do this with either. However, it is far easier to get CI/CD working with micro repos.

Cons of Mono Repos

I was surprised by how the cons have piled up. However, it is not just important to list a con, but a potential solution to con. If there is an easy solution, you can overlook the con. If there is not an easily solution, the con should have more negative weight.

  1. Fails to prevent decoupling – Nothing in a mono repo prevents tight coupling by default.
    Solution: There is no solution in mono repos to this except using conventions.Note: Requiring conventions is a problem. I call them uphill processes. Like water takes the easiest path, so do people. When you make a convention, you are making an uphill process, and like water, people are likely not to follow them. Downhill processes are easier to follow. So conventions require constant training and costly oversite.

    Because of coupling is only prevented by convention, it is easier to fall into the trap of these coupling issues.

    There are many forms of coupling

    1. Solution coupling
    2. Project coupling
    3. File system coupling
      1. Folder coupling – Many projects can reference other files and folders. With mono repos you can’t even change file and folder organization without breaking the world.
      2. File coupling  – Other projects can share not just your output, but your actual file, which means what you think is encapsulated in private or internal methods, might not be encapsulated.
    4. Build coupling – Break one tiny thing and the entire build system can be held up. Also, you can spend processor power building thousands of projects that never changed every build.
    5. Test coupling – Libraries can easily end up with crazy test dependencies.
    6. Release coupling – You can spend more money on storage because you have to store the build output of every library every time .
  2. Fails to Prevent Monoliths – By doing nothing to prevent coupling, it does nothing to prevent monolithic code
    Solution: There is no solution in mono repos to this except using conventions.
    Monoliths are not exactly a problem, they are a state of a base. However, Monolith has come to mean a giant piece of coupled software that has to built, released, and deployed together because it is to big to break up.Note: About doing nothing. Some will argue that it isn’t the repo’s job to do the above. I argue that doing nothing to help is a con. If someone is about to accidentally run over a child with their car, and you can stop it easily and safely, but you don’t, would you argue that doing nothing is fine because that kid isn’t your responsibility? Of course not. Doing nothing to help is a con. While a repository usually doesn’t have life and death consequences, the point is that failing to prevent issues is a con.
  3. All Changes Are Major – A change can have major consequences. It can break the world. In a large mono repo, you could spend days trying to figure out what your change impacted and often end up having to revert code often.
    Solution: None really. You can change the way you reference project, using package management, instead, which essential means you have micro repos in your mono repo.
  4. Builds take a long time
    Solution: None, really. If you change code, every other piece of code that depends on that code must build.
    – Builds can take a long time because you have to build the world every time.
  5. Mono repos cost more – Even a tiny change can cause the entire world to rebuild, which can cost a lot of money in processor power and cloud build agent time.
  6. Releases with no changes – Many of your released code will be versioned, new, yet has no change from the prior version.
  7. Not SOLID – Does NOT promote any SOLID programming, in fact, it makes it easier to break SOLID practices
    Breaks the S in SOLID. MonoRepos are not single responsibility. You don’t think SOLID only applies to the actual code, right? It applies to everything around the code, too.

    1. Because a repo has many responsibilities, it is constantly changing, breaking the O in solid.
  8. Increases Onboarding Complexity – It is just harder to work with mono repos as a new developer. One repo does nothing to easy a new developer’s burdens. In fact, it increases them.
    Solution: Train on conventions. Train on how to do partial check-outs and often dependencies prevent this
    – Developers have to download often gigs and gigs of data. With the world-wide work-anywhere workplace, this can take days for some offsite developers, and may never fully succeed.
    – Overwhelming code base.
  9. Security – Information disclosure
    Solution: Some repo tools can solve this, but only if the code is not coupled.
    – Easy to give a new user access to all the code. In fact, it is expected that new users have access to all the code.
    – Often, you have to give access to the entire code base when only access to a small portion is needed.
  10. Ownership confusion
    Solution: None.
    – Who owns what part of the mono repo? How do you know what part of a mono-repo belongs to your team?
    – Does everyone own everything?
    – Does each team own pieces?
    – This becomes very difficult to manage in a mono repo.
  11. Requires additional teams – Another team slows down build and deploy changes
    Solution: None, really.
    Team 1 – Build Team
    Tends toward requiring a completely separate Build team or teams.
    – A dev has to go through beuracracy to make changes, which . . .
    – Prevents proper DevOps.Note: DevOps Reminder – Remember DevOps means that developers of the code (not some other team) do their own Ops. If you have a Build team, or a Deploy team, you are NOT practicing DevOps even if you call such a team a DevOps team. If I name my cat “Fish” the cat is still a cat, not a fish. A build team, a deploy team; even if they are called DevOps, they aren’t. In proper DevOps the only DevOps team is the DevOps enablement team. This team doesn’t do the DevOps for the developers, the team does work that enables coding developers to do their own DevOps more easily. If the same developers that write the code also write CI/CD pipelines (or use alrady written ones) for both Build and deploy autation, and the developers of the code don’t need to submit a ticket to the DevOps team to change it, then you are practicing DevOps.

    Team 2 – Repo Management Team
    – No this is NOT the same as the build team.
    – Many large companies are paying developers to fix issues with their repo software to deal with 100 GB sized repos.
    – Companies who use mono repos ofte need a team to fix limitations with the software they use to manage their mono repo

Notice the list of cons piling up against mono repos. I’m just baffled that any one who creates a pro/con list wouldn’t see this.

Conclusion

The pros of mono repos are small. The cons of mono repos are huge. How anyone can talk them up with a straight face baffles me.

Warning signs that mono repos aren’t it isn’t all they are cracked up to be:

  • The most touted examples of success are massive companies with massive budgets (Google, Microsoft, etc)
    • Some of those examples show newer technology moving away from Monoreps
      •  Microsoft Windows is a Mono Repo
        • dotnet core has 218 repositories and clearly shows that Microsoft’s new stuff is going to polyrepo
  • A lot of the blogs for mono repos failed to back up their pros with facts and data
  • Some of the sites are bias (sell mono repo management tools)

Micro Repo

Poly Repo With
– Microlibraries in a single git repo system with only one code project and it’s matching test project
– Each Releases to a Package Management Systems
– Automated CI/CD pipelines
– Shared Repo Container (i.e. all repos in the same place, such as GitHub)
===============================================================================================
Warning signs it isn’t all it’s cracked up to be:
– Big O(n) in regards to repos – you need n repos
Note: Yep, that is the only warning sign. So if you can script something.

Pros

Again, we will only list pros that aren’t shared with a mono repo

  • Promotes Microservices and Microlibraries – Poly Repos promote microservices and microlibraries as a downhill process. Downhill means it is the natural easiest way to flow, and the natural direction leads to decoupling.
    • A microservice builds a small process or web service that can be deployed as independently
    • A microlibrary builds a small shareable library to a package management system for consumption.
  • Easy to pass Joel Test #2 – Can you make a build in one step? Every microlibrary can make a build in one step. And if one of them stops doing it, it is often a 1 minute fix for that one microlibrary.
  • Small repeatable CI/CD yaml pipelines as code
    • Because the projects are micro, the CI/CD pipelines can be their smallest.
      Note: This isn’t shared with a mono repo, as their CI/CD pipelines have to build everything.
    • They are also more likely to be reuseable.
      • You can use the same CI/CD automation files on all microlibraries
      • Almost every project can share the exact same yaml code, with a few variables
    • Easy to find repeatable processes with tiny blocks
    • Add a CI/CD pipeline to automatically update NuGet packages in your micro repos. This can also benefit your security, as your will always have the latest packages. When you use the correct solution, you start to see synergies like this.
  • Prevents coupling (non-code)
    • Prevents solution coupling
    • Prevents project coupling
    • Prevents file system coupling
      • file coupling. You can’t easily reference a file in another repo. You can copy it and have duplicates.
      • Prevents build coupling
    • Prevents test coupling
    • Prevents release coupling – New releases of libraries go out as a new package to your favorite package management system without breaking anyone. (see npm, maven, nuget, etc.)
    • (Only doesn’t prevent code coupling)
  • Builds are extremely tiny and fast –  Building a microlibrary can take as little as a minute
    • You can create a new build for a microlibary any time quickly
    • Builds often you spend more time downloading package than building.
  • Breaking a Microlibrary doesn’t break the world
    • It creates a new version of a package, the rest of the world doesn’t rely on it
    • With proper use of SemVer, you can notify your subscribers of breaking change for those who do need to update your package
  • Completed microservices and microlibraries can stay completed
    • A microservice or microlibrary that is working might never need to update to a new verison of a package
  • Promotes SOLID coding practices for tooling around the code
    – It follows the S in SOLID. Your repo has limited responsibilities and has only one reason to change.
    – O in SOLID. Once a project is stable, it may never change, and may never need to be built/released again.
  • Simplifies Onboarding – A new dev can be product day 1 (and possible even the first hour)
    – A new developer can check out a single repo, run it’s unit test, and get a debugger to hit a break point in about 5 minutes.
    – Promotes staggered onboarding, where a developer can join, be productive on day one for any given repo, and then expand their knowledge to other repos.
    – Any single micro repo will not overwhelm a new developer
  • Security – you can give a new developer access to only the repos they need access to.
  • Single Source of Truth – A microlibrary is a single source of truth. The code exists nowhere else. Because it is a microlibrary (micro implying that it is very small), there will usually be no more than one or two feature branches at a time where code is quickly changed and merged.
  • Promotes Proper DevOps – Devs can easily manage their own build, testing, and releasing to a package management system.
  • Transitioning to Open Source is Easy – If one micro repo needs to go to open source, you just make it open source and nothing else is affected. (Be aware of open source licensing, that is a separate topic.)
  • Ownership Clarity – Each repo has a few owners and it is easy to know who are the owners
  • New Releases only when changed – The micro repo itself has to change, not new releases.

Pros that are shared

  • Single Place for all your code – Storing all your repos in one repository system, such as GitHub can give you many of the benefits of a Mono repo without the cons.
  • Code reuse/Simplified Dependency Management – Each micro repo hosts a microlibrary that publishes itself to a package management system for easy code sharing
  • Better Visibility and Collaboration Across Teams – It is so easy to see when and by whom a change was made to a microlibrary.
  • Easy to run a project locally

Cons

  • Atomic Changes/Large-Scale Code Refactoring – Always hard to make sweeping changing anyway.
    Solution:
    You can script these changes. You often still can do this, but not with IDE tools. This is- Inability to change your repos in bulk without scripting it. What if you need to change all your repos in bulk?
    – 2 -things
    1. This might not be a con. You will likely never have to do this. I almost put this in ‘Cons that aren’t actually cons’.
    2. If you do need to do this, you can script this pretty easily. But you have to script it. So that is why I left it here in cons.
  • Doesn’t prevent code coupling – Just because you consume a dependency using package management, doesn’t automatically make your code decoupled.
    Solution: None. Mono repo had no solution either, but at least all other coupling (folder, file, solution, project, etc. is prevented)
    – You still need to practice SOLID coding practices.
    – However, because the repo is separate, it becomes much more obvious when you introduce coupling.
  • Big O(n) Repos – You need a repo for every microlibrary.
    – Can be overwhelming for a new developer to look at the number of repos.

Domain-based Repos

Domain-based Repos are another option. These are neither micro repos nor mono repos. If you have 100 libraries and 5 of them are extremely closely related and often, when coding, you edit those five together, you can put those in a single repo. For those 5 libraries, it behaves as a mono repo.

It is very easy to migrate from Micro Repos to Domain-based Repos. You will quickly learn which microlibraries change together. Over time, you may merge two or more microlibraries into a Domain-based Repo to get the benefits of atomic changes at a smaller level in a few domain-related micro libraries.

My recommendation is that you move to micro libraries and then over time, convert only the libraries most touched together into domain-based repos.

Example: My Rhyous.Odata libraries are a domain-based repo. However, I almost always only touch one library at a time now, so I’ve been considering breaking them up for two years now. It made sense during initial development for them to be a domain-based repo, but now that they are in maintenance mode, it no longer makes sense. Needs change over time and that is the norm.

Git SubModules

The only feature micro libraries doesn’t compete with Mono Repos on is atomic changes. With technology like git submodules, you may be able to have atomic changes, which is really only needed for a monolith. Everything is a microlibrary in a micro repo but then you could have a meta repo that takes a large group of micro libraries and bundles them together using git submodule technology. That repo can store a script that puts the compiled libraries together and creates an output of your monolith, read to run and test.

Conclusion

Micro Repo is the clear winner and it isn’t even close. Choose Micro Repos every time.

Once you move to micro libraries, allowing a small handful of Domain-based repos is totally acceptable.

Every Project is Different

There may be projects where a mono repo is a better solution, I just haven’t seen it yet. Analyze the needs of your project, as that is more important than this article or any other article out there.