Building a better Perforce
JamHub lets you collaborate on game projects with your team. It's like Perforce, but free and open-source.
It's time for a free and open source collaboration system for game developers. Current closed-source systems, like Perforce and Plastic/Unity DevOps, are expensive and complex. Current open-source solutions, like Git or SVN, are difficult to use and do not scale with large projects containing large files. Game developers want to focus on their game, not their version control.Jam is an AGPL-licensed version control system that efficiently tracks changes to large files and large projects. You can try it out by logging in or check out the AGPL-licensed source on GitHub.
Join the Discord for updates or to get support.
Problem
Third generation version control systems have enabled code collaboration and sharing on a massive scale. Most notably Git in combination with hosting platforms like GitHub have improved our ability to reuse and share code. However, game developers have mostly been removed from this movement due to Git's poor support for large files and large projects. There are several problems that will also continue to get worse.
-
Files will get larger
Game developers are continuing to create larger files with more data. Developers want to version their data and assets along with their source code and not have to interact with another layer on top of their current system.
-
Repositories will get larger
Game developers are continuing to create more complex projects with more files, distributed across their team. Monorepos have become the de-facto way to manage large amounts of code in other industries but even those systems have little support for this. Large companies like Google and Facebook have created their own internal solutions to monorepo problems with Google's Piper and Facebook's Sapling but there are almost no options for companies outside these two.
-
Closed-source systems restrict sharing and innovation
Perforce is widely used in the professional games industry, but it's expensive, closed-source, and still has many issues. Game developers, as well as other developers in other industries, are looking for easy ways to collaborate with other people without having to go through a sales person.
-
Build and deploy times will approach zero
Long feedback cycles and deploy times crush engineering productivity. Eventually, the most productive companies will find ways to minimize the time to get feedback and deploy -- companies like Vercel have already begun this process for the web, but we're still a few years away from having truly "instant" previews for other industries. When build and deploy times approach zero, there will be little separation between local and remote development environments. Developers will begin to expect to deploy and collaborate in "realtime", rather than waiting to commit and push their changes.
-
Networks will get much faster and reliable
Current version control systems do not take advantage of how fast our networks can be. Downloading an entire monorepo with years of history for each file no longer makes sense and developers are nearly always online.
Fourth Generation Version Control
-
Seamless large file support
Git and other systems currently rely on an additional layer to support large files. Ultimately, large files are no different than small files, and our version control system should be able to scale efficiently between the two.
-
Monorepo support
Versioning and deploying a monorepo efficiently should be built into the version control system. Changes in one project (like adding a large binary) should not slow down the rest of the company. There needs to be support for a large amount of files (>100 million), built-in permissions, file locking, and dependency resolution to enable instant deploys and collaboration.
-
Efficient realtime syncing between local and remote
If build times approach zero, developers should be able to collaborate and ship code in real time, rather than waiting for other developers to manually commit push up their code. With syncing built into the version control system, we'll be able to know if merge conflicts occur and constantly merge remote changes into the local version.
-
Fully-featured virtual file system
To enable efficient local development and fast deploys, our version control system should be able to fetch files as-needed over a network, rather than cloning an entire repository every time.
-
Direct API Access
Many systems, such as game engines, CAD software, or file storage solutions, need ways to store and version large amounts of data. We should enable these systems to integrate into the version control system through an open API.
JamHub
JamHub is an in-development fourth generation version control system that is being built for game developers. You can currently do things most that you would expect from a current version control system, like pulling, pushing, and merging, but it's not quite ready for production use. The core algorithm has been implemented but there's more work remaining to build out features that developers expect from a full collaboration platform. Please join the Discord and star the repo in GitHub to follow future development!
Terminology
Since JamHub works a little differently than most version control systems, it's necessary to define some words since they may have slightly different meaning than other systems.
- Mainline - The production history of the project. Made up of a series of "commits" that represent good versions of the project.
- Workspace - A workspace for developers to make changes in. Developers will make "changes" in their workspace and merge into the "mainline" when approved/ready. "Changes" will be tracked while in the workspace, but will be squashed into a single "commit" when merged into the mainline. Eventually, changes will be able to be synced live between local developer machine and their workspace.
- Change - A snapshot of a workspace while developers are working on their project, made by doing a `jam push`.
- Commit - A snapshot of the production version of the project, made by merging in a "workspace".
- Merge - Occurs when a workspace is squashed and committed to the "mainline".
Benchmarks
This section compares JamHub upload and download speed for a directory to Git. Note that these numbers are not final and future features will give JamHub ways to make typical workflows faster, like directory mounting over NFS. Also, these are raw file measurements, meaning no previous versions are uploaded or downloaded (which is to Git's advantage).
- Git Source - 4287 files, 77MB
-
Upload Download Git 19.583s 47.616s JamHub 8.357s 4.265s - Linux Source - 78351 files, 1.4G
-
Upload Download Git 8m32.365s 5m18.194s JamHub 56.401s 28.531s - Celeba Dataset - 202599 files, 1.8G
-
Upload Download Git 1hr6m46s 1m52.665s JamHub 11m0.868s 4m39.101s
Algorithm
The idea behind JamHub based off of the rsync algorithm and Content Defined Chunking (CDC). If you haven't read these, I would highly recommend them!
How JamHub uses Rsync and CDC
The main idea behind JamHub is that we can store the operations sent by the sender in an rsync-like stream to track changes to a file. This means we treat rsync operations like a delta chain that we can use later to regenerate the file. The storage of deltas and their usage to regenerate a file is similar to the Mercurial concept of a Revlog. However, the advantage of using rsync blocks is that we can efficiently store changes to, and regenerate, arbitrarily large files since these blocks can be streamed and regenerated independently.
Data pointers
In each block, we can store the location of the last data block to regenerate the file efficiently. By using blocks instead of an xdelta approach, we can store pointers in each block find the last actual data block to use in the file, rather than regenerating the file through a delta chain which Mercurial does. Mercurial essentially caches the entire file at certain points and uses this later to have a smaller regeneration length.
Workspaces
A chain of changes, formed by the process above, can be used to regenerate every file in a project. Workspaces can be automatically rebased on top of the mainline. This means that every workspace will always be up-to-date. If conflicts occur during the rebase, a workspace will need manual merging.
Limitations
The goal is to be able to handle over 100M files and over 1TB-sized files in a single repository. We're not there yet in the current implementation (~1M files with 16GB-sized files) but should be there in the next couple months.
Implementation
JamHub is being written from scratch in Golang and uses mattn/go-sqlite3 to store projects and change information. gRPC and Protocol buffers are used for service definitions and data serialization.
Acknowledgements
This awesome site theme is made by @panr and adapted to this site.