Skip to main content

Understanding the KLM

· 6 min read
Benjamin Ingberg
Benjamin Ingberg

When running a remote cache and execution cluster based on Buildbarn the Key Location Map (KLM) is a term that you will run into and it is important to take proper care when sizing the KLM and the number of KLM attempts.

If you are just looking for a ballpark number to get you started, set the number of get attempts to 16 and the number of put attempts to 64 and use the following table.

CASAC
Average Object Size125KB1KB
Storage Size500TB1GB
KLM Entries16 000 0004 000 000
KLM Size1056MB264MB

These are arbitrarily chosen values which are unlikely to match your actual workload. I recommend reading the rest of the article to understand your settings and how to reason about this. You can then use the prometheus metrics in the end to validate if your settings are a good match for your workload.

How does the KLM work?

The KLM is a hash table which describes the position in your storage layer where your desired data is written and is indexed by hashing the key of your storage data.

Given a limited key space hash functions will have collisions, it is therefore important that the KLM is significantly larger than required for fitting a key for every object. To keep the likelihood of a collision low a naive implementation would require an enourmous hash table but using a technique called Robin Hood hashing this requirement can be kept down to a small factor larger than the size of the key set requirement.

For example, in a blobstore which can fit n objects and that has a KLM which can fit 2n entries every hash would have a 50% chance of corresponding to an already occupied slot. With Robin Hood hashing we can repeat this process multiple times by incrementing an attempts counter giving us multiple possible locations for the same object.

When querying for an object we can then search up to the maximum number of allowed iterations to find our object in one of these slots. When inserting we perform a similar solution, namely incrementing the number of attempts whenever we encounter a collision but taking care to insert the younger of the colliding objects in the colliding slot and pushing the older object forward.

The number of attempts we allow the KLM to look for a free slot is described by the two parameters key_location_map_maximum_get_attempts and the key_location_map_maximum_put_attempts described by the LocalBlobAccessConfiguration.

So, how big should the KLM be?

Given a utilization rate r the chance of finding the object within k iterations is 1-r^k, we can therefore either decrease the utilization rate (by increasing the size of the KLM) or increase the number of attempts.

Due to the random access nature of the KLM the KLM greatly benefits from being small enough to fit in memory, even if the KLM itself is disk backed. Should the KLM be too big to fit in memory it will be constantly paged in and out detrimenting the system performance.

Simularly, there also needs to be a max number of iterations, in the degenerate case where the storage fits more entries than the klm is capable of inserting is full the algorithm would never terminate since every single slot would be occupied.

Having a KLM that is too small for the number of iterations used is bad.

This is somewhat mitigated by the insertion order where the oldest entries get pushed out first, since they are less likely to be relevant. This gives a graceful degradation for when your KLM is too small. You should choose a KLM so that the number of times you reach the maximum number of iterations is acceptably low.

How rare should you keep the maximum nuber of iterations?

It should be rare, but most objects that get discarded due to the KLM being full will tend to be old and unused. There is however a point where it is no longer meaningful to have a larger KLM.

Ultimately, any time you read or write to a disk there is a risk of failure. Popularly this is described as happening due to cosmic radiation but more realistically it is due random hardware failures from imperfections in the hardware.

Picking k and r values that gives a risk of dataloss below the Uncorrectable Bit Error Rate (UBER) of a disk is simply wasteful, should you wish to reduce the risk below this value you need to look at mirroring data.

Western Digital advertises that their Gold enterprise NVME disks has an UBER rate of 1 in 10^17, i.e. about once per 10 petabytes of read data so will serve as a decent standard.

For a random CAS object of 125KB this corresponds to a failure rate of about 1 in 10^11 reads, giving us this neat graph.

diagram

That is, for a KLM using the recommended 16 iterations giving it more than 5 entries per object in the storage is a waste since you are just as likely to fail to read the object due to disk errors as due to the KLM accidentally pushing it out.

Similarly for 32 iterations there is no point in having more than 2 entries per object, and for 8 iterations there is no point in having more than 20 entries per object.

As for number of put iterations, just keep it at 4x the number of get iterations. There is no fancy math here, it just needs to be bigger than the number of get iterations and it is very cheap since you will only put objects a miniscule fraction of the amount of times you will get objects.

The thought of data randomly getting lost might upset you spiritually, but you can comfort yourself with that you are far more likely to lose due to AWS engineer tripping on a cable in the datacenter.

How do I verify if my KLMs are properly sized?

Buildbarn exposes the behavior of the hashing strategy in it's Prometheus metrics, they are exposed in the following metrics:

  • hashing_key_location_map_get_attempts
  • hashing_key_location_map_get_too_many_attempts_total
  • hashing_key_location_map_put_iterations
  • hashing_key_location_map_put_too_many_iterations_total

These metrics exposes the required number of get and put attempts respectively as well as how many times we exceeded the maximum number of iterations, you can read the ratio between how many iterations were required to figure out how full the klm is. I.e. if you perform half as many attempts with 2 iterations as with 1 iteration this implies the klm is half full.

There are ready made Grafana dashboards which visualizes these metrics in in bb-deployments.

Introducing Meroton’s New Course: Buildbarn Fundamentals

· 2 min read
Benjamin Ingberg
Benjamin Ingberg

At Meroton, we’ve long provided managed Buildbarn environments to help development teams streamline their build processes. Now, we’re excited to take it a step further with the introduction of our latest offering: Buildbarn Fundamentals.

This new course is designed to empower your team with the knowledge and practical skills needed to manage and operate your own Buildbarn environment. Whether you're just getting started with remote build execution or looking to take control of your infrastructure, this hands-on course will provide you with the tools to succeed.

Unlock the Power of Buildbarn with Expert Guidance

Buildbarn Fundamentals is more than just a training session; it’s an opportunity to set up a production-ready Buildbarn reference cluster in your own AWS environment. By participating in the course, your team will not only understand what makes Buildbarn tick but also walk away with a fully functional Remote Build Environment (RBE) cluster, which you can continue to use or adapt to your organization’s needs.

At Meroton, we believe that mastering the management of remote build environments is crucial for modern development workflows. This course is designed to be both comprehensive and practical, offering a deep dive into Buildbarn while equipping your team with the operational knowledge to effectively maintain and scale your infrastructure.

What You'll Learn

Buildbarn Fundamentals is a hands-on course that teaches participants how to:

  • Set up a fully operational Buildbarn cluster in AWS
  • Manage and operate Buildbarn clusters independently
  • Integrate third-party tools to enhance your build environment and understand the needs of consumers
  • Optimize caching, remote execution, and troubleshooting with Buildbarn

By the end of the course, your team will have the skills and confidence to self-manage a Buildbarn environment, enabling you to scale and optimize your development processes independently.

More Information

If you're ready to empower your team with the skills to manage your own Buildbarn infrastructure, get in touch! Contact us at sales@meroton.com to learn more and secure your spot. Or read more at Buildbarn Fundamentals

Summer Buildbar

· One min read
Benjamin Ingberg
Benjamin Ingberg

Before heads out to summer adventures I'd like to invite everyone to a cool summer Buildbar. At the Buildbar we'll eat good food, talk about interesting technical problems, new developments with Bazel and Buildbarn.

And also have a few beers.

Feel welcome to come over on Wednesday 19 June 2024, from 16 to 20.

Directions

You'll find us at our Linköping offices at Fridtunagatan 33. Currently there is some ongoing construction but follow the red lines and you'll be fine.

Fridtunagatan 33 Linköping

Automatically Reformat all Commits on a Branch

· 4 min read
Nils Wireklint
Nils Wireklint

If you have a formatter tool that can rewrite your code you can run it automatically on all unmerged commits. This will show you how to script git-rebase to do so without any conflicts.

There are two ways to do it manually, forward or backward. The forward pass amends each commit and deals with the conflicts when stepping to the next commit. In contrast the backwards pass, formats each commit from the end, which will avoid conflicts but for long commit chains it can be almost as boring.

This pattern comes up when working with long-lived feature branches, or tasks that were almost done, and then pre-empted by other prioritized work. Here are a few oneliners you can run to tidy up your commits.

See also the full technical guide for developing this git-rebase workflow in our documentation. Which contains more details on rebasing with git, using a scriptable editor to automate the git-rebase todo-list, as well as the squashed commit messages.

Example commits

Say you have three unmerged commits:

21cc7b5 My amazing feature e05fd9f Other complimentary work acb9fae Fix annoying bug

They contain important work, but you forgot to run some linters, or the main branch added more lint requirements after the feature work was started. This will run linters that can automatically fix issues on each commit through a scripted git-rebase.

Rebase algorithm

We have a three-step process to update each commit.

  • 1: Create a fixup commit with the applied lint suggestions, which we immediately revert so the next commit still applies

    #!/bin/sh

    # Formatters and fixers go here.
    # Replace with your tools of choice! rustfmt, gofmt, black, ...
    ./run-all-linters-and-autofixers.sh

    # Add a new commit with the changes and revert it again.
    git add -u
    git commit --allow-empty --fixup HEAD
    # 'git-revert' does not support '--allow-empty'.
    git revert --no-commit HEAD
    git commit --allow-empty --no-edit
  • 2: Squash the fixup commit into the original feature commit

  • 3: Squash the revert down into the next feature commit

These tabs show how the commits evolve and are squashed, the extra commits are grouped to indicate the target commit. The revert of the first commit is grouped with the second feature commit, and so on. We discard the final revert.

21cc7b5 My amazing feature
01900c5 fixup! My amazing feature

55feaba Revert "fixup! My amazing feature"
e05fd9f Other complimentary work
d122da7 fixup! Other complimentary work

249b0d3 Revert "fixup! Other complimentary work"
acb9fae Fix annoying bug
50e426a fixup! Fix annoying bug

7e84259 Revert "fixup! Fix annoying bug"

Oneliners

git allows us to set any editor to edit the todo-list, $GIT_SEQUENCE_EDITOR, and the commit message, $EDITOR. We choose vim as it is often available, and easier to use than sed and awk. It is nice to have a scriptable interactive editor to make changes to the workflow and try out the commands.

See the full technical guide for details and more tips on git-rebase and vim.

Reformat:

$ env                          \
GIT_SEQUENCE_EDITOR="true" \
git rebase -i --exec ./reformat.sh origin/main

Fixup (autosquash):

# More robust autosquash, that handles duplicated commit messages.
# If your commit messages are all unique you can use '--autosquash' instead.
# See the technical guide for more details.
$ env \
GIT_SEQUENCE_EDITOR="vim +'g/^\w* \w* fixup!/s/^pick/fixup/'" \
git rebase -i origin/main

Squash:

$ env                                                                                               \
EDITOR="sed -i '1,9d'" \
GIT_SEQUENCE_EDITOR="vim +'g/^#/d' +'normal! Gdk' +'g/^pick \w* Revert \"fixup!/normal! j0ces'" \
git rebase -i origin/main
info

We have not developed the incantation, git-rebase command, to preserve the author date from the original commits. We will address that next!

Improved Chroot in Buildbarn

· One min read
Nils Wireklint
Nils Wireklint

We have just started a documentation series describing the Buildbarn chroot runners, and how they can be used for hermetic input roots that contain all the required tools. This includes implementation notes for a "mountat" functionality created through the new Linux mount API, how you can use this under-documented API and its shortcomings. And how this can/will be integrated into Buildbarn, with technical descriptions of the workers and runners.

The first sections are already available, with more to come!

Sections:

Reference code repository:

Updates to Buildbarn as of November 2023

· 2 min read
Benjamin Ingberg
Benjamin Ingberg

This is a continuation of the previous update article and is a high level summary of what has happened in Buildbarn from 2023-02-16 to 2023-11-14.

Added support for JWTs signed with RSA

Support for JWTs signed with RSA has been added. The following JWT signing algorithms are now supported:

  • HS256
  • HS384
  • HS512
  • RS256
  • RS384
  • RS512
  • EdDSA
  • ES256
  • ES384
  • ES512

Generalized tuneables for Linux BDI options

Linux 6.2 added a sysfs attribute for toggling BDI_CAP_STRICTLIMIT on FUSE mounts. If using the FUSE backed virtual file system on Linux 6.2 adding { "strict_limit": "0" } to linux_backing_dev_info_tunables will remove the BDI_CAP_STRICTLIMIT flag from the FUSE mount.

This may improve fileystem performance especially when running build actions which uses mmap'ed files extensively.

Add support for injecting Xcode environment variables

Remote build with macOS may call into locally installed copies of Xcode. The path to the local copy of Xcode may vary and Bazel assumes that the remote execution service is capable of processing Xcode specific environment variables.

See the proto files for details.

Add a minimum timestamp to ActionResultExpiringBlobAccess

A misbehaving worker may polluted the action cache, after fixing the misbehaving worker we would rather not throw away the entire action cache.

A minimum timestamp in ActionResultExpiringBlobAccess allows us to mark a timestamp in the past before which the action should be considered invalid.

Add authentication to HTTP servers

Much like the gRPC servers are capable of authenticated configuration the http servers can now also require authentication.

This allows the bb_browser and bb_scheduler UI to authenticate access using OAuth2 without involving any other middleware.

This also allows us to add authorization configuration for administrative tasks such as draining workers or killing of jobs.

Authentication using a JSON Web Key Set

JSON Web Key Sets (JWKS) is a standard format which allows us to specify multiple different encryption keys that may have been used to sign our JWT authentication.

Buildbarn can load the JWKS specification, either inline or as a file, when specifying trusted encryption keys.

This allows us to have rotation with overlap of encryption keys.

Memory Adventure

· 8 min read
Nils Wireklint
Nils Wireklint

An adventure in finding a memory thief in Starlark-land

This is a summary and follow-up to my talk at BazelCon-2023. With abridged code examples, the full instructions are available together with the code.

Problem Statement

First, we lament Bazel's out-of-memory errors, and point out that the often useful Starlark stacktrace does not always show up. Some allocation errors just crash Bazel without giving and indication of which allocation failed.

allocation

This diagram illustrates a common problem for memory errors, the allocation that fails may not be the problem, it is just the straw that breaks the camel's back. And the real thief may already have allocated its memory.

We have seen many errors when working with clients, and they typically hide in big corporate code bases. Which complicates troubleshooting, discussion and error reporting. So we create a synthetic repository to try to illustrate the problem, and have something to discuss. The code and instructions are available here.

Errors and poor performance in the analysis phase are not good at all. This is because the analysis must always be done before starting to build all actions. With big projects the number of configuration to build for can be very large, so one cannot rely on CI runners to build the same configuration over and over, to retain the analysis cache. Instead it is on the critical-path for all builds, especially if the actions themselves are cached remotely.

To illustrate (some of the problem) we have a reproduction repository with example code base with some Python and C programs. To introduce memory problems, and make it a little more complex we add two rules: one CPU intensive rule ("spinlock") and one memory intensive aspect ("traverse"). The "traverse" aspect encodes the full dependency tree of all targets and writes that to a file with ctx.actions.write. So the allocations are tied to the Action object.

Toolbox

We have a couple of tools available, many are discussed in the memory optimization guide, but we find that some problems can slip through the cracks.

First off, there are the post-build analysis tools in bazel:

  • bazel info
  • bazel dump --rules
  • bazel aquery --skyframe_state

These are a good starting point and have served us well on many occasions. But with this project they seem to miss some allocations We will return to that later. Additionally, these tool will not give any information if the Bazel server crashes. You will need to increase the memory and run the same build again.

Then one can use Java tools to inspect what the JVM is doing:

The best approach here is to ask Bazel to save the heap if it crashes, so it can be analyzed post-mortem: bazel --heap_dump_on_oom

And lastly, use Bazel's profiling information:

  • bazel --profile=profile.gz --generate_json_trace_profile --noslim_profile

This contains structured information and is written continuously to disk, so if Bazel crashes we can still parse it, we just need to discard partially truncated events.

Expected Memory consumption

As the two rules write their string allocations to output files we get a clear picture of the expected RAM usage (or at least a lower bound).

$ bazel clean
$ bazel build \
--aspects @example//memory:eat.bzl%traverse \
--output_groups=default,eat_memory \
//...
# Memory intensive tree traversal (in KB)
$ find bazel-out/ -name '*.tree' | xargs du | cut -f1 | paste -sd '+' | bc
78504
# CPU intensive spinlocks (in KB)
$ find bazel-out/ -name '*.spinlock' | xargs du | cut -f1 | paste -sd '+' | bc
3400

Here is a table with the data:

Memory for each targetTotal
Memory intensive0-17 MB79 MB
CPU intensive136 KB3.4 MB

Reported Memory Consumption

Next, we check with the diagnostic tools.

$ bazel version
Bazelisk version: development
Build label: 6.4.0

Bazel dump --rules

$ bazel $STARTUP_FLAGS --host_jvm_args=-Xmx"10g" dump --rules
Warning: this information is intended for consumption by developers
only, and may change at any time. Script against it at your own risk!

RULE COUNT ACTIONS BYTES EACH
cc_library 4 17 524,320 131,080
native_binary 1 4 524,288 524,288
cc_binary 6 54 262,176 43,696
toolchain_type 14 0 0 0
toolchain 74 0 0 0
...

ASPECT COUNT ACTIONS BYTES EACH
traverse 85 81 262,432 3,087
spinlock14 35 66 524,112 14,974
spinlock15 35 66 0 0
...

First, there are some common rules that we do not care about here, then we have the Aspects. traverse is the memory intensive aspect, which is applied on the command line and spinlock<N> are the CPU intensive rules, with identical implementations just numbered (there are 25 of them).

It is a little surprising that only one have allocations. And the action count for each aspect does not make sense either, as this is not a transitive aspect. It just runs a single action each time the rule is instantiated. The hypothesis is that this is a display problem, with code shared between rules. There are 25 rules, with 25 distinct implementation functions, but they in turn call the same function with the action. So the "count" and "actions" columns are glued together, but the "bytes" is reported for just one of the rules (it would be bad if this was double-counted).

Either way, the total number of bytes does not add up to what we expect. Compare the output to the lower-bound determined before:

| | Memory for each target | Total | Reported Total | | ---- | ---- | ----------- | | Memory intensive | 0-17 MB | 79 MB | 262 kB | CPU intensive | 136 KB | 3.4 MB | 524 kB

Skylark Memory Profile

info

This is not part of the video.

The skylark memory profiler is much more advanced, and can be dumped after a successful build.

$ bazel $STARTUP_FLAGS --host_jvm_args=-Xmx"$mem" dump \
--skylark_memory="$dir/memory.pprof"
$ pprof manual/2023-10-30/10g-2/memory.pprof
Main binary filename not available.
Type: memory
Time: Oct 30, 2023 at 12:16pm (CET)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2816.70kB, 73.34% of 3840.68kB total
Showing top 10 nodes out of 19
flat flat% sum% cum cum%
512kB 13.33% 13.33% 512kB 13.33% impl2
256.16kB 6.67% 20.00% 256.16kB 6.67% traverse_impl
256.11kB 6.67% 26.67% 256.11kB 6.67% _add_linker_artifacts_output_groups
256.09kB 6.67% 33.34% 256.09kB 6.67% alias
256.09kB 6.67% 40.00% 256.09kB 6.67% rule
256.08kB 6.67% 46.67% 256.08kB 6.67% to_list
256.06kB 6.67% 53.34% 256.06kB 6.67% impl7
256.04kB 6.67% 60.01% 256.04kB 6.67% _is_stamping_enabled
256.04kB 6.67% 66.67% 256.04kB 6.67% impl18
256.03kB 6.67% 73.34% 768.15kB 20.00% cc_binary_impl

Here the Memory intensive aspect shows up with 256kB, which is inline with the output from bazel dump --rules, but not reflecting the big allocations we know it makes.

Eclipse Memory Analyzer

The final tool we have investigated is the Java heap analysis tool Eclipse Memory Analyzer, which can easily be used with Bazel's --heap_dump_on_oom flag. On the other hand it is a little tricker to find a heap dump from a successful build.

eclipse-analysis

Here we see the (very) big allocation clear as day, but have no information of its provenance.

We have not found how to track this back to a Skylark function, Skyframe evaluator or anything that could be cross-referenced with the profiling information.

Build Time

The next section of the talk shows the execution time of the build with varying memory limits.

combined

This is benchmarked with 5 data points for each memory limit, and the plot shows failure if there was at least one crash among the data points. There is a region where the build starts to succeed more and more often, but sometimes crashes. So the Crash and not-crash graphs overlap a little, you want to have some leeway to avoid flaky builds from occasional out-of-memory crashes.

We see that the Skymeld graph requires a lot less memory than a regular build, that is because our big allocations are all tied to Action objects. Enabling Skymeld lets Bazel start executing Actions as soon as they are ready, so the resident set of Action objects does not grow so large, and the allocations can be freed much sooner.

Pessimization with limited memory

pessimization

We saw a hump in the build time for the Skymeld graph, where the builds did succeed in the 300 - 400 MB range, but the build speed gradually increased, reaching a plateau at around 500 MB. This is a pattern we have seen before, where more RAM, or more efficient rules can improve build performance.

This is probably because the memory pressure and the Java Garbage Collector interferes with the Skyframe work. See Benjamin Peterson's great talk about the Skyframe for more information.

Future work

example profile

This section details future work for more tools and signals that we can find from Bazel's profile information --profile=profile.gz --generate_json_trace_profile --noslim_profile. Written in the standard chrome://tracing format it is easy to parse for both successful and failed builds.

This contains events for the garbage collector, and all executed Starlark functions.

These can be correlated to find which functions are active during, or before, garbage collection events. Additionally, one could collect this information for all failed builds, and see if some functions are overrepresented among the last active functions for each evaluator in the build.

BazelCon 2023

· One min read
Fredrik Medley
Fredrik Medley

Meroton visited BazelCon 2023 in Munich October 24-25, 2023. During the conference, we held three talks:

Other talks that mentioned Buildbarn were:

We are thankful for all amazing chats with the community and are looking forward to BazelCon 2024.

Buildbarn Block Sizes

· 3 min read
Benjamin Ingberg
Benjamin Ingberg

When starting out with remote caching, an error you are likely to run into is:

java.io.IOException: com.google.devtools.build.lib.remote.ExecutionStatusException:
INVALID_ARGUMENT: Failed to store previous blob 1-<HASH>-<LARGE_NUM>:
Shard 1: Blob is <LARGE_NUM> bytes in size,
while this backend is only capable of storing blobs of up to 238608384 bytes in size

This is because your storage backend is too small. You are attempting to upload a blob larger than the largest blob accepted by your storage backend.

How do I fix it?

The largest blob you can store is the size of your your storage device divided by the number of blocks in your device.

To store larger blobs, either increase the size of your storage device or decrease the number of blocks it is split into. Larger storage devices will take more disk, while fewer blocks will decrease the granularity which your cache works with.

In bb-deployments this setting is found in storage.jsonnet.

{
// ...
contentAddressableStorage: {
backend: {
'local': {
// ...
oldBlocks: 8,
currentBlocks: 24,
newBlocks: 1,
blocksOnBlockDevice: {
source: {
file: {
path: '/storage-cas/blocks',
sizeBytes: 8 * 1024 * 1024 * 1024, // 8GiB
},
},
spareBlocks: 3,
},
// ...
},
},
},
// ...
}

To facilitate getting started bb-deployments emulates a block device by using an 8GiB large file. This file is small enough to fit most builds while not taking over the disk completely from a developers machine.

The device is then split into 36 blocks (8+24+1+3), where each block can then store a maximum of 238608384 bytes (8GiB / 36 - some alignment).

In production it is preferable to use a large raw block device for this purpose.

What does new/old/current/spare mean?

In depth documentation about all the settings are available in the configuration proto files.

In essence the storage works as a ringbuffer where the assignment of each block is rotated. Consider a 5 block configuration with 1 old, 2 current, 1 new and 1 spare block.

diagram

As data is referenced from an old block it gets written into a new block. When the new block is full the role rotates.

diagram

There are some tradeoffs in behaviour to consider when choosing your block layout. Fewer blocks will allow larger individual blobs at the cost of granularity. Here is a quick summary of the meaning of the different fields.

  • Old - Region where reads are actively copied over to new, too small value and your device behaves more like a FIFO than a LRU cache, too large and your device does a lot of uneccesary copying.
  • Current - Stable region, should be the majority of your device.
  • New - Region for writing new data to, must be 1 for AC and should be 1-4 for CAS. Having a couple of new blocks allows data to be better spread out over the device so as to not expire at the same time.
  • Spare - Region for giving ongoing reads some time to finish before data starts getting overwritten.

Updates to Buildbarn deployment repo as of Febuary 2023

· 4 min read
Benjamin Ingberg
Benjamin Ingberg

The example configuration project for buildbarn bb-deployments has gotten updates.

This is a continuation of the updates from last year article and is a high level summary of what has happened since April 2022 up to 2023-02-16.

Let ReferenceExpandingBlobAccess support GCS

ReferenceExpandingBlobAccess already supports S3 so support was extended to Google Cloud Storage buckets.

Support for prefetching Virtual Filesystems

Running workers with Fuse allows inputs for an action to be downloaded on demand. This significantly reduces the amount of data that gets sent in order to run overspecified actions. This however leads to poor performance for actions which reads a lot of their inputs synchronously.

With the prefetcher most of these actions can be recognized and data which is likely to be needed can be downloaded ahead of time.

Support for sha256tree

Buildbarn has added support for sha256tree which uses sha256 hashing over a tree structure similar to blake3.

This algorithm will allow large CAS objects to be chunked and decompositioned with guaranteed data integrity while still using sha256 hardware instructions.

Completeness checking now streams REv2 Tree objects

This change introduces a small change to the configuration schema. If you previous had this:

backend: { completenessChecking: ... },

You will now need to write something along these lines:

backend: {
completenessChecking: {
backend: ...,
maximumTotalTreeSizeBytes: 64 * 1024 * 1024,
},
},

See also the bb-storage commit 1b84fa8.

Postponed healthy service status

The healthy and serving status, i.e. HTTP /-/healthy and grpc_health_v1.HealthCheckResponse_SERVING, are now postponed until the whole service is up and running. Before, the healthy status was potentially reported before starting to listen to the gRPC ports. Kubernetes will now wait until the service is up before forwarding connections to it.

Server keepalive parameter options

The option buildbarn.configuration.grpc.ServerConfiguration.keepalive_parameters can be used for L4 load balancing, to control when to ask clients to reconnect. For default values, see keepalive.ServerParameters.

Graceful termination of LocalBlobAccess

When SIGTERM or SIGINT is received, the LocalBlobAccess now synchronize data to disk before shutting down. Deployments using persistent storage will no longer observe loss of data when restarting the bb_storage services.

Non-sector Aligned Writes to Block Device

Using sector aligned storage is wasteful for the action cache where the messages are typically very small. Buildbarn can now fill all the gaps when writing, making storage more efficient.

DAG Shaped BlobAccess Configuration

Instead of a tree shaped BlobAccess configuration, the with_labels notation allows a directed acyclic graph. See also the bb-storage commit cc295ad.

NFSv4 as worker filesystem

The bb_worker can now supply the working directory for bb_runner using NFSv4. Previously, FUSE and hard linking files from the worker cache were the only two options. This addition was mainly done to overcome the poor FUSE support on macOS.

The NFSv4 server in bb_worker only supports macOS at the moment. No effort has been spent to write custom mount logic for other systems yet.

Specify forwardMetadata with a JMESPath

Metadata forwarding is now more flexible, the JMESPath expressions can for example add authorization result data. The format is described in grpc.proto.

A common use case is to replace

{
forwardMetadata: ["build.bazel.remote.execution.v2.requestmetadata-bin"],
}

with

{
addMetadataJmespathExpression: '{
"build.bazel.remote.execution.v2.requestmetadata-bin":
incomingGRPCMetadata."build.bazel.remote.execution.v2.requestmetadata-bin"
}',
}

Tracing: Deprecate the Jaeger collector span exporter

This option is deprecated, as Jaeger 1.35 and later provide native support for the OpenTelemetry protocol.

bb-deployments Ubuntu 22.04 Example Runner Image

The rbe_autoconfig in bazel-toolchains has been deprecated. In bb-deployments it has been replaced by the Act image ghcr.io/catthehacker/ubuntu:act-22.04, distributed by catthehacker, used for running GitHub Actions locally under Ubuntu 22.04.

bb-deployments Integration Tests

The bare deployment and Docker Compose deployment have now got tests scripts that builds and tests @abseil-hello//:hello_test remotely, shuts down and then checks for 100% cache hit after restart. Another CI test is checking for minimal differences between the Docker Compose deployment and the Kubernetes deployment.

If there are any other changes you feel deserve a mention feel free to submit a pull request at github using the link below.