Skip to main content

Bazel 6 Errors when using Build without the Bytes

· 4 min read
Benjamin Ingberg
Benjamin Ingberg

** UPDATE: ** Bazel has a workaround for this issue preventing the permanent build failure loop from 6.1.0 and a proper fix with the introduction of --experimental_remote_cache_ttl in Bazel 7


Starting from v6.0.0, Bazel crashes when building without the bytes. Because it sets --experimental_action_cache_store_output_metadata when using --remote_download_minimal or --remote_download_toplevel.

Effectively this leads to Bazel getting stuck in a build failure loop when your remote cache evicts an item you need from the cache.

developer@machine:~$ bazel test @abseil-hello//:hello_test --remote_download_
minimal

[0 / 6] [Prepa] BazelWorkspaceStatusAction stable-status.txt
ERROR: /home/developer/.cache/bazel/_bazel_developer/139b99b96c4ab6cba5122193
1a36e346/external/abseil-hello/BUILD.bazel:26:8: Linking external/abseil-hell
o/hello_test failed: (Exit 34): 42 errors during bulk transfer:
java.io.FileNotFoundException: /home/developer/.cache/bazel/_bazel_developer/
139b99b96c4ab6cba51221931a36e346/execroot/cache_test/bazel-out/k8-fastbuild/b
in/external/com_google_absl/absl/base/_objs/base/spinlock.pic.o (No such file
or directory)
...
Target @abseil-hello//:hello_test failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 3.820s, Critical Path: 0.88s
INFO: 5 processes: 4 internal, 1 remote.
FAILED: Build did NOT complete successfully
@abseil-hello//:hello_test FAILED TO BUILD

The key here is (Exit 34): xx errors during bulk transfer. 34 is Bazel's error code for Remote Error.

The recommended solution is to set the flag explicitly to false, with --experimental_action_cache_store_output_metadata=false. To quickly solve the issue on your local machine you can run bazel clean. However, this will just push the error into the future.

The bug is independent of which remote cache system you use and is tracked at GitHub.

Background

When performing an analysis of what to build Bazel will ask the remote cache which items have already been built. Bazel will only schedule build actions for items that do not already exist in the cache. If running a build without the bytes1 the intermediary results will not be downloaded to the client.

Should the cached items be evicted then Bazel will run into an unrecoverable error. It wants the remote system to perform an action using inputs from the cache, but they have disappeared. And Bazel can not upload them, as they were never downloaded to the client. The build would then dutifully crash (some work has been put into trying to resolve this on the bazel side but it has not been considered a priority).

This puts an implicit requirement on the remote cache implementation. Artifacts need to be saved for as long as Bazel needs them. The problem here is that this is an undefined period of time. Bazel will not proactively check if the item still exists, nor in any other manner inform the cache that it will need the item in the future.

Before v6.0.0

Bazel tied the lifetime of which items already exists in the cache (the existence cache) to the analysis cache. Whenever the analysis cache was purged it would also drop the existence cache.

The analysis cache is purged quite frequently. It would therefore be rare in practice, that the existence cache would be out of date. Furthermore, since the existence cache was an in-memory cache, Bazel crashing would forcefully evict the existence cache. Thereby fixing the issue.

After v6.0.0

With the --experimental_action_cache_store_output_metadata flag enabled by default the existence cache is instead committed to disk and never dropped during normal operation.

This means two things:

  1. The implied requirement on the remote cache is effectively infinite.
  2. Should this requirement not be met the build will fail. And since the existence cache is committed to disk Bazel will just fail again the next time you run it.

Currently the only user-facing way of purging the existence cache is to run bazel clean. Which is generally considered an anti-pattern.

If you are using the bb-clientd --remote_output_service to run builds without the bytes (an alternative strategy to --remote_download_minimal) this will not affect you.

Footnotes

  1. When using Bazel with remote execution remote builds are run in a remote server cluster. There is therefore no need for each developer to download the partial results of build. Bazel calls this feature Remote Builds Without the Bytes. The progress of the feature can be tracked at GitHub.

BuildBar at the Meroton Office

· One min read
Benjamin Ingberg
Benjamin Ingberg

After BazelCon we've all gotten a bit giddy and you might feel excited how the information presented at BazelCon might impact your development workflow. For that purpose we're inviting everyone in the wider bazel community to an open BuildBar at the Meroton offices.

Come over and digest BazelCon with high level technical discussions of the talks, great company, a pleasant atmosphere and also beer.

Feel welcome to come over this Thursday the first of December from 16 to 20.

Directions

You'll find us at our Linköping offices at Fridtunagatan 33. Currently there is some construction but follow the red lines and you'll be fine.

Fridtunagatan 33 Linköping

Remote Executors for the Free Environment

· 2 min read
Benjamin Ingberg
Benjamin Ingberg

We've performed some updates to the free tier and our pricing model.

Shared Cache

The free tier has been upgraded into an environment with a 1TB cache. This means that you can use the free tier without worrying about hitting any limits or setup process.

Do note that while the cached items will only be accessible with the correct api keys, the storage area can be reclaimed by anyone. I.e. as the cache becomes full your items might be dropped.

With the current churn we expect any items in the cache last at least a week. However, you should always treat items in the cache as something which might be dropped at any moment. The purpose and design of the cache is to maximize build performance, not to provide storage.

Shared executors

New for the free environment is the introduction of remote execution, this was previously only available for paying customers.

If you're using bazel you can simply add the --remote_executor=<...> flag and your builds will be done remotely.

The free environment has access to 64 shared executors.

Starter tier

If you need dedicated storage for evaluation purposes we also offer a starter tier. The starter tier is a streamlined environment with 100 GB of dedicated cache and access to the same 64 executors as the free tier.

This allows you to try out remote execution and caching without being disturbed by others. While the 100GB dedicated cache will only be used by you and therefore not be overwritten by anyone else's build, the cache is still a cache and the system will still drop the cache entries if it determines it is useful for maximizing performance.

Need more executors?

At the moment the starter tier has a fixed amount of shared executors. If you need a scalable solution contact us for setting up a custom Buildbarn environment of any size.

Tips, Tricks & Non-Deterministic Builds

· 2 min read
Benjamin Ingberg
Benjamin Ingberg

When you have a remote build and cache cluster it can sometimes be hard to track down what exactly is using all of your building resources. To help with this we have started a tips and trix section in the documentation where we will share methods we use to debug and resolve slow builds.

The first section is about build non-determinism. Ideally your build actions should produce the same output when run with the same input, in practice this is sometimes not the case. If you are lucky a non-deterministic action won't be noticed since the inputs for the non-deterministic action is unchanged it won't be rebuilt.

If you're not so lucky the non-determinism stems from a bug in the implementation and you should definitely pay attention to them. But how do you know which if any actions are non-deterministic?

This is not trivial but we have added a server side feature which allows detection of non-determinism with virtually no effort on your part.

Once activated it reruns a configured fraction of your actions and automatically flags them if they produce different outputs. The scheduling is done outside of your bazel invocation so your build throughput will be unaffected at the cost of an increase in the number of resources consumed. We suggest 1% which will only increase your resource use by a trivial amount but you could of course set it to 100% which would double the cost of your builds.

Updates to Buildbarn deployment repo as of April 2022

· 2 min read
Benjamin Ingberg
Benjamin Ingberg

The sample configuration project for Buildbarn was recently updated after a long hiatus. As an aid for people to understand which changes have been done see the following high level summary.

April 2022 Updates

This includes updates to Buildbarn since December 2020.

Authorizer Overhaul

Authorizers have been rehauled to be more flexible it is now part of each individual cache and execution configuration.

Using a JWT authorization bearer token has been added as an authorization method.

Hierarchical Blob Access

Using hierarchical blob access allows blobs in instance name foo/bar to be accessed from instance foo/bar/baz but not instance foo or foo/qux.

Action Result Expiration

An expiry can be added to action result which lets the action cache purge the result of an exection that was performed too far in the past. This can be used to ensure that all targets are rebuilt periodically even if they are accessed frequently enough to not normally be purged from the cache.

Read Only Cache Replicas

Cache read traffic can now be sent to a read-only replica which is periodically probed for availability.

Concurrency Limiting Blob Replication

Limit the number of concurrent replications to prevent network starvation

Run Commands as Another User

Allows the commands to be run as a different user, on most platforms this means the bb-runner instance must run as root.

Size Class Analysis

Allows executors of different size classes to be used, the scheduler will attempt to utilize executors efficiently but there is an inherent tradeof between throughput and latency. Once configured the scheduler will automatically attempt to keep track of which actions are best run on which executors.

Execution Routing Policy

The scheduler accepts an execution routing policy configuration that allows it to determine how to defer builds to different executors.

If you see any other changes you feel should get a mention feel free to submit a pull request at github using the link below.

Purpose of the Articles

· One min read
Benjamin Ingberg
Benjamin Ingberg

The purpose of these articles is to have a freeform area discussing ideas, technical issues, solutions and news in an indepth relaxed manner. It is not to serve as reference material, structured reference material should be available in the documentation section.

The article format allows more in depth on discussions for reacurring subjects. In contrast to the documentation published articles aren't changed, if the subject requires a revisit in the future then we publish a new post and add references to the old.

If you see any errors feel free to submit a pull request at github using the link below.