Go IPFS v0.12.0 Release Notes

Release Date: 2022-02-17 // about 2 years ago
  • πŸš€ We're happy to announce go-ipfs 0.12.0. This release switches the storage of IPLD blocks to be keyed by multihash instead of CID.

    As usual, this release includes important fixes, some of which may be critical for security. Unless the fix addresses a bug being exploited in the wild, the fix will not be called out in the release notes. Please make sure to update ASAP. See our release process for details.

    πŸ›  BREAKING CHANGES

    • ipfs refs local will now list all blocks as if they were raw CIDv1 instead of with whatever CID version and IPLD codecs they were stored with. All other functionality should remain the same.

    ⚑️ Note: This change also effects ipfs-update so if you use that tool to mange your go-ipfs installation then grab ipfs-update v1.8.0 from dist.

    Keep reading to learn more details.

    πŸ”¦ Highlights

    There is only one change since 0.11:

    Blockstore migration from full CID to Multihash keys

    0️⃣ We are switching the default low level datastore to be keyed only by the Multihash part of the CID, and deduplicate some blocks in the process. The blockstore will become codec-agnostic.

    Rationale

    πŸ›  The blockstore/datastore layers are not concerned with data interpretation, only with storage of binary blocks and verification that the Multihash they are addressed with (which comes from the CID), matches the block. In fact, different CIDs, with different codecs prefixes, may be carrying the same multihash, and referencing the same block. Carrying the CID abstraction so low on the stack means potentially fetching and storing the same blocks multiple times just because they are referenced by different CIDs. Prior to this change, a CIDv1 with a dag-cbor codec and a CIDv1 with a raw codec, both containing the same multihash, would result in two identical blocks stored. A CIDv0 and CIDv1 both being the same dag-pb block would also result in two copies.

    How migration works

    ⚑️ In order to perform the switch, and start referencing all blocks by their multihash, a migration will occur on update. This migration will take the repository version from 11 (current) to 12.

    0️⃣ One thing to note is that any content addressed CIDv0 (all the hashes that start with Qm..., the current default in go-ipfs), does not need any migration, as CIDv0 are raw multihashes already. This means the migration will be very lightweight for the majority of users.

    The migration process will take care of re-keying any CIDv1 block so that it is only addressed by its multihash. Large nodes with lots of CIDv1-addressed content will need to go through a heavier process as the migration happens. This is how the migration works:

    πŸ”§ 1. Phase 1: The migration script will perform a pass for every block in the datastore and will add all CIDv1s found to a file named 11-to-12-cids.txt, in the go-ipfs configuration folder. Nothing is written in this first phase and it only serves to identify keys that will be migrated in phase 2. πŸ”§ 2. Phase 2: The migration script will perform a second pass where every CIDv1 block will be read and re-written with its raw-multihash as key. There is 1 worker performing this task, although more can be configured. Every 100MiB-worth of blocks (this is configurable), each worker will trigger a datastore "sync" (to ensure all written data is flushed to disk) and delete the CIDv1-addressed blocks that were just renamed. This provides a good compromise between speed and resources needed to run the migration.

    πŸ”€ At every sync, the migration emits a log message showing how many blocks need to be rewritten and how far the process is.

    # FlatFS specific migration

    🐎 For those using a single FlatFS datastore as their backing blockstore (i.e. the default behavior), the migration (but not reversion) will take advantage of the ability to easily move/rename the blocks to improve migration performance.

    πŸ‘ Unfortunately, other common datastores do not support renames which is what makes this FlatFS specific. If you are running a large custom datastore that supports renames you may want to consider running a fork of fs-repo-11-to-12 specific to your datastore.

    If you want to disable this behavior, set the environment variable IPFS_FS_MIGRATION_11_TO_12_ENABLE_FLATFS_FASTPATH to false.

    πŸ”§ ####### Migration configuration

    πŸ”§ For those who want to tune the migration more precisely for their setups, there are two environment variables to configure:

    • IPFS_FS_MIGRATION_11_TO_12_NWORKERS : an integer describing the number of migration workers - defaults to 1
    • IPFS_FS_MIGRATION_11_TO_12_SYNC_SIZE_BYTES : an integer describing the number of bytes after which migration workers will sync - defaults to 104857600 (i.e. 100MiB)
    Migration caveats

    Large repositories with very large numbers of CIDv1s should be mindful of the migration process:

    • πŸ‘€ We recommend ensuring that IPFS runs with an appropriate (high) file-descriptor limit, particularly when Badger is use as datastore backend. Badger is known to open many tables when experiencing a high number of writes, which may trigger "too many files open" type of errors during the migrations. If this happens, the migration can be retried with a higher FD limit (see below).
    • Migrations using the Badger datastore may not immediately reclaim the space freed by the deletion of migrated blocks, thus space requirements may grow considerably. A periodic Badger-GC is run every 2 minutes, which will reclaim space used by deleted and de-duplicated blocks. The last portion of the space will only be reclaimed after go-ipfs starts (the Badger-GC cycle will trigger after 15 minutes).
    • βͺ While there is a revert process detailed below, we recommend keeping a backup of the repository, particularly for very large ones, in case an issue happens, so that the revert can happen immediately and cases of repository corruption due to crashes or unexpected circumstances are not catastrophic.
    Migration interruptions and retries

    If a problem occurs during the migration, it is be possible to simply re-start and retry it:

    πŸ‘€ 1. Phase 1 will never overwrite the 11-to-12-cids.txt file, but only append to it (so that a list of things we were supposed to have migrated during our first attempt is not lost - this is important for reverts, see below).

    1. Phase 2 will proceed to continue re-keying blocks that were not re-keyed during previous attempts.

    βͺ ###### Migration reverts

    πŸš€ It is also possible to revert the migration after it has succeeded, for example to go to a previous go-ipfs version (<=0.11), even after starting and using go-ipfs in the new version (>=0.12). The revert process works as follows:

    1. The 11-to-12-cids.txt file is read, which has the list of all the CIDv1s that had to be rewritten for the migration. πŸ”§ 2. A CIDv1-addressed block is written for every item on the list. This work is performed by 1 worker (configurable), syncing every 100MiB (configurable). πŸ“Œ 3. It is ensured that every CIDv1 pin, and every CIDv1 reference in MFS, are also written as CIDV1-addressed blocks, regardless of whether they were part of the original migration or were added later.

    ⬇️ The revert process does not delete any blocks--it only makes sure that blocks that were accessible with CIDv1s before the migration are again keyed with CIDv1s. This may result in a datastore becoming twice as large (i.e. if all the blocks were CIDv1-addressed before the migration). This is however done this way to cover corner cases: user can add CIDv1s after migration, which may reference blocks that existed as CIDv0 before migration. The revert aims to ensure that no data becomes unavailable on downgrade.

    ⚑️ While go-ipfs will auto-run the migration for you, it will not run the reversion. To do so you can download the latest migration binary or use ipfs-update.

    Custom datastores

    πŸ— As with previous migrations if you work with custom datastores and want to leverage the migration you can run a fork of fs-repo-11-to-12 specific to your datastore. The repo includes instructions on building for different datastores.

    For this migration, if your datastore has fast renames you may want to consider writing some code to leverage the particular efficiencies of your datastore similar to what was done for FlatFS.

    πŸ”„ Changelog

    ❀️ Contributors

    Contributor Commits Lines Β± Files Changed
    Gus Eggert 10 +333/-321 24
    Steven Allen 7 +289/-190 13
    Hector Sanjuan 9 +134/-109 18
    Adin Schmahmann 11 +179/-55 21
    RaΓΊl Kripalani 2 +152/-42 5
    Daniel MartΓ­ 1 +120/-1 1
    frrist 1 +95/-13 2
    Alex Trottier 2 +22/-11 4
    Andrey Petrov 1 +32/-0 1
    Lucas Molas 1 +18/-7 2
    πŸ‘€ Marten Seemann 2 +11/-7
    whyrusleeping 1 +10/-0 1
    web3-bot 3 +9/-0 3
    postables 1 +5/-3 1
    Dr Ian Preston 1 +4/-0 1