Developer, Knowledge Management Advocate
Dev Journal
Ouroboros DB Dev Journal: Erasure Coding

Ouroboros DB Dev Journal: Erasure Coding

This is a Dev Journal for the Ouroboros DB Project.
I try to write down my thoughts and ideas Somewhat structured to have it as a reference for later and i publish it to give this information a chance to help other and to get feedback from the community.
If you have feedback you can write me on my Mastodon.

Pls note that this is only a Dev Journal and not a Blog Post, so it may be a bit unstructured and not as polished as a Blog Post, including typos and other errors.

TL;DR:

Today i worked on a refactoring of the architecture and feasibility of erasure coding for Ouroboros DB. There are some concerns i had regarding the potential index size and overhead that will result from it, and as it turns out, it is not as bad as i thought but i need a DHT for the index.

Architecture

I think it would be super handy to be able to run parts of data pipelines (eg. storing files into chunked,compressed, encrypted and then erasure coded blocks) in a distributed manner. for safety and because i want to be able to run trustless nodes that can’t see the raw data, it would be necessary to have the file chunking, compression and encryption part on the server the client currently speaks too, although it would be possible to pre-chunk the file on the client and upload it to different nodes, although here is the question if it is easy to port the chunking algorithm to browser JS or WASM.

Modules

Refactoring of the code is needed to implement erasure coding, maybe we can even get rid of the chunk entirely as a stored thing since we have the data in the parity blocks already.

Maybe best to add the erasure coding to the StoreDataPipeline and add the needed erasure coding metadata to ChunkData.

ASCII Art of the new architecture (click to get the .txt file):

ASCII Art of the new architecture

Erasure Coding

For now i think it is okay yo make the erasure coding a simple n6 k3, which would result in a parity block size of 327,680 Bytes or 0.31MB as average. This would result in about 1.875MB per chunk that is 1.25MB big in average. This would result in a 50% overhead for the erasure coding, which i think is quite okay.

For the goal of storing 100TB in it, which are about 400M Chunks, we would need 2,400,000,000 parity blocks, aka 2,4 Billion.

If we consider following overhead for the erasure coding:

Parity Meta DataThis is the Meta Data for a `parity block`:
{
"parityHash": "aeae379a6e857728e44164267fdb7a0e27b205d757cc19899586c89dbb221930f1813d02ff93a661859bc17065eac4d6edf3c38a034e6283a84754d52917e5b0",
"chunkHash": "aeae379a6e857728e44164267fdb7a0e27b205d757cc19899586c89dbb221930f1813d02ff93a661859bc17065eac4d6edf3c38a034e6283a84754d52917e5b0",
"sizeByte": 433000,
"lastChecked": 1724760126282,
"reblanceLog": [
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    }
],
"userLog": [
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    },
    {
    "time": 1724760126282,
    "from": "2deb000b57bfac9d72c14d4ed967b572"
    }
],
"this is for the KV Key": "aeae379a6e857728e44164267fdb7a0e27b205d757cc19899586c89dbb221930f1813d02ff93a661859bc17065eac4d6edf3c38a034e6283a84754d52917e5b0"
}

We see that parity block metadata has under 2373 Bytes of overhead, which is about 0.72%% of the parity block size (very good). If one node would need to store the entire hash table we look at a size of 5.18TB, which means that we need a DHT for this to work. This also is the case for the Events, which introduce more storage overhead (more in DHT )

DHT

We need additional DHT Metadata like this:

DHT Meta Data
{
  "parityHash": "aeae379a6e857728e44164267fdb7a0e27b205d757cc19899586c89dbb221930f1813d02ff93a661859bc17065eac4d6edf3c38a034e6283a84754d52917e5b0",
  "storingNodes": [
    {
      "node": "2deb000b57bfac9d72c14d4ed967b572",
      "lastValidated": 1724760126282
    },
    {
      "node": "2deb000b57bfac9d72c14d4ed967b572",
      "lastValidated": 1724760126282
    },
    {
      "node": "2deb000b57bfac9d72c14d4ed967b572",
      "lastValidated": 1724760126282
    },
    {
      "node": "2deb000b57bfac9d72c14d4ed967b572",
      "lastValidated": 1724760126282
    },
    {
      "node": "2deb000b57bfac9d72c14d4ed967b572",
      "lastValidated": 1724760126282
    }
  ]
}

This example meta data has a size of 674 Bytes, which is about 0.21% of the parity block size. The DHT meta data would require an additional 1.47TB of storage.

Overhead

DHT meta data in combination with the parity block meta data would have a total overhead of 0.93%% of the erasure coding metadata relative to the erasure coding parity block size, the entire metadata overhead at 100TB and 2,4 Billion parity blocks would be 6.65TB.

Adding this to the overhead from the erasure coding which are 50% we would have a total overhead of 56.65% relative to the raw data size. If we have a utilization of each chunk of 1310720 Bits which has been achieved with 10MB files with random binary data. This would result in a HDD to raw data ratio of 43.35% (which is pretty good for a k6n3 config and having self healing capabilities) for following configuration:

  • 100TB of raw data
  • data deduplication via BuzzHash into chunks
  • chunk compression via Zstd
  • chunk encryption via AES256
  • each chunk becomes 6 parity blocks of which 3 can be lost without data loss

Conclusion

I have written a Google Calc Spreadsheet for it, which you can find here: ouroboros-db Overhead Calculator.

Though this Spreadsheet is neat, i discoverd that i need some surface plotting to find the best configuration for the erasure coding.
Sadly there is no way to do this in Google Calc, so i will need to write a small Browser App for it… maybe i use Svelte for it, i haven’t used it in a while now.

Further Reading

A good overview over what erasure coding is about: Erasure Coding for Distributed Systems
Also See HN comments

Technology Distributed Systems Data Storage Erasure Coding Software Development Mia's Projects FLOSS Open Source Golang OuroborosDB Data Integrity Data Engineering CRC
Loading...