Improving Data Availability in Decentralized Storage Systems
Doctoral thesis

View/ Open
Date
2023Metadata
Show full item recordCollections
- PhD theses (TN-IDE) [24]
Original version
Improving Data Availability in Decentralized Storage Systems by Racin Nygaard, Stavanger : University of Stavanger, 2023 (PhD thesis UiS, no. 693)Abstract
Preserving knowledge for future generations has been a primary concern for humanity since the dawn of civilization. State-of-the-art methods have included stone carvings, papyrus scrolls, and paper books. With each advance in technology, it has become easier to record knowledge. In the current digital age, humanity may preserve enormous amounts of knowledge on hard drives with the click of a button.
The aggregation of several hard drives into a computer forms the basis for a storage system. Traditionally, large storage systems have comprised many distinct computers operated by a single administrative entity.
With the rise in popularity of blockchain and cryptocurrencies, a new type of storage system has emerged. This new type of storage system is fully decentralized and comprises a network of untrusted peers cooperating to act as a single storage system. During upload, files are split into chunks and distributed across a network of peers. These storage systems encode files using Merkle trees, a hierarchical data structure that provides integrity verification and lookup services.
While decentralized storage systems are popular and have a user base in the millions, many technical aspects are still in their infancy. As such, they have yet to prove themselves viable alternatives to traditional centralized storage systems.
In this thesis, we contribute to the technical aspects of decentralized storage systems by proposing novel techniques and protocols. We make significant contributions with the design of three practical protocols that each improve data availability in different ways.
Our first contribution is Snarl and entangled Merkle trees. Entangled Merkle trees are resilient data structures that decrease the impact hierarchical dependencies have on data availability. Whenever a chunk loss is detected, Snarl uses the entangled Merkle trees to find parity chunks to repair the lost chunk. Our results show that by encoding data as an entangled Merkle tree and using Snarl’s repair algorithm, the storage utilization in current systems could be improved by over five times, with improved data availability.
Second, we propose SNIPS, a protocol that efficiently synchronizes the data stored on peers to ensure that all peers have the same data. We designed a Proof of Storage-like construction using a Minimal Perfect Hash Function. Each peer uses the PoS-like construction to create a storage proof for those chunks it wants to synchronize. Peers exchange storage proofs and use them to efficiently determine which chunks they are missing. The evaluation shows that by using SNIPS, the amount of synchronization data can be reduced by three orders of magnitude in current systems.
Lastly, in our third contribution, we propose SUP, a protocol that uses cryptographic proofs to check if a chunk is already stored in the network before doing wasteful uploads. We show that SUP may reduce the amount of data transferred by up to 94 % in current systems.
The protocols may be deployed independently or in combination to create a decentralized storage system that is more robust to major outages. Each of the protocols has been implemented and evaluated on a large cluster of 1,000 peers.
Description
PhD thesis in Information technology
Has parts
Paper 1: Nygaard, R., Estrada-Galiñanes, V., Meling, H. (2021) Snarl: Entangled Merkle Trees for Improved File Availability and Storage Utilization. Proceedings of the 22nd International Middleware Conference. Middleware ’21. Québec city, Canada: Association for Computing Machinery, 2021, pp. 236–247. URL: https://doi.org/10.1145/3464298.3493397Paper 2: Nygaard, R., Meling, H. SNIPS: Succinct Proof of Storage for Efficient Data Synchronization in Decentralized Storage Systems. To be submitted.
Paper 3: Nygaard, R., Meling, H., Olsen, J.I. (2023) Cost-effective Data Upkeep in Decentralized Storage Systems. Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing. Tallinn, Estonia, 2023. URL: https://doi.org/10.1145/3555776.3577728
Paper 4: Nygaard, R. (2022) Lessons Learned from a Bare-metal Evaluation of Erasure Coding Algorithms in P2P Networks. in: arXiv abs/2208.12360 (2022). URL: https://arxiv.org/abs/2208.12360
Publisher
University of Stavanger, NorwaySeries
PhD thesis UiS;;693