Optimized Data Deduplication Strategy with Distributed Bloom Filters for Efficient Routing and Load Balancing in Clustered Environments

Teja Chalikanti; Bobbili Sreeja Reddy1

1

Publication Date: 2023/09/13

Abstract: This research paper delves into the realm of data routing strategies enhanced by a distributed Bloom Filter. The utilization of data deduplication technology effectively curbs data storage requirements and optimizes resource utilization. While the potential of single-node storage and computation is limited, the cluster data deduplication approach offers significant advantages. However, it introduces fresh challenges related to diminishing deduplication rates and maintaining equilibrium among storage nodes. To address these concerns, the study introduces a novel data routing strategy grounded in distributed Bloom Filter principles. The strategy capitalizes on the concept of a "Super chunk" as the fundamental data routing unit, bolstering overall system throughput. Following Broder's theorem, a selection process identifies the k smallest fingerprints, shaping Super chunk features sent to storage nodes. By employing Bloom Filter comparisons, the optimal routing node is determined, taking into account node storage capacity and memory maintenance.The research progresses to the design and implementation of system prototypes. Rigorous experimentation yields precise parameters for various routing strategies, subsequently subjected to testing. The results affirm the viability of the proposed strategies, both theoretically and empirically.

Keywords: Data Routing, Load Balancing, Clustered Deduplication, Distributed bloom filters, Super chunk, Deduplication rate, Communication overhead, Storage system, Cloud computing, System throughput.

DOI: https://doi.org/10.5281/zenodo.8340645

PDF: https://ijirst.demo4.arinfotech.co/assets/upload/files/IJISRT23SEP024.pdf

REFERENCES

No References Available