Anonymization over Bigdata Using MapReduce Algorithm with Secured Storage Service in DataCloud
Cloud computing provides a promising IT infrastructure for Big data processing in various sectors. Data set often contains sensitive information where privacy should be ensured while sharing or releasing this data to a third party. A traditional technique to satisfy this privacy model is to anonymize data via generalization. But this technique fails in case of large scale data processing. Today, there is a drastic increase in information distribution over the Internet, and the growth demand of information from users. Internet traffic becomes more variable, ranging from multimedia traffic, web pages, real-time streaming, etc. Consequently, the existing Internet architecture could not adapt well to the changes. In this paper, we investigate the local-recoding problem for big data anonymization against proximity privacy breaches and attempt to identify a scalable solution to this problem. Specifically, we present a proximity privacy model to solve the problem of local recoding as a proximity-aware clustering problem. A scalable two-phase clustering approach consisting of a t-ancestor clustering (similar to k-means) algorithm and a proximity-aware agglomerative clustering algorithm is proposed to address the above problem. We also include a concept of metadata which holds the information about the public data anonymizing the sensitive areas. We design the algorithms with MapReduce to gain high scalability by performing data-parallel computation in cloud. An approach which improves the capability of defending the proximity privacy breaches, the scalability and the time efficiency of local-recoding anonymization is also planned. In this paper, we propose a novel architecture for the future Internet based on information-centric networking (ICN), which is called DataClouds, to better accommodate data centric services. Different from existing ICN-based architectures, we take the sharing nature of data-centric services under the IoT into consideration and introduce logically and physically formed communities as the basic building blocks to construct the network so that data could be more efficiently shared and disseminated among interested users. We also elaborate on several fundamental design challenges for the Internet under this new architecture and show that DataClouds could offer more efficient and flexible solutions than traditional ICN-based architectures.
- There are currently no refbacks.
Disclaimer/Regarding indexing issue:
We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information.