CHARM: A Cost-efficient Multi-cloud
Data
Hosting Scheme with High Availability
Abstract:
More and more enterprises and organizations are hosting their data into
the cloud, in order to reduce the IT maintenance cost and enhance the data
reliability. However, facing the numerous cloud vendors as well as their
heterogenous pricing policies, customers may well be perplexed with which
cloud(s) are suitable for storing their data and what hosting strategy is
cheaper. The general status quo is that customers usually put their data into a
single cloud (which is subject to the vendor lock-in risk) and then simply
trust to luck. Based on comprehensive analysis of various state-of-the-art
cloud vendors, this paper proposes a novel data hosting scheme (named CHARM)
which integrates two key functions desired. The first is selecting several
suitable clouds and an appropriate redundancy strategy to store data with
minimized monetary cost and guaranteed availability. The second is triggering a
transition process to re-distribute data according to the variations of data
access pattern and pricing of clouds. We evaluate the performance of CHARM
using both trace-driven simulations and prototype experiments. The results show
that compared with the major existing schemes, CHARM not only saves around 20%
of monetary cost but also exhibits sound adaptability to data and price
adjustments.
Existing System
In existing industrial
data hosting systems, data availability (and reliability) are usually
guaranteed by replication or erasure coding. In the multi-cloud scenario, we
also use them to meet different availability requirements, but the
implementation is different. For replication, replicas are put into several
clouds, and a read access is only served (unless this cloud is unavailable
then) by the “cheapest” cloud that charges minimal for out-going bandwidth and
GET operation. For erasure coding, data is encoded into n blocks including m
data blocks and nô€€€m coding blocks, and these blocks are put into n
different clouds. In this case, though data availability can be guaranteed with
lower storage space (compared with replication), a read access has to be served
by multiple clouds that store the corresponding data blocks. Consequently, erasure
coding cannot make full use of the cheapest cloud as what replication does.
Still worse, this shortcoming will be amplified in the multi-cloud scenario
where bandwidth is generally (much) more expensive than
storage space.
Proposed System:
The proposed CHARM scheme. In this paper, we
propose a novel cost-efficient data hosting scheme with high availability in
heterogenous multi-cloud, named “CHARM”. It intelligently puts data into
multiple clouds with minimized monetary cost and guaranteed availability.
Specifically, we combine the two widely used redundancy mechanisms, i.e.,
replication and erasure coding, into a uniform model to meet the required
availability in the presence of different data access patterns. Next, we design
an efficient heuristic-based algorithm to choose proper data storage modes
(involving both clouds and redundancy mechanisms). Moreover, we implement the
necessary procedure for storage mode transition (for efficiently
re-distributing data) by monitoring the variations of data access patterns and
pricing policies. We evaluate the performance of CHARM using both tracedriven
simulations and prototype experiments.The traces are collected from two online
storage systems:, both of which possess hundreds of thousands of users. In the
prototype experiments, we replay samples from the two traces for a whole month
on top of four mainstream commercial clouds: Amazon S3, Windows Azure, Google
Cloud Storage, and Aliyun OSS. Evaluation results show that compared with the
major existing schemes which will be
elaborated in x VII-B), CHARM not only saves around 20% (more in detail,
7% 44%) of monetary cost but als
Advantages:
v Replication mechanism when the file’s size is
small. That is why gray level 4 puts its feet into the region of lower read
count and smaller file size. This storage mode table only depends on prices of
the available clouds and required availability. If the prices change, the table
will change accordingly, becoming a different one.
Problem Statement
Ø Nevertheless, as for multi-cloud people still
encounter the two critical problems:
Ø How to
choose appropriate clouds to minimize monetary cost in the presence of
heterogenous pricing policies?
Ø How to
meet the different availability requirements of different services?
Ø As to monetary cost, it mainly depends on the
data-level usage, particularly storage capacity consumption and network
bandwidth consumption.
Ø As to availability requirement, the major
concern lies in which redundancy mechanism (i.e., replication or erasure coding)
is more economical based on specific data access patterns. In other words, here
the fundamental challenge is:
Ø How to combine the two mechanisms elegantly
so as to greatly reduce monetary cost and meanwhile guarantee required
availability?
Ø Data Hosting and SMS are two important
modules in CHARM. Data Hosting decides storage mode and the clouds that the
data should be stored in.
Ø This is a complex integer programming problem
demonstrated in the following subsections. Then we illustrate how SMS works in
detail in x V, that is, when and how many times should the transition be
implemented.
Scope
As a holistic storage system, there are several other factors to be
considered, such as cache strategies, geographical data consistency, etc.
However, we only focus on the data hosting strategy to minimize monetary cost
while meeting flexible availability requirements. Though we have considered the
complexity and feasibility when designing this strategy, the system design is
out of the scope of this paper, and we put the detailed system design of
multi-cloud data hosting into future work. the complexity of this algorithm is mainly
the first loop, and the worst case complexity is O(Fn), where Fn is the number
of files. In order to reduce the complexity further, we can classify files with
similar access patterns into groups, and implement transition in the unit of
group. This is out of the scope of this paper.
Implementation of modules
Architecture:

Lots of data centers are distributed around
the world, and one region such as America, Asia, usually has several data
centers belonging to the same or different cloud providers. So technically all
the data centers can be access by a user in a certain region, but the user
would experience different performance. The latency of some data centers is
very low while that of some ones may be intolerable high. CHARM chooses clouds
for storing data from all the available clouds which meet the performance
requirement, that is, they can offer acceptable throughput and latency when
they are not in outage. The storage mode transition does not impact the
performance of the service. Since it is not a latency-sensitive process, we can
decrease the priority of transition operations, and implement the transition in
batch when the proxy has low workload.

In this section, we elaborate a
cost-efficient data hosting model with high availability in heterogenous
multi-cloud, named “CHARM”. The architecture of CHARM is shown in Figure 3. The
whole model is located in the proxy in Figure 1. There are four main components
in CHARM: Data Hosting, Storage Mode Switching (SMS), Workload Statistic, and
Predictor. Workload Statistic keeps collecting and tackling access logs to
guide the placement of data. It also sends statistic information to Predictor
which guides the action of SMS. Data Hosting stores data using replication or
erasure coding, according to the size and access frequency of the data. SMS
decides whether the storage mode of certain data should be changed from
replication to erasure coding or in reverse, according to the output of
Predictor. The implementation of changing storage mode runs in the background,
in order not to impact online service. Predictor is used to predict the future
access frequency of files. The time interval for prediction is one month, that
is, we use the former months to predict access frequency of files in the next
month. However, we do not put emphasis on the design of predictor, because
there have been lots of good algorithms for prediction. Moreover, a very simple
predictor, which uses the weighted moving average approach, works well in our
data hosting model. Data Hosting and SMS are two important modules in CHARM.
Data Hosting decides storage mode and the clouds that the data should be stored
in. This is a complex integer programming problem demonstrated in the following
subsections. Then we illustrate how SMS works in detail in x V, that is, when
and how many times should the transition be implemented.

Cloud storage services have
become increasingly popular. Because of the importance of privacy, many cloud
storage encryption schemes have been proposed to protect data from those who do
not have access. All such schemes assumed that cloud storage providers are safe
and cannot be hacked; however, in practice, some authorities (i.e., coercers)
may force cloud storage providers to reveal user secrets or confidential data
on the cloud, thus altogether circumventing storage encryption schemes. In this
paper, we present our design for a new cloud storage encryption scheme that
enables cloud storage providers to create convincing fake user secrets to
protect user privacy. Since coercers cannot tell if obtained secrets are true
or not, the cloud storage providers ensure that user privacy is still securely
protected. Most of the
proposed schemes assume cloud storage service providers or trusted third
parties handling key management are trusted and cannot be hacked; however, in
practice, some entities may intercept communications between users and cloud
storage providers and then compel storage providers to release user secrets by
using government power or other means. In this case, encrypted data are assumed
to be known and storage providers are requested to release user secrets.
we aimed to build an encryption scheme that
could help cloud storage providers avoid this predicament. In our approach, we
offer cloud storage providers means to create fake user secrets. Given such
fake user secrets, outside coercers can only obtained forged data from a user’s
stored ciphertext. Once coercers think the received secrets are real, they will
be satisfied and more importantly cloud storage providers will not have
revealed any real secrets. Therefore, user privacy is still protected. This
concept comes from a special kind of encryption scheme called deniable
encryption.

Owner
module is to upload their files using some access policy. First they get the
public key for particular upload file after getting this public key owner
request the secret key for particular upload file. Using that secret key owner
upload their file.

This
module is used to help the client to search the file using the file id and file
name .If the file id and name is incorrect means we do not get the file,
otherwise server ask the public key and get the encryption file.If u want the
the decryption file means user have the secret key.
Algorithm:
The key idea of this heuristic algorithm can
be described as follows:
We first assign each cloud a value which is
calculated based on four factors (i.e., availability, storage, bandwidth, and
operation prices) to indicate the preference of a cloud. We choose the most
preferred n clouds, and then heuristically exchange the cloud in the preferred
set with the cloud in the complementary set to search better solution. This is
similar to the idea of Kernighan-Lin heuristic algorithm , which is applied to
effectively partition graphs to minimize the sum of the costs on all edges cut.
The preference of a cloud is impacted by the four factors, and they have
different weights. The availability is the higher the better, and the price is
the lower the better.
Conclusion:
Cloud services are experiencing rapid development and the services based
on multi-cloud also become prevailing. One of the most concerns, when moving
services into clouds, is capital expenditure. So, in this paper, we design a
novel storage scheme CHARM, which guides customers to distribute data among
clouds cost-effectively. CHARM makes fine-grained decisions about which storage
mode to use and which clouds to place data in. The evaluation proves the
efficiency of CHARM.
Comments
Post a Comment