A Time Efficient Approach for Detecting Errors in Big Sensor Data on Cloud
Abstract
Big sensor data
is prevalent in both industry and scientific research applications where the
data is generated with high volume and velocity it is difficult to process using
on-hand database management tools or traditional data processing applications. Cloud
computing provides a promising platform to support the addressing of this
challenge as it provides a flexible stack of massive computing, storage, and
software services in a scalable manner at low cost. Some techniques have been
developed in recent years for processing sensor data on cloud, such as
sensor-cloud. However, these techniques do not provide efficient support on
fast detection and locating of errors in big sensor data sets. For fast data
error detection in big sensor data sets, in this paper, we develop a novel data
error detection approach which exploits the full computation potential of cloud
platform and the network feature of WSN. Firstly, a set of sensor data error
types are classified and defined. Based on that classification, the network
feature of a clustered WSN is introduced and analyzed to support fast error
detection and location. Specifically, in our proposed approach, the error
detection is based on the scale-free network topology and most of detection
operations can be conducted in limited temporal or spatial data blocks instead
of a whole big data set. Hence the detection and location process can be
dramatically accelerated. Furthermore, the detection and location tasks can be
distributed to cloud platform to fully exploit the computation power and
massive storage. Through the experiment on our cloud computing platform of
U-Cloud, it is demonstrated that our proposed approach can significantly reduce
the time for error detection and location in big data sets generated by large
scale sensor network systems with acceptable error detecting accuracy.
Error and Abnormality
Classification
Algorithm
Error detection and
location:
In our algorithm in error detection and localization are
not so ideal and it is hard to directly use one MapReduce to solve perfectly.
Existing System
Big data is a collection of data sets so large and complex
that it becomes difficult to process with on hand database management systems
or traditional data processing applications. It represents the progress of the human
cognitive processes, usually includes data sets with sizes beyond the ability
of current technology, method and theory to capture, manage, and process the
data within a tolerable elapsed time
WSN with cloud can be categorized as a kind of complex network
systems. In these complex network systems such as WSN and social network, data abnormality
and error become an annoying issue for the real network applications
Some work has been done for big data analysis and error
detection in complex networks including intelligence sensors networks. There
are also some works related to complex network systems data error detection and
debugging with online data processing techniques. Since these techniques were
not designed and developed to deal with big data on cloud, they were unable to
cope with current dramatic increase of data size. For example, when big data
sets are encountered, previous offline methods for error detection and
debugging on a single computer may take a long time and lose real time
feedback. Because those offline methods are normally based on learning or
mining, they often introduce high time cost during the process of data set
training and pattern matching.
Disadvantages:
1. No big data
analysis and error detection
2. Increase packet
loss ratio.
3.
Network Failure
Proposed System
Our proposed approach, the error detection is based on the
scale-free network topology and most of detection operations can be conducted
in limited temporal or spatial data blocks instead of a whole big data set.
Hence the detection and location process can be dramatically accelerated.
Furthermore, the detection and location tasks can be distributed to cloud platform
to fully exploit the computation power and massive storage. Through the
experiment on our cloud computing platform of U-Cloud, it is demonstrated that
our proposed approach can significantly reduce the time for error detection and
location in big data sets generated by large scale sensor network systems with
acceptable error detecting accuracy.
We
aim to develop a novel error detection approach by exploiting the massive
storage, scalability and computation power of cloud to detect errors in big
data sets from sensor networks.
Fast detection of data errors in big data with cloud remains
challenging. Especially, how to use the computation power of cloud to quickly
find and locate errors of nodes in WSN needs to be explored.
Advantages:
- NodeSide/EdgeSide error detection.
- Improve Network Performance.
- Big data analysis and error detection
Future Work
In future, in accordance with error detection for big data
sets from sensor network systems on cloud, the issues such as error correction,
big data cleaning and recovery will be further explored.
Modules:
1. Cloud Computing
2. Big Data Processing on Cloud
3. Error Definition and Modeling
4. Error Detection
Modules Descriptions:
- Cloud Computing:
Cloud
computing infrastructure is becoming popular because it provides an open,
flexible, scalable and reconfigurable platform. The proposed error detection
approach in this paper will be based on the classification of error types.
Specifically, nine types of numerical data abnormalities/errors are listed and
introduced in our cloud error detection approach. The defined error model will
trigger the error detection process. Compared to previous error detection of sensor
network systems, our approach on cloud will be designed and developed by
utilizing the massive data processing capability of cloud to enhance error
detection speed and real time reaction. In addition, the architecture feature of
complex networks will also be analyzed to combine with the cloud computing with
a more efficient way. Based on current research literature review, we divide
complex network systems into scale-free type and non scale-free type. Sensor
network is a kind of scale-free complex network system which matches cloud
scalability feature. Our proposed error detection approach on cloud is
specifically trimmed for finding errors in big data sets of sensor networks.
The main contribution of our proposed detection is to achieve significant time
performance improvement in error detection without compromising error detection
accuracy.
2.
Big Data Processing On Cloud:
Big data has
become a fundamental and critical challenge for modern society. Cloud computing
provides an ideal platform for big data storage, dissemination and interpreting
with its massive computation power. MapReduce has been widely revised from a
batch processing framework into a more incremental one to analyze huge-volume
of incremental data on cloud. It is a framework for processing parallelizable problems
across big data sets using a large number of computers (nodes), collectively
referred to as a cluster in which all computers (nodes) are on the same local
network and use similar hardware; or a grid in which the nodes are shared
across geographically and administratively distributed systems. It can sort a
petabyte of data in only a few hours. The parallelism also provides some
possibility of recovering from partial failure of servers or storage during the
operation.
3. Error Definition and Modeling:
4. Error Detection and Localization:
We propose a
two-phase approach to conduct the computation required in the whole process of
error detection and localization. At the phase of error detection, there are
three inputs for the error detection algorithm. The first is the graph of
network. The second is the total collected data set D and the third is the
defined error patterns p. The output of the error detection algorithm is the
error set D’.
After the
error pattern matching and error detection, it is important to locate the
position and source of the detected error in the original WSN graph G(V, E).
The input of the Algorithm 2 is the original graph of a scale-free network G(V,
E), and an error data D from Algorithm 1. The output of the algorithm 2 is
G’(V’, E’) which is the subset of the G to indicate the error location and
source.
Comments
Post a Comment