Abstract—Graph partition quality affects the overall performance of parallel graph computation systems. The quality of a graph
partition is measured by the balance factor and edge cut ratio. A balanced graph partition with small edge cut ratio is generally preferred
since it reduces the expensive network communication cost. However, according to an empirical study on Giraph, the performance
over well partitioned graph might be even two times worse than simple random partitions. This is because these systems only optimize
for the simple partition strategies and cannot efficiently handle the increasing workload of local message processing when a high quality
graph partition is used. In this paper, we propose a novel partition aware graph computation engine named PAGE, which equips a new
message processor and a dynamic concurrency control model. The new message processor concurrently processes local and remote
messages in a unified way. The dynamic model adaptively adjusts the concurrency of the processor based on the online statistics. The
experimental evaluation demonstrates the superiority of PAGE over the graph partitions with various qualities.
INTRODUCTION
MASSIVE big graphs are prevalent nowadays. Prominent
examples include web graphs, social networks and
other interactive networks in bioinformatics. The up to date
web graph contains billions of nodes and trillions of edges.
Graph structure can represent various relationships between
objects, and better models complex data scenarios.
The graph-based processing can facilitate lots of important
applications, such as linkage analysis [8], [18], community
discovery [20], pattern matching [22] and machine learning
factorization models [3].
With these stunning growths of a variety of large graphs
and diverse applications, parallel processing becomes the
de facto graph computing paradigm for current large scale
graph analysis. A lot of parallel graph computation systems
have been introduced, e.g., Pregel, Giraph, GPS and Graph-
Lab [23], [1], [27], [21]. These systems follow the vertexcentric
programming model. The graph algorithms in them
are split into several supersteps by synchronization barriers.
In a superstep, each active vertex simultaneously updates
its status based on the neighbors’ messages from previous
superstep, and then sends the new status as a message to its
neighbors. With the limited workers (computing nodes) in
practice, a worker usually stores a subgraph, not a vertex, at
local, and sequentially executes the local active vertices. The
computations of these workers are in parallel.
Therefore, graph partition is one of key components
that affect the graph computing performance. It splits the
original graph into several subgraphs, such that these
subgraphs are of about the same size and there are few
edges between separated subgraphs. A graph partition
with high quality indicates there are few edges connecting
different subgraphs while each subgraph is in similar
size. The ratio of the edges crossing different subgraphs
to the total edges is called edge cut (ratio). A good balanced
partition (or high quality partition) usually has a
small edge cut and helps improve the performance of systems.
Because the small edge cut reduces the expensive
communication cost between different subgraphs, and the
balance property generally guarantees that each subgraph
has similar computation workload.
However, in practice, a good balanced graph partition
even leads to a decrease of the overall performance in existing
systems. Fig. 1 shows the performance of PageRank
algorithm on six different partition schemes of a large web
graph dataset, and apparently the overall cost of PageRank
per iteration increases with the quality improvement of different
graph partitions. As an example, when the edge cut
ratio is about 3.48 percent in METIS, the performance is
about two times worse than that in simple random partition
scheme where edge cut is 98.52 percent. It indicates that the
parallel graph system may not benefit from the high quality
graph partition.
Fig. 1b also lists local communication cost and sync
remote communication cost (explained in Section 2). We
can see that, when the edge cut ratio decreases, the sync
remote communication cost is reduced as expected. However,
the local communication cost increases fast unexpectedly,
which directly leads to the downgrade of overall
performance. This abnormal outcome implies the local message
processing becomes a bottleneck in the system and
dominates the overall cost when the workload of local message
processing increases.
Lots of existing parallel graph systems are unaware of
such effect of the underlying partitioned subgraphs, and
ignore the increasing workload of local message processing
when the quality of partition scheme is improved. Therefore,
these systems handle the local messages and remote
messages unequally and only optimize the processing of
remote messages. Though there is a simple extension of centralized
message buffer used to process local and remote
incoming messages all together [27], the existing graph systems
still cannot effectively utilize the benefit of high quality
graph partitions.
In this paper, we present a novel graph computation
engine, partition aware graph computation engine (PAGE).
To efficiently support computation tasks with different partitioning
qualities, we develop some unique components
in this new framework. First, in PAGE’s worker, communication
module is extended with a new dual concurrent message
processor. The message processor concurrently handles
both local and remote incoming messages in a unified way,
thus accelerating the message processing. Furthermore, the
concurrency of the message processor is tunable according to
the online statistics of the system. Second, a partition aware
module is added in each worker to monitor the partitionrelated
characters and adjust the concurrency of the message
processor adaptively to fit the online workload.
To fulfill the goal of estimating a reasonable concurrency
for the dual concurrent message processor, we introduce
the Dynamic Concurrency Control Model (DCCM). Since
the message processing pipeline satisfied the producerconsumer
model, several heuristic rules are proposed by
considering the producer-consumer constraints. With the
help of DCCM, PAGE provides sufficient message process
units to handle current workload and each message process
unit has similar workload. Finally, PAGE can adaptively
accept various qualities of the integrated graph partition.
A prototype of PAGE has been set up on top of Giraph
(version 0.2.0). The experiment results demonstrate that the
optimizations in PAGE can enhance the performance of
both stationary and non-stationary graph algorithms on
graph partitions with various qualities.
The main contributions of our work are summarized as
follows:
We propose the problem that existing graph computation
systems cannot efficiently exploit the benefit
of high quality graph partitions.
We design a new partition aware graph computation
engine, called PAGE. It can effectively harness the
partition information to guide parallel processing
resource allocation, and improve the computation
performance
partition is measured by the balance factor and edge cut ratio. A balanced graph partition with small edge cut ratio is generally preferred
since it reduces the expensive network communication cost. However, according to an empirical study on Giraph, the performance
over well partitioned graph might be even two times worse than simple random partitions. This is because these systems only optimize
for the simple partition strategies and cannot efficiently handle the increasing workload of local message processing when a high quality
graph partition is used. In this paper, we propose a novel partition aware graph computation engine named PAGE, which equips a new
message processor and a dynamic concurrency control model. The new message processor concurrently processes local and remote
messages in a unified way. The dynamic model adaptively adjusts the concurrency of the processor based on the online statistics. The
experimental evaluation demonstrates the superiority of PAGE over the graph partitions with various qualities.
INTRODUCTION
MASSIVE big graphs are prevalent nowadays. Prominent
examples include web graphs, social networks and
other interactive networks in bioinformatics. The up to date
web graph contains billions of nodes and trillions of edges.
Graph structure can represent various relationships between
objects, and better models complex data scenarios.
The graph-based processing can facilitate lots of important
applications, such as linkage analysis [8], [18], community
discovery [20], pattern matching [22] and machine learning
factorization models [3].
With these stunning growths of a variety of large graphs
and diverse applications, parallel processing becomes the
de facto graph computing paradigm for current large scale
graph analysis. A lot of parallel graph computation systems
have been introduced, e.g., Pregel, Giraph, GPS and Graph-
Lab [23], [1], [27], [21]. These systems follow the vertexcentric
programming model. The graph algorithms in them
are split into several supersteps by synchronization barriers.
In a superstep, each active vertex simultaneously updates
its status based on the neighbors’ messages from previous
superstep, and then sends the new status as a message to its
neighbors. With the limited workers (computing nodes) in
practice, a worker usually stores a subgraph, not a vertex, at
local, and sequentially executes the local active vertices. The
computations of these workers are in parallel.
Therefore, graph partition is one of key components
that affect the graph computing performance. It splits the
original graph into several subgraphs, such that these
subgraphs are of about the same size and there are few
edges between separated subgraphs. A graph partition
with high quality indicates there are few edges connecting
different subgraphs while each subgraph is in similar
size. The ratio of the edges crossing different subgraphs
to the total edges is called edge cut (ratio). A good balanced
partition (or high quality partition) usually has a
small edge cut and helps improve the performance of systems.
Because the small edge cut reduces the expensive
communication cost between different subgraphs, and the
balance property generally guarantees that each subgraph
has similar computation workload.
However, in practice, a good balanced graph partition
even leads to a decrease of the overall performance in existing
systems. Fig. 1 shows the performance of PageRank
algorithm on six different partition schemes of a large web
graph dataset, and apparently the overall cost of PageRank
per iteration increases with the quality improvement of different
graph partitions. As an example, when the edge cut
ratio is about 3.48 percent in METIS, the performance is
about two times worse than that in simple random partition
scheme where edge cut is 98.52 percent. It indicates that the
parallel graph system may not benefit from the high quality
graph partition.
Fig. 1b also lists local communication cost and sync
remote communication cost (explained in Section 2). We
can see that, when the edge cut ratio decreases, the sync
remote communication cost is reduced as expected. However,
the local communication cost increases fast unexpectedly,
which directly leads to the downgrade of overall
performance. This abnormal outcome implies the local message
processing becomes a bottleneck in the system and
dominates the overall cost when the workload of local message
processing increases.
Lots of existing parallel graph systems are unaware of
such effect of the underlying partitioned subgraphs, and
ignore the increasing workload of local message processing
when the quality of partition scheme is improved. Therefore,
these systems handle the local messages and remote
messages unequally and only optimize the processing of
remote messages. Though there is a simple extension of centralized
message buffer used to process local and remote
incoming messages all together [27], the existing graph systems
still cannot effectively utilize the benefit of high quality
graph partitions.
In this paper, we present a novel graph computation
engine, partition aware graph computation engine (PAGE).
To efficiently support computation tasks with different partitioning
qualities, we develop some unique components
in this new framework. First, in PAGE’s worker, communication
module is extended with a new dual concurrent message
processor. The message processor concurrently handles
both local and remote incoming messages in a unified way,
thus accelerating the message processing. Furthermore, the
concurrency of the message processor is tunable according to
the online statistics of the system. Second, a partition aware
module is added in each worker to monitor the partitionrelated
characters and adjust the concurrency of the message
processor adaptively to fit the online workload.
To fulfill the goal of estimating a reasonable concurrency
for the dual concurrent message processor, we introduce
the Dynamic Concurrency Control Model (DCCM). Since
the message processing pipeline satisfied the producerconsumer
model, several heuristic rules are proposed by
considering the producer-consumer constraints. With the
help of DCCM, PAGE provides sufficient message process
units to handle current workload and each message process
unit has similar workload. Finally, PAGE can adaptively
accept various qualities of the integrated graph partition.
A prototype of PAGE has been set up on top of Giraph
(version 0.2.0). The experiment results demonstrate that the
optimizations in PAGE can enhance the performance of
both stationary and non-stationary graph algorithms on
graph partitions with various qualities.
The main contributions of our work are summarized as
follows:
We propose the problem that existing graph computation
systems cannot efficiently exploit the benefit
of high quality graph partitions.
We design a new partition aware graph computation
engine, called PAGE. It can effectively harness the
partition information to guide parallel processing
resource allocation, and improve the computation
performance
Comments
Post a Comment