Abstract:
Video streaming is gaining popularity among mobile
users. The latest mobile devices, such as smart phones and
tablets, are equipped with multiple wireless network interfaces.
How to efficiently and cost-effectively utilize multiple links to
improve video streaming quality needs investigation. In order to
maintain high video streaming quality while reducing the wireless
service cost, in this paper, the optimal video streaming process
with multiple links is formulated as a Markov Decision Process
(MDP). The reward function is designed to consider the quality of
service (QoS) requirements for video traffic, such as the startup
latency, playback fluency, average playback quality, playback
smoothness and wireless service cost. To solve the MDP in real
time, we propose an adaptive, best-action search algorithm to
obtain a sub-optimal solution. To evaluate the performance of
the proposed adaptation algorithm, we implemented a testbed
using the Android mobile phone and the Scalable Video Coding
(SVC) codec. Experiment results demonstrate the feasibility and
effectiveness of the proposed adaptation algorithm for mobile
video streaming applications, which outperforms the existing
state-of-the-art adaptation algorithms.
INTRODUCTION
VIDEO streaming is gaining popularity among mobile
users recently. Considering that the mobile devices have
limited computational capacity and energy supply, and the
wireless channels are highly dynamic, it is very challenging
to provide high quality video streaming services for mobile
users consistently. It is a promising trend to use multiple wireless
network interfaces with different wireless communication
techniques for mobile devices. For example, smart phones and
tablets are usually equipped with cellular, WiFi and Bluetooth
interfaces. Utilizing multiple links simultaneously can improve
video streaming in several aspects: the aggregated higher
bandwidth can support video of higher bit rate; when one
wireless link suffers poor link quality or congestion, the others
can compensate for it.
High resilience to bandwidth variation and easy deployment
are both important requirements for video streaming
applications. Currently, progressive download, one of the most
popular and widely deployed streaming techniques, buffers
a large amount of video data to absorb the variations of
bandwidth. Meanwhile, as video data are transmitted over
HTTP protocols, the video streaming service can be deployed
on any web server. However, the video quality version can only be manually selected by users and such decision can be
error-prone. Since the smart phones only have limited storage
space, it is impractical to maintain a very large buffer size.
In addition, the buffered unwatched video may be wasted if
the user turns off the video player or switches to other videos.
Furthermore, progressive download typically does not support
transmitting video data over multiple links.
To overcome the above disadvantages of progressive download,
dynamic adaptive streaming over HTTP (DASH) [1]
has been proposed. In a DASH system, multiple copies of
pre-compressed videos with different resolution and quality
are stored in segments. The rate adaptation decision is made
at the client side. For each segment, the client can request
the appropriate quality version based on its screen resolution,
current available bandwidth, and buffer occupancy status.
This pull-based DASH scheme can be extended to support
multiple links, i.e., we can let the client request different
parts of one segment over different links. How to optimize
this rate adaptation process for video streaming over multiple
wireless links, considering the video quality of service (QoS)
requirements, the wireless channel profiles, and the wireless
service costs of multiple links is an open issue.
In this paper, we formulate the multi-link video streaming
process as a reinforcement learning task. For each streaming
step, we define a state to describe the current situation,
including the index of the requested segment, the current
available bandwidth and other system parameters. A finitestate
Markov Decision Process (MDP) can be modeled for this
reinforcement learning task. The reward function is carefully
designed to consider the video QoS requirements, such as
the interruption rate, average playback quality, and playback
smoothness, as well as the service costs. To make a trade-off
between different QoS metrics and the cost, we can adjust
the parameters of the reward function. To solve the MDP
in real time, we proposed an adaptive best-action search
algorithm to obtain a sub-optimal solution. A realistic testbed
is implemented to better evaluate the performance of our
solution.
The main contributions of this paper are threefold. First, we
formulate the video streaming process over multiple links as
an MDP problem. To achieve smooth and high quality video
streaming, we define several actions and reward functions for
each state. Second, we propose a depth-first real-time search
algorithm. The proposed adaptation algorithm will take several
future steps into consideration to avoid playback interruption
and achieve better smoothness and quality. Last, we implement
a realistic testbed using an Android phone and Scalable Video
Coding (SVC) encoded videos to evaluate the performance.
The experiment results show that the proposed adaptation algorithm
is feasible for video streaming over multiple wireless
access networks, and it outperforms the existing state-of-theart
algorithms.
The rest of the paper is organized as follows. In Section II,
the related work is summarized. Section III describes the
system model and the problem formulation. We present the
real-time adaptive streaming algorithm in Section IV. The performance
evaluation by experiments is presented in Section V,
followed by the concluding remarks and further research issues
in Section VI.
II. BACKGROUND AND RELATED WORK
DASH has been a hot topic in recent years. There are
many commercial products which have implemented DASH
in different ways, such as Apple HTTP Live Streaming and
Microsoft Smooth Streaming. Since the clients may have
different available bandwidth and display size, each video
will be encoded several times with different quality, bit rate
and resolution. All the encoded videos will be chopped into
small segments and stored on the server, which can be a
typical web server.
These small segments will be downloaded
to the browsers’ cache and played by the client (browser or
browser plug-in). The video rate adaptation is performed at
the client side, which is also called the pull-based approach.
The client will determine the quality version of the requested
video segment according to its current available bandwidth,
resolution and the number of buffered unwatched segments.
After the current segment is completely downloaded, the
rate adaptation algorithm will be invoked again for the next
segment.
There is extensive work covering this topic [2]–[6]. The
authors in [2] proposed to estimate the bandwidth by a
statistical method, and they took both the quality contribution
and decoding time of each segment into consideration. K.P.
Mok et al. presented a QoE aware DASH system [3].
Their
algorithm estimates the available bandwidth by probing with
the video data. In order to keep the quality level as smooth as
possible, their algorithm will switch the video quality version
gradually and will try to maintain the buffer level being
stable. S. Akhshabi designed an evaluation method in [7] to
test the performance of several existing commercial DASH
products, such as Smooth Streaming, Netflix, and OSMF.
In [6], T. Kupka proposed to evaluate the performance of live
DASH under on/off traffic and tested four different methods to
improve the performance. In [5], how to reduce unnecessary
video quality variations using a probing method to identify the
effective available bandwidth was given.
In [4], [8], the authors
designed the optimal rate adaptation algorithm for streaming
scalable video coding (SVC) over HTTP using MDP. With
SVC, each video frame is encoded into a base layer and several
enhancement layers. Higher video quality can be achieved
when more layers are received. These works only considered
the single-link case, and in this work, we consider the more
challenging case with multiple access links.
How to efficiently deliver video in heterogeneous wireless
networks has been extensively studied, and approaches ranging
from the physical layer to the application layer have been
proposed [9]–[13]. Recently, a few approaches have appeared
to extend the DASH technique to support multiple links.
In [14], the authors summarized three typical schemes of
utilizing multiple links. They compared the performance of
these schemes through extensive simulations. Kaspar et al.
proposed an approach to implementing DASH over multiple
links [15].
In their algorithm, each segment will be transmitted
over one link. Thus multiple segments can be transmitted
at the same time. To reduce the overhead, they used HTTP
pipelining to improve the performance. This approach may
lead to the “last-segment problem” due to the link transmission
speed difference. To overcome this disadvantage, in [16]–[18],
Evensen et al. suggested to divide each segment into small
sub-segments, and these sub-segments can be downloaded
through different links. Their algorithm estimates the available
bandwidth according to the throughput of the previous
segment, and selects the video quality version most close to
the estimated bandwidth. In the evaluation section, we will
compare the performance of our proposed solution with the
above state-of-the-art one [18].
III. SYSTEM MODEL AND STREAMING PROCESS
FORMULATION
A. System Model
We consider how to utilize multiple wireless access networks
together for video streaming, e.g., using a combination
of cellular, WiFi, and/or Bluetooth simultaneously. Here, as an
example, Bluetooth and WiFi access networks are considered
as we do not have end-to-end control over cellular links,
and our work can be extended when other types of wireless
access networks or more than two wireless access networks are
used1. Since a wireless channel may suffer from time-varying
fading, shadowing, interference and congestion, the available
bandwidth of a wireless link may vary all the time. In addition,
different smart phones or tablets may have different screen size
and resolution. Taking these two aspects into consideration,
the server should store several copies of video with different
quality. The videos are encoded with SVC into a base layer
and several enhancement layers, and chopped into segments
and each segment can be played with a fixed duration.
We design a pull-based algorithm for video streaming, as
shown in Fig. 1. After initialization, the client will request
the video information which includes video resolutions, bitrates
and qualities from the server through both the WiFi and
Bluetooth links. The rate adaptation agent will request a video
segment of appropriate quality version based on the current
queue length and estimated available bandwidth. Once the
request decision is made, HTTP requests over both WiFi and
Bluetooth will be issued to download the video segment. This
process will continue until the completion of downloading the
last segment or the termination of the video streaming by the
user.
1A promising multi-link scenario is to use both WiFi and 3G/LTE celluar
links for video streaming. In our testbed, as the Android OS restricted the
utilization of WiFi and 3G to access Internet simultaneously, we use WiFi
and Bluetooth to test our multi-link streaming solution.
Streaming Process Formulation
The video streaming process can also be considered as the
interaction between two modules. As shown in Fig. 1, the
downloading and estimation steps in the top grey rectangle
can be viewed as an integrated environment module, and the
rate adaptation agent can be viewed as an agent module. The
video streaming process can be formulated as a reinforcement
learning task [19]. The environment sends a state signal
for each video segment to the agent, and the agent will
determine the best action correspondingly. For each action,
the environment replies a reward to the agent. Considering
the Markov property of the system states, a Markov Decision
Process (MDP) can be formulated for the streaming process,
and the state transition model of the Markov process needs to
be devised.
We define step n as to download segment n, so
the total number of steps equals the number of segments.
For each step n, we define the state as sn =
{qn, Δqn, vn, Δvn, tn, Δtn, bwn, btn, dn}, with the parameters
defined as follows. qn represents the number of unplayed
queued segments, which is also called the buffer level (or
queue length), with the range between 0 and qL (which is the
maximum queue length). vn is the SVC video layer index of
the n-th segment. In our work, there are L SVC video layers
in total, and a larger number represents a higher-quality layer.
Δqn and Δvn are the variations of qn and vn respectively, i.e.,
Δqn = qn − qn−1, and Δvn = vn − vn−1. The total traffic of
Bluetooth used to download the previous segment is recorded
by tn, and Δtn = tn − tn−1. The current bandwidth states of
the WiFi link and the Bluetooth link are described by bwn and
btn, respectively. dn indicates the current requested segment
index. As the total number of video segments is NT , dn is in
the range of [0, NT ].
When the rate adaptation agent receives input of state sn
from the environment, it will make the decision of which
action should the environment take. We define four types
of actions, Ab, Au, Aw, and As. The first two actions are
for downloading the next segment at the quality determined
by v, and Δv determines whether to upgrade, downgrade or
maintain the current quality. Once the quality v is determined,
the base layer and all enhancement layers belong to quality
v will be combined together to be downloaded. Considering
that drastic video quality variation will significantly decrease
the perceived video quality, we avoid those drastic layer
change actions by restricting the video layer change level
Δv to one. The difference of Ab and Au is that, with Ab,
both the WiFi and Bluetooth links are used to download
the segment simultaneously, while with Au, only the WiFi
link is used. As a short distance wireless communication
technology, Bluetooth can only help the client connect to the
web server indirectly via tethering cellular data services. Since
cellular services are charged at much higher rates than WiFi
services, it is better to limit the usage of the Bluetooth link to
reduce the cost. In addition, Bluetooth connection is not quite
reliable as it is easy to be interfered. Therefore, sometimes
we prefer to use WiFi only. For Ab actions, the download
load of WiFi and Bluetooth will be determined based on their
current available bandwidth. In other words, assuming that
the available bandwidth of WiFi and Bluetooth are bw and
bt, the segment size is SZ, then the downloading load of
WiFi is SZ · bw/(bw + bt). Aw represents the waiting action,
and the client will wait for W seconds, equal to the duration
of one segment. Since SVC video is encoded into different
layers and chopped into segments to be stored on the web
server, as long as the segment has not been played yet, the
higher enhancement layers can be requested to improve the
perceived streaming quality and smoothness.
Therefore, the
smooth action As is introduced in our work.
There are three principles to follow for the smooth action.
First, when the queue length is low, the smooth action will not
be taken as there is a high probability to experience playback
freeze soon. Thus the smooth action will be invoked only when
the queue length is sufficiently large, such as when the queue
length q is larger than a certain threshold Ts. Second, for
the segment which will be played soon, we will not take the
smooth action, because the requested enhancement layers may
miss the playback deadline.
Therefore we only take the smooth
action for a few number of buffered segments which are stored
in the tail part of the buffer. A smooth window with size Ls is
defined in our work to determine how many buffered segments
will be the candidate segments for the smooth action. In our
approach, the last Ls buffered segments are in the smooth
window as they are least likely to miss the playback deadline.
The last principle of the smooth action is to ensure that it will
not bring in any additional layer variation. As shown in Fig. 2,
generally, we will take the smooth action in four different
scenarios. Assume the smooth window size Ls = 6, and thus
all segments in are in the smooth window.
If only
one segment in the smooth window has the lowest number
of the enhancement layer, then we will smooth that segment
by requesting the same number of enhancement layers as the
other segments, which is shown in Fig. 2-(a). When there are
Comments
Post a Comment