A Real-Time Adaptive Algorithm for Video Streaming over Multiple Wireless Access Networks

Abstract:

Video streaming is gaining popularity among mobile users. The latest mobile devices, such as smart phones and tablets, are equipped with multiple wireless network interfaces. How to efficiently and cost-effectively utilize multiple links to improve video streaming quality needs investigation. In order to maintain high video streaming quality while reducing the wireless service cost, in this paper, the optimal video streaming process with multiple links is formulated as a Markov Decision Process (MDP). The reward function is designed to consider the quality of service (QoS) requirements for video traffic, such as the startup latency, playback fluency, average playback quality, playback smoothness and wireless service cost. To solve the MDP in real time, we propose an adaptive, best-action search algorithm to obtain a sub-optimal solution. To evaluate the performance of the proposed adaptation algorithm, we implemented a testbed using the Android mobile phone and the Scalable Video Coding (SVC) codec. Experiment results demonstrate the feasibility and effectiveness of the proposed adaptation algorithm for mobile video streaming applications, which outperforms the existing state-of-the-art adaptation algorithms.

INTRODUCTION

VIDEO streaming is gaining popularity among mobile users recently. Considering that the mobile devices have limited computational capacity and energy supply, and the wireless channels are highly dynamic, it is very challenging to provide high quality video streaming services for mobile users consistently. It is a promising trend to use multiple wireless network interfaces with different wireless communication techniques for mobile devices. For example, smart phones and tablets are usually equipped with cellular, WiFi and Bluetooth interfaces. Utilizing multiple links simultaneously can improve video streaming in several aspects: the aggregated higher bandwidth can support video of higher bit rate; when one wireless link suffers poor link quality or congestion, the others can compensate for it. High resilience to bandwidth variation and easy deployment are both important requirements for video streaming applications. Currently, progressive download, one of the most popular and widely deployed streaming techniques, buffers a large amount of video data to absorb the variations of bandwidth. Meanwhile, as video data are transmitted over HTTP protocols, the video streaming service can be deployed on any web server. However, the video quality version can only be manually selected by users and such decision can be error-prone. Since the smart phones only have limited storage space, it is impractical to maintain a very large buffer size. In addition, the buffered unwatched video may be wasted if the user turns off the video player or switches to other videos. Furthermore, progressive download typically does not support transmitting video data over multiple links. To overcome the above disadvantages of progressive download, dynamic adaptive streaming over HTTP (DASH) [1] has been proposed. In a DASH system, multiple copies of pre-compressed videos with different resolution and quality are stored in segments. The rate adaptation decision is made at the client side. For each segment, the client can request the appropriate quality version based on its screen resolution, current available bandwidth, and buffer occupancy status. This pull-based DASH scheme can be extended to support multiple links, i.e., we can let the client request different parts of one segment over different links. How to optimize this rate adaptation process for video streaming over multiple wireless links, considering the video quality of service (QoS) requirements, the wireless channel profiles, and the wireless service costs of multiple links is an open issue. In this paper, we formulate the multi-link video streaming process as a reinforcement learning task. For each streaming step, we define a state to describe the current situation, including the index of the requested segment, the current available bandwidth and other system parameters. A finitestate Markov Decision Process (MDP) can be modeled for this reinforcement learning task. The reward function is carefully designed to consider the video QoS requirements, such as the interruption rate, average playback quality, and playback smoothness, as well as the service costs. To make a trade-off between different QoS metrics and the cost, we can adjust the parameters of the reward function. To solve the MDP in real time, we proposed an adaptive best-action search algorithm to obtain a sub-optimal solution. A realistic testbed is implemented to better evaluate the performance of our solution. The main contributions of this paper are threefold. First, we formulate the video streaming process over multiple links as an MDP problem. To achieve smooth and high quality video streaming, we define several actions and reward functions for each state. Second, we propose a depth-first real-time search algorithm. The proposed adaptation algorithm will take several future steps into consideration to avoid playback interruption and achieve better smoothness and quality. Last, we implement a realistic testbed using an Android phone and Scalable Video Coding (SVC) encoded videos to evaluate the performance.

The experiment results show that the proposed adaptation algorithm is feasible for video streaming over multiple wireless access networks, and it outperforms the existing state-of-theart algorithms. The rest of the paper is organized as follows. In Section II, the related work is summarized. Section III describes the system model and the problem formulation. We present the real-time adaptive streaming algorithm in Section IV. The performance evaluation by experiments is presented in Section V, followed by the concluding remarks and further research issues in Section VI. II. BACKGROUND AND RELATED WORK DASH has been a hot topic in recent years. There are many commercial products which have implemented DASH in different ways, such as Apple HTTP Live Streaming and Microsoft Smooth Streaming. Since the clients may have different available bandwidth and display size, each video will be encoded several times with different quality, bit rate and resolution. All the encoded videos will be chopped into small segments and stored on the server, which can be a typical web server.

These small segments will be downloaded to the browsers’ cache and played by the client (browser or browser plug-in). The video rate adaptation is performed at the client side, which is also called the pull-based approach. The client will determine the quality version of the requested video segment according to its current available bandwidth, resolution and the number of buffered unwatched segments. After the current segment is completely downloaded, the rate adaptation algorithm will be invoked again for the next segment. There is extensive work covering this topic [2]–[6]. The authors in [2] proposed to estimate the bandwidth by a statistical method, and they took both the quality contribution and decoding time of each segment into consideration. K.P. Mok et al. presented a QoE aware DASH system [3].

Their algorithm estimates the available bandwidth by probing with the video data. In order to keep the quality level as smooth as possible, their algorithm will switch the video quality version gradually and will try to maintain the buffer level being stable. S. Akhshabi designed an evaluation method in [7] to test the performance of several existing commercial DASH products, such as Smooth Streaming, Netflix, and OSMF. In [6], T. Kupka proposed to evaluate the performance of live DASH under on/off traffic and tested four different methods to improve the performance. In [5], how to reduce unnecessary video quality variations using a probing method to identify the effective available bandwidth was given.

In [4], [8], the authors designed the optimal rate adaptation algorithm for streaming scalable video coding (SVC) over HTTP using MDP. With SVC, each video frame is encoded into a base layer and several enhancement layers. Higher video quality can be achieved when more layers are received. These works only considered the single-link case, and in this work, we consider the more challenging case with multiple access links. How to efficiently deliver video in heterogeneous wireless networks has been extensively studied, and approaches ranging from the physical layer to the application layer have been proposed [9]–[13]. Recently, a few approaches have appeared to extend the DASH technique to support multiple links. In [14], the authors summarized three typical schemes of utilizing multiple links. They compared the performance of these schemes through extensive simulations. Kaspar et al. proposed an approach to implementing DASH over multiple links [15].

In their algorithm, each segment will be transmitted over one link. Thus multiple segments can be transmitted at the same time. To reduce the overhead, they used HTTP pipelining to improve the performance. This approach may lead to the “last-segment problem” due to the link transmission speed difference. To overcome this disadvantage, in [16]–[18], Evensen et al. suggested to divide each segment into small sub-segments, and these sub-segments can be downloaded through different links. Their algorithm estimates the available bandwidth according to the throughput of the previous segment, and selects the video quality version most close to the estimated bandwidth. In the evaluation section, we will compare the performance of our proposed solution with the above state-of-the-art one [18].

III. SYSTEM MODEL AND STREAMING PROCESS FORMULATION

A. System Model We consider how to utilize multiple wireless access networks together for video streaming, e.g., using a combination of cellular, WiFi, and/or Bluetooth simultaneously. Here, as an example, Bluetooth and WiFi access networks are considered as we do not have end-to-end control over cellular links, and our work can be extended when other types of wireless access networks or more than two wireless access networks are used1. Since a wireless channel may suffer from time-varying fading, shadowing, interference and congestion, the available bandwidth of a wireless link may vary all the time. In addition, different smart phones or tablets may have different screen size and resolution. Taking these two aspects into consideration, the server should store several copies of video with different quality. The videos are encoded with SVC into a base layer and several enhancement layers, and chopped into segments and each segment can be played with a fixed duration. We design a pull-based algorithm for video streaming, as shown in Fig. 1. After initialization, the client will request the video information which includes video resolutions, bitrates and qualities from the server through both the WiFi and Bluetooth links. The rate adaptation agent will request a video segment of appropriate quality version based on the current queue length and estimated available bandwidth. Once the request decision is made, HTTP requests over both WiFi and Bluetooth will be issued to download the video segment. This process will continue until the completion of downloading the last segment or the termination of the video streaming by the user. 1A promising multi-link scenario is to use both WiFi and 3G/LTE celluar links for video streaming. In our testbed, as the Android OS restricted the utilization of WiFi and 3G to access Internet simultaneously, we use WiFi and Bluetooth to test our multi-link streaming solution.

Streaming Process Formulation The video streaming process can also be considered as the interaction between two modules. As shown in Fig. 1, the downloading and estimation steps in the top grey rectangle can be viewed as an integrated environment module, and the rate adaptation agent can be viewed as an agent module. The video streaming process can be formulated as a reinforcement learning task [19]. The environment sends a state signal for each video segment to the agent, and the agent will determine the best action correspondingly. For each action, the environment replies a reward to the agent. Considering the Markov property of the system states, a Markov Decision Process (MDP) can be formulated for the streaming process, and the state transition model of the Markov process needs to be devised.

We define step n as to download segment n, so the total number of steps equals the number of segments. For each step n, we define the state as sn = {qn, Δqn, vn, Δvn, tn, Δtn, bwn, btn, dn}, with the parameters defined as follows. qn represents the number of unplayed queued segments, which is also called the buffer level (or queue length), with the range between 0 and qL (which is the maximum queue length). vn is the SVC video layer index of the n-th segment. In our work, there are L SVC video layers in total, and a larger number represents a higher-quality layer. Δqn and Δvn are the variations of qn and vn respectively, i.e., Δqn = qn − qn−1, and Δvn = vn − vn−1. The total traffic of Bluetooth used to download the previous segment is recorded by tn, and Δtn = tn − tn−1. The current bandwidth states of the WiFi link and the Bluetooth link are described by bwn and btn, respectively. dn indicates the current requested segment index. As the total number of video segments is NT , dn is in the range of [0, NT ]. When the rate adaptation agent receives input of state sn from the environment, it will make the decision of which action should the environment take. We define four types of actions, Ab, Au, Aw, and As. The first two actions are for downloading the next segment at the quality determined by v, and Δv determines whether to upgrade, downgrade or maintain the current quality. Once the quality v is determined, the base layer and all enhancement layers belong to quality v will be combined together to be downloaded. Considering that drastic video quality variation will significantly decrease the perceived video quality, we avoid those drastic layer change actions by restricting the video layer change level Δv to one. The difference of Ab and Au is that, with Ab, both the WiFi and Bluetooth links are used to download the segment simultaneously, while with Au, only the WiFi link is used. As a short distance wireless communication technology, Bluetooth can only help the client connect to the web server indirectly via tethering cellular data services. Since cellular services are charged at much higher rates than WiFi services, it is better to limit the usage of the Bluetooth link to reduce the cost. In addition, Bluetooth connection is not quite reliable as it is easy to be interfered. Therefore, sometimes we prefer to use WiFi only. For Ab actions, the download load of WiFi and Bluetooth will be determined based on their current available bandwidth. In other words, assuming that the available bandwidth of WiFi and Bluetooth are bw and bt, the segment size is SZ, then the downloading load of WiFi is SZ · bw/(bw + bt). Aw represents the waiting action, and the client will wait for W seconds, equal to the duration of one segment. Since SVC video is encoded into different layers and chopped into segments to be stored on the web server, as long as the segment has not been played yet, the higher enhancement layers can be requested to improve the perceived streaming quality and smoothness.

Therefore, the smooth action As is introduced in our work. There are three principles to follow for the smooth action. First, when the queue length is low, the smooth action will not be taken as there is a high probability to experience playback freeze soon. Thus the smooth action will be invoked only when the queue length is sufficiently large, such as when the queue length q is larger than a certain threshold Ts. Second, for the segment which will be played soon, we will not take the smooth action, because the requested enhancement layers may miss the playback deadline.

Therefore we only take the smooth action for a few number of buffered segments which are stored in the tail part of the buffer. A smooth window with size Ls is defined in our work to determine how many buffered segments will be the candidate segments for the smooth action. In our approach, the last Ls buffered segments are in the smooth window as they are least likely to miss the playback deadline. The last principle of the smooth action is to ensure that it will not bring in any additional layer variation. As shown in Fig. 2, generally, we will take the smooth action in four different scenarios. Assume the smooth window size Ls = 6, and thus all segments in are in the smooth window.

If only one segment in the smooth window has the lowest number of the enhancement layer, then we will smooth that segment by requesting the same number of enhancement layers as the other segments, which is shown in Fig. 2-(a). When there are

SPRING SOURCE TECHNOLOGIES

Search This Blog

A Real-Time Adaptive Algorithm for Video Streaming over Multiple Wireless Access Networks

Comments

Post a Comment

Popular posts from this blog

Jio

Enabling Cloud Storage Auditing with Verifiable Outsourcing of Key Updates

PUNCHING MACHINE