DISTRIBUTED video coding (DVC) [1] [4] is an attractive

1040 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 6, JUNE 2013 Dstrbuted Wreless Vsual Communcaton Wth Power Dstorton Optmzaton Xaopeng Fan, Member, IEEE, Feng Wu, Fellow, IEEE, Debn Zhao, and Oscar C. Au, Fellow, IEEE Abstract Ths paper proposes a novel framework called DCast for dstrbuted vdeo codng and transmsson over wreless networks, whch s dfferent from exstng dstrbuted schemes n three aspects. Frst, coset quantzed DCT coeffcents and moton data are drectly delvered to the channel codng layer wthout syndrome or entropy codng. Second, transmsson power s drectly allocated to coset data and moton data accordng to ther dstrbutons and magntudes wthout forward error correcton. Thrd, these data are transformed by Hadamard and then drectly mapped usng a dense constellaton (64K-QAM) for transmsson wthout Gray codng. One of the most mportant propertes n ths framework s that the codng and transmsson rate s fxed and dstorton s mnmzed by allocatng the transmsson power. Thus, we further propose a power dstorton optmzaton algorthm to replace the tradtonal rate dstorton optmzaton. Ths framework avods the annoyng clff effect caused by the msmatch between transmsson rate and channel condton. In multcast, each user can get approxmately the best qualty matchng ts channel condton. Our experment results show that the proposed DCast outperforms the typcal soluton usng H.264 over 802.11 up to 8 db n vdeo PSNR n vdeo broadcast. Even n vdeo uncast, the proposed DCast s stll comparable to the typcal soluton. Index Terms Dstrbuted vdeo codng (DVC), softcast, wreless vsual communcaton. I. Introducton DISTRIBUTED vdeo codng (DVC) [1] [4] s an attractve scheme for vdeo compresson that has emerged n the past decade. Dfferent from conventonal vdeo codng schemes, t utlzes cross-frame correlaton only at the decoder. Ths has several unque advantages. Frst, DVC can shft ntensve computaton from encoder to decoder, whch s appealng for low complexty vdeo encodng applcatons. Second, DVC framework s robust to transmsson errors, Manuscrpt receved June 4, 2012; revsed October 18, 2012 and December 15, 2012; accepted January 26, 2013. Date of publcaton February 25, 2013; date of current verson May 31, 2013. Ths work was supported n part by the Major State Basc Research Development Program of Chna (973 Program 2009CB320905), the Program for New Century Excellent Talents n Unversty (NCET) of Chna (NCET-11-0797), the Natonal Scence Foundaton of Chna under Grants 61100095, and the Fundamental Research Funds for the Central Unverstes under Grant HIT.BRETIII.201221. Ths paper was recommended by Assocate Edtor E. Magl. X. Fan and D. Zhao are wth the Department of Computer Scence, Harbn Insttute of Technology, Harbn 150001, Chna (e-mal: fxp@ht.edu.cn). F. Wu s wth Mcrosoft Research Asa, Bejng 100080, Chna. O. C. Au s wth the Department of Electronc and Computer Engneerng, Hong Kong Unversty of Scence and Technology, Kowloon, Hong Kong. Color versons of one or more of the fgures n ths paper are avalable onlne at http://eeexplore.eee.org. Dgtal Object Identfer 10.1109/TCSVT.2013.2249019 1051 8215/$31.00 c 2013 IEEE whch s desrable for wreless applcatons. Although t has been proven that the theoretcal codng performance should be equvalent, no matter what source correlaton s utlzed at encoder or decoder for some typcal sources [5], [6], the actual codng performance of DVC s stll far nferor to that of the conventonal H.264 standard [7]. In DVC, quantzed transform coeffcents are converted to bt planes and compressed to bts by syndrome or entropy codng [2], [4], [8]. The syndrome codng s mplemented va channel codes (e.g., low-densty party-check codes). These channel codes are also typcally appled for error protecton n the physcal (PHY) layer. Therefore, Xong et al. [9], [10] propose a 46 jont source-channel codng (JSCC) framework for dstrbuted 47 vdeo transmsson based on ther prevous work on JSCC 48 of bnary source. Except for these JSCC works, the transmsson of dstrbuted coded vdeo s stll smlar to that of conventonal coded vdeo. Recently, a jont vdeo codng and transmsson scheme, named Softcast [11], [12], has been proposed for wreless vdeo multcastng. The key dea n Softcast s that transform coeffcents are not compressed by entropy codng. Instead, they are drectly transmtted through a dense constellaton after allocatng a certan power, such that the receved data can be decoded at any channel condtons. The decoded data s not error free and ts sgnal-to-nose rato (SNR) s dependent on channel condton for a gven transmsson power. Although the vdeo codng layer of Softcast s smply done through 2-D or 3-D transformaton, the overall performance of Softcast stll outperforms the typcal soluton usng H.264 over 802.11 n vdeo multcast. The current Softcast only adopts 3-D DCT to explot the cross-frame correlaton. Researches n scalable vdeo codng has fully demonstrated that ths s neffcent due to the lack of moton algnment among frames [13] [15]. However, moton compensaton (MC) n H.264 s dffcult to adopt n Softcast because n Softcast the reconstructed frames are determned by channel nose and the encoder can hardly obtan the same reconstructed frames as the decoder. Thus ths paper proposes a novel framework called DCast, whch not only utlzes the cross-frame correlaton by moton algnment but also retans the nce propertes provded by Softcast. In the proposed DCast, transformed coeffcents are frst coset quantzed and then are transmtted as Softcast. Smlar to other DVC frameworks, DCast utlzes the cross-frame correlaton at the decoder. The proposed DCast has two dfferent approaches to process moton vectors (MVs). Lke most

FAN et al.: DISTRIBUTED WIRELESS VISUAL COMMUNICATION WITH POWER DISTORTION OPTIMIZATION 1041 Fg. 1. Compresson of X when ts sde nformaton S s avalable at the decoder. tradtonal DVC schemes, n the frst approach moton vectors are estmated at decoder. It does not need reference frames at encoder, and greatly reduces the encodng complexty. But the sde nformaton may not be accurate, thus leadng to low codng effcency. In the second approach, moton vectors are estmated at encoder and then transmtted to the decoder. Actually several other DVC schemes also propose to estmate moton vectors at encoder and transmt them to decoder for mprovng the qualty of sde nformaton [16], [17]. The ntal results of these two approaches have been reported n [18] and [19]. In ths paper we wll focus our study on the second approach but both of them wll be evaluated. The key techncal contrbuton n ths paper s the proposed power dstorton optmzaton. In the proposed DCast, each par of quantzed DCT coeffcents or transformed moton vectors are transmtted n one tme slot and thus the transmsson rate s fxed. The dstorton s mnmzed by optmally allocatng transmsson power. Ths paper evaluates the mpact of the channel nose on the dstorton of the moton vectors and then the mpact of ths dstorton on the dstorton of reconstructed vdeo va the power spectrum approach [20]. Furthermore, a jont power optmzaton among coeffcents and moton data s derved. Our expermental results show that the proposed DCast can outperform Softcast up to 2 db n vdeo PSNR as t can better utlze the cross-frame correlaton. Compared wth the typcal soluton usng H.264 over 802.11, the proposed DCast can gan up to 8 db n vdeo PSNR n multcast. Even n uncast, t s stll comparable to the typcal soluton of H.264 over 802.11. The rest of ths paper s organzed as follows. Secton II brefly revews the related work on dstrbuted vdeo codng and transmsson. Secton III ntroduces the proposed DCast ncludng both encoder and decoder. Secton IV dscusses the proposed power dstorton optmzaton. Secton V presents our expermental results and compares them wth Softcast and H.264 over 802.11. Fnally, Secton VI concludes ths paper. II. Related Works A. Dstrbuted Vdeo Codng To compress a source wth ts predcton that s only avalable at the decoder s a typcal problem n dstrbuted source codng (DSC). As shown n Fg. 1, X s the source to be compressed (possbly representng the source vdeo), and S s ts sde nformaton (possbly representng the predcted frame). The theoretcal foundatons of DSC, the Slepan-Wolf theorem [5], and the Wyner-Zv theorem [6] show that the source X can be effcently compressed wth ts predctor S avalable only at the decoder. In practce, effcent DSC can be acheved by coset codng, turbo codng and LDPC codng [21], [22]. Accompaned by advances of practcal DSC solutons, DVC has emerged snce a decade. Pur et al. [3], [4] propose a DVC framework called PRISM, whch mplements DVC by coset codng and supports moton estmaton (ME) at decoder. The man attrbutes of PRISM nclude the ncreased robustness to channel losses and more flexble sharng of computatonal complexty between encoder and decoder. Another DVC work s the low complexty framework proposed by n [1] and [2]. In ths framework, the DVC s mplemented by turbo code, whle the moton estmaton at decoder s based on moton compensated nterpolaton (MCI) and moton compensated extrapolaton (MCE). Although DVC has shown unque advantages n vsual communcaton, ts compresson effcency s much lower than conventonal framework [2], [23]. In recent years, much research has focused on mprovng the performance of DVC. Enablng transform codng [24], [25] and ntra/nter mode selecton [26] [28] allows DVC to explot not only nter but also ntra frame redundancy. Hash based DVC lets the encoder send hash code to the decoder to mprove the accuracy of ME and the sde nformaton qualty [29]. Successve refnement schemes [30] [33] perform ME and DVC decodng alternatvely and recursvely, such that the MVs and reconstructon frame are successvely refned durng decodng process. More accurate correlaton estmaton n DVC mproves the utlzaton of the sde nformaton [34] [37]. Dfferent from these DVC schemes, the proposed DCast drectly delvers coset quantzed coeffcents and moton vectors to the channel codng layer. Furthermore, when coeffcents and moton vectors are transmtted from encoder to decoder, they are allowed to be corrupted by channel nose. It s clear from our results that DVC s robust to nose embedded n the receved data. B. Dstrbuted Vdeo Transmsson Over Wreless Network The transmsson of dstrbuted coded vdeo s usually smlar to the transmsson of conventonal coded vdeo n the PHY layer of wreless network. Coded bnary data s frst protected by channel codng and then s mapped to a constellaton for transmsson. When syndrome codng s adopted, DVC codng and channel codng can be jontly optmzed. Xu et al. [9] made the frst attempt to study DVC from a JSCC. It s a layered codng scheme, where the enhancement layer uses Raptor code for both vdeo compresson and data protecton. In another frame-based JSCC scheme [16], the functonalty of both DVC and channel codng are mplemented unversally by one error correcton code. In these JSCC schemes, dstrbuted vdeo transmsson are actually processed as data transmsson. The transmsson error are desred to be corrected n the JSCC decoder. Thus many bts are pad n channel codng to correct transmsson errors. In the proposed DCast, quantzed coeffcents and transformed moton vectors are drectly transmtted after allocatng a certan power. Although the receved data after decodng may stll contan a certan channel nose, t s more effcent on power consumpton because some receved nose can be tolerated by DVC.

1042 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 6, JUNE 2013 Fg. 2. DCast server (for nter frames). C. Softcast Softcast s a smple but comprehensve desgn for wreless vdeo multcast, coverng the functonalty of vdeo compresson, data protecton and transmsson n one scheme [11]. The Softcast encoder conssts of the followng steps: 1) DCT transform, power allocaton, 2) Hadamard transform, and 3) drect dense modulaton. Transform removes the spatal redundancy of a vdeo frame. Power allocaton mnmzes the total dstorton by optmally scalng the transform coeffcents. Hadamard transform s n some sense a precodng to make packets wth equal power and equal mportance. After that, the data s drectly mapped nto the wreless symbols by a very dense QAM. The decoder uses lnear least square estmator (LLSE) to reconstruct the sgnal. Almost all the steps n Softcast are lnear operatons and thus the channel nose s drectly transformed nto reconstructon nose of the vdeo. Therefore, Softcast s asymptotcally robust n the sense that each user can get the vsual qualty matchng hs channel condton. However, Softcast explots the ntra-frame correlaton only and thus s not very effcent n the aspects of vdeo compresson. Recently, Adtya et al. [38] proposed another vdeo codng and transmsson scheme called Flexcast. It removes entropy codng from conventonal vdeo codng and adopts rateless channel codng for channel varaton. Thus, t has the better codng effcency. However, Flexcast s a uncast approach and can hardly multcast or broadcast vdeo to the users wth dfferent SNRs smultaneously because of moton compensaton. In a recent mproved verson of Softcast, the utlzaton of 3-D-DCT partally enables nter frame compresson [12]. However, wthout moton algnment the nter frame correlaton s stll not fully exploted yet. The proposed DCast not only fully utlzes the cross-frame correlaton but also retans the good propertes of Softcast. DCast enables nter frame codng by DVC rather than conventonal moton compensaton. Instead of transmttng a vdeo frame tself lke Softcast, DCast transmts the coset codes of the vdeo frame such that the frame can be reconstructed by utlzng the predcton frame as sde nformaton at the decoder. Ths saves the transmsson power (or equvalently ncreases the SNR) because the coset data typcally have much smaller magntude than the orgnal data. Recently, we also notced that Kochman et al. [39] have studed the utlzaton of coset codng n the Wyner-Zv Drty-Paper problem and proved ts optmalty and asymptotcal robustness n multcast applcatons. It can be consdered n general as the theoretcal foundaton to support the proposed DCast. III. Proposed DCast Framework DCast dvdes nput vdeo sequences nto groups of pctures (GOP). In each GOP, the frst frame s an ntra (coded) frame, whle the followng frames are nter frames. The compresson and transmsson of the ntra frame n DCast s the same as n Softcast, whch conssts of DCT, power allocaton and Hadamard transform. In the rest of ths paper, we wll focus on the compresson and transmsson of nter frames. For smplcty, we manly dscuss the case wth moton vectors estmated at encoder. Fg. 2 depcts the server sde of DCast. DCast frst transforms the current frame nto DCT doman. Meanwhle, DCast performs ME and MC on the orgnal vdeo sequence to get the encoder predctons and MVs. Then DCast apples coset codng on the transform coeffcents of the orgnal mage to get the coset data for each DCT coeffcent. The quantzaton step sze of the coset codng s determned at the encoder accordng to the estmated decoder predcton nose. The MVs of the current frame, n the form of a matrx, are also transformed by DCT. The coset data and the moton data are then scaled for power dstorton optmzaton (PDO). The scalng factors and other metadata are transmtted by usng a conventonal scheme consstng of varable length codng (VLC), forward error correcton (FEC), and BPSK modulaton. The scaled coeffcents are transformed by Hadamard as precodng to make packets wth equal power and equal mportance. After that, the resultng coeffcents are mapped to complex symbols drectly by a very dense constellaton (64K-QAM). Each coeffcent s quantzed nto 8-bt nteger number and every two ntegers compose one complex number of 64K possble values. At last, these complex numbers are passed nto a raw OFDM module undergong FFT and D/A converson for transmsson. The recever sde of DCast s depcted n Fg. 3. The raw OFDM module performs A/D converson and FFT to reconstruct modulated data ncludng both the scaled coeffcents and the metadata. The metadata s demodulated and decoded frst. Then the scaled coeffcents are reconstructed by nverse 64K-QAM and nverse Hadamard transform. The nverse

FAN et al.: DISTRIBUTED WIRELESS VISUAL COMMUNICATION WITH POWER DISTORTION OPTIMIZATION 1043 64K-QAM here does nothng but splttng each complex value back nto two real values. Each real value here s actually the 8-bt nteger number plus channel nose. After nverse Hadamard transform, lnear mnmum mean square error (LMMSE) estmaton of the resdue coeffcents and the MV coeffcents are performed. Then the MVs are transformed back to spatal doman by nverse DCT. After ths, the MC module generates the predcted frame by the MVs and the reference frame. The predcted frame s transformed nto frequency doman by DCT. Then wth the coset resdues and the predctors, the coset decodng module recovers the DCT coeffcents of the current frame. At last, the sgnals are transformed back to spatal doman, and are lnearly combned wth the predcted sgnals by LMMSE to generate the fnal reconstructon. A. Coset Codng Coset codng s a typcal technque used n DSC. It parttons the set of possble nput source values nto several cosets and transmts the coset ndex to the decoder. Wth the coset ndex and the predctor, the decoder can recover the source value by choosng the one n the coset closest to the predctor. Coset codng acheves compresson because the coset ndex has typcally lower entropy than the source value. Let X be the DCT coeffcents of the orgnal vdeo frame. DCast encodes X to get coset values C. DCast dvdes the coeffcents nto 64 subbands accordng to the frequency. Let X be the th subband of X, and C be the th subband of C. For each, DCast quantzes the th subband of X by a unform scaler quantzer Q ( ) and gets the resdue value [39] by C = X Q (X )=X X q + 1 2 q (1) Ths coset codng s actually throwng away the man part of X. In some sense C represents the detal of X. At the clent sde, wth the sde nformaton S (.e. the predcted DCT coeffcents) and the receved coset value Ĉ, the recever reconstructs the DCT coeffcents by coset decodng. Let S be the th subband of S, and Ĉ be the th subband of Ĉ. Snce S s close to X, S Ĉ s around X C. Thus S Ĉ s around Q (X ) from (1). The quantzers are carefully desgned such that applyng quantzaton Q ( ) ons Ĉ we could get Q (X ),.e. Q (X )=Q (S Ĉ ) (2) n hgh probablty. Therefore, each subband of coeffcents s decoded by ˆX = Q (S Ĉ )+Ĉ (3) where ˆX s the reconstructon of X, and each ˆX s the th subband of ˆX. When the coset decodng s successful,.e. Q (X )=Q (S Ĉ ), the reconstructon nose s ˆX X = Ĉ C. (4) B. Estmaton of Coset Quantzaton Step The value of each coset step q s crucal to the codng performance of DCast. If q s too small, the coset decodng may suffer falure. On the other hand, f q s too large, the coset value C n (1) wll be large and wll consume a lot of transmsson power to keep the dstorton small. The value of each q s determned as follows. Injectng (1) nto (2), we get Q (X ) = Q (S Ĉ + C X + Q (X )), (5) = Q (X )+Q (S Ĉ + C X ) (6) To guarantee successful coset decodng, the last tem should be 0. Ths means the quantzaton step q should satsfy q 2 S X + C Ĉ (7) In ths equaton, the S X s the predcton nose at the decoder and the C Ĉ s the reconstructon nose of the coset value C due to transmsson. In ths paper, we assume they are ndependent Gaussan source. We let each q to be 2n tmes of the standard devaton of S X + C Ĉ,.e. q 2 =4n 2 σ 2 S X +C Ĉ (8) and ths guarantees that condton (7) s satsfed n probablty Pr = erf(n/ 2) (9) Under the same assumpton, the varance of S X + C Ĉ s the summaton of the varance of S X and C Ĉ,.e. σ 2 S X +C Ĉ = σ 2 S X + σ 2 C Ĉ (10) and each q can be calculated by q 2 =4n 2 (σ 2 S X + σ 2 C Ĉ ) (11) In our mplementaton, we let n = 3 such that the coset decodng s successful for more than 99.7% coeffcents. In (11), σs 2 X s the varance of the hypothetc resdue between the source and the sde nformaton, and t s estmated by smulatng at the encoder a recever wth target channel SNR. σ 2 s the dstorton of coset value C C Ĉ due to transmsson. It s also the dstorton of the source X accordng to (4). σ 2 C Ĉ s related to both the resdue σs 2 X and the channel SNR. The explct expresson of σ 2 s gven n Secton IV. C Ĉ C. Power Allocaton DCast transmts both the coset values and the moton nformaton. Thus, t has two levels of power allocaton. The frst allocaton s between MV data and coset data. The second level s the allocaton wthn MV coeffcents or coset coeffcents. The optmal power allocaton between MV data and coset data s gven n Secton IV. The optmal power allocaton wthn coset coeffcents and the optmal power allocaton wthn MV coeffcents are as follows. Let P coset be the total power of coset data, and g C be the gan (scalng factor) of C. The problem s how to mnmze the reconstructon dstorton of X, by optmally allocatng power among C. Under the assumpton that the coset decodng s successful n hgh probablty, the reconstructon dstorton of X wll be equal to the reconstructon dstorton of C accordng

1044 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 6, JUNE 2013 Fg. 3. DCast recever (for nter frames). to (4). Ths means that the problem now becomes how to mnmze the reconstructon dstorton of C, by optmally allocatng power among C. Thus the soluton has a smlar form as the one n Softcast [12],.e. ( ) 1/2 P coset C = g C C, g C = σ C j σ (12) C j where C s the coset value after power allocaton, C s the th subband of C, and σ C s the standard devaton of C. Ths power allocaton tends to scale down large coeffcents to get better performance under the constraned total power. The encoder calculates the varance σc 2 for each subband and transmts t to the decoder. Wth σc 2, both the encoder and the decoder calculate the gan g C for each C by (12). On MV data, DCast also performs power allocaton. To apply power allocaton, the encoder performs 2-D DCT on the MVs (the whole MV feld) and gets transform coeffcents M. Note that each MV contans horzontal and vertcal components, and the transform s actually appled to both components separately. Each coeffcent M s then consdered as a subband. The encoder apples a smlar optmal power allocaton over M,.e. ( ) 1/2 P mv M = g M M, g M = σ M j σ (13) M j where M s the MV data after power allocaton, M s the th subband of M, σ M s the standard devaton of M, and P mv s the total power for moton data. Snce each subband of M contans only one coeffcent, t s not effcent to transmt the varance of each subband. In ths lght, DCast only transmts the average varance σm 2 = 1 n σ2 M where n s the number of subbands. As shown n our prevous work [19], the σm 2 and g M are calculated by usng σm 2. Under the assumpton that the moton feld s random Markov feld where the correlaton coeffcent between two neghborng MVs s ρ, each σm 2 can be calculated by σ 2 M = σ 2 M V M (14) where V M s the th element of matrx V M, and V M = dag(2d DCT (R (h) ))dag(2d DCT (R (w) )) T (15) s a constant matrx for gven ρ. Here the functon dag( ) produces the dagonal elements of the nput matrx n the form of a column vector. 2D DCT( ) means 2-D DCT transform. w and h are the wdth and heght of the moton feld respectvely and 1 ρ ρ k 1 R (k) ρ 1 ρ k 2 =......... (16) ρ k 1 ρ k 2 1 The value of σm 2 s calculated at the encoder and s transmtted to the decoder as mentoned n the prevous secton. Both the encoder and the decoder calculate the value of each σm 2 by (14)-(16). In our experments we let ρ = 0.7 accordng to statstcs over several dfferent vdeo sequences. Wth each σm 2, the optmal power allocaton gan g M for each subband s calculated at both encoder and decoder by (13). The decoder needs the value of g M n (13) to reconstruct the sgnal. D. Packagng and Transmsson Smlar to Softcast [12], DCast transmts not only a small amount of bnary symbols but manly real-valued symbols. The organzaton of the symbol stream s as follows. The symbol stream conssts of a header and a followng data stream. symbol stream = {header btstream, data stream} (17) The header btstream contans the meta data, ncludng the nformaton of coset varances σc 2, the quantzaton steps q, the average MV varance σm 2 and other useful parameters. header btstream {coset varances, quantzaton steps, average MV varance, parameters} (18) The header nformaton s coded n a conventonal way. The encoder apples 8-bts scalar quantzaton on σ C, q and σ M respectvely. Then the quantzaton results are compressed by varable length codng (VLC). The VLC s the unversal one used for codng moton vectors n H.264 [7]. The compressed header btstream s transmtted by the standard 802.11 PHY

FAN et al.: DISTRIBUTED WIRELESS VISUAL COMMUNICATION WITH POWER DISTORTION OPTIMIZATION 1045 layer at the lowest speed,.e., by usng a 1/2 convolutonal code and BPSK modulaton. Ths s to make sure the header bts are decoded correctly when channel SNR s n typcal workng range (5 25 db) of 802.11. Note that the sze of the header s very small compared to the whole data of one frame. Accordng to our experments, the proporton of the bandwdth requred by headers s less than 3%. The data stream contans the nformaton of the coset data C and the MV data M. Smlar to Softcast [12], DCast apples Hadamard transform on the coset data C and the MV data M to create packets wth equal energy. Coset data and MV data are mxed together and then every 64 numbers are grouped for Hadamard transform. Ths forms the data stream data stream H {coset data, MV data}. (19) Note that the data stream conssts of real values rather than bnary values. In PHY layer, these real values are mapped to complex symbols drectly by 64K-QAM constellaton [12]. Ths constellaton s a typcal N-QAM constellaton wth N equal to 65536 (256 by 256). Each nput real value s quantzed nto an 8-bt nteger number by unform scalar quantzer. The dynamc range of the quantzer s formed by the mnmal and maxmal nput value. It s calculated for each frame at encoder and sent to decoder as a parameter n (18). After ths quantzaton, every two ntegers compose one complex number as the output of the 64K-QAM constellaton. An nverse FFT s computed on each packet of symbols, gvng a set of complex tme-doman samples. These samples are then quadrature-mxed to passband n the standard way. The real and magnary components are frst converted nto the analogue doman usng D/A converters. The analogue sgnals are then used to modulate cosne and sne waves at the carrer frequency, respectvely. These sgnals are then summed to gve the transmsson sgnal. In DCast, both MV data and coset data are transmtted by the aforementoned drect source channel mappng. Ths makes the system adaptve to the fluctuaton of the channel SNR. Gven a transmtter, hgh SNR users would receve accurate MVs and coset values and reconstruct hgh qualty vdeo. Meanwhle, low SNR users would receve nosy MVs and coset values, and derve nosy predcton frame based on the nosy MVs. However, the coset decodng n DCast has good tolerance to the nose of the predcton. Thus, the low SNR users would stll reconstruct the vdeo. E. LMMSE at Decoder The proposed approach contans two LMMSE estmators, operatng n transform doman and spatal doman, respectvely. The purpose of the frst LMMSE estmator s to reconstruct the coset data C and the MV data M n transform doman wth mnmum dstorton. Let Y be the receved sgnal after nverse Hadamard transform. Y contans the nosy verson of the coset data and the MV data. Y can be wrtten as: [ ] Ċ Y = (20) Ṁ where Ċ s the nosy verson of coset data, Ṁ s the nosy verson of MV data. Let W (C) and W (M) be the channel nose n Ċ and Ṁ respectvely. Let Ċ, Ṁ, W (C) and W (M) be the th subband of Ċ, Ṁ, W (C) and W (M), respectvely. We model each element n W (C) and W (M) as..d Gaussan source wth varance N 0. Each subband of Ċ and Ṁ can be expressed as Ċ = g C C + W (C), Ṁ = g M M + W (M). (21) Therefore, the LMMSE reconstructon of the orgnal sgnals s σc 2 Ĉ = σm σc 2 gc 2 Ċ, ˆM 2 = + N 0 σm 2 gm 2 Ṁ. (22) + N 0 And the reconstructon dstorton of each subband s E{(Ĉ C ) 2 } = σ 2 C N 0 σ 2 C g 2 C + N 0, (23) E{( ˆM M ) 2 σm 2 } = N 0 σm 2 gm 2. (24) + N 0 The purpose of the second LMMSE estmator s to reconstruct each pxel x n spatal doman wth mnmum dstorton. DCast decoder apples nverse DCT transform on coset reconstructon ˆX and gets a pxel-doman prelmnary reconstructon ˆx. ˆx s consdered as the frst nosy verson of x. DCast also has the predcted pxel s as the second nosy verson of x. Wth ˆx and s, the optmal LMMSE estmaton x s gven by: where x = θs +(1 θ)ˆx (25) σ 2ˆx x θ = σs x 2 + σ 2ˆx x. (26) σ 2ˆx x s the varance of ˆx x, and σ2 s x s the varance of s x. In DCast, the predcton nose varance σs x 2 s estmated at block level. Snce ˆx s close to x, σs x 2 s estmated by calculatng E{(s ˆx) 2 }. The varance σ 2ˆx x s calculated as follows. Accordng to the Parseval s theorem and (4), we have σ 2ˆx x = E{(ˆx x)2 } = E{( ˆX X) 2 } = E{(Ĉ C) 2 } (27) where E{(Ĉ C) 2 } s drectly calculated by summaton on (23). IV. Power-Dstorton Optmzaton In DCast, both the MVs and the coset values requre power to transmt. Thus t s necessary to nvestgate the optmal power allocaton between MVs and the coset values. Let D be the reconstructon dstorton, and P be the transmsson power. P coset and P mv be the transmsson power for the coset values and the MVs, respectvely. The optmal power allocaton s the one mnmzng the reconstructon dstorton D for gven power P,.e., the optmzaton problem s mn D, (28) s.t. P mv + P coset P.

1046 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 6, JUNE 2013 A. Relatonshp Between Varables The dstorton D s drectly related to both the decoder predcton nose varance σs X 2, and the coset transmsson power P coset. Intutvely, usng larger transmsson power P coset decreases the varance of the coset error Ĉ C at decoder. Ths means smaller D snce the reconstructon error ˆX X equals to the coset error Ĉ C accordng to (4). Meanwhle, larger σs X 2 means lower qualty of sde nformaton (SI), and lower qualty SI leads to larger reconstructon dstorton. Therefore, the dstorton D should be a decreasng functon of the coset power P coset and an ncreasng functon of the predcton nose varance σs X 2. Furthermore, the predcton nose varance σs X 2 s related to the MV transmsson power P mv. We use a two dmensonal random vector N (0,σ 2 I 2 2) to model MV error, whle σ 2 = 1 2 E{ T } s the dstorton of MV. Usng larger transmsson power P mv decreases the MV dstorton σ 2 and ths means more accurate MVs. More accurate MVs produces hgher qualty of decoder SI S, and hence a smaller predcton nose varance σs X 2. Thus the predcton nose varance σ2 S X decreases n the MV transmsson power P mv. However, due to the power constrant, gvng more power to coset (.e. usng larger P coset ) means less power to MV (.e., usng smaller P mv ), and vce versa. Ths s why we need power dstorton optmzaton. In the followng part of ths secton, before solvng (20) we wll derve the relatonshp between: 1) MV transmsson power P mv and MV dstorton σ 2 ; 2) MV dstorton σ 2 and predcton nose varance σ2 S X ; 3) Dstorton D, coset power P coset and predcton nose varance σs X 2. B. MV transmsson Power P mv and MV Dstorton σ 2 Ths subsecton focuses on the relatonshp between MV transmsson power P mv and MV dstorton σ 2. Accordng to Parseval s theorem, the MV dstorton σ 2 n spatal doman equals to the MV dstorton n DCT doman,.e. σ 2 = 1 E{( ˆM M ) 2 } (29) n mv where n mv s the number of MV coeffcents. From (23), we get σ 2 = 1 σm 2 N 0 n mv σ 2 M gm 2 1 N 0 + N 0 n mv g 2 (30) M where the approxmaton s accurate when P mv N 0. Substtutng (13) nto (30), we get σ 2 N 0( σ M ) 2. (31) n mv P mv Then usng (14) we get σ 2 N 0σM 2 ( V 1 2 M ) 2. (32) n mv P mv By defnng α mv =( 1 n mv V 1 2 M ) 2 (33) we can rewrte (32) as σ 2 n mvn 0 σm 2 α ( ) 1 mv = α mv σm 2 Pmv. (34) P mv n mv N 0 In ths equaton, σm 2 s the varance of the MV sgnal to P transmt, mv n mv N 0 s the SNR for MV sgnal. Thus α mv can be consdered as the extra gan ownng to the power allocaton n (13). From ths equaton, the MV dstorton σ 2 s proportonal to the nverse of the MV transmsson power P mv. C. MV Dstorton σ 2 and Predcton Nose Varance σ2 S X Ths subsecton focuses on the relatonshp between MV dstorton σ 2 and predcton nose varance σ2 S X. Let Ṡ be the orgnal decoder predcton when the MVs are perfectly receved. The practcal decoder predcton nose S X conssts of two components: the orgnal predcton nose Ṡ X, and the addtonal predcton nose S Ṡ caused by erroneous MVs. In ths paper, we assume they are ndependent of each other, and therefore σ 2 S X = σ2 Ṡ X + σ2 S Ṡ. (35) Gven that the Ṡ s a phase-shft verson of S, σ 2 can be S Ṡ analyzed by usng power densty. Smlar to the dervaton n [20], we have σ 2 S Ṡ = 1 π π 2 4π 2 ss (ω)(1 E{cos(ω T )})dω (36) π π where ss ( ) s the power densty functon of sde nformaton, ω s two-dmensonal frequency (n radans), and N (0,σ 2 I 2 2) s the MV error. For small σ 2,wehave 1 E{cos(ω T )}) 1 2 E(ωT ) 2 = 1 2 σ2 ωt ω, (37) and thus σ 2 1 π π S Ṡ 4π 2 σ2 ss (ω)ω T ωdω. (38) π π We defne γ = 1 π π 4π 2 ss (ω)ω T ωdω (39) π π and γ s a constant for a gven vdeo frame. Then we get Substtutng (40) nto (35), we get σ 2 S Ṡ γσ2. (40) σ 2 S X = σ2 Ṡ X + γσ2. (41) Therefore, the predcton nose varance σs X 2 s lnear to the MV dstorton σ 2. D. Dstorton D as a Functon of P coset and σ 2 S X The dervaton of the dstorton D s as follows. Frstly, from (4) we have ˆX X = Ĉ C n hgh probablty. Thus the dstorton D approxmately equals to the dstorton of the coset value, that s D = σ 2ˆX X σ2 Ĉ C. (42)

FAN et al.: DISTRIBUTED WIRELESS VISUAL COMMUNICATION WITH POWER DISTORTION OPTIMIZATION 1047 Smlar to secton IV-B, we can derve and express the coset dstorton as ( ) 1 σ 2 Ĉ C α cosetσc 2 Pcoset (43) n coset N 0 where α coset s the codng gan of power allocaton, σc 2 s the varance of C, and n coset s number of coset subbands. In general, our DCast transmts the coset values of the source X over Gaussan channel, wth the sde nformaton S at the recever sde. Therefore, for each subband, t forms a typcal Wyner Zv drty-paper problem, n whch transmttng the coset values has been proven to be as effcent as transmttng the resdue S X over the same channel (n case that S X s avalable to the encoder) [39]. Actually, accordng to the theorem n [39] (the exstence of good lattce), the coset value C of each subband has the same varance wth the predcton resdue S X of each subband, that s σ 2 C = E{C 2 } = E{(S X ) 2 }. (44) Thus the coset value and the predcton resdue have the same varance n frame level, that s σ 2 C = E{(S X)2 } = σ 2 S X. (45) Therefore, (42),(43) and (45) mples ( ) 1 D = σ 2 Ĉ C α cosetσs X 2 Pcoset. (46) n coset N 0 Ths means D s proportonal to the predcton nose varance σs X 2 and the nverse of coset power P coset E. Soluton Substtutng (34) and (41) nto (46), we get D=(σ 2 Ṡ X +γα mvσm 2 n mvn 0 Pmv 1 )α cosetn coset N 0 Pcoset 1. (47) Then takng (47) nto the problem (28), and solvng the problem, we get P mv = [(A 2 + A) 1/2 A]P, (48) A = γα mvσm 2 n mvn 0 P 1 σ 2. Ṡ X Although t seems that A contans so many varables, there s actually a qute straghtforward way to estmate A. In A, σm 2 s the varance of the MV sgnal to transmt, P n mv N 0 s the SNR when all power s allocated to MV, and α mv s the codng gan of the power allocaton. Ths means that, f all power s allocated to MV, the MV dstorton σ 2 wll be α mvσm 2 n mvn 0 P 1 accordng to (34). Furthermore, (34) together wth (40), mples that γα mv σm 2 n mvn 0 P 1 s the varance of the addtonal predcton nose caused by erroneous MVs when all transmsson power s allocated to MV. Therefore, the parameter A s estmated as follows. DCast smulates the transmsson and decodng process to get for each frame a hypothetc sde nformaton S, whch s the sde nformaton when all transmsson power s allocated to MV data. DCast also calculates for each frame another hypothetcal sde nformaton Ṡ, whch s the sde nformaton assumng the transmsson of MVs are lossless. Snce S Ṡ s the addtonal predcton nose caused by erroneous MVs, we have σ 2 S Ṡ = γα mvσ 2 M n mvn 0 P 1. (49) Wth (49), the soluton (48) s rewrtten as P mv = [(A 2 + A) 1/2 A]P (50) A = σ2 S Ṡ σ 2. Ṡ X Therefore, for optmal power dstorton optmzaton, the encoder frst estmates σ 2 and S Ṡ σ2, and then calculates Ṡ X optmal MV transmsson power P mv by (50). V. Experments In our experments, we evaluate the performance of the proposed DCast n vdeo streamng applcatons ncludng both uncast and multcast. We compare DCast wth Softcast [11], [12] and conventonal frameworks. We have mplemented two versons of Softcast based on 2-D-DCT and 3-D-DCT respectvely,.e. Softcast2-D [11] and Softcast3-D [12]. We also mplement two conventonal frameworks. One uses H.264 as vdeo encoder and the other uses a DVC codec named Wtsenhausen-Wyner Vdeo Codec (WWVC) [17]. Both of the two frameworks use standard 802.11 PHY layer wth FEC and QAM modulatons. We use JM14.2 software as H.264 codec. For error reslence, the ntra MB refresh rate s set to be 10%. Each vdeo slce s packed nto one RTP packet. We set the maxmal slce sze to be 1192 bytes such that the length of RTP packet s no greater than 1200 bytes. The WWVC coded btstream s also packed nto RTP packet of maxmal length 1200bytes. We append to each RTP packet a 32-bts CRC, and then encode each packet separately. Smlar to the experments n [12], for error protecton we apply on each packet an outer Reed-Solomon code wth the same parameters (188/204) used for dgtal TV [40]. Each packet s ndvdually nterleaved between the outer Reed-Solomon code and the nner FEC n accordance wth the same recommendaton. For nner FEC, we generate the 1/2 convolutonal code wth polynomals {133, 171} and puncture t to get 2/3 and 3/4 convolutonal codes. The FEC coded bts are mapped to the complex symbols by BPSK, QPSK, 16QAM or 64QAM. The complex symbols are then transmtted over OFDM. We assume the channel nose s Gaussan and the channel bandwdth s 1.15 MHz. The FEC decodng s done by soft Vterb algorthm. After the FEC decodng and RS decodng, the decoder performs CRC check for each RTP packet, and forward those error-free packets to vdeo decoders. The WWVC decoder performs Wyner-Zv decodng and s able to reconstruct the vdeo frames when the reference frames have some error. The H.264 decoder can also tolerate a small percentage of RTP packet loss, by utlzng the error concealment. In our test, we have confgured the H.264 decoder to use the most complex error concealment method n JM14.2, the moton copy one, to get best reconstructon qualty. The test vdeo sequences are standard CIF sequences (352 288, 30 Hz), ncludng Akyo, Bus, Coastguard, Crew, Flower, Football, Foreman, Harbour, Husky, Ice, News, Soccer,

1048 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 6, JUNE 2013 Fg. 5. Uncast performance comparson. Both the encoder and the decoder are assumed to know the channel SNR. Fg. 4. Verfcaton of the models of power dstorton optmzaton n Secton IV. P coset and P MV are transmsson power of coset data and MV data respectvely. D s reconstructon dstorton. Stefan, Tempete, Tenns, and Waterfall. To evaluate average performance of each framework, we also create a monochrome 512-frame test vdeo sequence, called all seq, by combnng the frst 32 frames of the above 16 test sequences. For DCast, H.264 and WWVC, the GOP structure s IPPP and the GOP length s 32. In the followng tests, all the PSNR results are for all the frames ncludng both ntra and nter frames. The number of reference frame for nter frame s 1. In DCast, the ntra frame codng s exactly the same as Softcast2- D and the nter frame codng s by proposed framework. The transmsson power allocated to an ntra frame s set to be 4 tmes of the power of an nter frame. Accordng to our experments, ths approxmately makes ntra and nter frames have smlar vdeo PSNR. The search range of ME s 32 32 and the MV precson s 1/4 pxel. In ME, DCast uses only 8 8 block sze, whle H.264 and WWVC use all the 7 block sze from 4 4to16 16. Table I gves a summary of the technques and confguratons of these frameworks. A. PDO Model Verfcaton Ths test s to verfy the models of power dstorton optmzaton (PDO) n Secton IV. We use all seq as the test sequence. In the frst test, we fx the coset transmsson power P coset and let the MV transmsson power P MV change. The channel nose power N 0 s set to 1. The results are gven n Fg. 4. Fg. 4(a) shows the relaton between the MV transmsson power P MV and the MV dstorton σ 2. Accordng to the result, the nverse of P MV s proportonal to the MV dstorton. Ths confrms the equaton (34). Fg. 4(b) shows the lnear relaton between the MV dstorton σ 2 and the predcton nose varance σs X 2. Ths verfes the model of equaton (41). Fg. 4(c) shows the relaton between the MV transmsson power P MV and the reconstructon dstorton D. They are approxmately n lnear relaton as shown n the equaton (47). In the second test, we fx the MV transmsson power P MV and let the coset transmsson power P coset change. The channel nose power N 0 s set to 1. The result s gven n Fg. 4(d). The reconstructon dstorton D s proportonal to the nverse of the coset transmsson power P coset. Ths verfes the model n equaton (46) and (47). B. Uncast Performance Ths test s to compare uncast performance among all the above frameworks. In ths test the nput vdeo s all seq and the channel SNR s 5 20 db. Both the encoder and the decoder s assumed to know the channel SNR. For each channel SNR, the parameters of DCast are optmally tuned. The total transmsson power s optmally allocated to coset data and moton data as explaned n Secton IV. The conventonal framework s assumed to be able to choose the best combnatons of the FEC and the QAM methods recommended by 802.11 accordng to the channel SNR, to get maxmal btrate for source codng layer. The RS codng s skpped n ths uncast test. The source codng layer,.e. the H.264 codec or WWVC codec, performs rate control to utlze the btrate as much as possble. The expermental result s gven n Fg. 5. Ths fgure compares the reconstructon qualty of sx frameworks at dfferent channel SNR. The reconstructon qualty s measured by vdeo PSNR. DCast s unformly 4 db better n vdeo PSNR than Softcast2D at all channel SNR, manly due to enablng nter frame predcton. DCast gans about 1.5 db n vdeo PSNR over Softcast3D, whch manly comes from moton algnment. Compared wth H.264 based framework, DCast s about 0.8 db worse n vdeo PSNR at low channel SNR but s about 2.9 db better n vdeo PSNR at hgh channel SNR. WWVC based framework performs slghtly worse than H.264 based framework. In ths test, we also mplement another verson of DCast n whch the ME s performed at the decoder by moton compensated extrapolaton [2]. Compared wth conventonal framework, the DCast wth decoder ME s about

FAN et al.: DISTRIBUTED WIRELESS VISUAL COMMUNICATION WITH POWER DISTORTION OPTIMIZATION 1049 TABLE I Summary of the Four Frameworks Frameworks Softcast2D Softcast3D DCast H.264/WWVC+802.11 GOP IIII... IPPP... IPPP... Reference frames 0 1 1 ME N N Y Y ME block sze fxed varable ME search range 32 32 32 32 MV precson 1/4 1/4 DCT 2-D 3-D 2-D 2-D Codng delay 1 frame 4 frames 1 frame 1 frame Modulaton OFDM OFDM OFDM OFDM Constellaton 64K-QAM, 64K-QAM, 64K-QAM, BPSK, QPSK, BPSK BPSK BPSK 16-QAM, 64-QAM FEC rate 1/2 (BPSK only) 1/2 (BPSK only) 1/2 (BPSK only) 1/2, 2/3, 3/4 RS rate 188/204 Fg. 6. Evaluaton of each module. The contrbuton of coset codng, ME and PDO are about 2.7 db, 0.8 db and 0.5 db n vdeo PSNR respectvely. 1.6 db worse n vdeo PSNR at low channel SNR but s 1.7 db better n vdeo PSNR at hgh channel SNR. Note that the result n Fg. 5 does not mean DCast can outperform H.264 n compresson effcency. H.264 s a vdeo codng standard whle DCast s a wreless vdeo transmsson framework. H.264 has very hgh compresson effcency but the btstream s not very robust to error. Ths s why H.264 btstream needs addtonal FEC bts to protect. DCast may not be as effcent as H.264 n vdeo compresson, but s robust to channel nose. Thus, t can skp FEC and can use a very dense 64K-QAM modulaton, and acheves hgh transmsson effcency. C. Evaluaton of Each Module DCast has several modules such as coset codng, moton estmaton (ME) and power dstorton optmzaton (PDO). In the followng test, we ncrementally turn off these modules n DCast to evaluate ther contrbuton. In ths test the nput vdeo s all seq, and the channel SNR s 5 15 db. The test results are gven n Fg. 6. In ths fgure, "PDO off" means there are no PDO and the encoder utlzes an adhoc power allocaton where the total transmsson power s equally Fg. 7. Robustness test. DCast s confgured to optmzed for target channel SNR of 5 db, 10 db and 15 db respectvely, and then tested under dfferent channel SNR. allocated between moton data and coset data,.e. P mv N mv = P coset N coset. "ME off" means there are no ME and the decoder uses prevous reconstructed frame drectly as sde nformaton. Note that there are dependences between the three modules (coset, ME, and PDO). When ME s dsabled, the PDO must be off because there s no MV to transmt. When coset codng s dsabled, the ME should be dsabled also because the decoder no longer needs sde nformaton. Furthermore, when all the three modules (coset, ME and PDO) are off, the DCast becomes the same as Softcast2-D. Accordng to the result n Fg. 6, the contrbuton of coset codng, ME and PDO are about 2.7 db, 0.8 db and 0.5 db respectvely n vdeo PSNR. D. Robustness Test In practcal wreless applcatons, the channel SNR may not be perfectly known to the encoder. In the followng tests, we wll evaluate the performance of DCast n ths stuaton. The nput vdeo s all seq and the channel SNR s 5 15 db. We let DCast to optmze for target channel SNR of 5 db, 10 db and 15 db respectvely. The vdeo PSNR are compared n Fg. 7. Accordng to the result, each of the three encoders performs best when the practcal channel

1050 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 6, JUNE 2013 Fg. 9. Multcast performance on dfferent vdeo sequences. Fg. 8. Robustness comparson between DCast and (a) H.264 and (b) another DVC framework: WWVC. Channel SNR s unknown to all the encoders. DCast encoder s optmzed for channel SNR of 5 db. SNR matches ts optmzaton target, but performs slghtly worse than the best one when the practcal channel SNR does not match the target. The one optmzed for 15 db channel performs 1.2 db lower n vdeo PSNR than the optmal one when the practcal channel SNR s 5 db. Ths ndcates that DCast should optmze for a lower channel SNR for more robustness n multcast. We then compare DCast wth the conventonal frameworks based on H.264 and WWVC. Stll we assume that only the decoder knows the channel SNR. DCast s optmzed for a target channel SNR of 5 db n the followng tests. For conventonal framework, we mplement all the eght recommended combnaton of channel codng and modulaton of 802.11. We calculate the correspondng btrates respectvely accordng to the bandwdth, and set the btrates constrant to the H.264 encoder and WWVC encoder for rate control. Both the vdeo btrate and the channel btrate (the btrate after RS codng and FEC) under the eght transmsson approaches are gven n Table II (Note that WWVC and H.264 have same btrate constrants.). For DCast, there are no btrate but only channel Fg. 10. Multcast to three recevers. Fg. 11. Servng a group of recevers wth dverse channel SNR. The average channel SNR of each group s 14 db.

FAN et al.: DISTRIBUTED WIRELESS VISUAL COMMUNICATION WITH POWER DISTORTION OPTIMIZATION 1051 Fg. 12. Vsual qualty comparson, channel SNR s 5 db. (a) Orgnal frame. (b) Softcast2D. (c) Softcast3D. d) DCast. symbol rate. Note that all the frameworks consume the same bandwdth and transmsson power. The vdeo PSNR of each framework under dfferent channel SNR s gven n Fg. 8. In Fg. 8(a), all eght conventonal transmsson approaches suffer a very serous clff effect. For example, the approach H.264,1/2FEC,16QAM performs well when channel SNR s between 13 db to 14 db, but s not good when channel SNR s out of ths range. When the channel SNR becomes more than 14 db, the reconstructon qualty does not ncrease. When the channel SNR becomes 12 db, the reconstructon qualty drops very quckly. When the channel SNR becomes even lower, the vdeo decoder cannot work snce almost all receved RTP packets have bt error. Note that the clff effect can be partally mtgated n a layered approach [41] combnng the scalable vdeo extenson of H.264 and a herarchcal modulaton PHY layer. However, as shown n [12], the layered approach needs a hgher channel SNR than the sngle layer approach to acheve the same PSNR. Fg. 8(b) shows the performance of WWVC based framework. In erroneous stuaton, WWVC can beneft from Wyner-Zv decodng and gans 1-2 db n vdeo PSNR over H.264. Ths comples wth the results n [17]. However, t stll suffers a very serous clff effect. In contrast, the three all-n-one frameworks do not suffer the clff effect. When the channel SNR ncreases, the reconstructon PSNR ncreases accordngly, and vce versa. DCast s stll the best one among the three all-n-one frameworks. At low channel SNR, DCast s stll 1.5 db and 4 db better n vdeo PSNR than Softcast3D and Softcast2D respectvely. However, when the channel SNR ncreases, the gan of DCast decreases. When channel SNR s 25 db, DCast performs smlar to Softcast3D and gans only about 2.5 db n vdeo PSNR over Softcast2D. Compared wth the uncast result n Fg. 5, the performance of DCast becomes 1.5 db worse n vdeo PSNR at hgh channel SNR. Ths s manly due to the fact that the optmzaton of DCast (ncludng both the PDO and the coset quantzaton step) s for 5 db channel SNR n ths test. Fg. 9 gves the performance comparson on dfferent vdeo sequence. E. Multcast Performance Next, we let all the frameworks serve a group of three recevers wth dverse channel SNR. The channel SNR for each recever s 6 db, 12 db, and 18 db, respectvely. The test result s gven n Fg. 10. In conventonal frameworks based on H.264 and WWVC, the server transmts the vdeo stream by usng 3/4 FEC and BPSK. It cannot use hgher transmsson rate because n that case the 6 db user wll not be able to decode the vdeo. Due to ths, although the other two recevers have better channel condtons, they wll also receve low speed 802.11 sgnal, and reconstruct low qualty vdeo. In Softcast and DCast, the server can accommodate all the recevers smultaneously. Usng DCast, the 6 db user can get slghtly lower reconstructon qualty than usng H.264 or WWVC based conventonal frameworks. However, the 12 db and 18 db users get 4 db and 8 db better reconstructon qualty respectvely by usng DCast other than conventonal frameworks.

1052 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 6, JUNE 2013 TABLE II Comparson of Complexty and btrate Encode tme Decode tme Vdeo bt rate Channel bt rate Channel symbol rate H.264+1/2FEC+BPSK 387 ms 7ms 530 Kb/s H.264+3/4FEC+BPSK 387 ms 8ms 795 Kb/s 1.15 Mb/s H.264+1/2FEC+QPSK 406 ms 9ms 1060 Kb/s H.264+3/4FEC+QPSK 389 ms 10 ms 1590 Kb/s 2.3 Mb/s H.264+1/2FEC+16QAM 381 ms 11 ms 2120 Kb/s 1.15 M/s 4.6 Mb/s H.264+3/4FEC+16QAM 385 ms 14 ms 3180 Kb/s H.264+2/3FEC+64QAM 371 ms 15 ms 4240 Kb/s H.264+3/4FEC+64QAM 427 ms 16 ms 4770 Kb/s 6.9 Mb/s DCast 304 ms 10 ms Fg. 11 compares the multcast performance of four frameworks, wth respect to the range of recever SNR. The range of recever SNR s defned as the dfference of the maxmal and mnmal channel SNR of the users n the group. The average channel SNR of the users n group s 14 db. When the channel SNR range s 0 db,.e., the channel SNR of all the users are equally 14 db, DCast, Softcast3D and H.264 framework performs smlar. However, when the users channel SNR becomes dverse, the performance of H.264 framework drops quckly. The vsual qualty comparson s gven n Fg. 12. The channel SNR s set to be 5 db. DCast has clearly better vsual qualty than both Softcast2-D and Softcast3-D. In all the tests, ncludng uncast and multcast, DCast performs better than both Softcast2-D and Softcast3-D. Moreover, DCast does not ntroduce frame delays as Softcast3-D does, and s applcable for realtme vdeo multcast lke Softcast2-D. F. Complexty and Btrate The proposed DCast allows the ME to be performed at encoder. Therefore the encoder would be n hgh complexty but the decoder would be n low complexty. Table II shows the average encodng tme and decodng tme per frame n mllsecond. The test machne has a Pentum (R) Dual-Core CPU E5300 @ 2.60 GHz, 2G nternal memory and Mcrosoft Wndows XP Professonal 5.1.2600, wth Servce Pack 3. The nput vdeo s all seq of CIF sze at 30 frames per second. DCast has less encodng tme than H.264 codec (JM14.2) possbly because that DCast has no mode decson and no entropy codng. As to the decodng tme, DCast s comparable to the H.264 codec. Table II also shows the vdeo btrate and channel btrate of H.264 solutons. For example, when the modulaton s BPSK, the channel btrate s equal to the channel symbol rate,.e. 1.15 M/s. If the FEC s 1/2 convolutonal code and the RS code s 188/204, then the vdeo btrate s 1.15 M 1 2 188 204 = 530 Kb/s. When the modulaton s QPSK and the FEC s 3/4 convolutonal code, then the channel btrate s 2.3 Mb/s and the vdeo btrate s 1590 Kb/s. The decodng tme of H.264 codec depends on the vdeo btrate. Bascally, the decodng tme becomes longer when the btrate ncreases. The DCast framework has no btrate but a unversal channel symbol rate. Its decodng tme s fxed and s smlar to the decodng tme of H.264 decoder at btrate 1590 Kb/s. VI. Concluson In ths paper, we proposed a novel framework called DCast for dstrbuted vdeo codng, and transmsson over wreless networks. DCast frst presented a new desgn on how to effcently transmt dstrbuted coded vdeo data over Gaussan channel. Furthermore, we also proposed a new power dstorton optmzaton for the proposed DCast. DCast avoded the annoyng clff effect of conventonal frameworks caused by the msmatch between transmsson rate and channel condton. A sngle DCast server accommodated multple users wth dverse channel SNRs smultaneously n multcast wthout sacrfcng any user s codng performance approxmately. As shown n the experments, DCast performed compettvely wth H.264 framework n uncast but ganed up to 8 db n vdeo PSNR n multcast. DCast, as a unque DVC framework, dd not utlze some sophstcated vdeo codng tools such as varable block ME, ntra mode, or mode decson. How to enable these tools to further mprove the performance of DCast s one possble future work. Furthermore, the DCast n ths paper was manly desgned and optmzed for Gaussan channel. Another opportunty for future work s to extend the proposed DCast to fadng channel whch may requre more complcated channel estmaton and power dstorton optmzaton. Acknowledgment The authors would lke to thank the anonymous revewers for ther valuable comments that greatly mproved ths paper. References [1] A. Aaron, R. Zhang, and B. Grod, Wyner Zv codng of moton vdeo, n Proc. 36th Aslomar Conf. Sgnals Syst. Comput., 2002. [2] B. Grod, A. M. Aaron, S. Rane, and D. Rebollo-Monedero, Dstrbuted vdeo codng, Proc. IEEE, vol. 93, no. 1, pp. 71 83, Jan. 2005. [3] R. Pur and K. Ramchandran, PRISM: A new robust vdeo codng archtecture based on dstrbuted compresson prncples, n Proc. Annu. Allerton Conf. Commun. Control Comput., 2002. [4] R. Pur, A. Majumdar, and K. Ramchandran, PRISM: A vdeo codng paradgm wth moton estmaton at the decoder, IEEE Trans. Image Process., vol. 16, no. 10, pp. 2436 2448, Oct. 2007. [5] D. Slepan and J. Wolf, Noseless codng of correlated nformaton sources, IEEE Trans. Inform. Theory, vol. 19, no. 4, pp. 471 480, Jul. 1973. [6] A. Wyner and J. Zv, The rate-dstorton functon for source codng wth sde nformaton at the decoder, IEEE Trans. Inform. Theory, vol. 22, no. 1, pp. 1 10, Jan. 1976. [7] T. Wegand, G. J. Sullvan, G. Bjontegaard, and A. Luthra, Overvew of the H. 264/AVC vdeo codng standard, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 13, no. 7, pp. 560 576, Jul. 2003.

FAN et al.: DISTRIBUTED WIRELESS VISUAL COMMUNICATION WITH POWER DISTORTION OPTIMIZATION 1053 [8] Q. Xu and Z. Xong, Layered Wyner Zv vdeo codng, IEEE Trans. Image Process., vol. 15, no. 12, pp. 3791 3803, Dec. 2006. [9] Q. Xu, V. Stankovc, and Z. Xong, Dstrbuted jont source-channel codng of vdeo usng Raptor codes, IEEE J. Select. Areas Commun., vol. 25, no. 4, pp. 851 861, May 2007. [10] A. D. Lvers, Z. Xong, and C. N. Georghades, Jont source-channel codng of bnary sources wth sde nformaton at the decoder usng ra codes, n Proc. IEEE Workshop Multmeda Sgnal Process., 2002, pp. 53 56. [11] S. Jakubczak and D. Katab, SoftCast: One-sze-fts-all wreless vdeo, n Proc. ACM SIGCOMM Comput. Commun. Rev., 2010. [12] S. Jakubczak and D. Katab, A cross-layer desgn for scalable moble vdeo, n Proc. 17th Annu. Int. Conf. Moble Comput. Networkng, 2011. [13] S. J. Cho and J. W. Woods, Moton-compensated 3-D subband codng of vdeo, IEEE Trans. Image Process., vol. 8, no. 2, pp. 155 167, Feb. 1999. [14] A. Secker and D. Taubman, Lftng-based nvertble moton adaptve transform (LIMAT) framework for hghly scalable vdeo compresson, IEEE Trans. Image Process., vol. 12, no. 12, pp. 1530 1542, Dec. 2003. [15] R. Xong, J. Xu, F. Wu, and S. L, Barbell-lftng based 3-D wavelet codng scheme, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 17, no. 9, pp. 1256 1269, Sep. 2007. [16] Y. Zhang, C. Zhu, and K. H. Yap, A jont source-channel vdeo codng scheme based on dstrbuted source codng, IEEE Trans. Multmeda, vol. 10, no. 8, pp. 1648 1656, Dec. 2008. [17] M. Guo, Z. Xong, F. Wu, D. Zhao, X. J, and W. Gao, Wtsenhausen- Wyner vdeo codng, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 21, no. 8, pp. 1049 1060, Aug. 2011. [18] X. Fan, F. Wu, and D. Zhao, D-cast: DSC based soft moble vdeo broadcast, n Proc. 10th Int. Conf. Moble Ubqutous Multmeda, 2011. [19] X. Fan, F. Wu, D. Zhao, O. C. Au, and W. Gao, Dstrbuted soft vdeo broadcast (DCAST) wth explct moton, n Proc. Data Compresson Conf., 2012. [20] A. Secker and D. Taubman, Hghly scalable vdeo compresson wth scalable moton codng, IEEE Trans. Image Process., vol. 13, no. 8, pp. 1029 1041, Aug. 2004. [21] J. Garca-Fras, Compresson of correlated bnary sources usng turbo codes, IEEE Commun. Lett., vol. 5, no. 10, pp. 417 419, Oct. 2001. [22] A. D. Lvers, Z. Xong, and C. N. Georghades, Compresson of bnary sources wth sde nformaton at the decoder usng LDPC codes, IEEE Commun. Lett., vol. 6, no. 10, pp. 440 442, Oct. 2002. [23] X. Artgas, J. Ascenso, M. Dala, S. Klomp, D. Kubasov, and M. Ouaret, The DISCOVER codec: archtecture, technques and evaluaton, n Proc. Pcture Codng Symp., vol. 6. 2007, pp. 14496 10. [24] A. Aaron, S. Rane, E. Setton, B. Grod, et al., Transform-doman Wyner-Zv codec for vdeo, n Proc. SPIE Vsual Commun. Image Process., 2004. [25] X. Guo, Y. Lu, F. Wu, D. Zhao, and W. Gao, Wyner-Zv-based multvew vdeo codng, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 18, no. 6, pp. 713 724, Jun. 2008. [26] M. Taglasacch, A. Trapanese, S. Tubaro, J. Ascenso, C. Brtes, and F. Perera, Intra mode decson based on spato-temporal cues n pxel doman Wyner Zv vdeo codng, n Proc. IEEE Int. Conf. Acoust. Speech Sgnal Process., 2006. [27] J. Slowack, S. Mys, J. Skorupa, N. Delganns, P. Lambert, A. Munteanu, and R. Van de Walle, Rate-dstorton drven decoder-sde btplane mode decson for dstrbuted vdeo codng, Sgnal Processng: Image Commun., vol. 25, no. 9, pp. 660 673, 2010. [28] S. Benerbah and M. Khamadja, Generalzed hybrd ntra and Wyner Zv vdeo codng, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 21, no. 12, pp. 1929 1934, 2011. [29] A. Aaron, S. Rane, and B. Grod, Wyner Zv vdeo codng wth hashbased moton compensaton at the recever, n Proc. Int. Conf. Image Process., 2004. [30] R. Martns, C. Brtes, J. Ascenso, and F. Perera, Refnng sde nformaton for mproved transform doman Wyner Zv vdeo codng, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 19, no. 9, pp. 1327 1341, Sep. 2009. [31] B. Macchavello, D. Mukherjee, and R. L. De Queroz, Iteratve sdenformaton generaton n a mxed resoluton wyner-zv framework, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 19, no. 10, pp. 1409 1423, Oct. 2009. [32] X. Fan, O. C. Au, N. M. Cheung, Y. Chen, and J. Zhou, Successve refnement based Wyner-Zv vdeo compresson, Sgnal Process. Image Commun., vol. 25, no. 1, pp. 47 63, 2010. [33] W. Lu, L. Dong, and W. Zeng, Moton refnement based progressve sde-nformaton estmaton for Wyner-Zv vdeo codng, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 20, no. 12, pp. 1863 1875, Dec. 2010. [34] C. Brtes and F. Perera, Correlaton nose modelng for effcent pxel and transform doman Wyner-Zv vdeo codng, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 18, no. 9, pp. 1177 1190, Sep. 2008. [35] X. Fan, O. C. Au, and N. M. Cheung, Transform-doman adaptve correlaton estmaton (TRACE) for Wyner Zv vdeo codng, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 20, no. 11, pp. 1423 1436, Nov. 2010. [36] N. Delganns, J. Barbaren, M. Jacobs, A. Munteanu, A. Skodras, and P. Schelkens, Sde-nformaton dependent correlaton channel estmaton n hash-based dstrbuted vdeo codng, IEEE Trans. Image Process., vol. 21, no. 4, pp. 1934 1949, Apr. 2012. [37] G. R. Esmal and P. C. Cosman, Wyner-Zv vdeo codng wth classfed correlaton nose estmaton and key frame codng mode selecton, IEEE Trans. Image Process., vol. 20, no. 9, pp. 2463 2474, Sep. 2011. [38] S. Adtya and S. Katt, FlexCast: Graceful wreless vdeo streamng, n Proc. 17th Annu. Int. Conf. Moble Comput. Networkng, 2011. [39] Y. Kochman and R. Zamr, Jont Wyner Zv/drty-paper codng by modulo-lattce modulaton, IEEE Trans. Inform. Theory, vol. 55, no. 11, pp. 4878 4889, Nov. 2009. [40] ETSI. (2009). Dgtal Vdeo Broadcastng (DVB) [Onlne]. Avalable: http://www.ets.org/delver/ets en/300700 300799/300744/01.06. 01 60/en 300744v010601p.pdf [41] T. Kratochvíl, Herarchcal modulaton n DVB-T/H moble TV transmsson, n Mult-Carrer Systems and Solutons. The Netherlands: Sprnger, 2009, pp. 333 341. Xaopeng Fan (S 07 M 09) receved the B.S. and M.S. degrees from the Harbn Insttute of Technology (HIT), Harbn, Chna, n 2001 and 2003, respectvely, and the Ph.D degree from the Hong Kong Unversty of Scence and Technology (HKUST), Kowloon, Hong Kong, n 2009. In 2009, he joned the Department of Computer Scence, HIT, where he s currently an Assocate Professor. From 2003 to 2005, he was wth the Intel Chna Software Laboratory as a Software Engneer. He has authored or co-authored over 50 techncal journal and conference papers. Hs current research nterests nclude mage/vdeo codng and processng, vdeo streamng and wreless communcaton. Feng Wu (F 12) receved the B.S. degree n electrcal engneerng from XIDIAN Unversty n 1992, and the M.S. and Ph.D. degrees n computer scence from the Harbn Insttute of Technology, Harbn, Chna, n 1996 and 1999, respectvely. In 1999, he joned Mcrosoft Research Asa, Bejng, Chna, where he s currently a Senor Researcher/Research Manager. He has authored or coauthored over 200 publcatons, ncludng 50 journal papers. He has had 13 of hs technques adopted nto nternatonal vdeo codng standards. Hs current research nterests nclude mage and vdeo compresson, meda communcaton, and meda analyss and synthess. Dr. Wu serves as an Assocate Edtor for varous publcatons, such as the IEEE Transactons on Crcuts and Systems for Vdeo Technology and IEEE Transactons on Multmeda. He served as the Techncal Program Commttee (TPC) Char n MMSP 2011, VCIP 2010 and PCM 2009, the TPC Track Char n ICME 2013, ICIP 2012, ICME 2012, ICME 2011 and ICME 2009, and the Specal Sessons Char n ISCAS 2013 and ICME 2010. He was the recpent of the Best Paper Award n IEEE T-CSVT 2009, PCM 2008 and SPIE VCIP 2007. Debn Zhao (M 11) receved the B.S., M.S., and Ph.D. degrees n computer scence from Harbn Insttute of Technology (HIT), Harbn, Chna, n 1985, 1988, and 1998, respectvely. He s now a Professor at the Department of Computer Scence, HIT. He has publshed over 200 journal and conference papers. Oscar C. Au (F 11) receved the B.A.Sc. from the Unversty of Toronto, Toronto, Canada, n 1986, the M.A. and Ph.D. degrees from Prnceton Unversty, Prnceton, NJ, USA, n 1988 and 1991, respectvely. He s a Professor at the Department of Electronc and Computer Engneerng, HKUST, Hong Kong, Chna. He has publshed 300+ papers and 70+ contrbutons to nternatonal standards.