CARNEGIE MELLON UNIVERSITY

Size: px

Start display at page:

Download "CARNEGIE MELLON UNIVERSITY"

Lynne Paul
5 years ago
Views:

1 CARNEGIE MELLON UNIVERSITY ERROR-RESILIENT RATE SHAPING FOR VIDEO STREAMING OVER PACKET-LOSS NETWORKS PH.D. DISSERTATION IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN ELECTRICAL AND COMPUTER ENGINEERING BY TRISTA PEI-CHUN CHEN PITTSBURGH, PENNSYLVANIA MAY 2003

2 Abstract Vdeo streamng over packet-loss networks faces the challenges that the networks are error-prone, transmsson bandwdth s lmted and fluctuatng, the user devce capabltes vary, and networks are heterogeneous. These challenges necesstate the need for smart adaptaton of the precoded vdeo. The focus of the thess s error-reslent rate shapng for streamng precoded vdeo over packet-loss networks. Gven the packet-loss characterstc of the networks, the precoded vdeo conssts of channel-coded as well as source-coded bts. Error-reslent rate shapng s a flterng process that adapts the bt rates of the precoded vdeo, n order to delver the best vdeo qualty gven the network condton at the tme of delvery. We frst llustrate baselne rate shapng (BRS) of the proposed error-reslent rate shapng as a baselne. Havng ntroduced BRS wth coarse decsons n rate adaptaton, more sophstcated error-reslent rate shapng s proposed for layer-coded vdeos, namely, the enhancement layer vdeo and the base layer vdeo. Fne-graned rate shapng (FGRS) s proposed for streamng the enhancement layer vdeo, and errorconcealment aware rate shapng (ECARS) s proposed for streamng the base layer vdeo. FGRS and ECARS are formulated as rate-dstorton (R-D) optmzaton problems. A two-stage R-D optmzaton approach s proposed to solve the R-D optmzaton problem n a fast and accurate manner. FGRS makes use of the fne granularty property of the MPEG-4 fne-granulartyscalablty btstream and outperforms ad-hoc unequal packet-loss protecton methods. ECARS takes nto account error concealment (EC) performed at the recever to delver the part of precoded vdeo that cannot be EC-reconstructed well. Frame dependency due to predctve codng and/or temporal EC s also consdered n ECARS by means of feedback from the recever. Experments are conducted under varous channel condtons and for varous types of the vdeo to demonstrate the effectveness of the proposed scheme. Fnally, we see that network condtons are needed n optmzng the streamng performance. In the last part of the thess, we focus on modelng the vdeo traffc so that we may use the syntactc traffc to probe the network to determne the network condton and optmze the proposed error-reslent rate shapng accordngly.

3 Acknowledgements I am extremely grateful to my advsor Prof. Tsuhan Chen, for beng a source of nspraton, and beng a great mentor durng my Ph.D. study. Through hs gudance and challenge, I learned to defne my research drecton and ncorporated nnovatve research deas. I have also been benefted by hs encouragements to nteract wth researchers n the feld, to present research deas effectvely, and to contrbute to the socetes. I cannot but be deeply thankful to have hm as my advsor. I would lke to thank Prof. Jose Moura, Prof. Roht Neg, and Dr. Mhaela van der Schaar, for beng my thess commttee members. Ther nsghtful feedback has been very precous to the mprovement of the work. I am grateful to be part of the Electrcal and Computer Engneerng department, to be stmulated and surrounded by excellent faculty members and motvated students. I would lke to thank Prof. B.V.K. Vjaya Kumar for beng a great mentor. I am thankful to all my group mates and frends durng my study at Carnege Mellon Unversty. My group mates, Ta-chen Ln, Deepak Turaga, Howard Leung, Fu Je Huang, Xaomng Lu, Cha Zhang, Fang Fang, Ed Ln, Jesse Hsu, Sam Chen, Wende Zhang, Smon Lucey, Kate Shm, Jack Yu, Avnash Balga, and Mchael Kaye, have been spendng tme and provded valuable suggestons to my work. In addton, they are the best frends. My frends Senan E Guran, Pnky Pongpabool, and Yu-nu Hsu have always been there for me. Thanks to Ele Shammas, Frances Nng, Iulana Tanase, and James Bruce for brngng the happy moments. Fnally, I would lke to express my earnest grattude to my parents and brothers for ther love and support. In partcular, I am deeply ndebted to my dear Mom. I would not have reached the accomplshment of the current stage wthout ther strong support. The thess s dedcated to my parents and brothers.

4 Table of Contents 1. Introducton Error-Reslent Rate Shapng: Baselne Rate Shapng (BRS) Rate Shapng for Enhancement Layer Vdeo: Fne-Graned Rate Shapng (FGRS) Rate Shapng for Base Layer Vdeo: Error Concealment Aware Rate Shapng (ECARS) Modelng of Vdeo Traffc Organzaton of Thess Rate Shapng for Error-Reslent Vdeo Streamng Conventonal Rate Shapng Rate Shapng by Selectve Transmsson of Transform Coeffcents Rate Shapng by Block Droppng Vdeo Transport over Packet-Loss Networks Error-Reslent Rate Shapng: Baselne Rate Shapng (BRS) System Descrpton of Vdeo Transport wth BRS Algorthms for BRS Rate Shapng for Enhancement Layer Vdeo Rate Shapng for Enhancement Layer Vdeo: Fne-Graned Rate Shapng (FGRS) Algorthms for FGRS Problem Formulaton Two-Stage R-D Optmzaton: Stage Two-Stage R-D Optmzaton: Stage Experment Concluson Rate Shapng for Base Layer Vdeo...39 v

5 4.1. Rate Shapng for Base Layer Vdeo: Error Concealment Aware Rate Shapng (ECARS) Background for ECARS Error Concealment Error Concealment Aware Precodng Tmely Feedback Algorthms for ECARS ECARS wthout Feedback ECARS wth Feedback Experment Rate Shapng vs. UPPRS ECARS vs. Non-ECARS ECARS wth Feedback vs. ECARS wthout Feedback ECARS wth EC Method Known vs. ECARS wthout EC Method Known All Methods Concluson Modelng of Vdeo Traffc Introducton Punctured Autoregressve Modelng Varable Bt Rate Vdeo Traffc Modelng Experment Concluson Summary and Future Drectons...89 Appendx A. Second-Generaton Error Concealment...92 A.1. Adaptve Mxture of Prncpal Component Analyss (AMPCA) A.2. Adaptve Probablstc Prncpal Component Analyss (APPCA) Appendx B. Fnte-State Markov Model for Bt Error Smulaton B.1. K -State Markov Chan Model v

6 B.2. Smulaton Bblography v

7 Lst of Tables Table 1. One-way transmsson tme Table 2 Summary of performance comparson between modelng methods Method 1 and Method v

8 Lst of Illustratons Fgure 1. Rate shapng for error-reslent vdeo streamng... 1 Fgure 2. Geometrc-structure-based error concealment for 50% block loss: (a) wthout error concealment; (b) wth error concealment... 9 Fgure 3. Packet loss rate as a functon of the transton probablty and the packet sze Fgure 4. A general vdeo transport system Fgure 5. System dagram of the precodng process: scalable encodng followed by FEC encodng Fgure 6. Transport of the precoded vdeo wth BRS Fgure 7. System dagram of the decodng process: FEC decodng followed by scalable decodng Fgure 8. (a) All four segments of the precoded vdeo and (b)~(g) avalable states for BRS: (b) state (0,0), (c) state (1,0), (d) state (1,1), (e) state (2,0), (f) state (2,1), and (g) state (2,2) Fgure 9. R-D maps of: (a) Frame 1, (b) Frame 2, and so on Fgure 10. Dscrete R-D combnaton algorthm: (a)(b) elmnaton of states nsde the convex hull of each frame, and (c) allocaton of rate to the frame m that utlzes the rate more effcently Fgure 11. Dependency graph of the FGS base layer and enhancement layer. Base layer allows for temporal predcton wth P and B frames. Enhancement layer s encoded wth reference to the base layer only Fgure 12. System dagram of the precodng process: FGS encodng followed by FEC encodng19 Fgure 13. Transport of the precoded btstreams: (a) transport of the FEC coded FGS enhancement layer btstream wth rate shaper va the wreless network, and (b) transport of the base layer btstream va the secure channel Fgure 14. System dagram of the decodng process: FEC decodng followed by FGS decodng 20 Fgure 15. Precoded vdeo: (a) FGS enhancement layer btstream n sublayers, and (b) FEC coded FGS enhancement layer btstream v

9 Fgure 16. Bandwdth adaptaton wth (a) random droppng; (b) UPPRS1; (c) UPPRS2; and (d) FGRS Fgure 17. Intersecton of the model-based hyper-surface (dark surface) and the bandwdth constrant (gray plane), llustrated wth h = Fgure 18. Pseudo-codes of the hll-clmbng algorthm Fgure 19. Test vdeo sequences n CIF: (a) akyo, (b) foreman, and (c) stefan Fgure 20. Sample BER traces of the wreless channel: (a) moble unt at 2 km/h; (a) moble unt at 6 km/h; (a) moble unt at 10 km/h Fgure 21. Sublayer bt allocatons of all methods at 10 km/h and SNR=20 db for Sequence foreman Fgure 22. Performance (PSNR of the Y component) of all methods at 10 km/h and SNR=20 db for Sequences akyo, foreman, and stefan Fgure 23. Performance (PSNR of the U component) of all methods at 10 km/h and SNR=20 db for Sequences akyo, foreman, and stefan Fgure 24. Performance (PSNR of the V component) of all methods at 10 km/h and SNR=20 db for Sequences akyo, foreman, and stefan Fgure 25. Performance (PSNR of the Y component) of all methods at varous wreless channel condtons for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 26. Performance (PSNR of the U component) of all methods at varous wreless channel condtons for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 27. Performance (PSNR of the V component) of all methods at varous wreless channel condtons for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 28. Frame-by-frame PSNR of the Y component of all methods at 10 km/h and SNR=20 db for Sequence foreman Fgure 29. Frame-by-frame PSNR of the U component of all methods at 10 km/h and SNR=20 db for Sequence foreman x

10 Fgure 30. Frame-by-frame PSNR of the V component of all methods at 10 km/h and SNR=20 db for Sequence foreman Fgure 31. A sample frame of (a) upprs1 and (b) fgrs at 10 km/h and SNR=20 db for Sequence stefan Fgure 32. System dagram of the precodng process: source encodng (whch can be EC aware) followed by FEC encodng Fgure 33. Transport of the precoded vdeo wth ECARS Fgure 34. System dagram of the decodng process: FEC decodng followed by source decodng Fgure 35. EC example by spatal nterpolaton: (a) the corrupted frame wthout EC, and (b) the reconstructed frame wth EC Fgure 36. EC example by temporal nterpolaton: (a) the corrupted frame wthout EC, and (b) the reconstructed frame wth EC Fgure 37. (a) Frame n 1, (b) Frame n, and (c) MB ndces. EC aware MB prortzaton MB (1,1) has hgher prorty than MB (0, 3) Fgure 38. Precoded vdeo: (a) MB prortzed btstream, (b) MB prortzed btstream n sublayers, and (c) FEC coded MB prortzed btstream Fgure 39. (a) Precoded vdeo n sublayers and (b) ECARS decson on whch symbols to send. 48 Fgure 40. Performance of Methods upprs1 and n-ecars at varous wreless channel condtons wth Case (0, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 41. Frame-by-frame PSNR of Methods upprs1 and n-ecars at 10 km/h and SNR=20 db wth Case (0, 1) for Sequence foreman Fgure 42. Performance of Methods upprs1 and n-ecars at varous wreless channel condtons wth Case (1, 2) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 43. Frame-by-frame PSNR of Methods upprs1 and n-ecars at 10 km/h and SNR=20 db wth Case (1, 2) for Sequence foreman Fgure 44. Performance of Methods upprs1 and n-ecars at varous wreless channel condtons wth Case (1, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous x

11 speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 45. Frame-by-frame PSNR of Methods upprs1 and n-ecars at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman Fgure 46. Performance of Methods n-ecars and ecars-nf at varous wreless channel condtons wth Case (0, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 47. Frame-by-frame PSNR of Methods n-ecars and ecars-nf at 10 km/h and SNR=20 db wth Case (0, 1) for Sequence foreman Fgure 48. Performance of Methods n-ecars and ecars-nf at varous wreless channel condtons wth Case (1, 2) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 49. Frame-by-frame PSNR of Methods n-ecars and ecars-nf at 10 km/h and SNR=20 db wth Case (1, 2) for Sequence foreman Fgure 50. Performance of Methods n-ecars and ecars-nf at varous wreless channel condtons wth Case (1, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 51. Frame-by-frame PSNR of Methods n-ecars and ecars-nf at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman Fgure 52. Performance of Methods ecars-nf, ecars-loc, and ecars-mean at varous wreless channel condtons wth Case (0, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 53. Frame-by-frame PSNR of Methods ecars-nf, ecars-loc, and ecars-mean at 10 km/h and SNR=20 db wth Case (0, 1) for Sequence foreman Fgure 54. Performance of Methods ecars-nf, ecars-loc, and ecars-mean at varous wreless channel condtons wth Case (1, 2) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR x

12 Fgure 55. Frame-by-frame PSNR of Methods ecars-nf, ecars-loc, and ecars-mean at 10 km/h and SNR=20 db wth Case (1, 2) for Sequence foreman Fgure 56. Performance of Methods ecars-nf, ecars-loc, and ecars-mean at varous wreless channel condtons wth Case (1, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 57. Frame-by-frame PSNR of Methods ecars-nf, ecars-loc, and ecars-mean at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman Fgure 58. Performance of Methods ecars-loc and ecars-deal at varous wreless channel condtons wth Case (0, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 59. Frame-by-frame PSNR of Methods ecars-loc and ecars-deal at 10 km/h and SNR=20 db wth Case (0, 1) for Sequence foreman Fgure 60. Performance of Methods ecars-loc and ecars-deal at varous wreless channel condtons wth Case (1, 2) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 61. Frame-by-frame PSNR of Methods ecars-loc and ecars-deal at 10 km/h and SNR=20 db wth Case (1, 2) for Sequence foreman Fgure 62. Performance of Methods ecars-loc and ecars-deal at varous wreless channel condtons wth Case (1, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 63. Frame-by-frame PSNR of Methods ecars-loc and ecars-deal at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman Fgure 64. Sublayer bt allocatons of all methods at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman Fgure 65. Performance of all methods at varous wreless channel condtons wth Case (0, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR 73 x

13 Fgure 66. Frame-by-frame PSNR of all methods at 10 km/h and SNR=20 db wth Case (0, 1) for Sequence foreman Fgure 67. Performance of all methods at varous wreless channel condtons wth Case (1, 2) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR 74 Fgure 68. Frame-by-frame PSNR of all methods at 10 km/h and SNR=20 db wth Case (1, 2) for Sequence foreman Fgure 69. Performance of all methods at varous wreless channel condtons wth Case (1, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR 75 Fgure 70. Frame-by-frame PSNR of all methods at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman Fgure 71. Performance of all methods at 10 km/h and SNR=20 db wth Case (0, 1) for Sequences akyo, foreman, and stefan Fgure 72. Performance of all methods at 10 km/h and SNR=20 db wth Case (1, 2) for Sequences akyo, foreman, and stefan Fgure 73. Performance of all methods at 10 km/h and SNR=20 db wth Case (1, 1) for Sequences akyo, foreman, and stefan Fgure 74. A sample frame of (a) n-ecars and (b) ecars-mean at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman Fgure 75. Two nterleaved autoregressve processes x n and autoregressve process x n ; (c) autoregressve process Fgure 76. Two nterleaved autoregressve processes x n and y n : (a) the nterleaved process; (b) y n y n : (a) the nterleaved process; (b) punctured autoregressve process x n ; (c) punctured autoregressve process y n Fgure 77. Models for VBR vdeo traffc (a) Method 1: Doubly Markov modulated AR process; (b) Method 2: Doubly Markov modulated punctured AR process Fgure 78. Test vdeos: (a) news; (b) talk show Fgure 79. Sample traces from the TV program news : (a) a 200 second trace; (b) a 20 second trace x

14 Fgure 80. Frst and second order statstcs of the synthetc traces generated by Method 1 and Method 2 wth respect to the real vdeo trace of the clp news. (a) Frst order statstcs: Q- Q plot; (b) Second order statstcs: ACF Fgure 81. LRD propertes of three traces by Hurst parameter from the R/S plots: (a) real vdeo trace; (b) synthetc trace by Method 1; (c) synthetc trace by Method Fgure 82. Queung behavor of the real vdeo trace: (a) packet loss rate; (b) queung delay Fgure 83. End system multcast (ESM) wth smulcast Fgure 84. End system multcast (ESM) wth rate shapng Fgure 85. Non-statonary data at (a) tme n (b) tme n Fgure 86. APCA for non-statonary data at (a) tme n (b) tme n Fgure 87. Updated means and egenvectors at tme nstants 20, 22, and Fgure 88. Sample reconstructed frames of Intra-coded Intervew wth: (a) no concealment; (b) concealment wth spatal nterpolaton; or (c) concealment wth APCA Fgure 89. (a) Probablstc PCA (PPCA) (b) PCA Fgure 90. PPCA at (a) tme n (b) tme n Fgure 91. Sample reconstructed frames of Intra-coded Intervew wth: (a) no concealment; (b) concealment wth APCA; or (c) concealment wth APPCA Fgure 92. Two-state Markov chan for bt error smulaton Fgure 93. BER traces of wreless channel wth unts movng at dfferent speeds Fgure 94. BER traces of wreless channel wth dfferent SNR xv

15 1. Introducton Vdeo streamng over packet-loss networks faces the challenges that the networks are error-prone, transmsson bandwdth s lmted and fluctuatng, the user devce capabltes vary, and networks are heterogeneous. These challenges necesstate the need for smart adaptaton of the precoded vdeo. We propose n the thess study an error-reslent rate shapng framework (Fgure 1) for vdeo streamng over packet-loss networks. Source/ Channel Encodng Precoded Vdeo Bandwdth Error Rate Rate Shapng Wreless Network Fgure 1. Rate shapng for error-reslent vdeo streamng Rate shapng s a technque to selectvely drop part of the pre (source- and/or channel-) coded btstream before the btstream s sent to the network. To ensure that the shaped btstream can best survve the hostle network condton, rate shapng takes nto account the network nformaton as the channel error rate and the avalable bandwdth, as well as the vdeo source statstcs. In addton to rate shapng at the sender, post-processng error concealment can be performed at the recever to recover the decoded vdeo qualty. Furthermore, knowng the recever can replensh the vdeo data by error concealment, error concealment aware rate shapng can take place at the sender to delver better qualty vdeos than non- error concealment aware rate shapng. 1

16 The proposed error-reslent rate shapng has many advantages over other error-reslent vdeo transport mechansms [59], namely, error-reslent vdeo codng and jont source-channel codng. Error-reslent rate shapng vs. error-reslent vdeo codng In many stuatons, the vdeo encoder and decoder are fxed and allow for no modfcatons to nclude error-reslent vdeo codng features, such as reversble varable length codng and ndependent segment predcton, etc. The proposed error-reslent rate shapng does not need to alter the orgnal vdeo encoder and decoder, thus can be adopted by systems, for example the commercal vdeo-on-demand system, n whch tremendous amount of work to modfy the vdeo coders s needed. Error-reslent rate shapng vs. jont source-channel codng By varyng the source and channel encoder parameters, jont source-channel codng [10][29] [55][63] allocates the bts for source and channel encoders to acheve the best vdeo qualty gven the current network condton. Jont source-channel codng technques are lmted by only provdng end-to-end optmzaton at the tme of encodng and are not sutable for streamng the precoded vdeo. The encoded btstream may not be optmal for transmsson along a dfferent path or along the same path at later tme. Moreover, rate adaptaton for each lnk mght be needed n a heterogeneous network. Rate shapng can optmze the vdeo streamng performance for each lnk Error-Reslent Rate Shapng: Baselne Rate Shapng (BRS) Dynamc rate shapng (DRS) [16][17][27][65][66] was frst ntroduced n 1995 to adapt the rates of the pre source-coded (pre-compressed) vdeo to the dynamcally varyng bandwdth constrants. On the other hand, to protect the vdeo from losses n the packet-loss networks, source-coded vdeo btstream s often protected by forward error correcton (FEC) codes [46][61]. Redundant nformaton, known as party bts, s added to the orgnal source-coded bts. Conventonal DRS does not consder shapng for the party bts n addton to the source-codng bts, that s, pre source- and channel- coded btstream. We propose a new framework of rate shapng: error-reslent rate shapng, whch adapts the rates of the pre source- and channel- coded btstream to the condtons of the packet-loss networks. The baselne approach of the proposed error-reslent rate shapng s frst ntroduced, whch we call baselne rate shapng (BRS). BRS optmzes the vdeo streamng performance n 2

17 a rate-dstorton (R-D) sense. Two R-D optmzaton algorthms: BRS by mode decson and BRS by dscrete R-D combnaton are presented Rate Shapng for Enhancement Layer Vdeo: Fne-Graned Rate Shapng (FGRS) The baselne approach BRS of the proposed error-reslent rate shapng makes decsons on a coarse level. To ncorporate a fner granularty, we propose fne-graned rate shapng (FGRS) for streamng the enhancement layer vdeo and error-concealment aware rate shapng (ECARS) for streamng the base layer vdeo. We adopt MPEG-4 fne granularty scalablty (FGS) [32] for source codng, and erasure codes [46][61] for FEC codng. Unlke conventonal scalablty technques such as SNR scalablty, MPEG-4 FGS vdeo btstream s partally decodable over a wde range of bt rates. The more bts of the FGS btstream s receved, the better the vdeo qualty s. In addton, t has been known that partal FEC coded btstream s stll decodable wthn the error correcton capablty f erasure codes are used. Thus, both FGS and erasure codes provde fne-granularty propertes n vdeo qualty and n packet-loss protecton. Gven the FEC coded FGS btstream as the precoded vdeo, FGRS adapts the rates of the precoded vdeo consderng the current packetloss rate. There are conceptually nfntely many possble combnatons of droppng porton of the FGS btstream and porton of the FEC codes. FGRS seeks the optmal soluton n the R-D sense. A new two-stage R-D optmzaton approach s proposed to select part of the precoded vdeo to drop Rate Shapng for Base Layer Vdeo: Error Concealment Aware Rate Shapng (ECARS) To have a fner-granular decson nstead of the coarse decson made by BRS, errorconcealment aware rate shapng (ECARS) for streamng the base layer vdeo s proposed n addton to FGRS for streamng the enhancement layer vdeo. Takng nto account that the recever may perform error concealment (EC) f any vdeo data s lost durng the transmsson, ECARS makes rate-shapng decsons accordngly. Frame dependency s usually nherent n vdeo btstream due to predctve codng. In addton, temporal EC mght ntroduce extra frame dependency. Feedback from the recever to the sender mght be helpful n addressng the frame dependency problem n rate shapng. We then ntroduce two types of ECARS algorthms: wthout feedback and wth feedback. Both evaluate the gans of sendng some parts of the precoded vdeo as opposed to not sendng them. The gan 3

18 metrcs are then ncluded n the R-D optmzaton formulaton. Fnally, the two-stage R-D optmzaton approach s adopted to solve for the R-D optmzaton problem. In the case of no feedback, ECARS evaluates the gan consderng a partcular EC method used at the recever. In order to ncorporate the frame dependency nto the rate shapng process, we propose to send the locaton (and mean) of the corrupted macroblock back to the sender, and use such feedback nformaton to determne the gan n the R-D optmzed ECARS Modelng of Vdeo Traffc To both the vdeo servce provders and the network desgners, t s mportant to have a good model for the vdeo traffc. A good model for vdeo traffc allows for better admsson control, schedulng, network resource allocaton polces, etc., that guarantee a desred qualty of servce (QoS) as well as a better utlzaton of the network resources. A good model captures essental characterstcs of the real vdeo traffc. The synthetc trace generated by such a model can be used to test the network. In ths thess study, vdeo traffc modelng s useful n helpng to gather network condtons. We present a new stochastc process called the punctured autoregressve (AR) process, and use t to model the vdeo traffc. To model the vdeo traffc, we propose to use punctured autoregressve processes modulated by a doubly Markov process. The doubly Markov process models the state of a vdeo frame whle the autoregressve process descrbes the number of bts of a frame at one partcular state. The punctured autoregressve process consders the tmng nformaton between frames of the same state and thus gves better modelng performance. The model captures the long-range dependency (LRD) characterstcs as well as the short-range dependency (SRD) characterstcs of the vdeo traffc. Queung behavor of the punctured autoregressve process s also closer to the real vdeo traffc than the conventonal autoregressve process Organzaton of Thess Ths thess s organzed as follows. Chapter 2 descrbes the fundamentals of the proposed errorreslent rate shapng. Conventonal rate shapng s frst ntroduced followed by descrpton of the characterstcs of the packet-loss networks. The baselne approach of the proposed error-reslent rate shapng s then presented. 4

19 Chapter 3 descrbes one of the man deas of the proposed error-reslent rate shapng: fne-graned rate shapng (FGRS) for streamng the enhancement layer vdeo. The proposed two-stage R-D optmzaton approach wll be detaled as well. Chapter 4 descrbes the other man dea of the proposed error-reslent rate shapng: error concealment aware rate shapng (ECARS) for streamng the base layer vdeo. The focus n ths chapter s to determne the gan used for R-D optmzaton. Two cases of ECARS are dscussed: ECARS wthout feedback and ECARS wth feedback. Chapter 5 dscusses about the use of punctured AR processes for vdeo traffc modelng. The synthetc traffc generated can be used to probe the network condtons that are eventually fed to the error-reslent rate shapng system. We summarze the study n Chapter 6 wth concluson, contrbuton of the thess, and future drectons. 5

20 2. Rate Shapng for Error-Reslent Vdeo Streamng Error-reslent rate shapng s a flterng process that, gven a precoded btstream and the network condton, generates an alteratve btstream that adapts to the network condton. Consderng rate shapng for vdeo streamng over the packet-loss networks, the precoded btstream should be both source- and channel- coded to be error reslent. We wll llustrate the baselne approach of the proposed error-reslent rate shapng as fundamentals for more advanced rate shapng n the later chapters. In ths chapter, we frst ntroduce the conventonal rate shapng where the rate adaptaton s performed on pre source-coded vdeo only. We then bref some characterstcs of packet-loss networks, gven that we am to solve the problem of error-reslent vdeo streamng. Fnally, we ntroduce the baselne approach of our proposed error-reslent rate shapng, whch s called baselne rate shapng (BRS) Conventonal Rate Shapng Rate shapng s a flterng process that, gven a pre source-coded btstream and the target bandwdth, generates an alteratve btstream that satsfes such a bandwdth constrant. If the bandwdth constrant vares over tme, t s called dynamc rate shapng (DRS) [16][17][27][65][66]. Wthout rate shapng, the btstream that exceeds the bandwdth constrant wll be dscarded ndscrmnately by the network. The resultng vdeo qualty wll be degraded unexpectedly. In a wde sense, the format of the vdeo btstream, e.g. from MPEG-1 to MPEG-4, the resoluton of the vdeo, the frame rate of the vdeo, may all be manpulated to acheve the target bt rates. Such knd of rate shapng s usually called transcodng [48][49][56]. The other knd of rate shapng s done by means of re-quantzaton of dscrete cosne transform (DCT) coeffcents. In re-quantzaton [4][41][60], the entre set of already quantzed DCT s mapped to new values at a coarser level of quantzaton thus resultng n a rate reducton. In [41], local and global actvty crtera are used to determne the re-quantzaton step sze. In [60], optmal selecton of the re- 6

21 quantzaton step sze s examned for Intra-coded frames. In [4], optmal selecton of the requantzaton step sze s analyzed for all frame types. The focus n ths secton wll be rate shapng n a strct sense. That s, rate shapng drops part of the pre source coded vdeo that s consdered less crtcal to the qualty of the decoded vdeo, wthout changng the content of the btstream. In [16][17][27], selectve transmsson of transform (DCT) coeffcents s presented. In [65][66], rate shapng s acheved by block droppng and the addtonal error concealment at the recever. We wll descrbe these two types of strct-sense rate shapng n the followng Rate Shapng by Selectve Transmsson of Transform Coeffcents DRS presented n [16][17][27] proposed to selectvely send some of the transform coeffcents to satsfy the bandwdth constrants. There are two cases of DRS, constraned and general (or unconstraned). In constraned DRS, the number of DCT run-length codes wthn each block that wll be kept s called the breakpont. All DCT coeffcents that are above the breakpont are to be elmnated from the btstream. In general DRS, the breakpont becomes a 64-element bnary vector, ndcatng whch coeffcents wthn each block wll be kept. 1) Constraned DRS of Intra-Coded Pctures If the vdeo s Intra-coded, there s no dependency between frames. The rate shapng errors n the current frame wll not propagate to the next frames. The problem formulaton s as follows: N mnmze ( ), where ( b ) = E ( k) 2 = 1 D b subject to R ( b ) B N =1 where { 1,2, L,64} D (2.1) k b (2.2) b s the breakpont for block, N s the number of blocks consdered, E ( k) s the DCT coeffcent at the k th poston, and ( ) block tll the breakpont. R s the rate requred to send Ths problem can be converted to a lnear programmng/nteger programmng problem wth Lagrange multplers as follows: b N N mn D ( ) + ( ) b λ R b (2.3) = 1 = 1 7

22 The problem can be solved by an teratve bsecton algorthm [19][42][43]. 2) Constraned DRS of Intra-Coded Pctures If the vdeo s Inter-coded wth I, P, and B pctures, the dependency between frames adds complexty to the problem. The rate shapng errors are the summaton of the accumulated errors (moton compensaton of the accumulated errors from the prevous frames) and the errors from the current frame (errors from shapng of DCT coeffcents of the current frame). The problem formulaton s as follows: mnmze N = 1 ( ) Dˆ b, where 2 [ ] 2 ( b ) = A ( k) + 2A ( ( k) ) E ( k) + E ( k) Dˆ ξ k k b (2.4) subject to R ( b ) B N =1 where A ( k) represents the accumulated error and ( k) (2.5) ξ maps the run-length poston to zgzag scan poston. The problem can be solved wth the same algorthms as the case of constraned DRS of Intra-coded pctures wth the new defnton of dstorton Dˆ ( b ). 3) Unconstraned DRS In unconstraned DRS, the breakpont s denoted as a vector wth bnary elements, = { k } where { 1,2, L, N} and k { 1,2, L, K} s: N D = 1 k mnmze ( b ), where ( ) b E ( k) 2 subject to R ( ) B N =1 b, b. The problem formulaton for Intra-coded pctures D b = (2.6) k b (2.7) And the problem formulaton for Inter-coded pctures s: mnmze N D = 1 D ( b ), where k k ( b ) = A ( k) + b A ( ξ ( k) ) E ( k) b E ( k) k k k (2.8) 8

23 subject to R ( ) B N =1 b (2.9) The problems can be solved by Lagrange multpler wth bsecton-based algorthm [19] or descent-based algorthm [20] Rate Shapng by Block Droppng Instead of droppng the DCT coeffcents, rate shapng can also be acheved by block droppng and the addtonal error concealment at the recever [65][66]. Rate shapng makes a decson on whch DCT blocks to drop dependng how much the dstorton s f these blocks are to be reconstructed wth error concealment ([64] and Fgure 2) at the recever. (a) (b) Fgure 2. Geometrc-structure-based error concealment for 50% block loss: (a) wthout error concealment; (b) wth error concealment The problem formulaton s of Intra-coded pctures s: mnmze ( X X ) D, (2.10) subject to R( X ) B (2.11) where X s the orgnal mage, Xˆ s the MPEG-quantzed mage wth bt rate R( X ˆ ), and X s the shaped-followed-by-concealed verson of Xˆ, wth bt rate R ( X ). The problem can be solved by tree prunng algorthm [11]. Smlar to 2.1.1, the th reconstructed frame X conssts of the 9

24 moton compensated result from the prevous frame ( ) X 1 M, where M () denotes the moton compensaton, and the reconstructed coeffcents from the current frame e. ( X ) e X = M 1 + (2.12) After the modfcaton of (2.12), the rest rate-dstorton optmzaton follows. The above two types of conventonal rate shapng select to drop ether some of the transform coeffcents or the blocks n order to adapt the bt rate of the pre source-coded (no channel-coded) vdeo. The conventonal rate shapng however s not sutable for vdeo streamng over packet-loss networks, where the precoded vdeo s both source- and channel- coded. We wll descrbe n the next secton some characterstcs of packet-loss networks Vdeo Transport over Packet-Loss Networks Packet-loss networks are generally wth tme-varyng packet loss rate and fluctuatng bandwdth. Bandwdth can be the bandwdth of crcut-swtched networks, or s provded by the network management layer. It can also be the estmated effectve bandwdth. In the study, we regard the bandwdth constrants as the target bt rates the output btstream s tryng to satsfy. Packet losses are due to two reasons: the packets never arrve (or arrve too late over a certan threshold) and the arrved packets contan bt errors. In the study, we focus on the second packet-loss scenaro snce the frst one usually results from router queue overflows, packet reorderng, etc., that are beyond the scope of forward error correcton (FEC) codes. The sources of bt errors n a wreless channel are nose, shadowng, fadng, ntersymbol nterference, etc. We adopt a smple fnte-state Markov chan for wreless channel bt error smulaton (detaled n Appendx B). To derve some nterestng results about how the sze of the packet s and the transton probablty (equvalently the burstness) affect the packet loss rate e p, we smplfy the model to make e = e = G 1 0, e = e = B 0 1, t 1,0 = p, and t 0,1 = q. The mean bt error rate (BER) b e s related to the transton probabltes p and q by e b p ( p + q) =. Wth bt error rate e b, transton probablty p, and packet sze s, the packet loss rate of the s -bt packet s, s ( 1 e )( 1 p) 1 e = 1 (2.13) p b 10

25 We observe two propertes from (2.13) gven the same bt error rate e b : () the smaller the transton probablty p, the smaller the packet loss rate e p, and () the smaller the packet sze s, the smaller the packet loss rate e p. These two propertes are shown n Fgure 3 wth e b = We wll see the use of these propertes n the later chapters. e b = 10 ** (-4) packet loss rate: e p transton probablty: p packet sze n bts: s 100 Fgure 3. Packet loss rate as a functon of the transton probablty and the packet sze Besdes the two propertes we have just seen, t s also known that to detect the loss of packets, some nformaton as the packet number has to be added to each packet. The smaller the packet s, the heaver the overhead s. Therefore, t s a trade-off between the selecton of the packet sze and the resultng packet loss rate. We use s = 280 (bts) heren. Users can select the packet sze s accordng to real system consderaton Error-Reslent Rate Shapng: Baselne Rate Shapng (BRS) To protect the vdeo from transmsson errors, source-coded vdeo btstream s often protected by forward error correcton (FEC) codes [46][61]. Redundant nformaton, known as party bts, s added to the orgnal source-coded bts. Party bts are ncluded n the precoded vdeo because FEC encodng at the tme of transmsson may not be feasble gven the capablty of the node that s transportng the vdeo. On the other hand, ths node should be able to perform rate shapng for both the source- and channel- coded btstream snce rate shapng has less complexty than full decodng. Ths node s able to perform full decodng f t wants to vew the content of the vdeo. 11

Error-reslent rate shapng s n need to adapt the bt rates of the pre source- and channel- coded vdeo. The adaptaton of bt rates s natural for wreless transmsson n the wreless LAN etc.

The proposed error-reslent rate shapng can be performed ether at the source (the vdeo server), at the applcaton-aware network node (the proxy), or at the recever, as shown n Fgure 4.

rate adaptaton for the precoded vdeo at the tme of delvery. The decson, as to select whch part of the precoded vdeo to drop, vares from tme to tme.

26 Error-reslent rate shapng s n need to adapt the bt rates of the pre source- and channel- coded vdeo. The adaptaton of bt rates s natural for wreless transmsson n the wreless LAN etc., gven the fluctuatng characterstc of the channel rates. It s also known that the devces used as clents of streamng applcatons vary a lot n ther computaton powers, connecton bandwdths, etc. The proposed error-reslent rate shapng can be performed ether at the source (the vdeo server), at the applcaton-aware network node (the proxy), or at the recever, as shown n Fgure 4. It s worth noted that, unlke jont source-channel technques that allocate the bts for the source and channel coders to acheve the best vdeo qualty, the proposed error-reslent rate shapng performs the rate adaptaton for the precoded vdeo at the tme of delvery. The decson, as to select whch part of the precoded vdeo to drop, vares from tme to tme. There s no need to reassgn bts to the source and channel coders as proposed by the jont source-channel technques. In addton, rate shapng can be appled to adapt to the network condton of each lnk along the path of transmsson. Ths s n partcular sutable for wreless vdeo transport, snce wreless networks are heterogeneous n nature. One sngle jont source-channel coded btstream cannot be optmal for all lnks along the path of transmsson from the sender to the recever. Rate shapng on the other hand can optmze the vdeo transport of each lnk. Vdeo Server Backbone Network Proxy Access Network Access Network Access Network Fgure 4. A general vdeo transport system We start ntroducng the error-reslent rate shapng wth the baselne approach, baselne rate shapng (BRS). More advanced error-reslent rate shapng wll be ntroduced n the followng chapters. BRS system wll be llustrated frst followed by the rate-dstorton (R-D) optmzaton algorthm for BRS. 12

27 System Descrpton of Vdeo Transport wth BRS There are three stages for transmttng the vdeo from the sender to the recever: () precodng, () streamng wth rate shapng, and () decodng, as shown from Fgure 5 to Fgure 7. Vdeo Scalable encoder enhancement layer btstream Base layer btstream FEC encoder FEC encoder Precoded Vdeo btstream Fgure 5. System dagram of the precodng process: scalable encodng followed by FEC encodng network condtons Precoded vdeo Baselne rate rate shaper shaper (BRS) (BRS) Wreless Network Fgure 6. Transport of the precoded vdeo wth BRS Wreless Network Shaped vdeo btstream FEC decoder Scalable decoder Reconstructed vdeo Fgure 7. System dagram of the decodng process: FEC decodng followed by scalable decodng The precodng process (Fgure 5) conssts of source codng usng scalable vdeo codng [22][40][52] and FEC codng. Scalable vdeo codng provdes the prortzed btstream for rate shapng. The concept of rate shapng works for any prortzed vdeo btstream n general 1. Wthout loss of generalty, we consder usng sgnal-to-nose-rato (SNR) scalablty. We use Reed-Solomon codes [61] as the FEC codes. In Fgure 6, the pre source and channel coded btstream s then passed through BRS to adjust the bt rate before beng sent to the wreless networks. BRS seeks to perform the best bandwdth adaptaton at the gven packet loss rate. The dstorton here s descrbed by the dstorton n peak-sgnal-to-nose-rato (PSNR). Packet loss rate, nstead of bt error rate, s 1 For example n DRS, prortzed vdeo btstream from hgh to low prortes, s offered by low to hgh frequency DCT coeffcents. Data parttonng for the sngle-layered non-scalable coded btstream can also gve the prortzed btstream. 13

28 consdered snce the shaped precoded vdeo wll be transmtted n packets. In summary, consderng the packet loss rate and the bandwdth, BRS reduces the bt rate of the precoded vdeo n the R-D optmzed manner (wll be elaborated later). The decodng process (Fgure 7) conssts of FEC decodng followed by scalable decodng Algorthms for BRS Rate-dstorton (R-D) optmzaton algorthms are taken by BRS to delver the best vdeo qualty. We wll descrbe n the followng two R-D optmzaton algorthms: BRS by mode decson and BRS by dscrete R-D combnaton, dependng on how much delay rate shapng allows BRS by Mode Decson Let us consder the case n whch the vdeo sequence s scalable coded nto two layers: one base layer and one enhancement layer. These two layers are FEC coded wth unequal packet loss protecton (UPP) capabltes. Therefore, there are four segments n the precoded vdeo. The frst segment conssts of the bts of the base layer vdeo btstream (upper left segment of Fgure 8 (a)). The second segment conssts of the bts of the enhancement layer vdeo btstream (upper rght segment of Fgure 8 (a)). The thrd segment conssts of the party bts for the base layer vdeo btstream (lower left segment of Fgure 8 (a)). The fourth segment conssts of the party bts for the enhancement layer vdeo btstream (lower rght segment of Fgure 8 (a)). BRS decdes a subset of the four segments to send. There s some constrant to yeld a vald combnaton. For example, f the segment that conssts of the party bts for the base layer vdeo btstream s selected, the segment that conssts of the bts of the base layer vdeo btstream must be selected as well. In ths case wth two layers of vdeo btstream, there are sx vald combnatons shown n Fgure 8 (b)~(g). We call each vald combnaton a state. Each state s represented by a par of ntegers ( x, y), where x s the number of segments selected countng from the segment consstng of the bts of the base layer, and y s the number of segments selected countng from the segment consstng of the party bts for the base layer. The two ntegers x and y satsfy the relatonshp of x y. 14

29 (a) (b) (c) (d) (e) (f) (g) Fgure 8. (a) All four segments of the precoded vdeo and (b)~(g) avalable states for BRS: (b) state (0,0), (c) state (1,0), (d) state (1,1), (e) state (2,0), (f) state (2,1), and (g) state (2,2) Each state of a frame has ts R-D performance represented by a dot n the R-D map shown n Fgure 9 (a) or (b), where B represents the bandwdth constrant. The constellatons of state R-D performances of dfferent frames are dfferent because of varatons of the vdeo source and the packet loss rate. If the bandwdth requrement B of each frame s gven and used, BRS performs mode decson by selectng the state that gves the least dstorton. For example n Fgure 9, state (1, 1) of Frame 1 and state (2, 0) of Frame 2 are chosen. D D B R B R (a) (b) (c) Fgure 9. R-D maps of: (a) Frame 1, (b) Frame 2, and so on BRS by Dscrete R-D Combnaton By allowng for some delay n makng the rate shapng decson, BRS can delver the precoded vdeo wth a better qualty. By allowng for delay, we mean to accumulate the total bandwdth budget for a group of pctures (GOP) and to allocate the bandwdth ntellgently among frames n a GOP. Vdeo btstream s typcally coded wth varable bt rate n order to mantan a constant vdeo qualty. Therefore, we want to allocate dfferent number of bts to dfferent frames n a GOP to utlze the total bandwdth more effcently. 15

30 Assume that there are F frames n a GOP and the total bandwdth budget for these F frames s C. Let x () be the state (represented by a par of ntegers mentoned n the last subsecton) chosen for frame, and let D, x() and R, x() be the resultng dstorton and rate allocated at frame respectvely. The goal of the rate shaper s to: F D = 1 mnmze, x() (2.14) F subject to R, x() C (2.15) =1 The dscrete R-D combnaton algorthm [7][43] fnds the soluton by frst elmnatng the states that are nsde the convex hull of states (Fgure 10 (a) and (b)) for each frame. The algorthm then allocates the rate step by step to the frame that can utlze the rate more effcently. That s, among frame m and frame n, f frame m gves a better rato of dstorton decrease over rate ncrease by movng from the current state u ( m) to the next state ( m) + 1 then the rate s allocated to frame m (the next state ( ) + 1 u, than frame n, u m of frame m s crcled n Fgure 10 (c)) from the avalable total bandwdth budget. The allocaton process contnues untl the total bandwdth budget has been consumed completely. D D a b c D m u(m) u(m)+1 D n u(n) u(n)+1 R R R m R n (a) (b) (c) Fgure 10. Dscrete R-D combnaton algorthm: (a)(b) elmnaton of states nsde the convex hull of each frame, and (c) allocaton of rate to the frame m that utlzes the rate more effcently To summarze for ths chapter, we ntroduce the conventonal rate shapng that s appled to the pre source-coded vdeo, bref the characterstcs of packet-loss networks, and provde the baselne approach, BRS, of the proposed error-reslent rate shapng. 16

31 3. Rate Shapng for Enhancement Layer Vdeo The baselne approach BRS of the proposed error-reslent rate shapng makes decsons on a coarse level. One out of sx states, s selected by BRS for streamng the precoded vdeo. To ncorporate a fner granularty, we propose fne-graned rate shapng (FGRS) for streamng the enhancement layer vdeo and error-concealment aware rate shapng (ECARS) for streamng the base layer vdeo. We wll talk about FGRS n ths chapter and ECARS n the next chapter. We adopt MPEG-4 fne granularty scalablty (FGS) [32] for source codng, and erasure codes [46][61] for FEC codng. Unlke conventonal scalablty technques such as SNR scalablty, MPEG-4 FGS provdes the vdeo btstream that s partally decodable over a wde range of bt rates. The more bts of the FGS btstream s receved, the better the vdeo qualty s. In addton, t has been known that partal FEC coded btstream s stll decodable wthn the error correcton capablty f erasure codes are used. Thus, both FGS and erasure codes provde fnegranularty propertes n vdeo qualty and n packet-loss protecton. Gven the FEC coded FGS btstream as the precoded vdeo, fne-graned rate shapng (FGRS) s proposed for bandwdth adaptaton consderng the current packet-loss rate. There are conceptually nfntely many possble combnatons of droppng porton of the FGS btstream and porton of the FEC codes. FGRS seeks the optmal soluton n the R-D sense. A new two-stage R-D optmzaton s proposed to select part of the precoded vdeo to drop. The proposed two-stage R-D optmzaton ams for both effcency and optmalty by usng model-based hyper-surface and hll-clmbng based refnement. In Stage 1, a model-based hyper-surface s frst traned wth a set of rate and gan pars. We then fnd the soluton that sts n the ntersecton of the hyper-surface and the bandwdth constrant. In Stage 2, the near-optmal soluton from Stage 1 (because the model can only approxmate the true relatonshp between rate and gan) s then refned wth the hll-clmbng based approach. We can see that Stage 1 ams to fnd the optmal soluton globally wth the model-based hyper-surface and Stage 2 refnes the soluton locally. 17

32 Ths chapter s organzed as follows. We frst ntroduce the system of FGRS. Background materals as MPEG-4 FGS and Reed-Solomon codes wll also be mentoned. We then elaborate on algorthms for FGRS, wth the R-D problem formulaton followed by the two-stage R-D optmzaton. Experments are carred out to show the superor performance of the proposed FGRS to naïve unequally packet-loss protecton methods. Fnally, concludng remarks are gven Rate Shapng for Enhancement Layer Vdeo: Fne-Graned Rate Shapng (FGRS) As mentoned, BRS performs the bandwdth adaptaton for the precoded vdeo by selectng the best state of each frame at the gven packet-loss rate. Snce the packet loss rate and the bandwdth at any gven tme could le n any value over a wde range of values, we would lke to extend the noton of BRS to allow for fner-graned decsons. There prompts the need for source and channel codng technques that offer fne granulartes n terms of vdeo qualty and packet loss protecton, respectvely. Fne granularty scalablty (FGS) has been proposed to provde btstreams that are stll decodable when truncated. That s, FGS enhancement layer btstream s decodable at any bt rate over a wde range of values. Wth such a property, FGS was adopted by MPEG-4 for streamng applcatons [32]. Through FGS encodng, two layers of btstream are generated: one base layer and one enhancement layer (Fgure 11). The base layer s predctve coded whle the enhancement layer only uses the correspondng base layer as the reference. Enhancement layer Base layer I B P B P Fgure 11. Dependency graph of the FGS base layer and enhancement layer. Base layer allows for temporal predcton wth P and B frames. Enhancement layer s encoded wth reference to the base layer only On the other hand, t has also been known that the erasure codes provde fne-graned packetloss protecton wth more and more symbols 2 receved at the FEC decoder [46][61]. The shaped 2 Symbols are used nstead of bts snce the Reed-Solomon codes used n the study use a symbol as the encodng/decodng unt. We use 14 bts to form one symbol. The selecton of symbol sze n bts s up to the user. 18

33 erasure code s stll decodable f the number of erasures/losses from the transmsson s no more than d mn 1 (number of unsent symbols). An erasure code can successfully decode the message wth the number of erasures up to d mn 1, consderng both the unsent symbols and the losses taken place n the transmsson. Therefore, the more symbols are sent, the better the sent btstream can cope wth the losses. We use Reed-Solomon codes as the erasure codes. In Reed-Solomon codes, d mn 1 equals n k, where k s the message sze n symbols and n s the code sze n symbols. Thus, the partal code of sze r n s stll decodable f the number of losses from the transmsson s no more than r k. After understandng the background materals of MPEG-4 FGS and Reed-Solomon codes, let us ntroduce the system for streamng the precoded vdeo. As BRS, there are three stages for transmttng the vdeo from the sender to the recever: () precodng, () streamng wth rate shapng, and () decodng, as shown from Fgure 12 to Fgure 14. Vdeo FGS encoder FGS enhancement layer btstream FEC encoder FEC coded FGS enhancement layer btstream Base layer btstream Fgure 12. System dagram of the precodng process: FGS encodng followed by FEC encodng network condtons FEC coded FGS enhancement layer btstream Fne-graned rate rate shaper shaper (FGRS) Packet-loss Network Base layer btstream Relable channel (a) (b) Fgure 13. Transport of the precoded btstreams: (a) transport of the FEC coded FGS enhancement layer btstream wth rate shaper va the wreless network, and (b) transport of the base layer btstream va the secure channel 19

34 Packet-loss Network Shaped FGS enhancement layer btstream FEC decoder FGS decoder Reconstructed vdeo Relable channel Base layer btstream Fgure 14. System dagram of the decodng process: FEC decodng followed by FGS decodng Through FGS encodng, two layers of btstream are generated: one base layer and one enhancement layer (Fgure 11). We wll consder hereafter the bandwdth adaptaton and packet loss reslence for the FGS enhancement layer btstream only, assumng that the base layer btstream s relably transmtted as shown n Fgure 13 (b). Let us look at the FGS enhancement layer btstream for a frame. FGS enhancement layer btstream conssts of bts of all the bt-planes for ths frame. The most sgnfcant bt-plane (MSB plane) s coded before the less sgnfcant bt-planes untl the least sgnfcant bt-plane (LSB plane). In addton, snce the data n each bt-plane s varable length coded (VLC), f some part of the bt-plane s corrupted (due to packet losses), the remanng part of the bt-plane becomes undecodable. The mportance of the bts of the enhancement layer decreases from the begnnng to the end. Before appendng the party symbols to the FGS enhancement layer btstream, we frst dvde all the symbols for ths frame nto several sublayers (Fgure 15 (a)). The way to dvde the symbols nto sublayers s arbtrary except that the later sublayers are longer than the prevous ones, k k L k 1 2 h, snce we want to acheve unequal packet loss protecton (UPP). A natural way of dvson s to let Sublayer 1 consst of symbols of the MSB plane, Sublayer 2 consst of symbols of the MSB-1 plane,, and Sublayer h consst of symbols of the LSB plane. Each sublayer s then FEC encoded wth erasure codes to the same length n (Fgure 15 (b)). The precoded vdeo s stored and can be used at the tme of delvery. 20

35 Sublayer h Sublayer h (a) (b) Fgure 15. Precoded vdeo: (a) FGS enhancement layer btstream n sublayers, and (b) FEC coded FGS enhancement layer btstream At the transport stage, FEC coded FGS btstream s passed through FGRS for bandwdth adaptaton under the current packet loss rate. Note agan that FGRS s dfferent from jont sourcechannel codng based approaches, whch perform FEC encodng for the FGS btstream at the tme of delvery wth a bt allocaton scheme that acheves certan objectves, as proposed by van der Schaar and Radha [55] and Yang et al. [63]. Packetzaton s performed after error-reslent rate shapng Algorthms for FGRS Wth the precoded vdeo, bandwdth adaptaton can be acheved by methods shown n Fgure 16. The dark bars n Fgure 16 (a) and Fgure 16 (d) are selected to be sent. Fgure 16 (a) shows how to adapt the bandwdth by randomly droppng part of the precoded vdeo (or randomly keepng part of the precoded vdeo). Bandwdth adaptaton can also be acheved by naïvely droppng the symbols n the order shown n Fgure 16 (b). Gven a certan bandwdth requrement for ths frame, Sublayer 1 has more party symbols kept than Sublayer 2 and so on. Shaped btstream wth such a nave bandwdth adaptaton scheme has UPP to the sublayers. We wll refer to ths method as UPPRS1 hereafter. In addton, bandwdth adaptaton can be acheved by frst droppng the symbols from hgher sublayers as shown n Fgure 16 (c), whch we refer to as UPPRS2 hereafter. However, none of the above methods are optmal. We propose FGRS (Fgure 16 (d)) for bandwdth adaptaton gven the current network condton. 21

36 Order of droppng Sublayer h Sublayer h (a) (b) Sublayer h Rate shapng Sublayer h r 3 r 2 r h r 1 Order of droppng (c) (d) Fgure 16. Bandwdth adaptaton wth (a) random droppng; (b) UPPRS1; (c) UPPRS2; and (d) FGRS Let us start from the problem formulaton and contnue wth the two-stage R-D optmzaton to solve for the FGRS problem Problem Formulaton A FGS enhancement layer btstream provdes better and better vdeo qualty as more and more sublayers are correctly decoded. In other words, the total dstorton s decreased as more sublayers are correctly decoded. Wth Sublayer 1 correctly decoded, we reduce the total dstorton by G 1 (accumulated gan s G 1 ); wth Sublayer 2 correctly decoded, we reduce the total dstorton further by G 2 (accumulated gan s G 1 + G2 ); and so on. If Sublayer s corrupted, the followng Sublayers + 1, + 2, etc., become un-decodable. Note that G of Sublayer can ether (1) be calculated gven the FGS btstream, after performng partal decodng n order to get 22

37 the values of gan; or (2) be embedded n the btstream as the meta-data. dfferent for every frame. G of Sublayer s Snce the precoded vdeo s transmtted over error prone wreless networks, sublayers are subject to loss and have certan recovery rates gven a partcular rate shapng decson. The expected accumulated gan s then: G = h G v j = 1 j= 1 (3.1) where h s the number of sublayers of ths frame, and v j s the recovery rate of Sublayer j that s a functon of r j as shown later. Sublayer j s recoverable (or successfully decodable) f the number of erasures resultng from the lossy transmsson s no more than rj k j. j k s the message (the symbols from the FGS btstream) sze n Sublayer j, and r j s the number of symbols selected to be sent n Sublayer j. Wth Reed-Solomon codes used, r k wth the excepton of the last sublayer (not necessary the Sublayer h, can be the sublayer before that); and the whole sublayer s consdered lost f the number of erasures s beyond the error-correcton capablty r k. The recovery rate v j s the summaton of the probabltes that no loss occur, one erasure occurs, and so on untl r k erasures occur. j j v j = r j k j l= 0 p { l occur} erasures, j = 1 ~ h (3.2) If each erasure occurs as a Bernoull tral wth probablty e m, the probablty of havng l erasures out of r j symbols s, p r erasures m m (3.3) l j l rj l { l occur} = ( e ) ( 1 e ) 23

38 The symbol loss rate can be derved from the packet loss rate as ( ) s the packet sze and m s the symbol sze n bts. e m e p m = 1 1, where s s If the erasures come from a fnte-state Markov model, for example, a two-state Markov model wth symbol loss rates m, 1 symbols s: p e and e m, 2, the probablty of havng l erasures out of r j r r r r j, 2 l2 erasures (3.4) l = 0 l1 l l 1 1 l j,1 l rj,1 l1 j j,1 l2 { l occur} = ( em,1 ) ( em,1 ) ( em,2 ) ( em,2 ) where r j, 1 s the number of symbols out of r j symbols that orgnates from State 1 of the twostate Markov model, and r j, 2 s the number of symbols out of r j symbols that orgnates from State 2 of the two-state Markov model. We can see that (3.4) s the convoluton of two bnomal dstrbutons. By choosng dfferent combnatons of the number of symbols for each sublayer, the expected accumulated gan wll be dfferent. The rate-shapng problem can be formulated as follows: G v j maxmze G = h = 1 j= 1 (3.5) subject to h r B =1 (3.6) where B s the bandwdth constrant of ths frame. To solve ths problem, we propose a new two-stage R-D optmzaton approach. The twostage R-D optmzaton frst fnds the near-optmal soluton globally. The near-optmal global soluton s then refned by a hll clmbng approach. Pror work on R-D optmzaton ncludes [12][43][45][50]. The proposed two-stage R-D optmzaton s dfferent from [12][43][45][50] n three folds. Frst, the model-based Stage 1 allows us to examne fewer samples from all the operatonal R-D states. Second, the proposed dstorton measure (or expected accumulated gan n the termnology of the thess) accounts for the effects of packet loss as well as the channel codes by means of recovery rates. Fnally, the proposed two-stage R-D optmzaton approach can avod the potental problem that the soluton could be trapped n the local maxmum or reach the local maxmum very slowly. 24

39 Packetzaton s performed after error-reslent rate shapng. That s, symbols are grouped nto packets after the decson of = [ r r L ] 1 2 r h r has been made. Small packet s desrable to make use of the fne-graned decson resulted from FGRS. For example, a bg packet that contans all the symbols from a frame could be unrecoverable f t s decded to be dropped by the lower layers (for example, the lnk layer detects a CRC check error for ths bg packet) Two-Stage R-D Optmzaton: Stage 1 We can see from (3.1) to (3.4) that the expected accumulated gan G s related to r = [ r r L ] mplctly through the recovery rates v = [ v v L ] 1 2 r h 1 2 v h. We can nstead fnd a model-based hyper-surface that explctly relates r and G. The model parameters can be traned from a set of tranng data ( r,g), where r values are chosen by the user and G values can be computed from (3.1) to (3.4). The optmal soluton s the feasble soluton wthn the ntersecton of the hyper-surface and the bandwdth constrant as llustrated n Fgure 17. A complex model, wth a lot of parameters, can be used to descrbe as close as possble the true dstrbuton of the R-D states. The soluton obtaned from the ntersecton wll be as close to optmal as possble. However, the number of ( r,g) pars needed to tran the model-based hypersurface ncreases wth the number of parameters. G r 1 r 1 +r 2 =B r 2 Fgure 17. Intersecton of the model-based hyper-surface (dark surface) and the bandwdth constrant (gray plane), llustrated wth h = 2 In the study, we use a quadratc equaton to descrbe the relaton between r and G as follows: h h h Gˆ = 2 a r + bjr rj + cr + d (3.7) = 1, j= 1, j = 1 25

40 To dstngush the hyper-surface model Ĝ from the real expected gan G, we denote the former wth a head sgn. The model parameters a, b j, c, and d are traned dfferently for each frame. They can be solved by surface fttng wth a set of tranng data ( r,g) obtaned by (3.1)- (3.4). For example, the parameters can be computed by: a 's bj 's = c 's d T ( R R) 1 R T 1 G 2 G M Ξ G (3.8) where the left super ndex of G s the ndex of the tranng data, R s a matrx consstng Ξ rows of ( r 2 's, r r 's, r 's, 1) j. The complexty of computng a s, b j s, c s, and d relates 2 to the number of parameters h + h + 1 and the number of tranng data Ξ, usng (3.8). Note that the number of tranng data Ξ s n general much greater than the number of parameters h + h + 1. Thus, a more complex model, such as a thrd-order model wth h + h + h + 1 parameters, wll not be sutable snce t requres much more tranng data. In addton, Secondorder Taylor expanson can approxmate ncely n general every functon. (3.7) can be seen as a second-order approxmaton to (3.1). To reduce the computaton complexty n realty, we can also choose a smaller h. Wth (3.7), the near-optmal soluton can be obtaned by the use of Lagrange multpler as follows. J h h h = h 2 ar bjr rj cr d λ r B (3.9) = 1, j= 1, j = 1 = 1 J By settng = 0, we get: r r h 1 = bjr 2a j= 1, j j + c + λ (3.10) where λ s: 26

41 h h 1 2B + bjrj + c = 1 a j= 1, j λ = (3.11) h 1 a The near-optmal soluton can be solved recursvely startng from the ntal condton that all = 1 sublayers are allocated wth equal number of symbols, (3.11). B r = r = L = r = 1 2 h usng (3.10) and h Two-Stage R-D Optmzaton: Stage 2 Stage 1 of the two-stage R-D optmzaton gves a near-optmal soluton. The soluton can be refned by a hll-clmbng based approach (Fgure 18). The soluton from Stage 1 s perturbed n order to yeld a larger expected accumulated gan. The process can be terated untl the soluton reaches a stoppng crteron such as the convergence. Whle (stop == false) z = r for all =1~h For (j=1; j<=h; j++) For (k=1; k<=h; k++) z k = z k + delta for k==j //Increase sublayer j z k = z k - delta/(h - 1) for k!=j //Decrease others End - for Evaluate G j End - for Fnd the j* wth the largest G j *. For (=1; <=h; ++) r = r + delta for ==j* r = r - delta/(h - 1) for!=j* End - for Calculate the stop crteron. End - whle Fgure 18. Pseudo-codes of the hll-clmbng algorthm The dea of allocatng bandwdth optmally for sublayers can be extended to a hgher level to allocate bandwdth effcently among frames n a GOP. The problem formulaton s slghtly dfferent from the orgnal (3.5)-(3.6) as follows: h = = = G m v mj 1 1 j 1 maxmze G = m F (3.12) 27

subject to F h rm C m= 1 = 1 (3.13) where F s the number of frames n a GOP. FGRS wll ncur delay wth duraton of F frames f t allows for optmzaton among frames n a GOP.

The two-stage R-D optmzaton obtans the optmal soluton by frst fndng the near-optmal soluton globally, then refnng the soluton wth the hll-clmbng based approach. 3.

Four methods (mentoned n Fgure 16) wll be compared sde-by-sde: random droppng (wth legend rand ), UPPRS1 (wth legend upprs1 ), UPPRS2 (wth legend upprs2 ), and FGRS (wth legend fgrs ).

42 subject to F h rm C m= 1 = 1 (3.13) where F s the number of frames n a GOP. FGRS wll ncur delay wth duraton of F frames f t allows for optmzaton among frames n a GOP. To summarze, the proposed FGRS acheves the best streamng performance for FEC coded FGS btstream wth the two-stage R-D optmzaton. The two-stage R-D optmzaton obtans the optmal soluton by frst fndng the near-optmal soluton globally, then refnng the soluton wth the hll-clmbng based approach Experment We wll show n ths secton the effectveness of FGRS n streamng the precoded vdeo over packet-loss networks. Four methods (mentoned n Fgure 16) wll be compared sde-by-sde: random droppng (wth legend rand ), UPPRS1 (wth legend upprs1 ), UPPRS2 (wth legend upprs2 ), and FGRS (wth legend fgrs ). The test vdeo sequences are akyo, foreman, and stefan n common ntermedate format (CIF) (Fgure 19 (a)-(c)). Sequence akyo represents a vdeo sequence wth lower bt rate due to smpler texture and less moton. Sequence foreman represents a vdeo sequence wth medum bt rate wth regular texture and moton. Sequence stefan represents a vdeo sequence wth hgher bt rate wth complex texture and faster moton. The frame rate of MPEG-4 FGS codng s three frames/sec. The source-codng rates of the FGS enhancement layer btstream of the three sequences are kbts/sec, kbts/sec, and kbts/sec. The FEC coded btstreams (before beng shaped) of these three sequences have rates kbts/sec, kbts/sec, kbts/sec, respectvely. (a) (b) (c) Fgure 19. Test vdeo sequences n CIF: (a) akyo, (b) foreman, and (c) stefan 28

43 The bandwdth of the smulated networks fluctuates between 200 kbts/sec and 1100 kbts/sec. The bt error rate (BER) of the channel also fluctuates accordng to the two-state Markov chan model detaled n Appendx B. The wreless channel smulaton parameters can be found n B.2. Some of the BER traces are shown n Fgure 20. Under the same network condton (the same BER trace and the same bandwdth trace), the results shown n the followng are tested for 10 dfferent seeds for pseudo-random smulatons. That s, the overall PSNR result shown s the average of 10 dfferent tests. The frame-by-frame PSNR result s an nstance out of of the 10 tests. (a) (b) (c) Fgure 20. Sample BER traces of the wreless channel: (a) moble unt at 2 km/h; (a) moble unt at 6 km/h; (a) moble unt at 10 km/h Gven the gan embedded n the btstream, FGRS consumes on the average <0.01% (the denomnator s the bt rates of the source-coded btstream) of the orgnal precoded vdeo to carry the sublayer gan nformaton ( meta-data ). The performance mprovement of FGRS n PSNR over non- rate shapng based methods s on the average 8 db. On the other hand, f the gan s not embedded n the btstream for rate shapng, no extra bts are needed to carry the sublayer gan nformaton. Partal decodng to obtan the sublayer gan nformaton s requred. In the followng experment results, we frst show an example of how each method allocates the rates among sublayers (Fgure 21). We then show the performance n terms of the overall PSNR of dfferent sequences (from Fgure 22 to Fgure 24), and the performance n terms of the overall PSNR at varous wreless channel condtons (from Fgure 25 to Fgure 27). Fnally, we show the performance n terms of the frame-by-frame PSNR for sequence foreman (from Fgure 28 to Fgure 30). 29

44 Fgure 21 shows that wth the bandwdth constrant specfed, Method rand allocates the rates randomly among the nne sublayers; Method upprs1 allocates the rates equally among the nne sublayers; Method upprs2 allocates the rates all to the frst sublayer; and Method fgrs allocates the rates smartly among the nne sublayers (some sublayers are even not allocated wth rates). The bt allocaton process of FGRS happens automatcally by the proposed two-stage R-D optmzaton consderng the current network condton. Fgure 21. Sublayer bt allocatons of all methods at 10 km/h and SNR=20 db for Sequence foreman From Fgure 22 to Fgure 24, the performance n terms of the overall PSNR of the Y, U, and V components, of dfferent sequences s shown. We can see that for each sequence, for all Y, U, and V components, fgrs performs the best among all four methods. Gven the same network condton, Sequence akyo has hgher PSNR than foreman ; and Sequence foreman has hgher PSNR than stefan. The sequence wth texture that s more complex and faster moton, such as stefan, gves smaller PSNR value gven the same bandwdth budget. Results are consstent for Y, U, and V, components. 30

45 Fgure 22. Performance (PSNR of the Y component) of all methods at 10 km/h and SNR=20 db for Sequences akyo, foreman, and stefan Fgure 23. Performance (PSNR of the U component) of all methods at 10 km/h and SNR=20 db for Sequences akyo, foreman, and stefan Fgure 24. Performance (PSNR of the V component) of all methods at 10 km/h and SNR=20 db for Sequences akyo, foreman, and stefan 31

46 The performance n terms of the overall PSNR of the Y, U, and V components at varous wreless channel condtons s shown from Fgure 25 to Fgure 27. Fgure 25 (a), Fgure 26 (a), and Fgure 27 (a) show the 3-D plots of the overall PSNR. Fgure 25 (b), Fgure 26 (b), and Fgure 27 (b) show the top vews (seen from the top of the z-axs) of the 3-D plots. The color shown n the top vew represents the color of the method that outperforms the others. At all wreless channel condtons, fgrs outperforms all other methods. Fgure 25 (c), Fgure 26 (c), and Fgure 27 (c) show the overall PSNR at varous speeds at SNR = 10 db. Fxed SNR value gves the same bt error rate (BER) of the wreless channel. The hgher the speed s, the more bursty the bt error of the wreless channel s. In other words, the larger the transton probablty s. From the results, we see that the PSNR drops as the speed ncreases. Ths matches wth what we have mentoned n Secton 2.2 that the hgher the transton probablty s, the hgher the packet-loss rate s, gven the same bt error rate. Hgher packet-loss rate has the effect of requrng more party bts n the shaped btstream, and hgher probablty of corruptng the packets that carres the shaped btstream, thus, the PSNR value s lower. Fgure 25 (d), Fgure 26 (d), and Fgure 27 (d) show the overall PSNR at varous SNR at speed = 10 km/h. Fxed speed gves the same burstness of the bt errors of the wreless channel. The larger the SNR s, the smaller the BER s. We see from the results that the PSNR value ncreases wth SNR. Also from Secton 2.2, we know that the smaller the BER s, the smaller the packet-loss rate s, gven the same burstness. Smaller packet-loss rate then leads to a hgher PSNR. Results are consstent for Y, U, and V, components for all the fgures shown from Fgure 25 to Fgure 27. (a) (b) 32

47 (c) (d) Fgure 25. Performance (PSNR of the Y component) of all methods at varous wreless channel condtons for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR 33

48 (a) (b) (c) (d) Fgure 26. Performance (PSNR of the U component) of all methods at varous wreless channel condtons for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR 34

49 (a) (b) (c) (d) Fgure 27. Performance (PSNR of the V component) of all methods at varous wreless channel condtons for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fnally, we show the performance n terms of the frame-by-frame PSNR of the Y, U, and V components for sequence foreman (from Fgure 28 to Fgure 30). We see that fgrs performs the best among all. Sample frames of Method upprs1 and fgrs are also shown n Fgure 31 to demonstrate vsually the mert of fne-graned rate shapng. 35

50 Fgure 28. Frame-by-frame PSNR of the Y component of all methods at 10 km/h and SNR=20 db for Sequence foreman Fgure 29. Frame-by-frame PSNR of the U component of all methods at 10 km/h and SNR=20 db for Sequence foreman 36

51 Fgure 30. Frame-by-frame PSNR of the V component of all methods at 10 km/h and SNR=20 db for Sequence foreman (a) (b) Fgure 31. A sample frame of (a) upprs1 and (b) fgrs at 10 km/h and SNR=20 db for Sequence stefan 3.4. Concluson To ncorporate fner scalablty to error-reslent rate shapng, we proposed fne-graned rate shapng (FGRS) for streamng the enhancement layer vdeo, gven that the base layer vdeo s relably transmtted. FGRS uses the proposed two-stage R-D optmzaton approach to adapt the rates of the FEC coded FGS enhancement layer btstream, gven the network condton. The twostage R-D optmzaton frst obtans the near-optmal soluton that sts n ntersecton of the model-based hyper-surface and the bandwdth constrant. The near-optmal soluton s then 37

52 refned by a hll-clmbng based approach. The two-stage R-D optmzaton ams for both the effcency and the optmalty. The proposed FGRS outperforms the other naïve methods. 38

53 4. Rate Shapng for Base Layer Vdeo To have a fner-granular decson nstead of the coarse decson made by BRS, we propose errorconcealment aware rate shapng (ECARS) for streamng the base layer vdeo n addton to FGRS for streamng the enhancement layer vdeo, llustrated n the last chapter. Takng nto account that the recever may perform error concealment (EC) f any vdeo data s lost durng the transmsson, ECARS makes rate shapng decsons accordngly. Related work that utlzed EC nformaton for rate shapng on pre source- coded btstream only can be found n [66]. Frame dependency s usually nherent n vdeo btstream due to predctve codng. In addton, temporal EC mght ntroduce extra frame dependency. Feedback from the recever to the sender mght be helpful n addressng the frame dependency problem n rate shapng. We then ntroduce two types of ECARS algorthms: wthout feedback and wth feedback. Both evaluate the gans of sendng some parts of the precoded vdeo as opposed to not sendng them. The gan metrcs are then ncluded n the R-D optmzaton formulaton. Fnally, the two-stage R-D optmzaton approach s adopted to solve for the R-D optmzaton problem. In the case of no feedback, ECARS evaluates the gans consderng a partcular EC method used at the recever. In order to ncorporate the frame dependency nto the rate shapng process, we propose to send the locaton (and mean) of the corrupted macroblock back to the sender, and use such feedback nformaton to determne the gans used n the R-D optmzed ECARS. Ths chapter s organzed as follows. We frst ntroduce the system of ECARS. Background materals as EC methods and tmely feedback follow. We then elaborate on algorthms for both types of ECARS: wthout feedback and wth feedback. Experments are carred out to show the performance of the proposed ECARS. Fnally, concludng remarks are gven Rate Shapng for Base Layer Vdeo: Error Concealment Aware Rate Shapng (ECARS) There are three stages for transmttng the vdeo from the sender to the recever: () precodng, () streamng wth rate shapng, and () decodng, as shown from Fgure 32 to Fgure

54 EC aware precodng Vdeo Source encoder FEC encoder Precoded Vdeo btstream Fgure 32. System dagram of the precodng process: source encodng (whch can be EC aware) followed by FEC encodng network condtons & decoder feedback Precoded vdeo Rate Rate shapng Wreless Network Fgure 33. Transport of the precoded vdeo wth ECARS Wreless Network Shaped vdeo btstream FEC decoder Source decoder Reconstructed vdeo Fgure 34. System dagram of the decodng process: FEC decodng followed by source decodng In the precodng process (shown n Fgure 32), vdeo s encoded by both the source encoder and the FEC encoder. The precodng process s done before the tme of delvery. The precodng process may be aware of the EC method used at the recever, whch we wll descrbe later. In the streamng stage (shown n Fgure 33), ECARS takes the network condtons as bandwdth and packet-loss rate, and possbly the feedback from the recever, nto account. The decodng process (shown n Fgure 34) conssts of FEC decodng followed by scalable decodng Background for ECARS We wll descrbe brefly on error concealment (EC) methods, EC aware precodng, and tmely feedback n ths secton Error Concealment EC reles on some a pror knowledge to reconstruct the lost vdeo content. Such a pror can come from spatal or temporal neghbors. For example, we can assume that the pxel values are smooth across the boundary of the lost and retaned regons. To recover lost data wth the smoothness assumpton, nterpolaton or optmzaton based on certan objectve functons are 40

55 often used. Fgure 35 and Fgure 36 show corrupted frames and the correspondng reconstructed frames. The black regons n Fgure 35 (a) and Fgure 36 (a) ndcate losses of the vdeo data. Fgure 35 shows an EC method usng spatal nterpolaton from the neghborng pxels. Fgure 36 shows an EC method usng temporal nterpolaton. That s, f some pxel values are lost, the decoder copes the pxel values from the prevous frame at the correspondng locatons to the current frame. The EC method usng temporal nterpolaton can be extended to copyng the pxel values from the prevous frame at the moton-compensated locatons. The moton vectors used for moton compensaton ether are assumed error-free or can be estmated at the decoder [3][31]. We use the smple temporal nterpolaton method n the study. Future extenson ncludes usng moton-compensated temporal nterpolaton, or more sophstcated EC methods as mentoned n [8][9]. (a) (b) Fgure 35. EC example by spatal nterpolaton: (a) the corrupted frame wthout EC, and (b) the reconstructed frame wth EC (a) (b) Fgure 36. EC example by temporal nterpolaton: (a) the corrupted frame wthout EC, and (b) the reconstructed frame wth EC Error Concealment Aware Precodng In addton to ECARS, the precodng process can be EC aware to prortze the precoded vdeo based on the gan. We present an example EC aware precodng process by means of macroblock 41

56 (MB) prortzaton. A MB n a frame s ranked accordng to ts gan, whch depends on how well ths MB can be reconstructed by the EC method used at the recever. The gan of sendng a MB s large f the EC method used at the recever cannot reconstruct ths MB very well. Let us consder that a smple temporal nterpolaton based EC method s adopted. Fgure 37 provdes us wth an llustraton of EC aware MB prortzaton. If MB ( 1,1) s lost n Frame n, t cannot be well reconstructed by MB ( 1,1) from Frame n 1. On the other hand, f MB ( 0,3) s lost n Frame n, t can be well reconstructed by MB ( 0,3) from Frame n 1. Therefore, we should rank MB ( 1,1) wth hgher prorty than MB ( 0,3). We can use square sum of the pxel dfferences between the orgnal MB and the ECreconstructed MB as the measure for prorty. The larger the square sum s, the larger the gan for ths MB s, thus, the hgher the prorty of ths MB s. Assumng that the neghborng MB of the MB consdered are decoded wthout errors, the MB gan 255 ( c ju p ju s ju ) u= 0 2 g j s defned as follows: g =, j =1 ~ number of MB n a frame (4.1) j where u 3 s the coeffcent ndex n a MB, c ju s the coeffcent of the EC-reconstructed MB, p ju s the predcton value of ths MB, and s ju s the resdue value of ths MB. p ju + s ju s the deal value wthout any transmsson error or rate adaptaton by rate shapng. c ( p + s ) ju s to see how far the EC value s from the deal value. The assumpton that the neghborng MB are decoded wthout errors s vald f the packet losses are not too bursty. ju ju 3 We consder only the Y components n the MB wthout loss of generalty. Thus, there are four 8 8 blocks or 256 coeffcents nsde. 42

57 (0,0) (0,1) (0,2) (0,3) (1,0) (1,1) (1,2) (1,3) (2,0) (2,1) (2,2) (2,3) (a) (b) (c) n Fgure 37. (a) Frame 1, (b) Frame n, and (c) MB ndces. EC aware MB prortzaton MB (1,1) has hgher prorty than MB (0, 3) An observaton to make s that the conventonal vdeo codng can be consdered as a specal case of the proposed EC aware MB prortzaton. Let us consder the case where no moton vector s used n vdeo codng. The MB wth large resdues s encoded and transmtted, whle the MB wth small resdues does not need to be transmtted snce the small resdues wll become zero after quantzaton. Ths case translates to the case of EC aware MB prortzaton usng temporal nterpolaton wth zero moton vectors. Let us consder another case where moton vectors are ncluded n vdeo codng. Ths then translates to the case of EC aware MB prortzaton usng temporal nterpolaton wth moton vectors. We can see that the proposed EC aware MB prortzaton s more general snce t s not lmted to any specfc error concealment method. The source-coded btstream wth EC aware MB prortzaton can be appended wth party bts from the FEC codng. Frst, the bts of the hghest prorty MB s placed followed by the bts of the second hghest prorty MB and so on, as shown n Fgure 38 (a). To label the MB after the MB are ordered by ther prortes, 446 bytes of complementary nformaton of the MB labels are needed f the vdeo s n common ntermedate format (CIF). The bts are then dvded nto sublayers as shown n Fgure 38 (b). Sublayer + 1 has more bts than Sublayer snce we want to acheve UPP for the sublayers when appended wth the party bts. For example, we can let Sublayer 1 conssts of bts from the frst 10 hghest prorty MB, Sublayer 2 conssts of bts from the followng 20 hghest prorty MB, and so on. Each sublayer s then appended wth party bts from the FEC codng as shown n Fgure 38 (c). 43

58 MB prortzed btstream bts of MB (1,1) bts of MB (0,1) Sublayer h Sublayer h bts of MB (0,3) (a) (b) (c) Fgure 38. Precoded vdeo: (a) MB prortzed btstream, (b) MB prortzed btstream n sublayers, and (c) FEC coded MB prortzed btstream Also, wth the MB gan defned, we can defne the sublayer gan correspondngly as: j G = g j, =1 ~ number of sublayers n a frame (4.2) { ndces of MB that belong to Sublayer } Note agan that ECARS can perform rate adaptaton wth or wthout EC aware precodng as long as the precoded vdeo s provded wth sublayer gans. The sublayer gan wll be used later n the R-D optmzed ECARS Tmely Feedback In the system of ECARS wth feedback, feedback nformaton from the recever s utlzed. Feedback can be carred out by RTCP message [44]. To understand whether feedback nformaton s tmely,.e. wthn one frame nterval 4, let us examne the one-way transmsson tme [25]. 4 The ITU-T recommends the followng lmts for one-way transmsson tme accordng to ITU-T Recommendaton G. 131 [26]. However, the lmt n the rate shapng system wth feedback s a frame nterval snce the rate shapng decson s made on each frame nterval. 0 to 150 ms: Acceptable for most user applcatons. 150 to 400 ms: Acceptable provded that admnstratons are aware of the transmsson tme mpact on the transmsson qualty of user applcatons. above 400 ms: Unacceptable for general network plannng purposes; however, t s recognzed that n some exceptonal cases ths lmt wll be exceeded. 44

59 The transmsson tme s the aggregate of several components, e.g. group delay n cables and equpment processng tmes, etc. In addton, one-way transmsson tme of the natonal extenson crcuts and the nternatonal crcuts must be taken nto account. The transmsson tme for the natonal extenson crcuts can be estmated as follows: a) In purely analogue networks, the transmsson tme wll probably not exceed: 12 + (0.004 dstance n klometers) ms (4.3) Here the factor s based on the assumpton that natonal trunk crcuts wll be routed over hgh-velocty plant (250 km/ms). The 12 ms constant term makes allowance for termnal equpment and for the probable presence n the natonal network of a certan quantty of loaded cables (e.g. three pars of channel translatng equpments plus about 160 km of H 88/36 loaded cables). For an average sze country, the one-way propagaton tme wll be less than 18 ms. b) In mxed analogue/dgtal networks, the transmsson tme can generally be estmated by the equaton gven for purely analogue networks. However, under certan unfavorable condtons, ncreased delay may occur compared wth the purely analogue case. Ths occurs n partcular when dgtal exchanges are connected wth analogue transmsson systems through PCM/FDM equpment n tandem, or trans-multplexers. Wth the growng degree of dgtalzaton, the transmsson tme wll gradually approach the condton of purely dgtal networks. c) In purely dgtal networks between local exchanges, based on optcal fber systems (e.g. an IDN), the transmsson tme wll probably not exceed: 3 + (0.005 dstance n klometers) ms (4.4) The 3 ms constant term makes allowance for one par of PCM coder and decoder and for fve dgtally swtched exchanges. The value s a mean value for optcal fber systems; for coaxal cable systems and rado-relay systems s to be used. d) In purely dgtal networks between subscrbers (e.g. an ISDN), the delay of c) above has to be ncreased by up to 3.6 ms f burst-mode (tme compresson multplexng) transmsson s used on 2-W local subscrber lnes. The transmsson tme for the nternatonal crcuts can use values of Table 1 below. Table 1. One-way transmsson tme 45

60 Transmsson or processng system Contrbuton to one-way transmsson tme Remarks Terrestral coaxal cable or rado-relay system: FDM and dgtal transmsson Optcal fber cable system, dgtal transmsson Submarne coaxal cable system Submarne optcal fber system: transmt termnal receve termnal Satellte system: 400 km alttude km alttude km alttude 4 µs/km 5 µs/km (Note 1) Allows for delay n repeaters and regenerators 6 µs/km 13 ms 10 ms 12 ms 110 ms 260 ms FDM channel modulator or demodulator 0.75 ms (Note 2) PLMS (Publc Land Moble System) ms objectve 40 ms H.260-seres vdeo coders and decoders Further study (Note 3) DCME per par: for speech, VBD, and non-remodulated fax DCME per par: for speech, VBD, and non-remodulated fax DCME n conjuncton wth ITU-T Rec. G.763 or ITU-T Rec. G.767) per par: for remodulated fax PCME per par: wth speech and non-remodulated VBD wth remodulated VBD 30 ms 30 ms 200 ms 35 ms 70 ms Transmultplexer 1.5 ms (Note 4) Dgtal transt exchange, dgtal-dgtal Dgtal local exchange, analogue-analogue Dgtal local exchange, analogue subscrber lne-dgtal juncton Dgtal local exchange, dgtal subscrber lne-dgtal juncton 0.45 ms (Note 5) 1.5 ms (Note 5) ms (Note 5) ms (Note 5) Echo cancellers 0.5 ms (Note 6) ATM (CBR usng AAL1) 6.0 ms (Note 7) Worst case Propagaton through space only (between earth statons) Half the sum of transmsson tmes n both drectons of transmsson 46

61 NOTE 1 Ths value s provsonal and s under study. NOTE 2 These values allow for group-delay dstorton around frequences of peak speech energy and for delay of ntermedate hgher order multplex and through-connectng equpment. NOTE 3 Further study requred. Delay for these devces s usually non-constant, and the range vares by mplementaton. Current mplementatons are on the order of several hundred mllseconds and consderable delay s added to audo channels to acheve lp-synchronzaton. Manufacturers are encouraged to reduce ther contrbuton to transmsson tme, n accordance wth ths ITU-T Recommendaton. NOTE 4 For satellte dgtal communcatons where the transmultplexer s located at the earth staton, ths value may be ncreased to 3.3 ms. NOTE 5 These are mean values: dependng on traffc loadng, hgher values can be encountered, e.g ms (1.950 ms, ms, or ms) wth 0.95 probablty of not exceedng. NOTE 6 Ths s averaged for both drectons of transmsson. NOTE 7 Ths s the cell formaton delay of 64 kbts/s stream when t completely flls the cell (one voce channel per VC). In practcal applcatons, addtonal delay wll result, e.g. from cell loss detecton and bufferng. Other delays may be applcable to other AALs and cell mappng arrangements, and are for further study. Snce rate shapng s performed at each lnk, the transmsson tme from one hop to the other s consdered. We can see from (4.3), (4.4), and Table 1 that the feedback consumes n general less than 33 ms (assumng the vdeo frame rate s 30 frames/sec) to get back to the sender Algorthms for ECARS To explan the algorthms for ECARS, let us start from a smple example as an extenson to BRS. Let us consder that the precoded vdeo conssts of two layers of vdeo btstream, namely, the base layer and the enhancement layer. Each layer s protected by some party bts from the FEC codng. The settng s shown earler n Fgure 8 (a). The rate shaper s extended to gve a fner decson on how many symbols to send (or how many symbols to drop) for each layer, nstead of decdng whch segment(s) to drop. Snce the rate shaper s aware of the EC method used at the recever, t can evaluate how much dstorton t wll result n f the rate shaper decdes to send a certan amount of symbols for each layer. In other words, the rate shaper can evaluate how much gan t wll get f t decdes to send ths certan amount of symbols for each layer. In general, the base layer can be reconstructed well wth EC snce the base layer conssts of coarse nformaton of the vdeo that can be easly reconstructed. On the other hand, the enhancement layer, whch conssts of fne detals of the vdeo, cannot be easly reconstructed. The EC aware rate shaper may assgn a hgher gan on sendng symbols n the enhancement layer than the symbols n the base layer. Notce that the example gven here s just for understandng. We are not gong to prortze the btstream n terms of base and enhancement layers. Instead, all the dscussons 47

62 hereafter wll occur n the base layer as sad n the very begnnng of the chapter. The prortzaton takes place when we order the macroblocks (MB) by ther MB gans ECARS wthout Feedback Suppose ECARS s gven the precoded vdeo wth sublayers. Each sublayer conssts of symbols from source codng, whch s shown as the upper porton of each strpe n Fgure 39 (a), and symbols from channel codng, whch s shown as the lower porton of each strpe n Fgure 39 (a). The darken bars n Fgure 39 (b) represent the symbols to be sent by ECARS. Sublayer h Rate shapng Sublayer h r 3 r 2 r h r 1 (a) (b) Fgure 39. (a) Precoded vdeo n sublayers and (b) ECARS decson on whch symbols to send The problem formulaton for ECARS s as follows. The total gan s ncreased (or the total dstorton s decreased) as more sublayers are correctly decoded. Wth Sublayer 1 correctly decoded, the total gan s ncreased by G 1 (accumulated gan s G 1 ); wth Sublayer 2 correctly decoded, the total gan s ncreased further by G 2 (accumulated gan s G 1 + G2 ); and so on. Note that G of Sublayer can ether (1) be calculated gven the btstream and the EC method used by the recever, after performng partal decodng n order to get the values of gan; or (2) be embedded n the btstream as the meta-data. ECARS s EC aware because the gan dependent on the EC method used by the recever. The expected accumulated gan s then: G s G of Sublayer s dfferent for every frame. 48

63 h G = = 1 G v (4.5) where h s the number of sublayers of ths frame, and v s the recovery rate of Sublayer. The defnton of v and the rest of the R-D optmzaton follow what s stated and proposed n Secton 3.2 except the change of defnton n the expected accumulated gan shown above n (4.5) ECARS wth Feedback As we have dscussed, the goal of rate shapng s to acheve the maxmal expected accumulated gan. We should consder the frame dependency n the process of R-D optmzaton, snce the reconstructed result of the prevous frame wll affect the followng frames f the vdeo s predctve coded, and/or the EC method performed at the recever utlzes the temporal nformaton. We propose to use feedback nformaton from the recever to carry nformaton about the prevous reconstructed frame for the use of the current frame n rate shapng. Notce that n our setup, the forward and feedback transmssons share the same amount of bandwdth. In addton, the use of feedback n ECARS does not suggest an ncrease n channel capacty. The total bandwdth stays the same. Feedback s to nform the rate shaper for transmsson of more useful data (wth a larger gan) rather than less useful (wth a smaller gan). If the EC method used at the recever s precsely known by the rate shaper, wth the nformaton where the macroblock s corrupted beng sent back, the rate shaper can mtate what the decoder gets. Knowng what the decoder gets, the rate shaper can calculate the MB gan for R-D optmzaton. In the later experments, we wll use ecars-deal to represent that the locaton of the corrupted macroblock s sent back and the EC method used at the recever s known. If the EC method used at the recever s not precsely known by the rate shaper, we can try to approxmate what the gan should be by sendng back nformaton as () the locaton of the corrupted macroblock; or () the mean of the corrupted macroblock n addton to the locaton of the corrupted macroblock. In the later experments, we wll use ecars-nf to represent no feedback s used and the gan nformaton s embedded n the btstream, ecars-loc to represent feedback wth the locaton of the corrupted macroblock s used, and ecars-mean to represent feedback wth the locaton and mean of the corrupted macroblock s used. Note that none of the 49

64 rate shapng methods, ecars-nf, ecars-loc, ecars-mean know the precse EC method at the sender. For ecars-loc, we wll explan how to determne the gan g of each MB wth respect to three cases where frame dependency can occur. The three cases are denoted as (0,1), (1,2), and (1,1). An 1 n the frst feld represents Inter-codng, and a 0 n the frst feld represents Intracodng. An 1 n the second feld represents an EC method that utlzes the temporal nformaton, and 2 n the second feld represents an EC method that utlzes only the spatal nformaton from the neghbors. Lkewse, we wll explan how to determne the gan wth respect to (0,1), (1,2), and (1,1) for ecars-mean. In general, the MB gan g j of each MB remans the same as g j f the correspondng MB of the prevous frame s successfully decoded. We then see how to determne the MB gan each MB f the correspondng MB of the prevous frame s corrupted. g j of ECARS wth Feedback: ECARS-LOC (0,1): If a MB of the prevous frame s corrupted, we want to ncrease the MB gan of the correspondng MB of the current frame. Temporal EC wll use nformaton from the corrupted MB of the prevous frame f the correspondng MB of the current frame cannot be decoded successfully. Thus, we want to make sure that the MB of the current frame s sent wth good protecton. A natural way s to double the value of MB gan j g j g j as: g = 2, j =1 ~ number of MBn a frame (4.6) (1,2) and (1,1): If a MB of the prevous frame s corrupted, we want to decrease the MB gan of the correspondng MB of the current frame. We do not want to use the corrupted MB of the prevous frame for predctve codng. Sendng the resdues of the MB of the current frame s useless f the predcton to ths MB s already erroneous. A natural way s to set the MB gan to zero as: j g = 0, j =1 ~ number of MBn a frame (4.7) Havng determned the MB gans, they are grouped together to form the sublayer gan = 1 ~ number of G, where sublayers n a frame. Agan, the rest of the R-D optmzaton follows what s stated and proposed n Secton 3.2 except the change of defnton n the expected accumulated gan. 50

65 ECARS wth Feedback: ECARS-MEAN The method proposed n Secton s heurstc. We know that the MB gan g j remans the same as g j f the correspondng MB of the prevous frame s successfully decoded. If the correspondng MB of the prevous frame s corrupted, we can determne the MB gan g j from g j by examnng how dfferent the corrupted correspondng MB of the prevous frame s from the successfully decoded counterpart. The further apart the corrupted MB s from ts successfully decoded counterpart, the more we should change the gan of the MB of the current frame. The corrupted MB of the prevous frame wll affect ether the predcton of the correspondng MB of the current frame, the EC reconstructon of the correspondng MB of the current frame, or both. We propose to send the mean of the predcton of the MB for the current frame p j and the mean of the EC reconstructon of the MB for the current frame back to the sender for calculaton of c j, both affected by the corrupted MB of the prevous frame, g j. Recall that the orgnal defnton of the MB gan 255 ( c ju p ju s ju ) u= 0 2 g j of the current frame s g = from (4.1). Now the MB of the prevous frame s corrupted, the MB j gan g j of the current frame should be: g = j = [ ( c ) ( ju p ju s ju p ju + s ju ( p ju + s ju ) ] [ ( c ) ( ju p ju s ju p ju p ju ) ] u= 0 u= 0 j =1 ~ number of MB n a frame, (4.8) snce the gan s defned as the dstorton decrease comparng sendng the MB of the current frame wth not sendng but reconstructng the MB by EC. Wth the EC method known and the locaton where the MB s corrupted, the rate shaper can calculate the exact values of p ju and c ju, where u =1 ~ 256. Those values p ju and c ju are to be used by (4.8). However, f the EC method s not precsely known at the rate shaper (whch s usually the case snce the recever 51

66 mght not want to dsclose ts own EC method), we can nstead consder only the means c j back to the sender to approxmate g j as follows: 2 2 ( c j p j s j ) ( p j p j ) g j ( c p s ), g j = 2 j j j j =1 ~ number of MBn a frame p j and (4.9) Wthout subscrpt u, the above values n the numerator and the denomnator represent means. Smlarly, havng determned the MB gans, they are grouped together to form the sublayer gan G, where =1 ~ number of sublayersn a frame. The rest of the R-D optmzaton follows what s stated and proposed n Secton 3.2 except the change of defnton n the expected accumulated gan. The dea of allocatng bandwdth optmally for sublayers can be extended to a hgher level to allocate bandwdth effcently among frames n a GOP. The problem formulaton s then: maxmze G = ( ) F h m= 1 = 1 G m v m (4.10) subject to F rm C m h =1 (4.11) where F s the number of frames n a GOP. ECARS wll ncur delay wth duraton of F frames f t allows for optmzaton among frames n a GOP. Note agan the packetzaton s performed after error-reslent rate shapng. That s, symbols are grouped nto packets after the decson of = [ r r L ] 1 2 r h r has been made. Small packet s desrable to make use of the fne-graned decson resulted from ECARS. For example, a bg packet that contans all the symbols from a frame could be unrecoverable f t s decded to be dropped by the lower layers (for example, the lnk layer detects a CRC check error for ths bg packet) Experment We wll show n ths secton the effectveness of ECARS n streamng the precoded vdeo over packet-loss networks. Seven methods wll be compared sde-by-sde: random droppng (wth legend rand ), UPPRS1 (wth legend upprs1 ), UPPRS2 (wth legend upprs2 ), and non- ECARS (wth legend n-ecars ), ECARS wthout feedback (wth legend ecars-nf ), ECARS 52

67 wth locaton feedback (wth legend ecars-loc ), and ECARS wth mean and locaton feedback (wth legend ecars-mean ). One deal method (as the performance bound) where EC method at the recever s precsely known wll be shown as well. The test vdeo sequences are akyo, foreman, and stefan n CIF format. Sequence akyo represents a vdeo sequence wth lower bt rate due to smpler texture and less moton. Sequence foreman represents a vdeo sequence wth medum bt rate wth regular texture and moton. Sequence stefan represents a vdeo sequence wth hgher bt rate wth complex texture and faster moton. The frame rate s 30 frames/sec. The bandwdth of the smulated networks fluctuates between 2 Mbts/sec and 11 Mbts/sec. The bandwdth of the forward channel s subtracted by the amount of bts the feedback channel requres f there s some feedback sent back from the recever. The bt error rate (BER) of the channel also fluctuates accordng to the two-state Markov chan model detaled n Appendx B. The wreless channel smulaton parameters can be found n B.2. Under the same network condton (the same BER trace and the same bandwdth trace), the results shown n the followng are tested for 10 dfferent seeds for pseudo-random smulatons. That s, the overall PSNR result shown s the average of 10 dfferent tests. The frame-by-frame PSNR result s an nstance of the 10 tests. Gven the gan embedded n the btstream, ECARS consumes on the average <1% (the denomnator s the bt rates of the source-coded btstream) of the orgnal precoded vdeo to carry the gan nformaton ( meta-data ). The performance mprovement of ECARS n PSNR over non- rate shapng based methods s on the average 8 db. In the followng, we wll present the results of: Rate shapng vs. non- rate shapng (.e., rate shapng vs. UPPRS) ECARS vs. Non-ECARS ECARS wth feedback vs. ECARS wthout feedback ECARS wth EC method known vs. ECARS wthout EC method known All seven methods The reference that the results of all methods compared to n computng the PSNR, s the result of a vdeo btstream that s transmtted wth no packet loss and wth unlmted bandwdth. 53

68 Rate Shapng vs. UPPRS We wll compare the results of rate shapng based method n-ecars wth the non- rate shapng based method upprs1 here. Results of Case (0, 1) are shown n Fgure 40 and Fgure 41; results of Case (1, 2) are shown n Fgure 42 and Fgure 43; and results of Case (0, 1) are shown n Fgure 44 and Fgure 45. The performance n terms of the overall PSNR at varous wreless channel condtons s shown n Fgure 40, Fgure 42, and Fgure 44. Fgure 40 (a), Fgure 42 (a), and Fgure 44 (a) show the 3-D plots of the overall PSNR. Fgure 40 (b), Fgure 42 (b), and Fgure 44 (b) show the top vews (seen from the top of the z-axs) of the 3-D plots. The color shown n the top vew represents the color of the method that outperforms the others. At all wreless channel condtons, n-ecars outperforms upprs1. Even though n-ecars s not EC aware, t stll performs rate shapng by R-D optmzaton. Fgure 40 (c), Fgure 42 (c), and Fgure 44 (c) show the overall PSNR at varous speeds at SNR = 10 db. Fxed SNR value gves the same bt error rate (BER) of the wreless channel. The hgher the speed s, the more bursty the bt error of the wreless channel s. In other words, the larger the transton probablty s. The hgher the transton probablty s, the hgher the packet-loss rate s, gven the same BER. On the other hand, the EC performance degrades as the error becomes more bursty because EC reles on spatal or temporal neghbors. Neghbors are usually corrupted f the error s bursty. Therefore, from the results, we do not see the correlaton between the overall PSNR and the speed. Fgure 40 (d), Fgure 42 (d), and Fgure 44 (d) show the overall PSNR at varous SNR at speed = 10 km/h. Fxed speed gves the same burstness of the bt errors of the wreless channel. The larger the SNR s, the smaller the BER s. We see from the results that the PSNR value ncreases wth SNR. Also from Secton 2.2, we know that the smaller the BER s, the smaller the packet-loss rate s, gven the same burstness. Smaller packet-loss rate then leads to a hgher PSNR. Frame-by-frame PSNR performance s shown n Fgure 41, Fgure 43, and Fgure 45. We also see that n-ecars outperforms upprs1. 54

69 (a) (b) (c) (d) Fgure 40. Performance of Methods upprs1 and n-ecars at varous wreless channel condtons wth Case (0, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 41. Frame-by-frame PSNR of Methods upprs1 and n-ecars at 10 km/h and SNR=20 db wth Case (0, 1) for Sequence foreman 55

70 (a) (b) (c) (d) Fgure 42. Performance of Methods upprs1 and n-ecars at varous wreless channel condtons wth Case (1, 2) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 43. Frame-by-frame PSNR of Methods upprs1 and n-ecars at 10 km/h and SNR=20 db wth Case (1, 2) for Sequence foreman 56

71 (a) (b) (c) (d) Fgure 44. Performance of Methods upprs1 and n-ecars at varous wreless channel condtons wth Case (1, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 45. Frame-by-frame PSNR of Methods upprs1 and n-ecars at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman 57

72 ECARS vs. Non-ECARS We wll compare the results of EC aware rate shapng ecars-nf wth the non- EC aware rate shapng n-ecars here. Results of Case (0, 1) are shown n Fgure 46 and Fgure 47; results of Case (1, 2) are shown n Fgure 48 and Fgure 49; and results of Case (0, 1) are shown n Fgure 50 and Fgure 51. The performance n terms of the overall PSNR at varous wreless channel condtons s shown n Fgure 46, Fgure 48, and Fgure 50. Fgure 46 (a), Fgure 48 (a), and Fgure 50 (a) show the 3-D plots of the overall PSNR. Fgure 46 (b), Fgure 48 (b), and Fgure 50 (b) show the top vews (seen from the top of the z-axs) of the 3-D plots. The color shown n the top vew represents the color of the method that outperforms the others. At all wreless channel condtons, ecars-nf outperforms n-ecars. Fgure 46 (c), Fgure 48 (c), and Fgure 50 (c) show the overall PSNR at varous speeds at SNR = 10 db. Fxed SNR value gves the same bt error rate (BER) of the wreless channel. The hgher the speed s, the more bursty the bt error of the wreless channel s. In other words, the larger the transton probablty s. The hgher the transton probablty s, the hgher the packet-loss rate s, gven the same BER. On the other hand, the EC performance degrades as the error becomes more bursty because EC reles on spatal or temporal neghbors. Neghbors are usually corrupted f the error s bursty. Therefore, from the results, we do not see the correlaton between the overall PSNR and the speed. Fgure 46 (d), Fgure 48 (d), and Fgure 50 (d) show the overall PSNR at varous SNR at speed = 10 km/h. Fxed speed gves the same burstness of the bt errors of the wreless channel. The larger the SNR s, the smaller the BER s. We see from the results that the PSNR value ncreases wth SNR. Also from Secton 2.2, we know that the smaller the BER s, the smaller the packet-loss rate s, gven the same burstness. Smaller packet-loss rate then leads to a hgher PSNR. Frame-by-frame PSNR performance s shown n Fgure 47, Fgure 49, and Fgure 51. We also see that ecars-nf outperforms n-ecars. 58

73 (a) (b) (c) (d) Fgure 46. Performance of Methods n-ecars and ecars-nf at varous wreless channel condtons wth Case (0, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 47. Frame-by-frame PSNR of Methods n-ecars and ecars-nf at 10 km/h and SNR=20 db wth Case (0, 1) for Sequence foreman 59

74 (a) (b) (c) (d) Fgure 48. Performance of Methods n-ecars and ecars-nf at varous wreless channel condtons wth Case (1, 2) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 49. Frame-by-frame PSNR of Methods n-ecars and ecars-nf at 10 km/h and SNR=20 db wth Case (1, 2) for Sequence foreman 60

75 (a) (b) (c) (d) Fgure 50. Performance of Methods n-ecars and ecars-nf at varous wreless channel condtons wth Case (1, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 51. Frame-by-frame PSNR of Methods n-ecars and ecars-nf at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman 61

76 ECARS wth Feedback vs. ECARS wthout Feedback We wll compare the results of EC aware rate shapng wth feedback ecars-loc and ecarsmean wth the EC aware rate shapng wthout feedback ecars-nf here. Results of Case (0, 1) are shown n Fgure 52 and Fgure 53; results of Case (1, 2) are shown n Fgure 54 and Fgure 55; and results of Case (0, 1) are shown n Fgure 56 and Fgure 57. The performance n terms of the overall PSNR at varous wreless channel condtons s shown n Fgure 52, Fgure 54, and Fgure 56. Fgure 52 (a), Fgure 54 (a), and Fgure 56 (a) show the 3-D plots of the overall PSNR. Fgure 52 (b), Fgure 54 (b), and Fgure 56 (b) show the top vews (seen from the top of the z-axs) of the 3-D plots. The color shown n the top vew represents the color of the method that outperforms the others. ecars-mean outperforms ecarsloc and ecars-nf at most of the channel condtons wth small margns. Fgure 52 (c), Fgure 54 (c), and Fgure 56 (c) show the overall PSNR at varous speeds at SNR = 10 db. Fxed SNR value gves the same bt error rate (BER) of the wreless channel. The hgher the speed s, the more bursty the bt error of the wreless channel s. In other words, the larger the transton probablty s. The hgher the transton probablty s, the hgher the packet-loss rate s, gven the same BER. On the other hand, the EC performance degrades as the error becomes more bursty because EC reles on spatal or temporal neghbors. Neghbors are usually corrupted f the error s bursty. Therefore, from the results, we do not see the correlaton between the overall PSNR and the speed. Fgure 52 (d), Fgure 54 (d), and Fgure 56 (d) show the overall PSNR at varous SNR at speed = 10 km/h. Fxed speed gves the same burstness of the bt errors of the wreless channel. The larger the SNR s, the smaller the BER s. We see from the results that the PSNR value ncreases wth SNR. Also from Secton 2.2, we know that the smaller the BER s, the smaller the packet-loss rate s, gven the same burstness. Smaller packet-loss rate then leads to a hgher PSNR. Frame-by-frame PSNR performance s shown n Fgure 53, Fgure 55, and Fgure 57. We also see that ecars-mean outperforms ecars-loc and ecars-nf at most of the channel condtons wth small margns. 62

77 (a) (b) (c) (d) Fgure 52. Performance of Methods ecars-nf, ecars-loc, and ecars-mean at varous wreless channel condtons wth Case (0, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 53. Frame-by-frame PSNR of Methods ecars-nf, ecars-loc, and ecars-mean at 10 km/h and SNR=20 db wth Case (0, 1) for Sequence foreman 63

78 (a) (b) (c) (d) Fgure 54. Performance of Methods ecars-nf, ecars-loc, and ecars-mean at varous wreless channel condtons wth Case (1, 2) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR 64

79 Fgure 55. Frame-by-frame PSNR of Methods ecars-nf, ecars-loc, and ecars-mean at 10 km/h and SNR=20 db wth Case (1, 2) for Sequence foreman (a) (b) (c) (d) Fgure 56. Performance of Methods ecars-nf, ecars-loc, and ecars-mean at varous wreless channel condtons wth Case (1, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR 65

80 Fgure 57. Frame-by-frame PSNR of Methods ecars-nf, ecars-loc, and ecars-mean at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman To understand why the mprovement of ecars-mean over ecars-nf s margnal, let us consder that the pxel value can be modeled as an AR(1) Gauss-Markov process [51]. x n 2 = ρx 1 + σ 1 ρ e (4.12) n n Usng the pxel value of the prevous frame to conceal the error n the current frame can result n the dstorton as follows. That s, c j s from the prevous frame. [( ) ] 2 = 2σ 2 ( 1 ρ) E x n x n 1 (4.13) If the prevous frame s corrupted, we use the one before the prevous frame to conceal the error n the current frame. That s, dstorton s: c j s from the one before the prevous frame. The resultng [( ) ] = 2σ ( ρ ) E x n x n (4.14) 2 1 If ρ 1 (whch s reasonable for natural mages), the dstortons n (4.13) and (4.14) are almost dentcal. Therefore, we conclude that the gan g j does not change a lot wth the feedback. In addton to the analyss, let us examne ecars-deal wth respect to ecars-loc to see the lmtaton of the best ECARS method wth feedback n the followng subsecton ECARS wth EC Method Known vs. ECARS wthout EC Method Known We wll compare the results of ECARS wth feedback knowng exactly the EC method ecarsdeal, wth ECARS wth feedback wthout knowng exactly the EC method ecars-loc here. In 66

81 general, rate shaper does not know exactly the EC method used at the recever. Thus, ecarsdeal s usually not the case snce the recever mght not want to dsclose ts own EC method. The comparson n ths subsecton s served for nformaton only. Results of Case (0, 1) are shown n Fgure 58 and Fgure 59; results of Case (1, 2) are shown n Fgure 60 and Fgure 61; and results of Case (0, 1) are shown n Fgure 62 and Fgure 63. The performance n terms of the overall PSNR at varous wreless channel condtons s shown n Fgure 58, Fgure 60, and Fgure 62. Fgure 58 (a), Fgure 60 (a), and Fgure 62 (a) show the 3-D plots of the overall PSNR. Fgure 58 (b), Fgure 60 (b), and Fgure 62 (b) show the top vews (seen from the top of the z-axs) of the 3-D plots. The color shown n the top vew represents the color of the method that outperforms the others. At all wreless channel condtons, ecars-nf outperforms n-ecars. Fgure 58 (c), Fgure 60 (c), and Fgure 62 (c) show the overall PSNR at varous speeds at SNR = 10 db. Fgure 58 (d), Fgure 60 (d), and Fgure 62 (d) show the overall PSNR at varous SNR at speed = 10 km/h. Frame-by-frame PSNR performance s shown n Fgure 59, Fgure 61, and Fgure 63. We also see that ecars-nf outperforms n-ecars. (a) (b) 67

82 (c) (d) Fgure 58. Performance of Methods ecars-loc and ecars-deal at varous wreless channel condtons wth Case (0, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 59. Frame-by-frame PSNR of Methods ecars-loc and ecars-deal at 10 km/h and SNR=20 db wth Case (0, 1) for Sequence foreman 68

83 (a) (b) (c) (d) Fgure 60. Performance of Methods ecars-loc and ecars-deal at varous wreless channel condtons wth Case (1, 2) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 61. Frame-by-frame PSNR of Methods ecars-loc and ecars-deal at 10 km/h and SNR=20 db wth Case (1, 2) for Sequence foreman 69

84 (a) (b) (c) (d) Fgure 62. Performance of Methods ecars-loc and ecars-deal at varous wreless channel condtons wth Case (1, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR 70

85 Fgure 63. Frame-by-frame PSNR of Methods ecars-loc and ecars-deal at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman We can see that ecars-deal on the average outperforms ecars-loc by 0.1~0.3 db, whch falls n the same range by whch ecars-mean outperforms ecars-loc. We conclude that ecars-mean s an almost deal ECARS method wth feedback wthout requrng the knowledge of the exact EC method used at the recever All Methods Sample results of methods where exact EC methods are not requred are shown here. Fgure 64 shows an example of how each method allocates the rates among sublayers. Wth the bandwdth constrant specfed, Method rand allocates the rates randomly among the 27 sublayers; Method upprs1 allocates the rates equally among the 27 sublayers; Method upprs2 allocates the rates to the earler sublayers; and Methods n-ecars, ecars-nf, ecars-loc, and ecars-mean allocate the rates smartly among the 27 sublayers (some sublayers are even not allocated wth rates) dependng on dfferent defntons of the MB gan. The bt allocaton processes of necars, ecars-nf, ecars-loc, and ecars-mean happen automatcally by the proposed twostage R-D optmzaton consderng the current network condton. 71

86 Fgure 64. Sublayer bt allocatons of all methods at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman We then recap the performance of all seven methods (shown from Fgure 65 to Fgure 70). We can see that ecars-mean outperforms all the others most of the tme. Rate shapng based methods n-ecars, ecars-nf, ecars-loc, and ecars-mean outperform naïve methods rand, upprs1, and upprs2 at all tme. (a) (b) 72

87 (c) (d) Fgure 65. Performance of all methods at varous wreless channel condtons wth Case (0, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 66. Frame-by-frame PSNR of all methods at 10 km/h and SNR=20 db wth Case (0, 1) for Sequence foreman 73

88 (a) (b) (c) (d) Fgure 67. Performance of all methods at varous wreless channel condtons wth Case (1, 2) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 68. Frame-by-frame PSNR of all methods at 10 km/h and SNR=20 db wth Case (1, 2) for Sequence foreman 74

89 (a) (b) (c) (d) Fgure 69. Performance of all methods at varous wreless channel condtons wth Case (1, 1) for Sequence foreman : (a) 3-D vew of PSNR at varous speeds and SNR; (b) top vew of PSNR at varous speeds and SNR; (c) PSNR at varous speeds; (d) PSNR at varous SNR Fgure 70. Frame-by-frame PSNR of all methods at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman 75

90 From Fgure 71 to Fgure 73, the performance n overall PSNR of dfferent sequences s shown. Gven the same network condton, Sequence akyo has hgher PSNR than foreman ; and Sequence foreman has hgher PSNR than stefan. The sequence wth texture that s more complex and faster moton, such as stefan, gves smaller PSNR value gven the same bandwdth budget. Fgure 71. Performance of all methods at 10 km/h and SNR=20 db wth Case (0, 1) for Sequences akyo, foreman, and stefan Fgure 72. Performance of all methods at 10 km/h and SNR=20 db wth Case (1, 2) for Sequences akyo, foreman, and stefan 76

91 Fgure 73. Performance of all methods at 10 km/h and SNR=20 db wth Case (1, 1) for Sequences akyo, foreman, and stefan Fnally, sample frames of Method n-ecars and ecars-mean are shown n Fgure 74 to demonstrate vsually the mert of ECARS wth locaton and mean nformaton as feedbacks. (a) (b) Fgure 74. A sample frame of (a) n-ecars and (b) ecars-mean at 10 km/h and SNR=20 db wth Case (1, 1) for Sequence foreman 4.5. Concluson We proposed n ths paper error concealment aware rate shapng (ECARS) for vdeo transport over wreless networks. ECARS s appled to pre source- and channel- coded vdeo. ECARS frst evaluates the gan of sendng the MB of the precoded vdeo, as opposed to not sendng t but reconstructng t by EC. Then gven a certan packet loss rate, the expected accumulated gan can 77

92 be derved and be ncluded n the R-D optmzaton problem formulaton. Fnally, ECARS performs R-D optmzaton by the proposed two-stage R-D optmzaton approach. Two types of ECARS algorthms: wthout feedback and wth feedback from the recever, were proposed to account for the frame dependency problem n rate shapng. In the case of no feedback, ECARS evaluates the MB gan consderng a partcular EC method used at the recever. The case of ECARS wth feedback s needed f the vdeo s predctve coded, and/or the EC method performed at the recever utlzes the temporal nformaton. In order to ncorporate the frame dependency nto the rate shapng process, we propose to send the locaton (and mean) of the corrupted MB back to the sender, and use such feedback nformaton to determne the MB gan n the R-D optmzed ECARS. Experments have shown that ECARS s better than other naïve methods. Moreover, ECARS has mproved performance wth the ads of the feedback nformaton. The way the MB are grouped nto sublayers n ths paper s fxed and s not part of the ECARS R-D optmzaton, snce how MB are grouped should be consdered n the precodng process but not n the rate shapng stage. In the future, we can consder R-D optmzaton on the way MB are grouped nto sublayers (that s, the number of source-coded symbols that go to each sublayer) gven the rate shapng problem s solved. 78

93 5. Modelng of Vdeo Traffc We present a new stochastc process called the punctured autoregressve (AR) process, and use t to model the varable bt rate (VBR) vdeo traffc. To model the VBR vdeo traffc, we propose to use punctured autoregressve processes modulated by a doubly Markov process. The doubly Markov process models the state of each vdeo frame whle the autoregressve processes descrbe the number of bts of each frame at each state. The punctured autoregressve process consders the tmng nformaton between frames of the same state and thus gves better modelng performance. The model captures the long-range dependency (LRD) characterstcs as well as the short-range dependency (SRD) characterstcs of the vdeo traffc. Queung behavor of the punctured autoregressve process s also closer to the real vdeo traffc than the conventonal autoregressve process. Ths chapter s organzed as follows. We frst ntroduce some pror work on VBR traffc modelng. We then ntroduce punctured AR process as opposed to the conventonal AR process. The proposed doubly Markov process modulated punctured AR process as well as the conventonal non-punctured verson are both descrbed for ther usage n traffc modelng. Experments are conducted to compare the proposed punctured process wth the non-punctured process. Fnally, concluson remarks are gven Introducton To both the vdeo servce provders and the network desgners, t s mportant to have a good model for the vdeo traffc. A good model for vdeo traffc allows for better admsson control, schedulng, network resource allocaton polces, etc., that guarantee a desred qualty of servce (QoS) as well as a better utlzaton of the network resources. A good model captures essental characterstcs of the real vdeo traffc. The synthetc trace generated by such a model can be used to test the network performance under a certan, for example, admsson control polcy. Therefore, the network desgners can desgn a network that s more frendly to the vdeo traffc and thus delvers a better vdeo servce. Because of the mportance of both the varable bt rate (VBR) vdeo traffc modelng and the wreless channel dynamc modelng, many models have been proposed. To evaluate dfferent 79

94 models, there are generally three crtera to be taken nto account (1) A good model should capture the statstcal propertes of the real trace. A trace s defned as a sequence of data we ntend to model. In the case of VBR vdeo traffc modelng, a trace s a sequence of numbers, each represents the number of bts to encode each Group of Blocks (GOB)/frame/Group of Pctures (GOP). In the case of wreless channel modelng, a trace s a sequence of numbers, each represents the channel bt error rate (BER) at dfferent tme nstant. The statstcal propertes should nclude those that are related to the long-range dependency (LRD) of the trace as well as those that are related to the short-range dependency (SRD) of the trace [1]. (2) The synthetc vdeo trace should be smlar to the real vdeo trace n terms of the queung behavor. Lkewse, the synthetc trace of the wreless channel should be smlar to the real trace n terms of the QoS behavor. (3) The model should be smple and easy to be analyzed. Related dscusson on how to evaluate the performance of the models can be found n [28][37][47]. Vdeo traffc modelng s challengng n several aspects. Frst, vdeo s usually encoded wth VBR to be adaptve to the vdeo content. Second, dependng on dfferent codng schemes, the vdeo trace has very dfferent propertes. Popular vdeo codng schemes nclude H.263 [52] and MPEG-4 [40]. The vdeo frame can be Intra (I), Predctve (P), or Bdrectonally predctve (B) encoded. The GOP structure, whch s defned as frames enclosed by two I frames ncludng the leadng I frame, can be fxed or dynamc. A fxed GOP structure that s commonly used s IBBPBBPBBPBB. Exstng work for vdeo traffc modelng ncludes DAR [24], whch fals to capture the LRD property of the vdeo traffc. Models such as [14][36][58][67] are constraned by a fxed GOP structure. Non-statstcal methods as [33][62] are usually more dffcult to analyze. The wavelet-based model [38] and the autoregressve process (AR) based model [35] do not capture the dynamc nature of the vdeo traffc ncely. In general, t s preferred to use a Markov chan lke process to model the dynamc nature of the vdeo traffc. Models such as [30] are too complex. We propose to buld a model based on the work done by [53], whch models the vdeo traffc as a doubly Markov process wth AR processes nsde each Markov state. However, the model of [53] does not use the tmng nformaton between frames of the same Markov state. Here, we explctly use the tmng nformaton between frames of the same Markov state and refer to the new model as a doubly Markov process modulated punctured AR process. It s shown that the proposed model outperforms the model of [53] n terms of statstcs, both SRD and LRD, queung behavor, and has the same complexty n terms of the number of model parameters. 80

95 5.2. Punctured Autoregressve Modelng A conventonal autoregressve (AR) process x n s defned as follows: x n 2 = ρxn 1 + σ 1 ρ en, n =1,2, L (5.1) where n s the tme ndex of the process, ρ descrbes the dependency of the sample at tme n 2 wth the prevous sample at tme n 1, σ s the varance of the process x n, and e n s a Gaussan random varable wth mean 0 and varance 1 to characterze the random nature of the process. Consder the case where more than one AR processes are nterleaved together such as shown n Fgure 75 (a). At tme nstant 1, AR process x n takes place; at tme nstant 2, AR process y n takes place; and so on. Conventonal method to tran two sets of AR parameters of sequences x n and y n s by splttng the sngle process n Fgure 75 (a) to two separate processes as shown n Fgure 75 (b) and Fgure 75 (c). Each one of the processes represents the tranng sequence of x n or y n regardless of the tme ndex assocated wth each sample. For example, the sequence n Fgure 75 (b) s used as f samples are: where ~ x 1 = x1, ~ x 2 = x3, and so on. ~ x ~ x ~ x ~ x ~ x ~ x ~ x ~ x ~ x (5.2) x 1 y 2 x 3 x 4 x 5 x 6 y 7 y 8 x 9 y 10 x 11 x 12 x 13 y 14 (a) x 1 x 3 x 4 x 5 x 6 x 9 x 11 x 12 x 13 (b) y 2 y 7 y 8 y 10 y 14 (c) Fgure 75. Two nterleaved autoregressve processes x n and autoregressve process x ; (c) autoregressve process n y n : (a) the nterleaved process; (b) y n. To synthesze samples usng ths model, two separate AR processes are generated wth parameters traned by x~ n and y~ n. Synthetc samples generated by parameters traned by x~ n are taken one after the other to form the fnal synthetc process. Synthetc samples generated by 81

96 parameters traned by skpped. y~ n are formed n a smlar way. In bref, no synthetc samples are It can be seen that the conventonal method does not take nto account the tmng nformaton of the two processes n both the tranng and synthess stages. To consder the tmng nformaton, we propose to tran and synthesze the AR processes as follows (Fgure 76). Frst splt the sngle process n Fgure 76 (a) to two separate processes as shown n Fgure 76 (b) and Fgure 76 (c). Notce that the tmng nformaton s utlzed whle leavng the sample of x n blank f at some partcular tme nstance the orgnal process s wth the other process y n (Fgure 76 (b)). Smlarly, the sample of other process x n (Fgure 76 (c)). y n s blank f at some tme nstance the orgnal process s wth the x 1 y 2 x 3 x 4 x 5 x 6 y 7 y 8 x 9 y 10 x 11 x 12 x 13 y 14 (a) x 1 _ x 3 x 4 x 5 x 6 x 9 _ x 11 x 12 x 13 _ (b) _ y 2 y 7 y 8 _ y 10 _ y 14 (c) Fgure 76. Two nterleaved autoregressve processes x n and punctured autoregressve process x ; (c) punctured autoregressve process n y n : (a) the nterleaved process; (b) y n. To tran the punctured AR processes, we fnd the value ~ ρ X of the process ~ x ~ x ~ x ~ x ~ x ~ x ~ x ~ x ~ x. We also construct the hstogram of the sample spacng. For example n Fgure 76 (b), the hstogram has values [5, 2, 1] at spacng [1, 2, 3]. Recall that n the smplest case where all the samples are adjacent to each other, the hstogram has values [8, 0, 0] at spacng [1, 2, 3]. The ρ X of the process n Fgure 76 (b) can be found by solvng the followng equaton: ρ X + ρ X + ρ X ~ = ρ X ( ) (5.3) The value of ρ Y n Fgure 76 (c) can be solved n the same way. To synthesze samples usng ths model, two separate AR processes are generated by and ρ. The fnal synthetc process s formed by some means of multplexng/modulaton of the Y two synthetc processes. Samples are not taken one after the other but wth consderaton of how separate the samples of the same type are. In bref, some synthetc samples are skpped. ρ X 82

97 Punctured AR processes can be modulated by dfferent processes. In ths study, we specfcally consder the Markov process Varable Bt Rate Vdeo Traffc Modelng Before we proceed to the proposed model for VBR vdeo traffc, let us brefly descrbe the method proposed by [53] wth whch we wll compare (Fgure 77). The model comprses of two layers of Markov process. Wthout loss of generalty, we consder two frame types, I and P. The doubly Markov process models I and P frame transtons, as well as dfferent frame actvtes. The outer Markov process descrbes how I and P frames transt. The frame type can be further categorzed nto dfferent actvty levels. Frames of hgher actvty level consume more number of bts, whle frames of lower actvty level consume less number of bts. The nner Markov process descrbes how the frames of dfferent actvty levels transt. Ths model s not constraned by a fxed GOP structure. There are sx states n total, namely, I frame n hgh actvty level, I frame n medum actvty level, I frame n low actvty level, P frame n hgh actvty level, P frame n medum actvty level, and P frame n low actvty level. Each state s modeled as an AR process wth dfferent AR parameters. We can consder ths as a slghtly more complex process than the one descrbed n Fgure 75. There are two states n Fgure 75 (a) whle there are sx states n ths model. The AR parameters are traned n the way descrbed n Fgure 75 of n a conventonal non-punctured manner. The synthetc trace s generated by procedures descrbed n the last secton. To smply the model, the nner Markov process of I frames s characterzed by ntal probabltes of three actvty levels only. Snce I frames are usually far apart n a vdeo sequence, they do not need to modeled as a Markov process. Such smplfcaton wll have smlar performance as the full model. We call ths model Method 1, as shown n Fgure 77 (a), for later dscusson. We propose to model the VBR vdeo traffc as a doubly Markov modulated punctured AR process. Ths new model explctly consders the tmng nformaton between two frames of the same state. Agan, ths s a slghtly more complex process than the one descrbed n Fgure 76. There are two states n Fgure 76 (a) whle there are sx states n ths model. The AR parameters are traned n the way descrbed n Fgure 76 n a punctured manner. The synthetc trace s generated by procedures descrbed n the last secton. We call ths model Method 2, as shown n Fgure 77 (b), for later dscusson. 83

Experment We now compare the performance of the non-punctured Method 1 wth the proposed punctured Method 2.

98 AR(1) H I State M H P State M Punctured AR(1) H I State M H P State M L L L L (a) (b) Fgure 77. Models for VBR vdeo traffc (a) Method 1: Doubly Markov modulated AR process; (b) Method 2: Doubly Markov modulated punctured AR process Experment We now compare the performance of the non-punctured Method 1 wth the proposed punctured Method 2. To evaluate the performance of the models, we consder four performance metrcs: (1) frst order statstcs by means of the quantle-quantle (Q-Q) plot; (2) second order statstcs by means of the auto-correlaton functon (ACF); (3) LRD property by means of the Hurst parameter from the range/standard devaton (R/S) plot; and (4) queung behavor by means of the packet loss rate and the queung delay. Defntons of the performance metrcs can be found n [28][37][47]. The experment settng s as follows. Two dfferent types of TV programs are recorded: news as shown n Fgure 78 (a) and talk show as shown n Fgure 78 (b). The two TV programs are encoded usng vdeo compresson codec H.263 to generate the real vdeo traces. Both of them are encoded wth frame rate of 15 frames/sec and wth duraton of 30 mnutes each. The vdeo trace of the clp news s shown n Fgure 79 (a) and (b) wth dfferent scales. The vdeo traces are then fed nto both models: Method 1 and Method 2 to generate synthetc traces. The performances of the two models are evaluated. (a) (b) Fgure 78. Test vdeos: (a) news; (b) talk show 84

99 btrate (bts/frame) x seconds (a) btrate (bts/frame) x seconds (b) Fgure 79. Sample traces from the TV program news : (a) a 200 second trace; (b) a 20 second trace. The performance comparson s summarzed n Table 2. MSE refers to mean square error compared to the real trace. It can be seen that the proposed Method 2 outperforms Method 1 n all fve aspects. The MSE mprovement s computed by (MSE of Method 2) (MSE of Method 1) / (MSE of Method 1). Detaled dscusson assocated wth each performance metrc wll be presented later. Table 2 Summary of performance comparson between modelng methods Method 1 and Method 2 MSE of Q-Q plot MSE of ACF Hurst parameter MSE of packet loss rate MSE of queung delay Real Method e e (98e-4 n SE) Method e e (3.8025e-4 n SE) e e e e-4 MSE mprovement 85.50% 63.56% 96.12% 1.88% 26.87% Fgure 80 (a)(b) shows the performance of both models n terms of frst and second order statstcs. The frst order statstcs n Fgure 80 (a) s shown by the Q-Q plot. The Q-Q plot s constructed by a par of cumulatve dstrbuton functons (CDF). The closer one CDF s to the other CDF n one par, the more the curve wll look lke a straght lne y = x. In Fgure 80 (a), a dotted straght lne s plotted as a reference. We have two pars of CDF to compare: the synthetc 85

100 trace by Method 1 wth respect to the real vdeo trace and the synthetc trace by Method 2 wth respect to the real vdeo trace. The dashed-dotted curve n Fgure 80 (a) refers to the frst par. The sold curve n Fgure 80 (a) refers to the second par. It s shown that the curve of Method 2 s closer to the reference dotted straght lne than the curve of Method 1. The second order statstcs n Fgure 80 (b) s shown by the ACF. The dotted curve n Fgure 80 (b) refers to the ACF of the real vdeo trace. The dashed-dotted curve n Fgure 80 (b) refers to the ACF of the synthetc trace by Method 1. The sold curve n Fgure 80 (b) refers to the ACF of the synthetc trace by Method 2. It s shown that the curve of Method 2 s closer to the reference curve than the curve of Method 1. x Real Method 1 Method x 10 4 (a) 18 x Real Method 1 Method lag (b) Fgure 80. Frst and second order statstcs of the synthetc traces generated by Method 1 and Method 2 wth respect to the real vdeo trace of the clp news. (a) Frst order statstcs: Q-Q plot; (b) Second order statstcs: ACF. The LRD property n Fgure 81 s shown by means of the Hurst parameter, whch s the slope of the lnear regresson lne of the ponts n a R/S plot. Fgure 81 (a) shows the R/S plot of the real vdeo trace. The lnear regresson lne s shown as a dotted lne. Fgure 81 (b) shows the R/S plot of the synthetc trace by Method 1. Fgure 81 (c) shows the R/S plot of the synthetc trace by Method 2. It s shown that the synthetc trace by Method 2 has closer Hurst parameter value to the real vdeo trace than the synthetc trace by Method 1. 86

101 R/S n log scale N n log R/S n log scale N n log R/S n log scale N n log (a) (b) (c) Fgure 81. LRD propertes of three traces by Hurst parameter from the R/S plots: (a) real vdeo trace; (b) synthetc trace by Method 1; (c) synthetc trace by Method 2 The queung behavor of the traces s evaluated by means of the packet loss rate and the queung delay. Packet loss rate and queung delay are measured at dfferent dran rates and buffer szes. The network s a leaky bucket wth dran rate R and tme to dran M R, where M s the buffer sze. The queung performance of the real trace s shown n Fgure 82. The queung performances of the synthetc traces by both Method 1 and Method 2 have smlar look. However, the synthetc trace by Method 2 has smaller MSE n both the packet loss rate and the delay than the synthetc trace by Method 1 (Table 2). (a) (b) Fgure 82. Queung behavor of the real vdeo trace: (a) packet loss rate; (b) queung delay Concluson We proposed a new punctured AR processes to model the vdeo traffc. The punctured AR processes are modulated by Markov processes. The punctured AR processes explctly consder 87

102 the tmng nformaton between samples of each state. Thus, t outperforms the conventonal approach n VBR vdeo traffc modelng. A good set of performance metrcs are expermented showng the novelty of the proposed model n dfferent aspects, especally n the queung behavor. 88

103 6. Summary and Future Drectons Ths thess provdes an error-reslent rate shapng framework for streamng vdeo over packetloss networks. The challenges n transmttng multmeda data over packet-loss networks urge the need of closer collaboraton between the applcaton layer and the network layer. The proposed error-reslent rate shapng acts as a flterng process to adapt the precoded vdeo from the applcaton layer accordng to the network condtons gven by the network layer. After ntroducng the fundamentals n Chapter 2, Chapter 3 and Chapter 4 consttute the proposed error-reslent rate shapng system for streamng the enhancement layer vdeo and base layer vdeo. FGRS s appled to streamng enhancement layer vdeo and ECARS s appled to streamng base layer vdeo. The contrbuton of the thess les n: Error-reslent rate shapng for pre source- and channel- coded vdeo None of the pror rate shapng work consders rate adaptaton of pre source- and channelcoded vdeo. Pre source- and channel- coded vdeo s useful for streamng over packet-loss networks. Ths thess provdes a R-D optmzed soluton for streamng the pre source- and channel- coded vdeo. o Gven the gan embedded n the btstream, FGRS and ECARS consume on the average <0.01% and <1% (the denomnator s the bt rates of the source-coded btstream), respectvely, of the orgnal precoded vdeo to carry the gan nformaton ( meta-data ). The performance mprovement of FGRS and ECARS n PSNR over non rate shapng based methods s on the average 8 db. On the other hand, f the gan s not embedded n the btstream for rate shapng, no extra bts are needed to carry the gan nformaton. Partal decodng to obtan the gan nformaton s requred. o FGRS provdes an error-reslent rate shapng scheme for pre- channel coded MPEG- 4 FGS btstream. o ECARS provdes an error-reslent rate shapng scheme for source- and channelcoded btstream that s aware of the error concealment method used at the recever. Two-stage R-D optmzaton algorthm 89

104 Frst stage of the two-stage R-D optmzaton algorthm provdes a model-based approach to fnd the near-optmal soluton. Wth the refnement of the second stage, the proposed twostage R-D optmzaton algorthm fnds the soluton fast and accurately. Error-reslent rate shapng vs. error-reslent vdeo codng The proposed error-reslent rate shapng does not need to alter the orgnal vdeo encoder and decoder, thus can be adopted by systems, n whch tremendous amount of work to modfy the vdeo coders s needed. Error-reslent rate shapng vs. jont source-channel codng Jont source-channel codng technques are lmted by only provdng the optmzaton at the tme of encodng and are not sutable for streamng the precoded vdeo. The encoded btstream may not be optmal for transmsson along a dfferent path or along the same path at later tme. Rate shapng can optmze the vdeo streamng performance adaptve to each lnk. Future work ncludes replacng streamng by smulcast wth rate shapng. Smulcast s adopted n the current vdeo streamng applcatons. Multple streams of dfferent qualtes are sent concurrently to satsfy the needs of dfferent users wth dfferent devce capabltes and access bandwdths. We can see that the bandwdth utlzaton s not effcent gven the concurrent transmssons of multple streams. Moreover, t s not only nflexble n adaptng the bt rates accordng to the avalable bandwdths, but also ntolerant to the packet losses. The proposed error-reslent rate shapng provdes a soluton to overcome the shortages of the current vdeo streamng wth smulcast. The precoded vdeo can be adapted to any bt rates to make use of the avalable bandwdths. Hence, the vdeo qualty wll not be constraned to a lmted amount of qualty steps. The R-D optmzed decson of rate shapng also guarantees the shaped btstream to have the best error-reslency gven the current network condton. For example, we can modfy End System Multcast (ESM) [13], whch currently adopts a smulcast approach for multcastng (Fgure 83), to ncorporate the propose rate shapng (Fgure 84). In that, ESM does not need to transmt two vdeo btstream concurrently to the host computers. The rate shapng mechansm resdes n the parent host computer adapts the rates for the chld host computer. Each host computer can enjoy vdeo wth fne granular qualty. 90

105 Analog Source Encoder Audo Vdeo L Vdeo H Host computer ESM Qucktme Host computer ESM Qucktme Host computer ESM Qucktme Fgure 83. End system multcast (ESM) wth smulcast Analog Source Encoder Audo Vdeo H Host computer ESM Qucktme Rate Rate Rate Rate Shapng Shapng Shapng Shapng Host computer ESM Qucktme Rate Rate Rate Rate Shapng Shapng Host computer ESM Qucktme Rate Rate Shapng Fgure 84. End system multcast (ESM) wth rate shapng 91

106 Appendx A. Second-Generaton Error Concealment When transmttng vdeo data over error prone channels, the vdeo data may suffer from losses or errors. Error concealment s an effectve way to recover the lost nformaton at the decoder. Compared to other error control mechansms such as FEC [61] and automatc retransmsson request (ARQ) [34], error concealment has the advantages of not consumng extra bandwdth as FEC and not ntroducng retransmsson delay as ARQ. On the other hand, error concealment can be used to supplement FEC and ARQ when both FEC and ARQ fal to overcome the transmsson errors [5]. Error concealment s performed after error detecton. That s, error concealment needs to be preceded wth some error detecton mechansm to know where the errors n the decoded vdeo locate. For example, error detecton provdes nformaton as whch part of the receved vdeo btstream s corrupted. Varous methods, such as checkng the vdeo btstream syntax, montorng the packet numbers of the receved vdeo data, etc., can be appled [2][23]. In ths work, we assume that the errors are located and such nformaton s avalable to us. We focus on the reconstructon for the lost vdeo. All error concealment methods reconstruct the lost vdeo content by makng use of some a pror knowledge about the vdeo content. Most exstng error concealment methods, whch we refer to as frst-generaton error concealment, buld such a pror n a heurstc manner by assumng smoothness or contnuty of the pxel values, etc. The proposed second-generaton error concealment methods tran context-based models as the a pror. Methods of such a framework have advantages over frst-generaton error concealment, as the context-based model s created specfcally for the vdeo content hence can capture the statstcal varatons of the content more effectvely. It s mportant for a second-generaton error concealment approach to choose a model that can represent the vdeo content effectvely. Prncpal component analyss (PCA) has long been used to model vsual content of mages. The most well known example s usng egenfaces to represent human faces [54]. In ths work, we ntroduce two new adaptve models adaptve 92

107 mxture prncpal component analyss (AMPCA, pror name updatng mxture of prncpal components (UMPC)) [8][9] and adaptve probablstc prncpal component analyss (APPCA) for second-generaton error concealment. AMPCA and APPCA are very sutable for error concealment applcatons n that t updates wth non-statonary vdeo data. A.1. Adaptve Mxture of Prncpal Component Analyss (AMPCA) We consder a sngle component case of AMPCA n the followng, named APCA. Interested readers can read [8][9] for AMPCA (pror name updatng mxture of prncpal components (UMPC)), a more general case of APCA wth the number of mxture components greater than one. Gven a set of data, we try to model the data wth mnmum representaton error. The data gven can be non-statonary,.e., the stochastc propertes of the data are tme-varyng as shown n Fgure 85 (a). For example, at tme nstant n, the data are dstrbuted as shown by Fgure 85 (a). At tme nstant n, the data are dstrbuted as shown by Fgure 85 (b). We see that the mean of the data s shftng and the most representatve axes of the data are also rotatng. * ** * ** ** * * * * ** * * * * ** * * * * * * * * * * * * * * * * * * ** * ** * * * * * * * ** ** * * * *** * * * * * * ** * * * * * * * * * * * * * * * * ** * * * * * (a) (b) Fgure 85. Non-statonary data at (a) tme n (b) tme n At any tme nstant, we attempt to represent the data as a weghted sum of the mean and prncpal axes. As tme proceeds, the model changes ts mean and prncpal axes as shown n Fgure 86 from Fgure 86 (a) to Fgure 86 (b), so that t always models the current data effectvely. To accomplsh ths, the representaton/reconstructon error of the model evaluated at tme nstant n should have less contrbuton from the data that are further away n tme from the current tme nstant n. 93

108 ( n) u 1 * ** * ** ** * * * * ** * * * * ** * * * * * * * * * * * * * * * * * ** * ** * * m ( n) * * * * * ** ** * * * *** * * * * * * * * * * ** * * * * * * * * * * * ** * * * * m u * ( n' ) ( n ' ) 1 (a) (b) Fgure 86. APCA for non-statonary data at (a) tme n (b) tme n The optmzaton objectve functon at tme nstant n, that tres to mnmze the sum of weghted reconstructon errors of all data, can be wrtten as: P n mn x ( ) ( ) n m + n n m, U = 0 k= 1 The notatons are organzed as follows: ( ) ( ) ( ) ( n) T n [ ] ( n α x ) n m u u k k (A.1) xˆ n 2 n D P : Current tme ndex : Dmenson of the data vector : Number of egenvectors x : Data vector at tme n, where represents how far away the data are from n the current tme nstant ( n) m : Mean at tme n ( n ) u : k th egenvector at tme n k ( n) U : Matrx wth P columns of ( n) u, P k k = 1 ~ xˆ : n Reconstructon of x n α : Decay factor, 0 < α < 1 94

109 The reconstructon errors contrbuted by prevous data are weghted by powers of the decay factor α. The powers are determned by how far away ths sample of data s from the current tme nstant. At any tme nstant n, we try to re-estmate or update the parameter (mean or egenvector) gven the parameter estmated at the prevous tme nstant n 1 and the new data x n, by mnmzng (A.1). The soluton of mean ( n) m that mnmzes (A.1) at tme n s: m ( n) ( n 1) = αm + ( 1 α ) x n (A.2) We can see that ( n) m s obtaned from the prevous estmated The decay factor α tells how fast the new estmaton smaller the decay factor, the faster the estmated covarance matrx Agan, ( n) C that mnmzes (A.1) at tme n s: C ( n) ( n 1) ( n) ( ) = αc + 1 α ( n) C s obtaned by the prevous estmated ( n 1) m and the current nput x n. ( n) m adapts to the new data x n. The ( n) m adapts to the new data. Smlarly, the ( n) [( x )( ) ] T n m xn m factor α controls how fast the egenvectors adapt to the new data (A.3) ( n 1) C and the current nput x n. The decay Experment result of usng APCA for error concealment s shown n Fgure 87 and Fgure 88. Fgure 87 shows the updated APCA model at dfferent tme nstances. Fgure 88 shows the concealment result compared wth the spatal nterpolaton method. x n. 95

110 Tme 20 Tme 22 Tme 60 Mean 1 st Egenvector 2 nd Egenvector 3 rd Egenvector 4 th Egenvector 5 th Egenvector 6 th Egenvector Fgure 87. Updated means and egenvectors at tme nstants 20, 22, and 60 (a) (b) (c) Fgure 88. Sample reconstructed frames of Intra-coded Intervew wth: (a) no concealment; (b) concealment wth spatal nterpolaton; or (c) concealment wth APCA 96

111 A.2. Adaptve Probablstc Prncpal Component Analyss (APPCA) In the APCA/AMPCA approach, the APCA/AMPCA model s merely a subspace. There s no probablty nformaton assocated wth the model. We propose a new probablstc-based and non- statonary model adaptve probablstc prncpal component analyss (APPCA). (a) (b) Fgure 89. (a) Probablstc PCA (PPCA) (b) PCA Gven a set of tranng data n real tme, we try to descrbe them wth a statstcal model. The data can be non-statonary, that s, the statstcal propertes of the data are tme-varyng. Wth such non-statonary data, we want the statstcal model to be traned based on the recent data more than the older data, so as to descrbe the recent data better. Such a statstcal model adjusts ts model parameters to adapt to the ncomng data. Let us represent the set of tranng data as = ( y y L y L) of the tranng data n Y 1, where each n y s a d -dmensonal vector. The ndces n, n 1,, n, etc., ndcate the tme, where n s the current tme nstant, n 1 s the prevous tme nstant, and so on. ( n) Before we proceed further, let us ntroduce 1) the weghted sample mean m ; and 2) ( n) the egenanalyss of the weghted sample covarance matrx S, of the tranng data Y, at tme ( n) nstant n. These two results wll be used later. The weghted sample mean m s defned as: n n m ( n) α + α + L = 0 α y n (A.4) It can be expressed n a recursve form as: m ( n) ( n 1) = αm + ( 1 α ) y n (A.5) The weghted sample covarance matrx ( ) n S s defned as: 97

112 S α + α + L ( n) ( n ) = 0 α ( n ) T T ( y m )( y m ) = ( 1 α ) Y Y n n (A.6) where Y s defned as: Smlarly, The rank of ( n) ( n 1) ( n ) ( y n m ) α ( y n 1 m ) L α ( y n ) L) (A.7) Y m ( n) S can be wrtten n a recursve form as: ( n) S s S ( n) ( n ) ( n) ( ) ( n) ( y m )( y m ) T = αs α (A.8) ( n) rank ( S ) = r, where r d. The egenanalyss result of E ( n) T S = E, where EΛ ( e e L ) T = 1 2 e r, e e j = δ j n n, and ( n) S s: λ1 Λ = 0 λ 2 O 0 = dag λ r ( λ, λ, L, λ ) 1 2 r (A.9) Now we are ready to ntroduce the statstcal model and the adaptve estmators to obtan the model parameters. The Gaussan latent varable model s: ( n) ( n) y = µ + W x + e (A.10) where y s the observed date, e ( 0 I p ) x nose. To llustrate, the model s shown n Fgure 90. x ~ N, (A.11) ( n) ( n) ~ N( 0, R ) = N 0, ε I (A.12) e ( d ) e ( n) µ s the mean of the data, x s the hdden varable, and e s the (a) (b) Fgure 90. PPCA at (a) tme n (b) tme n 98

113 To solve for the model parameter, we propose an adaptve maxmum lkelhood (ML) estmator. θˆ = argmax α ln p( y n θ) = argmax L( θ) (A.13) θ = 0 θ = 0 where we defne L( ) ln p( y θ) θ α. n To perform the adaptve ML estmaton, let us frst wrte out p ( y θ), where θ = { µ,w,ε}. ( y θ) (A.12). p can be derved from the pdf of x and e, whch are lsted n (A.11) and T ( y W ) = N ( 0 WW + R) y p,ε, (A.14) The sum of the weghted log-lkelhoods functon L ( θ) can then be expressed as: L ( θ) = α ln p( y n W, ε ) = 0 ( ) 1 d ln 2π 1 T T ( y n µ ) ( WW + R) = T α ( α ) ln WW + R 2 ( y µ ) = 0 n 1 (A.15) To fnd µˆ, let us take the dervatve of L ( θ) wth respect to µ and set the dervatve to zero. µˆ s therefore, Wth (A.16), we can express (A.15) as: L ( θ) = π 2 1 ( α ) ( n) ( 1 ) α y m µ ˆ = α n = (A.16) = 0 [ ] T T 1 ( n) { d ln( 2 ) + ln WW + R + tr ( WW + R) S } 1 (A.17) To fnd Ŵ, let us take the dervatve of L ( θ) wth respect to W and set the dervatve to zero. Ŵ s: Wˆ 1 ( ) 2 = E( p) Λ( p) εi (A.18) p where E ( p) represents the egenvectors of S ( n) up to the p th and Λ ( p) represents the ( n) egenvalues of S up to the p th, where r L θ wth respect to ε and set the dervatve to zero. εˆ s: p. To fnd εˆ, let us take the dervatve of ( ) 99

114 r 1 ˆ ε = λ (A.19) d p p+ 1 Experment result of usng APPCA for error concealment s shown n Fgure 91. Error concealment result usng APPCA compared wth result usng APCA s shown. j (a) (b) (c) Fgure 91. Sample reconstructed frames of Intra-coded Intervew wth: (a) no concealment; (b) concealment wth APCA; or (c) concealment wth APPCA 100

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University Dynamc Optmzaton Assgnment 1 Sasanka Nagavall snagaval@andrew.cmu.edu 16-745 January 29, 213 Robotcs Insttute Carnege Mellon Unversty Table of Contents 1. Problem and Approach... 1 2. Optmzaton wthout