A High Definition Motion JPEG Encoder Based on Epuma Platform

Similar documents
Module 6 STILL IMAGE COMPRESSION STANDARDS

FC-JPEG04 JPEG Compression Design Specification

Study on Repetitive PID Control of Linear Motor in Wafer Stage of Lithography

Available online at ScienceDirect. Procedia Technology 17 (2014 )

Chapter 9 Image Compression Standards

Hybrid Coding (JPEG) Image Color Transform Preparation

Journal of Engineering Science and Technology Review 9 (5) (2016) Research Article. L. Pyrgas, A. Kalantzopoulos* and E. Zigouris.

Analysis on Color Filter Array Image Compression Methods

Available online at ScienceDirect. Anugerah Firdauzi*, Kiki Wirianto, Muhammad Arijal, Trio Adiono

AN EFFICIENT ALGORITHM FOR THE REMOVAL OF IMPULSE NOISE IN IMAGES USING BLACKFIN PROCESSOR

An Integrated Image Steganography System. with Improved Image Quality

ASIP Solution for Implementation of H.264 Multi Resolution Motion Estimation

A High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction

Ch. 3: Image Compression Multimedia Systems

Implementation of Face Detection System Based on ZYNQ FPGA Jing Feng1, a, Busheng Zheng1, b* and Hao Xiao1, c

Reconfigurable Video Image Processing

Available online at ScienceDirect. The 4th International Conference on Electrical Engineering and Informatics (ICEEI 2013)

The Design of Experimental Teaching System for Digital Signal Processing Based on GUI

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

Multi-Channel FIR Filters

Video Enhancement Algorithms on System on Chip

Heterogeneous Concurrent Error Detection (hced) Based on Output Anticipation

Implementing Logic with the Embedded Array

Accelerating embedded software processing in an FPGA with PowerPC and Microblaze

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

A Hybrid Technique for Image Compression

Comparative Analysis of Lossless Image Compression techniques SPHIT, JPEG-LS and Data Folding

Image Acquisition Method Based on TMS320DM642

Image processing. Case Study. 2-diemensional Image Convolution. From a hardware perspective. Often massively yparallel.

Efficient Construction of SIFT Multi-Scale Image Pyramids for Embedded Robot Vision

Open Source Digital Camera on Field Programmable Gate Arrays

Figures from Embedded System Design: A Unified Hardware/Software Introduction, Frank Vahid and Tony Givargis, New York, John Wiley, 2002

Part Number SuperPix TM image sensor is one of SuperPix TM 2 Mega Digital image sensor series products. These series sensors have the same maximum ima

Convolution Engine: Balancing Efficiency and Flexibility in Specialized Computing

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

A HIGH PERFORMANCE HARDWARE ARCHITECTURE FOR HALF-PIXEL ACCURATE H.264 MOTION ESTIMATION

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

A SCALABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye

Very High Speed JPEG Codec Library

Open Source Digital Camera on Field Programmable Gate Arrays

Study of Implementation of Image Analysis with Hardware and Software Co-Design on the Xilinx Platform

A Modified Image Template for FELICS Algorithm for Lossless Image Compression

Research on the communication system of Mine Managing Mobile

Temperature Monitoring System Based on Hadoop and VLC

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

Aerial Photographic System Using an Unmanned Aerial Vehicle

High Performance Imaging Using Large Camera Arrays

Exact Characterization of Monitor Color Showing

Lossy and Lossless Compression using Various Algorithms

A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS

Real-Time License Plate Localisation on FPGA

The design and implementation of high-speed data interface based on Ink-jet printing system

B.E, Electronics and Telecommunication, Vishwatmak Om Gurudev College of Engineering, Aghai, Maharashtra, India

Assistant Lecturer Sama S. Samaan

DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS

A HIGH SPEED FIFO DESIGN USING ERROR REDUCED DATA COMPRESSION TECHNIQUE FOR IMAGE/VIDEO APPLICATIONS

A Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution

Energy Consumption and Latency Analysis for Wireless Multimedia Sensor Networks

Demosaicing Algorithm for Color Filter Arrays Based on SVMs

ISSN Vol.03,Issue.02, February-2014, Pages:

An Efficient Forward Error Correction Scheme for Wireless Sensor Network

Introduction to More Advanced Steganography. John Ortiz. Crucial Security Inc. San Antonio

Low-Complexity Bayer-Pattern Video Compression using Distributed Video Coding

A Crop Monitoring System Based on Wireless Sensor Network

A Self-Reconfigurable Implementation of the JPEG Encoder

COMPRESSION OF SENSOR DATA IN DIGITAL CAMERAS BY PREDICTION OF PRIMARY COLORS

Algorithmic-Technique for Compensating Memory Errors in JPEG2000 Standard

Journal of Chemical and Pharmaceutical Research, 2013, 5(9): Research Article. The design of panda-oriented intelligent recognition system

Based on the ARM and PID Control Free Pendulum Balance System

An Implementation of LSB Steganography Using DWT Technique

Imaging serial interface ROM

Multi-sensor Panoramic Network Camera

Face Detection System on Ada boost Algorithm Using Haar Classifiers

Detection of Image Forgery was Created from Bitmap and JPEG Images using Quantization Table

Image Compression Supported By Encryption Using Unitary Transform

Evolution of DSP Processors. Kartik Kariya EE, IIT Bombay

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

ScienceDirect. A Novel DWT based Image Securing Method using Steganography

A FPGA Implementation of Power Efficient Encoding Schemes for NoC with Error Detection

Artifacts and Antiforensic Noise Removal in JPEG Compression Bismitha N 1 Anup Chandrahasan 2 Prof. Ramayan Pratap Singh 3

Parallel Storage and Retrieval of Pixmap Images

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Hybrid QR Factorization Algorithm for High Performance Computing Architectures. Peter Vouras Naval Research Laboratory Radar Division

Efficient Image Compression Technique using JPEG2000 with Adaptive Threshold

Team Description Paper: HuroEvolution Humanoid Robot for Robocup 2010 Humanoid League

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

AN IMPLEMENTATION OF MULTI-DSP SYSTEM ARCHITECTURE FOR PROCESSING VARIANT LENGTH FRAME FOR WEATHER RADAR

Optimized Image Scaling Processor using VLSI

The Application of the Three-dimensional Display Technology in the Website Construction

DATA ENCODING TECHNIQUES FOR LOW POWER CONSUMPTION IN NETWORK-ON-CHIP

Putting It All Together: Computer Architecture and the Digital Camera

Design of CMOS Instrumentation Amplifier

Intelligent Camera for Object Identification and Tracking

Mahendra Engineering College, Namakkal, Tamilnadu, India.

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Pipelining Harris Corner Detection with a Tiny FPGA for a Mobile Robot

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

A SURVEY ON DICOM IMAGE COMPRESSION AND DECOMPRESSION TECHNIQUES

Implementation of a Visible Watermarking in a Secure Still Digital Camera Using VLSI Design

Transcription:

Available online at www.sciencedirect.com Procedia Engineering 29 (2012) 2371 2375 2012 International Workshop on Information and Electronics Engineering (IWIEE) A High Definition Motion JPEG Encoder Based on Epuma Platform Yanjun Zhang a, Wenbiao Zhou a, Zhenyu Liu a, Siye Wang a*, Dake Liu ab a School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081, China b Department of EE,Linkoping university,linkoping, 51583,Sweden Abstract The epuma is a novel parallel DSP platform based on master-multi-simd architecture. The essential technology is to use separated data access kernels and algorithm kernels to minimize the communication overhead of parallel processing by running the two types of kernels in parallel. In this paper, a high definition motion JPEG encoder based on epuma platform is introduced. The epuma processor is re-configured and acts as the main processing unit of the encoder. The motion JPEG encoder is implemented on the FPGA development board. Results show that the encoder can process high definition video with 1920x1080@30fps. 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Harbin University of Science and Technology Open access under CC BY-NC-ND license. Keywords: Motion JPEG, DSP, epuma 1. introduction epuma [1], [2] is a domain-specific embedded heterogeneous 9-core chip multiprocessor with a unique design for low power and high silicon efficiency for high-throughput DSP and image processing computations in emerging telecommunication and multimedia applications [3]. The epuma platform is based on master-multi-simd architecture. The essential technology is to separate the data access from the data processing. Thus there is no explicit time for data access by running the data access kernels and algorithm kernels in parallel. High processing performance can be achieved by epuma platform for some applications. Motion JPEG is a widely used video encoding standard. It compresses separately each frame of the * Corresponding author. Tel.: +86-10-6891-8279. E-mail address: boyew@bit.edu.cn. 1877-7058 2011 Published by Elsevier Ltd. doi:10.1016/j.proeng.2012.01.317 Open access under CC BY-NC-ND license.

2372 Yanjun Zhang et al. / Procedia Engineering 29 (2012) 2371 2375 video sequence in JPEG format [4],[5]. A high definition motion JPEG encoder is introduced in this paper which is a case study for the epuma platform. The goal of this encoder is to demonstrate the performance of epuma platform. As epuma is a scalable platform, it is reconfigured to meet the requirements of motion JPEG encoder. Besides the master core, only four out of the eight SIMD cores are used to implement the motion JPEG encoder. Results show that the reconfigured epuma can fulfil the requirements of motion JPEG perfectly. The architecture of epuma platform will be introduced in section 2. And an overview and detailed design of the motion JPEG encoder will be described in section 3 and section 4. Section 5 will give the implementation of the encoder and finally conclusion will be drawn in section 6. 2. Architecture of epuma The epuma master-multi-simd architecture is illustrated in Fig. 1(a). It consists of one master controller, eight SIMD coprocessors, and a memory subsystem for the on-chip communication. The master processor executes the sequential task in an application algorithm, while the SIMD cores run the parallelizable portion of the algorithms. Each SIMD has a local program memory (PM) and data memory (DM). DM is a vector memory which can exchange data with main memory through the central DMA controller. The vector data from one SIMD could also be sent to any other SIMD(s) by the packet based interconnection network with eight switching nodes [2]. Fig. 1. (a)architecture of epuma (b) memory hierarchy Fig. 1(b) shows the memory hierarchy of epuma. The memory hierarchy consists of three layers. The highest layer is the off-chip main memory, which is with a low clock rate and has the longest access latency from the processing cores. The local memory as the second level computing buffer includes both data memory and program memory. The master controller uses two data memories and a cache as the local program memory. The SIMD processors use eight-bank vector memory as the local data buffer and a simple scratchpad memory for program. The lowest layer in the memory hierarchy is the register files in the master and SIMD processors [2]. 3. Overview of the motion JPEG encoder The whole system for motion JPEG encoding is composed of video capture unit, data processing unit and results storing unit. Fig. 2 shows the block diagram of the motion JPEG encoder. The whole system combines the high performance processor and some specific accelerators to improve the performance. The video signals are captured by the video camera and then pre-processed by some specific circuits, such as synchronizing unit, frame counter and format converter which translates Bayer's format to RGB format. After pre-processing,video streams will be stored in the DDR memory. epuma is the main process unit of this encoder. It gets video data from DDR memory and processes the data by motion JPEG

Yanjun Zhang et al. / Procedia Engineering 29 (2012) 2371 2375 2373 specification. Then epuma will send data to a buffer for Huffman encoding. The Huffman encoder is another accelerator to perform the Entropy Coding. After encoded, the compressed data will be stored into USB disk by the USB controller. Finally a compressed video stream is stored in the USB disk and can be replayed on your computer. Fig. 2 block diagram of motion JPEG encoder 4. Detailed design of motion JPEG encoder 4.1. data preparation Before compressed by epuma processor, the video signals must be captured and pre-processed firstly. A 500M pixels video camera from Terasic Co. Ltd was selected in this project to capture the video signal. A camera controller was designed to control the camera to obtain the high quality and proper resolution. The camera controller was made up of clock generator and parameter configuration unit. The clock generator generated correct clock signal for the camera. The parameter configuration unit provided proper parameters for camera to get the satisfied image, including the resolution, the gain adjustment for three colours, and so on. The captured data from the video camera was in Bayer's format. To simplify the processing in epuma processor, data in Bayer's format were translated into RGB format before sent into epuma processor. Fig. 3 shows an example of image in Bayer's format, in which only one colour value is sampled for one pixel. Bilinear interpolation algorithm is used to construct the RGB image from Bayer's format. That is, the average value of all the same colour values in adjacent pixels will be used as the colour value for one pixel. For example, G34 = (G33 + G24 + G35 + G44)/4 B34 = (B43 + B45 + B23 + B25)/4 R35 = (R34 + R36)/2 Fig. 3 Bayer's format (R: red, G: green, B: blue, the number after the colour stands for the coordinates of the pixel) It can be seen that data in three lines will be used for interpolating. So a three-line buffer is designed to pipeline the data, as shown in Fig. 4. There are three consecutive line buffers and the length of each line buffer is equal to the row size of the image. Two registers are designed to buffer the outputs the of each

2374 Yanjun Zhang et al. / Procedia Engineering 29 (2012) 2371 2375 line buffer. A 3*3 matrix will appear on the outputs of line buffers and the registers as shown in Fig. 4 which is corresponding a 3*3 block in the image. So the three colour values for the central pixel will be generated. In each clock cycle, one datum is fed into the line buffer and the RGB values for one pixel are worked out. Fig. 4 three-line buffer architecture 4.2. epuma configure and programming The epuma is the main processor of the video encoder. According to the requirements of the encoder, epuma is reconfigured. Only 4 SIMD cores together with the master processor are chose in this project. The master core acts as the management unit. It controls the DMA to move video stream from DDR memory to the local memories of each SIMD processor and to move the results from the local memories to the buffer for Huffman encoder. Moreover, the master also controls four SIMD processors to start their works at the right time. Fig. 5 (a) data flow architecture of the encoder (b) image is partitioned into slices In order to compress the video in parallel, each image is partitioned into several slices in vertical, as shown is Fig. 5(b). According to the motion JPEG specification, each slice consists of 16 lines which can be divided by 8*8 block. The slice is the least element for SIMD to process. When compressing, the slices will be sent to the SIMD processors sequentially. For example, the first slice will be sent to SIMD1 and the second slice will be send to SIMD2 and so on. After the fourth slice is sent to SIMD4, the fifth slice will be sent to SIMD1. Programs are written for the SIMD processors to compress the input video slice, including RGB to YUV converting, DCT transform, quantization and Zig-Zag scan. In this architecture, DMA should prepare data for four SIMD processors and read out the processing results. So the throughput of DMA is very high. The software pipeline technique is adapted to reduce the data access time. The essential technique is to hide the data access time behind the data processing time. As shown in Fig. 6, SIMD1 will begin to work on the first slice while DMA begin to prepare data for SIMD2. After data are prepared, SIMD2 begins to work and DMA begin to prepare data for SIMD3, and so on. Thus, the time for data access is hidden by the time for data process. The four SIMD processors can work continuously without breaking.

Yanjun Zhang et al. / Procedia Engineering 29 (2012) 2371 2375 2375 Fig. 6 software pipeline architecture 5. implementation results This motion JPEG encoder is implemented on an FPGA board from Terasic Co. Ltd [6]. Table 1 lists the resources utilization of the FPGA. The results show that with 100MHz clock frequency, the encoder can compress high definition video with resolution of 1920*1080 @30fps. Table 1 resource utilization in FPGA Item value Logic utilization 41 % Combinational ALUTs 125,017 / 424,960 ( 29 % ) Memory ALUTs 1,636 / 212,480 ( 1 % ) Dedicated logic registers 58,464 / 424,960 ( 14 % ) Total registers 58896 Total block memory bits 11,538,124 / 21,233,664 ( 54 % ) DSP block 18-bit elements 134 / 1,024 ( 13 % ) 6. conclusion In this paper, a high definition motion JPEG decoder is designed based on reconfigured epuma platform. The architecture combined processors with accelerators is used to improve the performance. The main processing is performed in epuma processor while the data preparation and Huffman encoding are designed as accelerators. When programming for epuma processor, software pipeline technique is used to hide the data access time behind the data process time. The encoder is implemented on the FPGA board and the results show that with 100MHz clock frequency this encoder can compress high definition video with resolution of 1920*1080@30fps. References [1] Dake Liu, Joar Sohl, Jian Wang: Parallel Programming and its Architectures Based on Data Access Separated Algorithm Kernels. International Journal of Embedded and Real-Time communication Systems, 1(1), 64-84, January-March 2010. [2] J. Wang, J. Sohl, O. Kraigher, and D. Liu: epuma: a Novel Multi-core DSP Platform for Predictable Computing. International Conference on Information and Electronics Engineering, 2010. [3] Hansson, E.; Sohl, J.; Kessler, C.; Liu, D.: Case Study of Efficient Parallel Memory Access Programming for the Embedded Heterogeneous Multicore DSP Architecture epuma. Complex, Intelligent and Software Intensive Systems (CISIS), 2011 International Conference on. Page(s): 624-629,2011. [4] Dung Trung Vo; Truong Quang Nguyen: Quality Enhancement for Motion JPEG Using Temporal Redundancies, Circuits and Systems for Video Technology, IEEE Transactions on Volume: 18, Issue: 5, Page(s): 609-619. [5] Wallace, G.K.: The JPEG still picture compression standard, Consumer Electronics, IEEE Transactions on Volume: 38, Issue: 1 Publication Year: 1992, Page(s): xviii - xxxiv. [6] Terasic, development board provider, http://www.terasic.com.