On Optimization of Finite-Difference Time-Domain (FDTD) Computation on Heterogeneous and GPU Clusters

Size: px
Start display at page:

Download "On Optimization of Finite-Difference Time-Domain (FDTD) Computation on Heterogeneous and GPU Clusters"

Transcription

1 On Optmzaton of Fnte-Dfference Tme-Doman (FDTD) Computaton on Heterogeneous and GPU Clusters Ramtn Shams and Parastoo Sadegh a a College of Engneerng and Computer Scence (CECS) The Australan Natonal Unversty Canberra ACT 0200, Australa Abstract A model for the computatonal cost of fnte-dfference tme-doman (FDTD) method rrespectve of mplementaton detals or the applcaton doman s gven. The model s used to formalze the problem of optmal dstrbuton of computatonal load to an arbtrary set of resources across a heterogeneous cluster. We show that the problem can be formulated as a mnmax optmzaton problem and derve analytc lower bounds for the computatonal cost. The work provdes nsght nto optmal desgn of FDTD parallel software. We also propose an effcent algorthm for parttonng of the computatonal doman for load-balancng FDTD computatons across an arbtrary cluster. We demonstrate that sgnfcant performance gans, as much as 75%, can be acheved by proper load dstrbuton. Key words: Fnte-Dfference Tme-Doman (FDTD), Heterogeneous Computng, Parallel Processng, Graphcs Processng Unt (GPU), Optmzaton Introducton Fnte-dfference tme-doman (FDTD) method, snce ts ntroducton by Yee [], has been wdely used to obtan numercal solutons of Maxwell s equatons for a broad range of problems. The applcatons of FDTD n electrodynamcs nclude antenna and radar desgn, electronc and photonc crcut desgn, mcrowave tomography, cellular and wreless network smulaton, moble phone safety studes, and many more [2]. The method s not lmted to electrodynamcs and can be used to solve other spatotemporal partal dfferental equatons such as those occurrng n acoustcs (e.g. see [3]). The explct nature of FDTD Preprnt submtted to Elsever 4 May 200

2 formulaton, ts smplcty, accuracy and robustness, together wth a well establshed theoretcal framework have contrbuted to a seemngly unendng popularty of the method. Realstc FDTD smulatons nvolve fne dscretzaton of the spatal doman as well as the temporal doman. It s not uncommon to use spatal grds of 0 9 and spatotemporal grds of 0 3 or more cells. As a result, FDTD smulatons requre sgnfcant computatonal resources both n terms of memory and executon. It s nevtable that many realstc smulatons exceed the memory lmtatons of a sngle computer and have to be dvded across a cluster of computers. The addtonal ncentve to dstrbute FDTD computaton s to mnmze executon tme where resources are avalable. Ths, however, ntroduces the addtonal cost of ntercommuncaton between compute nodes n order to execute FDTD smulatons on the entre computatonal doman of the problem. Parallelzaton and acceleraton of FDTD has been an actve area n recent years. In partcular, there have been several examples of FDTD acceleraton on feld programmable gate array (FPGA) hardware and graphcs processng unts (GPUs). A lst of recent contrbutons n ths area s gven later n Secton 2.2. Tradtonally, computer clusters have been bult exclusvely from homogeneous compute nodes. Wth the ntroducton of accelerator technologes and ther rse n popularty for general purpose computng, ths s no longer the case. Current generaton of accelerators requre a tradtonal computer host and hence by defnton create hybrd nodes when clustered. A notable example of one such cluster s IBM s Roadrunner supercomputer whch comprses 3,824 Opteron cores and 6,640 Cell processer cores. Accordng to techncal staff at Los Alamos the applcatons are typcally desgned for executon on Cell processors and except for trval house keepng and data transfer to and from Cell processors, the Opteron cores reman dle most of the tme. Ths represents a sgnfcant computng capacty that remans under-utlzed. Our motvaton s to maxmze use of avalable computatonal resources, whether across an organzaton s network or on purpose-bult heterogeneous clusters, towards solvng larger computatonal problems. A heterogeneous cluster s defned as a group of computatonal resources wth dfferng technology, executon capablty, memory sze, and speed. A heterogeneous cluster may comprse accelerators (e.g. GPUs, Cell processors, FPGA boards), desktop computers, and server computers. Whle heterogeneous resources are commonly found on any modern network, they are rarely used as heterogeneous clusters partc- Accordng to dscussons at Path to Petascale Workshop, Aprl 2009, Natonal Center for Supercomputng Applcatons, Unversty of Illnos at Urbana- Champagn. 2

3 ularly across technology boundares (e.g. x86 and PowerPC). The two man mpedments n effectve use of heterogeneous clusters are (a) the addtonal effort nvolved n development of heterogeneous applcatons and (b) the need to desgn an optmal load dstrbuton scheme across heterogeneous resources. In ths work, we assume the reader s suffcently motvated to tackle the former problem and focus on the latter n the context of FDTD parallelzaton where we look at the problem of optmal dstrbuton of FDTD computaton across a heterogeneous cluster.. Contrbutons Ths work provdes nsght nto the optmal desgn of a heterogeneous FDTD applcaton by () modelng the cost of FDTD computaton on a heterogeneous cluster (Secton 3.); (2) formalzng the load dstrbuton problem as a mnmax optmzaton problem (Secton 3.2); (3) dervng analytc lower bounds for the executon cost of FDTD on a heterogeneous cluster (Secton 3.4.); and (4) proposng a heurstc algorthm for effcent dstrbuton of load to an arbtrary cluster of computatonal resources (Secton 4). We would lke to clarfy from the outset that ths work s not a software development effort or a specfc parallel mplementaton of FDTD. It s an mplementaton-agnostc analyss of the nherent computatonal lmtatons of FDTD on a most general class of computatonal clusters (.e. heterogeneous clusters). It also proposes a near optmal load dstrbuton algorthm for FDTD mplementaton for heterogeneous clusters n general and for homogeneous clusters as a subset. We note that a myrad of parallel mplementatons of FDTD on homogeneous clusters, n the form of commercal packages, open source lbrares, and scholarly research exst, that can beneft from ths work. These parallelzaton efforts cover a wde range of hardware from supercomputers and general purpose programmable CPUs to GPU, applcaton-specfc ntegrated crcut (ASIC) and FPGA clusters. We would partcularly lke to emphasze lack of FDTD software that can effcently run across such technology boundares n a true heterogeneous fashon. 3

4 2 Concepts 2. An Overvew of FDTD Method In ths secton, we provde a bref overvew of the FDTD method based on 3D Maxwell electromagnetc equatons. The extent we delve nto the subject s to enable the reader to apprecate the computatonal model of FDTD presented n Secton 3.. A careful treatment of the subject s outsde the scope of ths paper and the reader s referred to [2] for detaled dscussons. Maxwell s curl equatons for lnear, sotropc, lossy and non-dspersve meda are gven by µ H t = E + σ mh + M, () ɛ E t = H + σ ee + J, (2) where H s the magnetc feld, E s the electrc feld, µ s the magnetc permeablty, ɛ s the electrc permttvty, M s the equvalent magnetc current densty, J s the electrc current densty, σ m s the equvalent magnetc conductvty, and σ e s the electrc conductvty. For the purpose of FDTD smulatons H and E felds are unknown and all other quanttes are gven at each pont n space. The equatons embody 6 partal dfferental equatons, for example the dervatve of the x-component of the electrc feld wth respect to tme s gven by ɛ E x t = H y z H z y + σ ee x + J x. (3) The equatons are dscretzed n space and tme to derve an explct soluton for the next tme step. Due to dependence of E and H components, t s best to nterleave values of E and H n tme wth t/2 tme dfference between them. For example, H can be computed at n t and E computed wth a half nterval shft at n t + t/2. Smlarly, the E and H components are staggered n space accordng to an arrangement known as the Yee cell [,2]. For convenence we show a functon u( x, j y, k z, n t) wth u n,j,k. Usng ths notaton, (3) can be dscretzed on a cubod grd of ( x, y, z) usng 4

5 second order accurate central dfferences as E x n+ 2 E,j+ ɛ 2,k+ x n 2,j+ 2 2,k+ 2,j,k t H y n,j+ 2,k+ H y n,j+ 2,k z H z n,j+,k+ 2 H z n,j,k+ 2 y + σ e,j,k E x n,j+ 2,k+ 2 + J x n,j,k, (4) snce E x s not computed at n t, t can be approxmated by E x n,j+ 2,k+ 2 and we have ( ) E x n+ 2 ɛ,j,k / t σ e,j,k /2,j+ 2,k+ 2 ɛ,j,k / t + σ e,j,k /2 [ Hy n,j+ H y n 2,k+,j+ 2,k z E x n+ 2 + E,j+ 2,k+ x n 2,j+ 2 2,k+ 2, (5) 2 E x n 2,j+ 2,k+ 2 ( ) ɛ,j,k / t + σ e,j,k /2 ] H z n,j+,k+ 2 H z n,j,k+ 2 y + J x n,j,k Based on (6), E x at each grd pont and at tme n t + t/2 can be computed from values of E and H at prevous tmes. Smlar equatons can be derved for other components of E and H felds. These equatons allow the feld values to be computed explctly at an arbtrary tme ndex by marchng through all prevous tme ndces. We note that the rather unusual notaton nvolvng half ndces such as j + are due to the arrangement of the feld components n 2 the Yee cell. We refer the reader to [2] f a detaled explanaton of the notaton s desred. The dscrete grd dscussed above, also known as the computatonal doman, represents a fnte, dscrete and bounded model of some real doman of nterest. If the smulaton s suffcently long the wavefronts reach the boundares of the computatonal doman. Wth no nformaton about the medum outsde the boundary, the wavefronts cannot naturally progress and are reflected back nto the computatonal doman. In effect, the boundary of the computatonal doman acts as a reflectve barrer, whch n most crcumstances s undesrable and a source of error and clutter. A sgnfcant body of work has been dedcated to desgnng absorbng boundary condtons (ABCs) to address ths problem [2]. Commonly used ABC s nclude Mur s ABC [4], the perfectly matched layer (PML) [5] and ts varants such as the unaxal PML (UPML) [6,7] and the convolutonal PML (CPML) [8]. Wthout gettng nto the detals of dfferent types of ABCs and ther characterstcs, we note that mplementaton of the boundary condtons nvolves. (6) 5

6 addtonal computaton at one or several layers of cells on the border of the computatonal doman. Ths makes the computaton of boundary cells more expensve than regular cells. As prevously noted, use of FDTD s not lmted to soluton of Maxwell s equatons or to two or three dmensons. In the followng sectons, a general d-dmensonal (d > ) Cartesan computatonal doman s assumed. 2.2 Parallelzaton of FDTD From the dscusson of the prevous secton t should be obvous that FDTD method naturally lends tself to parallelzaton. At each tme-step updatng a cell, for example wth (6), requres values of the feld components of a gven cell and ts neghborng cells n the prevous tme step. In a more general settng where hgher order dscretzaton or non-lnear wave equatons are nvolved one may requre feld values from several prevous tme steps. Regardless of the complexty of the wave equatons, FDTD ensures that each cell update s ndependent of ts neghbors n the current tme-step. Ths allows for FDTD computatons to be well suted to parallelzaton. Several parallel mplementatons of FDTD have appeared n the lterature. The efforts cover a wde range of parallel archtecture and hardware ncludng symmetrc multprocess (SMP) clusters, FPGA hardware, GPUs, and dstrbuted shared memory (DSM) systems. A non-exhaustve summary (post 2000) s gven n Table whch serves to demonstrate the degree of nterest n ths problem. SMP clusters typcally use a combnaton of OpenMP [9] for parallelzaton on a sngle node and the message passng nterface (MPI) [0] for parallelzaton across multple nodes. There has been more nterest n usng GPUs for acceleraton of FDTD n recent years. FDTD s memory ntensve and standard cachng mechansms on the CPUs are not well suted to FDTD memory access patterns. Latest generaton of GPUs, on the other hand, are specfcally suted to ths task as they gve programmers control over loadng data nto a small but effcent shared memory that can sgnfcantly boost memory access. In addton, bandwdths of devce memory on a GPU may exceed 00 GB/s whch s at least an order of magntude hgher than standard host memory. 6

7 Table A sample lst of parallel FDTD contrbutons n the lterature. Applcaton ABC Platform Perf 2 Group Year 3D FDTD PML Cray T3E (6 300 MHz), MPI 8.8 Guffaut [] 200 2D FDTD Mur Custom 00 MHz 6.3 Kawaguch [2] D FDTD Mur Frebrd FPGA board (@ 70 MHz) 3.8 Chen [3] D FDTD PML Xlnx Vrtex-II 8000 FPGA 30 Durbano [4] D FDTD Mur 3 GeForce FX 5800 Ultra, OpenGL 82 Krakwsky [5] D FDTD PML IBM RS/6000 SP (8 nodes/ MHz), OpenMP/MPI 3D FDTD None GTX 8800 (6 MP/ 28 cores), OpenGL/Cg 36. Hughes [6] Adams [7] D FDTD Mur GTX 280 (30 MP/ 240 cores), CUDA Stefansk [8] D ADI-FDTD Mur GTX 280 (30 MP/ 240 cores), CUDA 40 Stefansk [8] D FDTD PEC GTX 280 (30 MP/ 240 cores), CUDA - Luge [9] D FDTD None GTX 280 (30 MP/ 240 cores), CUDA 795 Takada [20] 2009 : Absorbng Boundary Condton 2 : Performance n MCells/s 3 : A combnaton of st order Mur and perodc boundary condtons s used. 4 : Compute Unfed Devce Archtecture [2] 3 Method 3. Modelng Computatonal Cost of FDTD on a Heterogeneous Cluster Consder a hyper-rectangular FDTD computatonal doman Ω such as the 2- rectangle shown n Fg. parttoned nto a number of non-overlappng hyperrectangular sub-domans each denoted by Ω and a boundary gven by Ω. The sub-domans are non-overlappng (except on the boundary),.e. Ω = Ω, Ω Ω j = for j. (7) Each partton s mapped to a computatonal resource such as a mult-core CPU, a GPU or a Cell processor and the number of parttons equals the number of avalable computatonal resources. We denote the set of all such parttons wth n elements by Γ n. For each partton the cost of a sngle tme-marchng step s broken down nto three components () cost of updatng regular cells whch s proportonal to the sze of partton Ω, 7

8 Ω Ω Ω 3 Ω 2 Ω 3 Ω 4 Ω 5 Fg.. A 2D computatonal doman dvded nto 5 parttons. The boundary of the thrd partton s shown wth a thcker border for emphass. (2) addtonal cost of updatng cells n the nner boundary of the partton wth neghborng cells due to the need to load nformaton from neghborng parttons through an nterconnecton between assocated resources whch depends on the sze of the common boundary between a gven partton and ts neghborng parttons Ω Ω j, and (3) addtonal cost of updatng cells n the outer boundary of the partton wth the boundary of the doman (typcally on an absorbng layer) whch s proportonal to the sze of the common boundary between the partton and the computatonal doman Ω Ω. Ignorng the addtonal cost of handlng source cells (that typcally comprse only a small number of cells), the cost of tme-marchng algorthm (n terms of executon tme) for the th partton can be wrtten as t = α Ω + j β j Ω Ω j + γ Ω Ω, (8) where Ω s the sze of the partton, Ω s the sze of the partton boundary, and α, β j and γ are constants of proportonalty that relate sze of parttons and boundares to the respectve cost of executon. Note that these constants are determned by the computatonal capablty of each resource and the throughput of the nterconnect between resources, j and are ndependent of the doman parttonng. The frst term n (8) captures the cost of executng one tme-marchng step on the nteror of the partton where all the nformaton to compute n the next tme nstance for a gven cell resdes wthn the computatonal resource, the second term represents the addtonal cost assocated wth the transfer of data from resource j to resource to enable computaton of the next tme step for boundary cells, and the thrd term represents the addtonal cost of computng absorbng boundary condtons on the boundary of the doman. 8

9 3.2 Problem formulaton All resources need to complete ther computaton of the current tme step before they can proceed to the next step. In other words, a barrer synchronzaton prmtve s requred at the end of each tme step teraton. Ths requres faster resources to wat for slower resources to complete ther task and hence the cost of a sngle teraton s gven by t m = max α Ω + β j Ω Ω j + γ Ω Ω. (9) j We are now ready to formalze the problem: we seek the optmal dstrbuton of load to a heterogeneous cluster that mnmzes (9). Ths s a mnmax problem over dsjont n-element parttons of the computatonal doman Ω t opt = mn Γ n max α Ω + β j Ω Ω j + γ Ω Ω. (0) j Fndng the optmal parttons based on (0) s far from trval. Ths s because, the geometry and poston of the parttons need to be known n order to compute the overlap between neghborng parttons and between parttons and the exteror of the doman. However, as shown n the followng sectons, t s possble to fnd analytc lower bounds for t opt (under certan condtons) that are ndependent of parttonng scheme and hence provde nsght nto achevable performance levels wthout the need to drectly solve for (0). The problem needs to be relaxed n order to acheve these goals. 3.3 Relaxng the Problem Let β = mn j {β j } for j ; by replacng β j wth β n (0) and usng Ω Ω j = Ω Ω Ω, () j we have { t opt mn Γ n max } {α Ω + β Ω + (γ β ) Ω Ω }. (2) 9

10 Assumng the thrd term (nvolvng Ω Ω ) n the above equaton s nonnegatve (γ β for all ), we can wrte { } t opt mn max {α Ω + β Ω }. (3) Γ n Ths s not an unreasonable assumpton, gven that computng boundary condtons s typcally a more expensve task. Fndng a soluton for the rghthand sde of (3) provdes a lower-bound for t opt. Ths s also equvalent to solvng a specal case of (0) where the normalzed cost of data transfer between a resource and all other resources s the same (.e. β j = β ) and the normalzed cost of computng boundary condtons s assumed to be the same as the cost of transferrng boundary nformaton between the resources (.e. γ = β ). We also relax the parttonng condton (7) such that we only requre the sum of partton szes to be equal to the sze of the doman. Ths means that we gnore the need to properly pack the parttons n the gven doman, at least for now. We denote ths reduced optmzaton problem by: t opt = mn Γ n { max {α Ω + β Ω } }, Ω = Ω. (4) Lemma: If Ω = {Ω,..., Ω n } s a soluton of (4) so s Ω = { Ω,..., Ω n } where Ω = Ω and the parttons of Ω are hyper-cubes. Proof : Out of all hyper-rectangles of a gven sze, the hyper-cube has the least boundary sze, hence Ω Ω and α Ω + β Ω α Ω + β Ω. So the cost of computaton for no resource under Ω s hgher than under Ω and Ω must be a soluton as well. For a d-dmensonal hyper-cube we have Ω = 2d Ω ( d ) and we can now focus our search on fndng a hyper-cubc soluton by solvng { { t opt = mn max Ω (α + 2dβ Ω }} d ), Ω = Ω. (5) Γ n 3.4 Lower Bounds for the Mnmax Load Dstrbuton Problem In ths secton we derve analytc lower bounds for (5) Bound Let us denote the executon tme of a gven partton by t m = max {t } ( t m Ω + 2d β ) Ω d α α (6) 0

11 t m α Ω + 2d β Ω d (7) α ( ) ( t m Ω + 2d α β α Ω ) d (8) ( ) ( t opt = mn{t m } mn Ω + 2d Γ n Γ n α ) β Ω d α. (9) Mnmzng the rghthand sde of (9) gves a lower bound for t opt. Ths requres mnmzng the term nvolvng Ω d. Let q be the ndex of the partton wth the smallest rato of the normalzed transfer cost to the normalzed computaton cost (.e. q = argmn β /α ) β Ω α d β q Ω d, (20) α q β ( ) q d β q Ω = Ω d, (2) α q α q where we use x p + y p (x + y) p for x, y 0 and 0 p to derve (2). Therefore, t opt ( ) ( + 2d β ) q Ω d Ω. (22) α q α The rghthand sde of (22) gves a lower bound for t opt. A soluton close to ths lower bound can be found when the addtonal cost of computng boundary cells s sgnfcantly lower than the cost of tme-marchng regular cells (β /α ) or deally when β = 0 n whch case t opt from (8) s gven by t opt = ( ) Ω. (23) We argue that ths s possble when the partton szes are gven by α Ω j = α j ( α ) Ω = t opt α j. (24) The proof s by contradcton, frst consder for some j, we have Ω j > t opt /α j, ths results n t j > t opt, whch volates the condton that t opt s the maxmum cost of executon of any partton for a gven parttonng scheme. Conversely, consder that for some j, we have Ω j < t opt /α j. We have already establshed

12 that Ω t opt /α for j. Summng up nequaltes for all we have Ω j + j Ω < t opt α j + j t opt α. (25) Usng (23) and notng that Ω = Ω, both sdes of the above nequalty reduce to Ω whch cannot be and the proof s complete. Accordng to (24), where the computatonal cost assocated wth boundary condtons and data transfers are low, the optmal parttonng scheme s one that ensures all computatonal resources take the same amount of tme to complete one teraton of the algorthm. Ths s consstent wth the desre to ensure computatonal resources wll not be dle when possble Bound 2 The bound gven n the prevous secton s tght where α β and becomes less tght as the cost of data transfers ncreases. We derve a tghter bound under such condtons n ths secton. Assumng that Ω > and usng (5), we can wrte { { t opt mn max Ω d (α + 2dβ ) }}, (26) Γ n where we replaced α Ω wth the smaller term α Ω d. In a manner smlar to the proof gven n the prevous secton, t can be shown that the rghthand sde of (26) s mnmzed when for all and j and Ω s gven by Ω d (α + 2dβ ) = Ω j d (αj + 2dβ j ), (27) Ω j = Ω [ (α j + 2dβ j ) d d And a new lower bound s gven by ( ) t opt Ω (α + 2dβ ) d d (α + 2dβ ) d d ]. (28) d. (29) The tghtness of the bound mproves as the number of dmensons ncreases. We also note that the bound gven n (29) s loose when the cost of data transfer s not sgnfcant compared to the cost of computatons but mproves as data transfer becomes the bottleneck. Ths trend s opposte to that of the 2

13 bound derved n the prevous secton. Therefore, t makes sense to combne the two bounds and use ther maxmum as the lower-bound. 3.5 Numercal Optmzaton The objectve functon gven n (5) can be solved usng constraned numercal optmzaton methods. The objectve functon f( Ω,..., Ω ) = max { } Ω (α + 2dβ Ω d ), Ω = Ω (30) s nonlnear and non-convex wth lnear constrants or alternatvely one can parameterze n partton dmenson sze x = Ω d where the cost functon wll be convex n x but subject to nonlnear constrants. Ether way standard convex optmzaton methods cannot be used. We use a nonlnear constraned optmzaton method based on sequental quadratc programmng (SQP) and Quas-Newton algorthm to solve the problem [22] 2. As usual ntalzaton close to the global mnmum s an mportant element for mprovng the success of the optmzaton algorthm. We ntalze the algorthm wth an ntal guess n accordance wth (24) or (26). We use the equaton that corresponds wth the tghter of the two bounds. In practce, for a range of experments, the optmzaton converges quckly (typcally n less than 00 teratons) and gven the smplcty of the cost functon the computaton are most effcent. We defer further dscusson on the experments to Secton A Heurstc Algorthm for Load Dstrbuton Up to now, we have dscussed methods to determne the sze and dmensons of parttons subject to the relaxed constrant that the sum of parttons equals the sze of the computatonal doman. Once the soluton of the relaxed problem s found, the parttons need to be ft nto the computatonal doman under the constrant that the they must cover ts entre volume. There are several reasons that an exact ft may not be possble. In practce, dmensons of the computatonal doman and ts parttons belong to the set of postve ntegers. Ths nherently means that the optmal 3 parttonng sze and dmensons 2 An mplementaton of an actve-set method based on SQP and Quas-Newton algorthm s gven by MATLAB Optmzaton Toolbox functon fmncon. 3 In ths secton, the use of term optmal refers to the results obtaned from the numercal optmzaton algorthm. We realze that gven the nonlnear nature of the problem strct optmalty of the optmzaton algorthm s not guaranteed. 3

14 cannot be exactly met except for carefully engneered dmensons. We also note that even where optmal partton szes and dmensons are feasble, parttonng of the computatonal doman to an exact set of parttons may not be possble (e.g. try parttonng a square nto two squares). Intutvely, as the number of parttons ncrease these lmtatons become less of an ssue and a close to optmal parttonng can be acheved. We propose a heurstc algorthm for parttonng. Our ntuton s that the algorthm should be fathful to the orgnal partton szes and shapes (whch are hyper-cubes) to the extent possble. The method s called balanced parttonng algorthm and s gven as follows: Let Ω be a computatonal doman to be dstrbuted to n computatonal resources () Measure normalzed computatonal cost α and transfer cost β of each resource. (2) Compute the set of optmal partton szes S = { Ω, Ω 2,..., Ω n } usng a properly ntalzed nonlnear optmzaton algorthm. (3) Partton S nto two sets S and S 2 such that the dfference between the sum of elements of S and S 2 s mnmal. (4) Partton Ω along the largest dmenson nto two parttons whose sze s gven by the sum of elements of S and S 2. Adjust any nteger round-off errors ntroduced as a result. (5) Replace S wth S and S 2 and Ω wth newly created parttons and contnue the steps 3-5 untl S and S 2 cannot be further parttoned. The ntuton to partton the doman along ts largest dmenson s to mantan the lowest possble aspect rato (the rato of the largest dmenson to the smallest dmenson). Ths s an attempt to make the parttons closer to hypercubes as the parttonng algorthm progresses. The thrd step of the algorthm s known as the number parttonng problem. The problem s whether a set of numbers can be parttoned nto two halves of equal sum or more generally fndng two parttons that mnmze the maxmum partton sum. The number parttonng problem s NP-complete [23], however, there are smple heurstc algorthms that can, n many nstances, solve the problem optmally or near optmally n less than O(n 2 ) tme. The best heurstc algorthm s the dfferencng algorthm and s gven n [24]. Brefly, the dfferencng algorthm reduces the sze of the set by one n each teraton by replacng the two largest numbers wth ther absolute dfference. Ths s equvalent to decdng that the two largest numbers wll go nto dfferent sets wthout actually commttng whch set receves whch number at ths tme. The forward leg of the algorthm termnates when the set s reduced to a sngle number. The last number standng wll represent the dscrepancy of 4

15 the two sets (the absolute dfference of ther sums). The algorthm then backtracks and at each step replaces one dfference number wth ts components n such a way that dscrepancy of the two sets remans constant. For a detaled dscusson and an example of the algorthm refer to [24]. An example of parttonng a 2D computatonal doman of 0 0 cells across 5 resources where the partton szes are gven as S = {4, 7, 22, 26, 3} s gven n Table 2. Usng the balanced parttonng algorthm results n slght adjustments to partton szes at the end; wth fnal partton szes beng {5, 5, 20, 25, 35} as shown n the last row of the table. Table 2 Parttonng a sample computatonal doman Parttons Balanced Parttons Adjusted Sums Doman {4, 7, 22, 26, 3} {4, 22, 26}, {7, 3} 50, {4, 22, 26} {4, 22}, {26} 25, {7, 3} {7}, {3} 5, {4, 22} {4}, {22} 5, We compare the performance of the balanced parttonng algorthm wth the strpe parttonng algorthm where the computatonal doman s smply parttoned along a sngle axs. For the strpe algorthm, we choose partton szes proportonal to the resource s performance (.e. n accordance to (24)). Ths s smlar to what s typcally used n FDTD parallelzaton on homogeneous clusters today. Fg. 2 shows one smple strpe parttonng of the prevous example. The frst thng to notce s that the dscrepancy between achevable parttons and desred parttons s hgher. Ths s the result of larger roundoff errors due to the ntegral dmensons of the computatonal doman, whch n turn translates nto an even less optmum dstrbuton of the computatonal load. 5

16 (a) Strpe (b) Balanced Fg. 2. Comparson of the strpe and balanced parttonng algorthms. The balanced parttonng results n less dscrepancy compared wth the desred partton szes. 4 Results In ths secton, we present a number of smulaton results for heterogeneous and homogeneous clusters and compare the performance of the balanced and strpe parttonng methods n respect to the derved bounds. We fnd t more ntutve to show the results n terms of achevable throughput rather than the computaton tme. The throughput s defned as the rato of the number of cells to the processng tme (.e. computaton or transfer tme). Ths has the added advantage of havng the results normalzed to the sze of the computatonal doman. The measurements wll be gven as recprocals of α and β n mega cells per second (MCells/s). We also note that n the context of throughput we wll be talkng about upper bounds whch are nversely related to computaton tme lower bounds. Example : Homogeneous GPU Cluster Doman: cells Cluster: homogeneous, 2-8 NVIDIA GT200 GPUs on a sngle host (certan motherboards allow up to 8 GT200 GPUs to be nstalled), 3D FDTD performance on a GT200 GPU α = 493 MCells/s Lnk: PCI-E x6, nomnal bandwdth: 8 GB/s, actual bandwdth measured at 2.4 GB/s and 4.3 GB/s on two dfferent motherboards, combned throughput s β = 50 and 93 MCells/s respectvely Notes: The throughput measurements were performed wth host memory allocated as standard page-able memory. The throughput can be mproved by usng page-locked (pnned) memory. However, pnned memory s a scarce resource and not sutable for typcally large memory demands of FDTD applcatons. The total bandwdth s constant and as we ncrease the number of GPUs the data transfer rate per GPU decreases. In Fg. 3, we show the performance of the cluster as a functon of the number of GPUs on two dfferent hosts wth dfferent actual PCI-E bandwdths. A number of observatons can be made from Fg. 3: (a) the scalablty of the cluster mproves wth ncreased throughput; (b) the performance of the balanced par- 6

17 Performance (MCells/s) Example : Homogeneous Cluster of GPUs Bandwdth: 2.40 GB/s Upper Bound Optmal Parttonng Balanced Parttonng Strpe Parttonng Performance (MCells/s) Example : Homogeneous Cluster of GPUs Bandwdth: 4.30 GB/s Upper Bound Optmal Parttonng Balanced Parttonng Strpe Parttonng GPUs (a) Bandwdth: 2.4 GB/s GPUs (b) Bandwdth: 4.3 GB/s Fg. 3. Comparson of balanced and strpe parttonng algorthms. The balanced parttonng results n up to 47% mprovement n performance and s close to the performance of the optmal soluton. ttonng s close to the optmal soluton; (c) the balanced parttonng method outperforms the stpe method by up to 47% for the slower host and up to 38% for the faster host; (d) as the number of GPUs ncreases, the transfer rate per GPU decreases and the advantage of better parttonng s more emphatcally demonstrated; (e) the strpe method exhbts poor scalablty and the performance plateaus wth 5 or 6 GPUS whereas the balanced parttonng contnues to scale. Example 2: Heterogeneous GPU Cluster: Doman: cells Cluster: heterogeneous, (a) -4 NVIDIA GT200 GPUs, 3D FDTD performance measured at α = 493 MCells/s, (b) -4 NVIDIA GT80 GPUs, 3D FDTD performance measured at α = 40 MCells/s Lnk: PCI-E x6, nomnal bandwdth: 8 GB/s, actual bandwdth measured at 2.4 GB/s and 4.3 GB/s on two dfferent motherboards, combned throughput s β = 50 and 93 MCells/s respectvely In ths example, we look at a heterogeneous cluster of GPUs nstalled n a sngle host. The cluster comprses equal numbers of GT80 and GT200 GPUs. The results depcted n Fg. 4 demonstrate the superor performance and scalablty of the balanced parttonng for a heterogeneous cluster. The results are more or less consstent wth earler observatons n the prevous example. Example 3: Homogeneous CPU Cluster: Doman: cells Cluster: homogeneous, 4-32 nodes each wth Quad-core Intel Core GHz CPUs, 3D FDTD performance measured at α = 72.8 MCells/s Lnk: (a) Ggabt Ethernet, actual bandwdth: 0. GB/s, β = 2. 7

18 Performance (MCells/s) Example 2: Heterogeneous Cluster of GPUs Bandwdth: 2.40 GB/s Upper Bound Optmal Parttonng Balanced Parttonng Strpe Parttonng Performance (MCells/s) Example 2: Heterogeneous Cluster of GPUs Bandwdth: 4.30 GB/s Upper Bound Optmal Parttonng Balanced Parttonng Strpe Parttonng GPUs (a) Bandwdth: 2.4 GB/s GPUs (b) Bandwdth: 4.3 GB/s Fg. 4. Comparson of balanced and strpe parttonng algorthms on a heterogeneous cluster. The balanced parttonng results n up to 34% mprovement n performance and s close to the performance of the optmal soluton. MCells/s (b) 0 Ggabt Ethernet throughput, actual bandwdth: 0.6 GB/s, β = 2.5 MCells/s Note: OpenMP s used to parallelze the FDTD code on each node. In ths example, a larger computatonal doman s dstrbuted to a cluster of PCs. We compare the scalablty and performance of the cluster over a Ggabt and 0 Ggabt network. The smulatons predct that for a Ggabt network the cluster saturates wth 8 nodes when balanced parttonng s used. Usng the strpe parttonng method saturates the cluster wth only 4 nodes. Fg. 5(a) also demonstrates that usng the balanced parttonng method results n more than 3% mprovement n peak performance compared to the strpe method. In Fg. 5(b) the network bandwdth s ncreased by a factor of 6. Ths has a sgnfcant mpact on the performance of the cluster. The balanced parttonng method scales up to 32 nodes now and acheves a peak performance of 669 MCells/s compared to 80 MCells/s on the Ggabt network. Also note that the peak performance of the balanced parttonng s almost 76% hgher than the peak performance of the strpe parttonng method. The advantage of PC clusters over GPU clusters s ther larger memory sze. Ths makes smulaton of larger computatonal domans possble, albet at the cost of lower performance. As shown n prevous examples, a cluster of GPUs on a sngle host exceeds a performance level of 000 MCells/s. Perhaps to solve the dlemma, one can create a cluster of mult-gpus nodes to address both memory capacty and performance problems. However, such a cluster wll hardly scale unless one s prepared to nvest n hgher bandwdth technologes such as quad data rate (QDR) InfnBand. Example 4: Heterogeneous CPU Cluster: Doman: cells 8

19 Performance (MCells/s) Example 3: Homogeneous Cluster of CPUs Bandwdth: 0.0 GB/s Upper Bound Optmal Parttonng Balanced Parttonng Strpe Parttonng Performance (MCells/s) Example 3: Homogeneous Cluster of CPUs Bandwdth: 0.60 GB/s Upper Bound Optmal Parttonng Balanced Parttonng Strpe Parttonng GPUs (a) Bandwdth: 0. GB/s GPUs (b) Bandwdth: 0.6 GB/s Fg. 5. Comparson of balanced and strpe parttonng algorthms on a homogeneous cluster of PCs and for dfferent network bandwdths. The network bandwdth s the man bottleneck. Increasng network bandwdth mproves the scalablty of the cluster. Cluster: heterogeneous, (a) 2-6 nodes each wth Quad-core Intel Core GHz CPUs, 3D FDTD performance measured at α = 72.8 MCells/s, (b) 2-6 node each wth Quad-core Intel Core Duo 2.66 GHz CPUs, 3D FDTD performance measured at α = 39. MCells/s Lnk: (a) Ggabt Ethernet, actual bandwdth: 0. GB/s, β = 2. MCells/s (b) 0 Ggabt Ethernet throughput, actual bandwdth: 0.6 GB/s, = 2.5 MCells/s β Performance (MCells/s) Example 4: Heterogeneous Cluster of CPUs Bandwdth: 0.0 GB/s Upper Bound Optmal Parttonng Balanced Parttonng Strpe Parttonng Performance (MCells/s) Example 4: Heterogeneous Cluster of CPUs Bandwdth: 0.60 GB/s Upper Bound Optmal Parttonng Balanced Parttonng Strpe Parttonng GPUs (a) Bandwdth: 0. GB/s GPUs (b) Bandwdth: 0.6 GB/s Fg. 6. Comparson of balanced and strpe parttonng algorthms on a heterogeneous cluster of PCs and for dfferent network bandwdths. The network bandwdth s the man bottleneck. Increasng network bandwdth mproves the scalablty of the cluster. In our last example, we look at results from a heterogeneous cluster of up to 32 nodes. The cluster comprses of equal number of quad-core Core 7 and quadcore Core Duo nodes. Despte the dsparty n performance of the nodes, the balanced parttonng acheves reasonable scalablty wth the faster network. 9

20 5 Dscusson For ease of reference, the man results of the paper are summarzed here: The problem of dstrbuton of FDTD load to a heterogeneous cluster can be formulated as a mnmax problem over n-element parttons of the computatonal doman Ω t opt = mn Γ n max α Ω + β j Ω Ω j + γ Ω Ω. j A smpler problem s formulated by relaxng the condtons of the orgnal problem as { } t opt = mn max {α Ω + β Ω }, Γ n Ω = Ω. Two lower bounds can be found for the relaxed problem. The combnaton of whch gves the followng bound t opt ( ) ( max + 2d β ) ( ) q Ω d Ω, Ω (α + 2dβ ) d d α q α The results set an upper bound on achevable performance mprovements that can be used to predct the extent to whch parallelzaton s practcally benefcal. In a dynamc cluster where resources may become avalable durng the lfe tme of computatons one may have to decde f t s benefcal to repartton the problem to utlze the newly avalable resources. Redstrbuton of the problem may ncur sgnfcant traffc and an algorthm may not redstrbute the problem untl such tme that enough computatonal resources are avalable to justfy the overhead or may even determne that redstrbuton of the problem s detrmental to the overall performance. One can smply use the bounds or better sll estmate the performance of the redstrbuted confguraton before makng such decsons. We proposed a heurstc algorthm for parttonng the computatonal doman and showed by experment that the algorthm acheves performance levels close to deal partton szes obtaned by the numercal optmzaton algorthm. The algorthm s smple and effcent. The results show that sgnfcant performance gans can be acheved by the smple vrtue of usng a better parttonng algorthm. Optmal parttonng also mproves scalablty whch means that computatonal resources can be more effcently utlzed. The burden of optmzng parttons as descrbed n ths paper s neglgble compared to the d 20

21 effort of parallelzng FDTD code. A properly parallelzed FDTD code should n prncple be able to run wth non-equal parttons and should beneft from the method presented n ths paper wth mnmal effort. As an addtonal beneft exstng FDTD applcatons can effcently run on heterogeneous clusters of smlar technology. Fully heterogeneous applcatons that can run across technology boundares wll be a natural extenson. The questons remans whether the numercal optmzaton fnds the global mnmum of (5). Based on the experments, we beleve the numercal optmzaton results are optmal or very close to optmal. Ths s a clam that can be more comfortably asserted f one s able to derve tghter bounds or ndeed prove the optmalty of the soluton. We suspect the methodology presented n ths work s not lmted to FDTD and can lend tself to smlar analyses n other computatonal problem domans. 6 Acknowledgements Ths work was supported n part by the Australan Research Councl (ARC) Dscovery Project DP09349 and n part by the ARC/Mcrosoft Lnkage Project LP The vews expressed heren are those of the authors and are not necessarly those of the fundng organzatons. References [] Yee, K.: Numercal soluton of ntal boundary value problems nvolvng Maxwell s equatons n sotropc meda. IEEE Transactons on Antennas and Propagaton 4 (966) [2] Taflove, A., Hagness, S.C.: Computatonal Electrodynamcs: The Fnte Dfference Tme Doman Method. thrd edn. Artech House Inc., Norwood, MA, USA (2005) [3] Pnton, G.F., Dahl, J., Rosenzweg, S., Trahey, G.E.: A heterogeneous nonlnear attenuatng full-wave model of ultrasound. IEEE Transactons on Ultrasoncs, Ferroelectrcs, and Frequency Control 56(3) (March 2009) [4] Mur, G.: Absorbng boundary condtons for the fnte-dfference approxmaton of the tme-doman electromagnetc-feld equatons. IEEE Trans. on Electromagnetc Compatblty 23(4) (98) [5] Berenger, J.: A perfectly matched layer for the absorpton of electromagnetc waves. Journal of Computatonal Physcs 4(2) (October 994)

22 [6] Sacks, Z.S., Kngsland, D.M., Lee, R., Lee, J.F.: A perfectly matched ansotropc absorber for use as an absorbng boundary condton. IEEE Trans. on Antennas and Propagaton 43(2) (995) [7] Gedney, S.D.: An ansotropc perfectly matched layer-absorbng medum for the truncaton of FDTD lattces. IEEE Trans. on Antennas and Propagaton 44(2) (996) [8] Roden, J.A., Gedney, S.D.: Convolutonal PML (CPML): An effcent FDTD mplementaton of the CFS-PML for arbtrary meda. Mcrowave and Optcal Technology Letters 27(5) (2000) [9] : OpenMP Applcaton Programmng Interface, verson 3.0. OpenMP, (2009) [0] W. Gropp, E.L., Skjellum, A.: Usng MPI: Portable Parallel Programmng wth the Message Passng Interface. second edn. MIT Press, Cambrdge, MA, USA (999) [] Guffaut, C., Mahdjoub, K.: A parallel FDTD algorthm usng the MPI lbrary. IEEE Antennas and Propagaton Magazne 43(2) (Aprl 200) [2] Kawaguch, H., Takahara, K., Yamauch, D.: Desgn study of ultrahgh-speed mcrowave smulator engne. IEEE Transactons on Magnetcs 38(2) (Aprl 2002) [3] Chen, W., Kosmas, P., Leeser, M., Rappaport, C.: An FPGA mplementaton of the two-dmensonal fnte-dfference tme-doman (FDTD) algorthm. In: Proc. Internatonal Symposum on Feld Programmable Gate Arrays. (2004) [4] Durbano, J.P., Humphrey, J.R., Ortz, F.E., Curt, P.F., Prather, D.W., Mrotznk, M.S.: Hardware acceleraton of the 3D fnte-dfference tme-doman method. In: Proc. IEEE Antennas and Propagaton Socety Int. Symposum. Volume. (2004) [5] Krakwsky, S.E., Turner, L.E., Okonewsk, M.M.: Acceleraton of fntedfference tme-doman (FDTD) usng graphcs processor unts (GPU). In: IEEE Int. Mcrowave Symposum. Volume 2. (2004) [6] Hughes, M.C., Stuchly, M.A.: Hybrd parallel fnte dfference tme doman smulaton of nanoscale optcal phenomena. In: Int. Conf. on Wreless Communcatons and Appled Computatonal Electromagnetcs. (2005) [7] Adams, S., Payne, J., Boppana, R.: Fnte dfference tme doman (FDTD) smulatons usng graphcs processors. In: Hgh Performance Computng Modernzaton Program Users Group Conference. (2007) [8] Stefansk, T.P., Drysdale, T.D.: Acceleraton of the 3D ADI-FDTD method usng graphcs processor unts. In: IEEE Int. Mcrowave Symposum. (2009)

23 [9] Luge, D., Kang, L., Fanmn, K.: Parallel 3D fnte dfference tme doman smulatons on graphcs processors wth CUDA. In: Int. Conf. on Computatonal Intellgence and Software Engneerng. (December 2009) 4 [20] Takada, N., Shmobaba, T., Masuda, N., Ito, T.: Hgh-speed FDTD smulaton algorthm for GPU wth compute unfed devce archtecture. In: Proc. IEEE Antennas and Propagaton Socety Int. Symposum. (2009) 4 [2] : Compute Unfed Devce Archtecture (CUDA) Programmng Gude, verson 2.2. NVIDIA, (2009) [22] Bonnans, J.F., Glbert, J.C., Lemaréchal, C., Sagastzábal, C.A.: Numercal Optmzaton: Theoretcal and Practcal Aspects. second edn. Sprnger (2006) [23] Mertens, S.: The easest hard problem: Number parttonng. In Percus, A., Istrate, G., Moore, C., eds.: Computatonal Complexty and Statstcal Physcs, New York, Oxford Unversty Press (2006) [24] Karmarker, N., Karp, R.M.: The dfferencng method of set parttonng. Techncal report, Unversty of Calforna at Berkeley, Berkeley, CA, USA (983) 23

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University Dynamc Optmzaton Assgnment 1 Sasanka Nagavall snagaval@andrew.cmu.edu 16-745 January 29, 213 Robotcs Insttute Carnege Mellon Unversty Table of Contents 1. Problem and Approach... 1 2. Optmzaton wthout

More information

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht 68 Internatonal Journal "Informaton Theores & Applcatons" Vol.11 PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION Evgeny Artyomov and Orly

More information

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6) Passve Flters eferences: Barbow (pp 6575), Hayes & Horowtz (pp 360), zzon (Chap. 6) Frequencyselectve or flter crcuts pass to the output only those nput sgnals that are n a desred range of frequences (called

More information

Understanding the Spike Algorithm

Understanding the Spike Algorithm Understandng the Spke Algorthm Vctor Ejkhout and Robert van de Gejn May, ntroducton The parallel soluton of lnear systems has a long hstory, spannng both drect and teratve methods Whle drect methods exst

More information

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate Comparatve Analyss of Reuse and 3 n ular Network Based On IR Dstrbuton and Rate Chandra Thapa M.Tech. II, DEC V College of Engneerng & Technology R.V.. Nagar, Chttoor-5727, A.P. Inda Emal: chandra2thapa@gmal.com

More information

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results AMERICAN JOURNAL OF UNDERGRADUATE RESEARCH VOL. 1 NO. () A Comparson of Two Equvalent Real Formulatons for Complex-Valued Lnear Systems Part : Results Abnta Munankarmy and Mchael A. Heroux Department of

More information

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES IEE Electroncs Letters, vol 34, no 17, August 1998, pp. 1622-1624. ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES A. Chatzgeorgou, S. Nkolads 1 and I. Tsoukalas Computer Scence Department, 1 Department

More information

Uncertainty in measurements of power and energy on power networks

Uncertainty in measurements of power and energy on power networks Uncertanty n measurements of power and energy on power networks E. Manov, N. Kolev Department of Measurement and Instrumentaton, Techncal Unversty Sofa, bul. Klment Ohrdsk No8, bl., 000 Sofa, Bulgara Tel./fax:

More information

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel To: Professor Avtable Date: February 4, 3 From: Mechancal Student Subject:.3 Experment # Numercal Methods Usng Excel Introducton Mcrosoft Excel s a spreadsheet program that can be used for data analyss,

More information

熊本大学学術リポジトリ. Kumamoto University Repositor

熊本大学学術リポジトリ. Kumamoto University Repositor 熊本大学学術リポジトリ Kumamoto Unversty Repostor Ttle Wreless LAN Based Indoor Poston and Its Smulaton Author(s) Ktasuka, Teruak; Nakansh, Tsune CtatonIEEE Pacfc RIM Conference on Comm Computers, and Sgnal Processng

More information

Calculation of the received voltage due to the radiation from multiple co-frequency sources

Calculation of the received voltage due to the radiation from multiple co-frequency sources Rec. ITU-R SM.1271-0 1 RECOMMENDATION ITU-R SM.1271-0 * EFFICIENT SPECTRUM UTILIZATION USING PROBABILISTIC METHODS Rec. ITU-R SM.1271 (1997) The ITU Radocommuncaton Assembly, consderng a) that communcatons

More information

High Speed ADC Sampling Transients

High Speed ADC Sampling Transients Hgh Speed ADC Samplng Transents Doug Stuetzle Hgh speed analog to dgtal converters (ADCs) are, at the analog sgnal nterface, track and hold devces. As such, they nclude samplng capactors and samplng swtches.

More information

Latency Insertion Method (LIM) for IR Drop Analysis in Power Grid

Latency Insertion Method (LIM) for IR Drop Analysis in Power Grid Abstract Latency Inserton Method (LIM) for IR Drop Analyss n Power Grd Dmtr Klokotov, and José Schutt-Ané Wth the steadly growng number of transstors on a chp, and constantly tghtenng voltage budgets,

More information

Adaptive Modulation for Multiple Antenna Channels

Adaptive Modulation for Multiple Antenna Channels Adaptve Modulaton for Multple Antenna Channels June Chul Roh and Bhaskar D. Rao Department of Electrcal and Computer Engneerng Unversty of Calforna, San Dego La Jolla, CA 993-7 E-mal: jroh@ece.ucsd.edu,

More information

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter Walsh Functon Based Synthess Method of PWM Pattern for Full-Brdge Inverter Sej Kondo and Krt Choesa Nagaoka Unversty of Technology 63-, Kamtomoka-cho, Nagaoka 9-, JAPAN Fax: +8-58-7-95, Phone: +8-58-7-957

More information

NETWORK 2001 Transportation Planning Under Multiple Objectives

NETWORK 2001 Transportation Planning Under Multiple Objectives NETWORK 200 Transportaton Plannng Under Multple Objectves Woodam Chung Graduate Research Assstant, Department of Forest Engneerng, Oregon State Unversty, Corvalls, OR9733, Tel: (54) 737-4952, Fax: (54)

More information

Define Y = # of mobiles from M total mobiles that have an adequate link. Measure of average portion of mobiles allocated a link of adequate quality.

Define Y = # of mobiles from M total mobiles that have an adequate link. Measure of average portion of mobiles allocated a link of adequate quality. Wreless Communcatons Technologes 6::559 (Advanced Topcs n Communcatons) Lecture 5 (Aprl th ) and Lecture 6 (May st ) Instructor: Professor Narayan Mandayam Summarzed by: Steve Leung (leungs@ece.rutgers.edu)

More information

A MODIFIED DIFFERENTIAL EVOLUTION ALGORITHM IN SPARSE LINEAR ANTENNA ARRAY SYNTHESIS

A MODIFIED DIFFERENTIAL EVOLUTION ALGORITHM IN SPARSE LINEAR ANTENNA ARRAY SYNTHESIS A MODIFIED DIFFERENTIAL EVOLUTION ALORITHM IN SPARSE LINEAR ANTENNA ARRAY SYNTHESIS Kaml Dmller Department of Electrcal-Electroncs Engneerng rne Amercan Unversty North Cyprus, Mersn TURKEY kdmller@gau.edu.tr

More information

Traffic balancing over licensed and unlicensed bands in heterogeneous networks

Traffic balancing over licensed and unlicensed bands in heterogeneous networks Correspondence letter Traffc balancng over lcensed and unlcensed bands n heterogeneous networks LI Zhen, CUI Qme, CUI Zhyan, ZHENG We Natonal Engneerng Laboratory for Moble Network Securty, Bejng Unversty

More information

Low Switching Frequency Active Harmonic Elimination in Multilevel Converters with Unequal DC Voltages

Low Switching Frequency Active Harmonic Elimination in Multilevel Converters with Unequal DC Voltages Low Swtchng Frequency Actve Harmonc Elmnaton n Multlevel Converters wth Unequal DC Voltages Zhong Du,, Leon M. Tolbert, John N. Chasson, Hu L The Unversty of Tennessee Electrcal and Computer Engneerng

More information

THE GENERATION OF 400 MW RF PULSES AT X-BAND USING RESONANT DELAY LINES *

THE GENERATION OF 400 MW RF PULSES AT X-BAND USING RESONANT DELAY LINES * SLAC PUB 874 3/1999 THE GENERATION OF 4 MW RF PULSES AT X-BAND USING RESONANT DELAY LINES * Sam G. Tantaw, Arnold E. Vleks, and Rod J. Loewen Stanford Lnear Accelerator Center, Stanford Unversty P.O. Box

More information

Harmonic Balance of Nonlinear RF Circuits

Harmonic Balance of Nonlinear RF Circuits MICROWAE AND RF DESIGN Harmonc Balance of Nonlnear RF Crcuts Presented by Mchael Steer Readng: Chapter 19, Secton 19. Index: HB Based on materal n Mcrowave and RF Desgn: A Systems Approach, nd Edton, by

More information

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS INTRODUCTION Because dgtal sgnal rates n computng systems are ncreasng at an astonshng rate, sgnal ntegrty ssues have become far more mportant to

More information

Network Reconfiguration in Distribution Systems Using a Modified TS Algorithm

Network Reconfiguration in Distribution Systems Using a Modified TS Algorithm Network Reconfguraton n Dstrbuton Systems Usng a Modfed TS Algorthm ZHANG DONG,FU ZHENGCAI,ZHANG LIUCHUN,SONG ZHENGQIANG School of Electroncs, Informaton and Electrcal Engneerng Shangha Jaotong Unversty

More information

Approximating User Distributions in WCDMA Networks Using 2-D Gaussian

Approximating User Distributions in WCDMA Networks Using 2-D Gaussian CCCT 05: INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS, AND CONTROL TECHNOLOGIES 1 Approxmatng User Dstrbutons n CDMA Networks Usng 2-D Gaussan Son NGUYEN and Robert AKL Department of Computer

More information

High Speed, Low Power And Area Efficient Carry-Select Adder

High Speed, Low Power And Area Efficient Carry-Select Adder Internatonal Journal of Scence, Engneerng and Technology Research (IJSETR), Volume 5, Issue 3, March 2016 Hgh Speed, Low Power And Area Effcent Carry-Select Adder Nelant Harsh M.tech.VLSI Desgn Electroncs

More information

antenna antenna (4.139)

antenna antenna (4.139) .6.6 The Lmts of Usable Input Levels for LNAs The sgnal voltage level delvered to the nput of an LNA from the antenna may vary n a very wde nterval, from very weak sgnals comparable to the nose level,

More information

Throughput Maximization by Adaptive Threshold Adjustment for AMC Systems

Throughput Maximization by Adaptive Threshold Adjustment for AMC Systems APSIPA ASC 2011 X an Throughput Maxmzaton by Adaptve Threshold Adjustment for AMC Systems We-Shun Lao and Hsuan-Jung Su Graduate Insttute of Communcaton Engneerng Department of Electrcal Engneerng Natonal

More information

Optimal Sizing and Allocation of Residential Photovoltaic Panels in a Distribution Network for Ancillary Services Application

Optimal Sizing and Allocation of Residential Photovoltaic Panels in a Distribution Network for Ancillary Services Application Optmal Szng and Allocaton of Resdental Photovoltac Panels n a Dstrbuton Networ for Ancllary Servces Applcaton Reza Ahmad Kordhel, Student Member, IEEE, S. Al Pourmousav, Student Member, IEEE, Jayarshnan

More information

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation T. Kerdchuen and W. Ongsakul / GMSARN Internatonal Journal (09) - Optmal Placement of and by Hybrd Genetc Algorthm and Smulated Annealng for Multarea Power System State Estmaton Thawatch Kerdchuen and

More information

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems 0 nd Internatonal Conference on Industral Technology and Management (ICITM 0) IPCSIT vol. 49 (0) (0) IACSIT Press, Sngapore DOI: 0.776/IPCSIT.0.V49.8 A NSGA-II algorthm to solve a b-obectve optmzaton of

More information

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques The th Worshop on Combnatoral Mathematcs and Computaton Theory Effcent Large Integers Arthmetc by Adoptng Squarng and Complement Recodng Technques Cha-Long Wu*, Der-Chyuan Lou, and Te-Jen Chang *Department

More information

Resource Allocation Optimization for Device-to- Device Communication Underlaying Cellular Networks

Resource Allocation Optimization for Device-to- Device Communication Underlaying Cellular Networks Resource Allocaton Optmzaton for Devce-to- Devce Communcaton Underlayng Cellular Networks Bn Wang, L Chen, Xaohang Chen, Xn Zhang, and Dacheng Yang Wreless Theores and Technologes (WT&T) Bejng Unversty

More information

A MODIFIED DIRECTIONAL FREQUENCY REUSE PLAN BASED ON CHANNEL ALTERNATION AND ROTATION

A MODIFIED DIRECTIONAL FREQUENCY REUSE PLAN BASED ON CHANNEL ALTERNATION AND ROTATION A MODIFIED DIRECTIONAL FREQUENCY REUSE PLAN BASED ON CHANNEL ALTERNATION AND ROTATION Vncent A. Nguyen Peng-Jun Wan Ophr Freder Computer Scence Department Illnos Insttute of Technology Chcago, Illnos vnguyen@t.edu,

More information

Review: Our Approach 2. CSC310 Information Theory

Review: Our Approach 2. CSC310 Information Theory CSC30 Informaton Theory Sam Rowes Lecture 3: Provng the Kraft-McMllan Inequaltes September 8, 6 Revew: Our Approach The study of both compresson and transmsson requres that we abstract data and messages

More information

MTBF PREDICTION REPORT

MTBF PREDICTION REPORT MTBF PREDICTION REPORT PRODUCT NAME: BLE112-A-V2 Issued date: 01-23-2015 Rev:1.0 Copyrght@2015 Bluegga Technologes. All rghts reserved. 1 MTBF PREDICTION REPORT... 1 PRODUCT NAME: BLE112-A-V2... 1 1.0

More information

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985 NATONAL RADO ASTRONOMY OBSERVATORY Green Bank, West Vrgna SPECTRAL PROCESSOR MEMO NO. 25 MEMORANDUM February 13, 1985 To: Spectral Processor Group From: R. Fsher Subj: Some Experments wth an nteger FFT

More information

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation 1 Parameter Free Iteratve Decodng Metrcs for Non-Coherent Orthogonal Modulaton Albert Gullén Fàbregas and Alex Grant Abstract We study decoder metrcs suted for teratve decodng of non-coherently detected

More information

DUE TO process scaling, the number of devices on a

DUE TO process scaling, the number of devices on a IEEE TRANSACTIONS ON COMPONENTS, PACKAGING AND MANUFACTURING TECHNOLOGY, VOL. 1, NO. 11, NOVEMBER 011 1839 Latency Inserton Method (LIM) for DC Analyss of Power Supply Networks Dmtr Klokotov, Patrck Goh,

More information

A Novel Optimization of the Distance Source Routing (DSR) Protocol for the Mobile Ad Hoc Networks (MANET)

A Novel Optimization of the Distance Source Routing (DSR) Protocol for the Mobile Ad Hoc Networks (MANET) A Novel Optmzaton of the Dstance Source Routng (DSR) Protocol for the Moble Ad Hoc Networs (MANET) Syed S. Rzv 1, Majd A. Jafr, and Khaled Ellethy Computer Scence and Engneerng Department Unversty of Brdgeport

More information

Analysis of Time Delays in Synchronous and. Asynchronous Control Loops. Bj rn Wittenmark, Ben Bastian, and Johan Nilsson

Analysis of Time Delays in Synchronous and. Asynchronous Control Loops. Bj rn Wittenmark, Ben Bastian, and Johan Nilsson 37th CDC, Tampa, December 1998 Analyss of Delays n Synchronous and Asynchronous Control Loops Bj rn Wttenmark, Ben Bastan, and Johan Nlsson emal: bjorn@control.lth.se, ben@control.lth.se, and johan@control.lth.se

More information

Discussion on How to Express a Regional GPS Solution in the ITRF

Discussion on How to Express a Regional GPS Solution in the ITRF 162 Dscusson on How to Express a Regonal GPS Soluton n the ITRF Z. ALTAMIMI 1 Abstract The usefulness of the densfcaton of the Internatonal Terrestral Reference Frame (ITRF) s to facltate ts access as

More information

Distributed Uplink Scheduling in EV-DO Rev. A Networks

Distributed Uplink Scheduling in EV-DO Rev. A Networks Dstrbuted Uplnk Schedulng n EV-DO ev. A Networks Ashwn Srdharan (Sprnt Nextel) amesh Subbaraman, och Guérn (ESE, Unversty of Pennsylvana) Overvew of Problem Most modern wreless systems Delver hgh performance

More information

UNIT 11 TWO-PERSON ZERO-SUM GAMES WITH SADDLE POINT

UNIT 11 TWO-PERSON ZERO-SUM GAMES WITH SADDLE POINT UNIT TWO-PERSON ZERO-SUM GAMES WITH SADDLE POINT Structure. Introducton Obectves. Key Terms Used n Game Theory.3 The Maxmn-Mnmax Prncple.4 Summary.5 Solutons/Answers. INTRODUCTION In Game Theory, the word

More information

EE 508 Lecture 6. Degrees of Freedom The Approximation Problem

EE 508 Lecture 6. Degrees of Freedom The Approximation Problem EE 508 Lecture 6 Degrees of Freedom The Approxmaton Problem Revew from Last Tme Desgn Strategy Theorem: A crcut wth transfer functon T(s) can be obtaned from a crcut wth normalzed transfer functon T n

More information

MASTER TIMING AND TOF MODULE-

MASTER TIMING AND TOF MODULE- MASTER TMNG AND TOF MODULE- G. Mazaher Stanford Lnear Accelerator Center, Stanford Unversty, Stanford, CA 9409 USA SLAC-PUB-66 November 99 (/E) Abstract n conjuncton wth the development of a Beam Sze Montor

More information

A TWO-PLAYER MODEL FOR THE SIMULTANEOUS LOCATION OF FRANCHISING SERVICES WITH PREFERENTIAL RIGHTS

A TWO-PLAYER MODEL FOR THE SIMULTANEOUS LOCATION OF FRANCHISING SERVICES WITH PREFERENTIAL RIGHTS A TWO-PLAYER MODEL FOR THE SIMULTANEOUS LOCATION OF FRANCHISING SERVICES WITH PREFERENTIAL RIGHTS Pedro Godnho and oana Das Faculdade de Economa and GEMF Unversdade de Combra Av. Das da Slva 65 3004-5

More information

Revision of Lecture Twenty-One

Revision of Lecture Twenty-One Revson of Lecture Twenty-One FFT / IFFT most wdely found operatons n communcaton systems Important to know what are gong on nsde a FFT / IFFT algorthm Wth the ad of FFT / IFFT, ths lecture looks nto OFDM

More information

MODEL ORDER REDUCTION AND CONTROLLER DESIGN OF DISCRETE SYSTEM EMPLOYING REAL CODED GENETIC ALGORITHM J. S. Yadav, N. P. Patidar, J.

MODEL ORDER REDUCTION AND CONTROLLER DESIGN OF DISCRETE SYSTEM EMPLOYING REAL CODED GENETIC ALGORITHM J. S. Yadav, N. P. Patidar, J. ABSTRACT Research Artcle MODEL ORDER REDUCTION AND CONTROLLER DESIGN OF DISCRETE SYSTEM EMPLOYING REAL CODED GENETIC ALGORITHM J. S. Yadav, N. P. Patdar, J. Sngha Address for Correspondence Maulana Azad

More information

Priority based Dynamic Multiple Robot Path Planning

Priority based Dynamic Multiple Robot Path Planning 2nd Internatonal Conference on Autonomous obots and Agents Prorty based Dynamc Multple obot Path Plannng Abstract Taxong Zheng Department of Automaton Chongqng Unversty of Post and Telecommuncaton, Chna

More information

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

A study of turbo codes for multilevel modulations in Gaussian and mobile channels A study of turbo codes for multlevel modulatons n Gaussan and moble channels Lamne Sylla and Paul Forter (sylla, forter)@gel.ulaval.ca Department of Electrcal and Computer Engneerng Laval Unversty, Ste-Foy,

More information

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode A Hgh-Senstvty Oversamplng Dgtal Sgnal Detecton Technque for CMOS Image Sensors Usng Non-destructve Intermedate Hgh-Speed Readout Mode Shoj Kawahto*, Nobuhro Kawa** and Yoshak Tadokoro** *Research Insttute

More information

ANNUAL OF NAVIGATION 11/2006

ANNUAL OF NAVIGATION 11/2006 ANNUAL OF NAVIGATION 11/2006 TOMASZ PRACZYK Naval Unversty of Gdyna A FEEDFORWARD LINEAR NEURAL NETWORK WITH HEBBA SELFORGANIZATION IN RADAR IMAGE COMPRESSION ABSTRACT The artcle presents the applcaton

More information

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol., No., November 23, 3-9 Rejecton of PSK Interference n DS-SS/PSK System Usng Adaptve Transversal Flter wth Condtonal Response Recalculaton Zorca Nkolć, Bojan

More information

A New Type of Weighted DV-Hop Algorithm Based on Correction Factor in WSNs

A New Type of Weighted DV-Hop Algorithm Based on Correction Factor in WSNs Journal of Communcatons Vol. 9, No. 9, September 2014 A New Type of Weghted DV-Hop Algorthm Based on Correcton Factor n WSNs Yng Wang, Zhy Fang, and Ln Chen Department of Computer scence and technology,

More information

problems palette of David Rock and Mary K. Porter 6. A local musician comes to your school to give a performance

problems palette of David Rock and Mary K. Porter 6. A local musician comes to your school to give a performance palette of problems Davd Rock and Mary K. Porter 1. If n represents an nteger, whch of the followng expressons yelds the greatest value? n,, n, n, n n. A 60-watt lghtbulb s used for 95 hours before t burns

More information

Piecewise Linear Approximation of Generators Cost Functions Using Max-Affine Functions

Piecewise Linear Approximation of Generators Cost Functions Using Max-Affine Functions Pecewse Lnear Approxmaton of Generators Cost Functons Usng Max-Affne Functons Hamed Ahmad José R. Martí School of Electrcal and Computer Engneerng Unversty of Brtsh Columba Vancouver, BC, Canada Emal:

More information

Joint Adaptive Modulation and Power Allocation in Cognitive Radio Networks

Joint Adaptive Modulation and Power Allocation in Cognitive Radio Networks I. J. Communcatons, etwork and System Scences, 8, 3, 7-83 Publshed Onlne August 8 n ScRes (http://www.scrp.org/journal/jcns/). Jont Adaptve Modulaton and Power Allocaton n Cogntve Rado etworks Dong LI,

More information

Resource Control for Elastic Traffic in CDMA Networks

Resource Control for Elastic Traffic in CDMA Networks Resource Control for Elastc Traffc n CDMA Networks Vaslos A. Srs Insttute of Computer Scence, FORTH Crete, Greece vsrs@cs.forth.gr ACM MobCom 2002 Sep. 23-28, 2002, Atlanta, U.S.A. Funded n part by BTexact

More information

Introduction to Coalescent Models. Biostatistics 666

Introduction to Coalescent Models. Biostatistics 666 Introducton to Coalescent Models Bostatstcs 666 Prevously Allele frequences Hardy Wenberg Equlbrum Lnkage Equlbrum Expected state for dstant markers Lnkage Dsequlbrum Assocaton between neghborng alleles

More information

Digital Transmission

Digital Transmission Dgtal Transmsson Most modern communcaton systems are dgtal, meanng that the transmtted normaton sgnal carres bts and symbols rather than an analog sgnal. The eect o C/N rato ncrease or decrease on dgtal

More information

Topology Control for C-RAN Architecture Based on Complex Network

Topology Control for C-RAN Architecture Based on Complex Network Topology Control for C-RAN Archtecture Based on Complex Network Zhanun Lu, Yung He, Yunpeng L, Zhaoy L, Ka Dng Chongqng key laboratory of moble communcatons technology Chongqng unversty of post and telecommuncaton

More information

Introduction to Coalescent Models. Biostatistics 666 Lecture 4

Introduction to Coalescent Models. Biostatistics 666 Lecture 4 Introducton to Coalescent Models Bostatstcs 666 Lecture 4 Last Lecture Lnkage Equlbrum Expected state for dstant markers Lnkage Dsequlbrum Assocaton between neghborng alleles Expected to decrease wth dstance

More information

Development of a High-Order Discontinuous Galerkin Fluid Solver Within SU2

Development of a High-Order Discontinuous Galerkin Fluid Solver Within SU2 Development of a Hgh-Order Dscontnuous Galern Flud Solver Wthn SU2 Edwn van der Wede Department of Mechancal Engneerng Unversty of Twente Thomas D. Economon, Juan J. Alonso, Jae hwan Cho, Carlos da Slva

More information

Learning Ensembles of Convolutional Neural Networks

Learning Ensembles of Convolutional Neural Networks Learnng Ensembles of Convolutonal Neural Networks Lran Chen The Unversty of Chcago Faculty Mentor: Greg Shakhnarovch Toyota Technologcal Insttute at Chcago 1 Introducton Convolutonal Neural Networks (CNN)

More information

Micro-grid Inverter Parallel Droop Control Method for Improving Dynamic Properties and the Effect of Power Sharing

Micro-grid Inverter Parallel Droop Control Method for Improving Dynamic Properties and the Effect of Power Sharing 2015 AASRI Internatonal Conference on Industral Electroncs and Applcatons (IEA 2015) Mcro-grd Inverter Parallel Droop Control Method for Improvng Dynamc Propertes and the Effect of Power Sharng aohong

More information

Opportunistic Beamforming for Finite Horizon Multicast

Opportunistic Beamforming for Finite Horizon Multicast Opportunstc Beamformng for Fnte Horzon Multcast Gek Hong Sm, Joerg Wdmer, and Balaj Rengarajan allyson.sm@mdea.org, joerg.wdmer@mdea.org, and balaj.rengarajan@gmal.com Insttute IMDEA Networks, Madrd, Span

More information

Sensors for Motion and Position Measurement

Sensors for Motion and Position Measurement Sensors for Moton and Poston Measurement Introducton An ntegrated manufacturng envronment conssts of 5 elements:- - Machne tools - Inspecton devces - Materal handlng devces - Packagng machnes - Area where

More information

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b 2nd Internatonal Conference on Computer Engneerng, Informaton Scence & Applcaton Technology (ICCIA 207) Research of Dspatchng Method n Elevator Group Control System Based on Fuzzy Neural Network Yufeng

More information

A Preliminary Study on Targets Association Algorithm of Radar and AIS Using BP Neural Network

A Preliminary Study on Targets Association Algorithm of Radar and AIS Using BP Neural Network Avalable onlne at www.scencedrect.com Proceda Engneerng 5 (2 44 445 A Prelmnary Study on Targets Assocaton Algorthm of Radar and AIS Usng BP Neural Networ Hu Xaoru a, Ln Changchuan a a Navgaton Insttute

More information

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 13, NO. 12, DECEMBER

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 13, NO. 12, DECEMBER IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 3, NO. 2, DECEMBER 204 695 On Spatal Capacty of Wreless Ad Hoc Networks wth Threshold Based Schedulng Yue Lng Che, Student Member, IEEE, Ru Zhang, Member,

More information

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme Performance Analyss of Mult User MIMO System wth Block-Dagonalzaton Precodng Scheme Yoon Hyun m and Jn Young m, wanwoon Unversty, Department of Electroncs Convergence Engneerng, Wolgye-Dong, Nowon-Gu,

More information

Iterative Water-filling for Load-balancing in

Iterative Water-filling for Load-balancing in Iteratve Water-fllng for Load-balancng n Wreless LAN or Mcrocellular Networks Jeremy K. Chen Theodore S. Rappaport Gustavo de Vecana Wreless Networkng and Communcatons Group (WNCG), The Unversty of Texas

More information

Controlled Random Search Optimization For Linear Antenna Arrays

Controlled Random Search Optimization For Linear Antenna Arrays L. MERAD, F. T. BENDIMERAD, S. M. MERIAH, CONTROLLED RANDOM SEARCH OPTIMIZATION FOR LINEAR Controlled Random Search Optmzaton For Lnear Antenna Arrays Lotf MERAD, Feth Tar BENDIMERAD, Sd Mohammed MERIAH

More information

Th P5 13 Elastic Envelope Inversion SUMMARY. J.R. Luo* (Xi'an Jiaotong University), R.S. Wu (UC Santa Cruz) & J.H. Gao (Xi'an Jiaotong University)

Th P5 13 Elastic Envelope Inversion SUMMARY. J.R. Luo* (Xi'an Jiaotong University), R.S. Wu (UC Santa Cruz) & J.H. Gao (Xi'an Jiaotong University) -4 June 5 IFEMA Madrd h P5 3 Elastc Envelope Inverson J.R. Luo* (X'an Jaotong Unversty), R.S. Wu (UC Santa Cruz) & J.H. Gao (X'an Jaotong Unversty) SUMMARY We developed the elastc envelope nverson method.

More information

The Spectrum Sharing in Cognitive Radio Networks Based on Competitive Price Game

The Spectrum Sharing in Cognitive Radio Networks Based on Competitive Price Game 8 Y. B. LI, R. YAG, Y. LI, F. YE, THE SPECTRUM SHARIG I COGITIVE RADIO ETWORKS BASED O COMPETITIVE The Spectrum Sharng n Cogntve Rado etworks Based on Compettve Prce Game Y-bng LI, Ru YAG., Yun LI, Fang

More information

Design of Shunt Active Filter for Harmonic Compensation in a 3 Phase 3 Wire Distribution Network

Design of Shunt Active Filter for Harmonic Compensation in a 3 Phase 3 Wire Distribution Network Internatonal Journal of Research n Electrcal & Electroncs Engneerng olume 1, Issue 1, July-September, 2013, pp. 85-92, IASTER 2013 www.aster.com, Onlne: 2347-5439, Prnt: 2348-0025 Desgn of Shunt Actve

More information

Multiband Jamming Strategies with Minimum Rate Constraints

Multiband Jamming Strategies with Minimum Rate Constraints Multband Jammng Strateges wth Mnmum Rate Constrants Karm Banawan, Sennur Ulukus, Peng Wang, and Bran Henz Department of Electrcal and Computer Engneerng, Unversty of Maryland, College Park, MD 7 US Army

More information

Target Response Adaptation for Correlation Filter Tracking

Target Response Adaptation for Correlation Filter Tracking Target Response Adaptaton for Correlaton Flter Tracng Adel Bb, Matthas Mueller, and Bernard Ghanem Image and Vdeo Understandng Laboratory IVUL, Kng Abdullah Unversty of Scence and Technology KAUST, Saud

More information

4.3- Modeling the Diode Forward Characteristic

4.3- Modeling the Diode Forward Characteristic 2/8/2012 3_3 Modelng the ode Forward Characterstcs 1/3 4.3- Modelng the ode Forward Characterstc Readng Assgnment: pp. 179-188 How do we analyze crcuts wth juncton dodes? 2 ways: Exact Solutons ffcult!

More information

Optimizing a System of Threshold-based Sensors with Application to Biosurveillance

Optimizing a System of Threshold-based Sensors with Application to Biosurveillance Optmzng a System of Threshold-based Sensors wth Applcaton to Bosurvellance Ronald D. Frcker, Jr. Thrd Annual Quanttatve Methods n Defense and Natonal Securty Conference May 28, 2008 What s Bosurvellance?

More information

Enhancing Throughput in Wireless Multi-Hop Network with Multiple Packet Reception

Enhancing Throughput in Wireless Multi-Hop Network with Multiple Packet Reception Enhancng Throughput n Wreless Mult-Hop Network wth Multple Packet Recepton Ja-lang Lu, Paulne Vandenhove, We Shu, Mn-You Wu Dept. of Computer Scence & Engneerng, Shangha JaoTong Unversty, Shangha, Chna

More information

Space Time Equalization-space time codes System Model for STCM

Space Time Equalization-space time codes System Model for STCM Space Tme Eualzaton-space tme codes System Model for STCM The system under consderaton conssts of ST encoder, fadng channel model wth AWGN, two transmt antennas, one receve antenna, Vterb eualzer wth deal

More information

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION Phaneendra R.Venkata, Nathan A. Goodman Department of Electrcal and Computer Engneerng, Unversty of Arzona, 30 E. Speedway Blvd, Tucson, Arzona

More information

Equivalent Circuit Model of Electromagnetic Behaviour of Wire Objects by the Matrix Pencil Method

Equivalent Circuit Model of Electromagnetic Behaviour of Wire Objects by the Matrix Pencil Method ERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 5, No., May 008, -0 Equvalent Crcut Model of Electromagnetc Behavour of Wre Objects by the Matrx Pencl Method Vesna Arnautovsk-Toseva, Khall El Khamlch Drss,

More information

Generalized Incomplete Trojan-Type Designs with Unequal Cell Sizes

Generalized Incomplete Trojan-Type Designs with Unequal Cell Sizes Internatonal Journal of Theoretcal & Appled Scences 6(1): 50-54(2014) ISSN No. (Prnt): 0975-1718 ISSN No. (Onlne): 2249-3247 Generalzed Incomplete Trojan-Type Desgns wth Unequal Cell Szes Cn Varghese,

More information

AN IMPROVED BIT LOADING TECHNIQUE FOR ENHANCED ENERGY EFFICIENCY IN NEXT GENERATION VOICE/VIDEO APPLICATIONS

AN IMPROVED BIT LOADING TECHNIQUE FOR ENHANCED ENERGY EFFICIENCY IN NEXT GENERATION VOICE/VIDEO APPLICATIONS Journal of Engneerng Scence and Technology Vol., o. 4 (6) 476-495 School of Engneerng, Taylor s Unversty A IMPROVED BIT LOADIG TECHIQUE FOR EHACED EERGY EFFICIECY I EXT GEERATIO VOICE/VIDEO APPLICATIOS

More information

A Mathematical Model for Restoration Problem in Smart Grids Incorporating Load Shedding Concept

A Mathematical Model for Restoration Problem in Smart Grids Incorporating Load Shedding Concept J. Appl. Envron. Bol. Sc., 5(1)20-27, 2015 2015, TextRoad Publcaton ISSN: 2090-4274 Journal of Appled Envronmental and Bologcal Scences www.textroad.com A Mathematcal Model for Restoraton Problem n Smart

More information

Ad hoc Service Grid A Self-Organizing Infrastructure for Mobile Commerce

Ad hoc Service Grid A Self-Organizing Infrastructure for Mobile Commerce Ad hoc Servce Grd A Self-Organzng Infrastructure for Moble Commerce Klaus Herrmann, Kurt Gehs, Gero Mühl Berln Unversty of Technology Emal: klaus.herrmann@acm.org Web: http://www.vs.tu-berln.de/herrmann/

More information

Decomposition Principles and Online Learning in Cross-Layer Optimization for Delay-Sensitive Applications

Decomposition Principles and Online Learning in Cross-Layer Optimization for Delay-Sensitive Applications Techncal Report Decomposton Prncples and Onlne Learnng n Cross-Layer Optmzaton for Delay-Senstve Applcatons Abstract In ths report, we propose a general cross-layer optmzaton framework n whch we explctly

More information

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13 A Hgh Gan DC - DC Converter wth Soft Swtchng and Power actor Correcton for Renewable Energy Applcaton T. Selvakumaran* and. Svachdambaranathan Department of EEE, Sathyabama Unversty, Chenna, Inda. *Correspondng

More information

Electrical Capacitance Tomography with a Square Sensor

Electrical Capacitance Tomography with a Square Sensor Electrcal Capactance Tomography wth a Square Sensor W Q Yang * Department of Electrcal Engneerng and Electroncs, Process Tomography Group, UMIST, P O Box 88, Manchester M60 QD, UK, emal w.yang@umst.ac.uk

More information

Utility-based Routing

Utility-based Routing Utlty-based Routng Je Wu Dept. of Computer and Informaton Scences Temple Unversty Roadmap Introducton Why Another Routng Scheme Utlty-Based Routng Implementatons Extensons Some Fnal Thoughts 2 . Introducton

More information

Optimum Allocation of Distributed Generations Based on Evolutionary Programming for Loss Reduction and Voltage Profile Correction

Optimum Allocation of Distributed Generations Based on Evolutionary Programming for Loss Reduction and Voltage Profile Correction ISSN : 0976-8491(Onlne) ISSN : 2229-4333(rnt) Optmum Allocaton of Dstrbuted Generatons Based on Evolutonary rogrammng for Reducton and Voltage rofle Correcton 1 Mohammad Saleh Male, 2 Soodabeh Soleyman

More information

Uplink User Selection Scheme for Multiuser MIMO Systems in a Multicell Environment

Uplink User Selection Scheme for Multiuser MIMO Systems in a Multicell Environment Uplnk User Selecton Scheme for Multuser MIMO Systems n a Multcell Envronment Byong Ok Lee School of Electrcal Engneerng and Computer Scence and INMC Seoul Natonal Unversty leebo@moble.snu.ac.kr Oh-Soon

More information

Chapter 13. Filters Introduction Ideal Filter

Chapter 13. Filters Introduction Ideal Filter Chapter 3 Flters 3.0 Introducton Flter s the crcut that capable o passng sgnal rom nput to output that has requency wthn a speced band and attenuatng all others outsde the band. Ths s the property o selectvty.

More information

An Energy Efficient Hierarchical Clustering Algorithm for Wireless Sensor Networks

An Energy Efficient Hierarchical Clustering Algorithm for Wireless Sensor Networks An Energy Effcent Herarchcal Clusterng Algorthm for Wreless Sensor Networks Seema Bandyopadhyay and Edward J. Coyle School of Electrcal and Computer Engneerng Purdue Unversty West Lafayette, IN, USA {seema,

More information

Cloud of Things for Sensing-as-a-Service: Architecture, Algorithms, and Use Case

Cloud of Things for Sensing-as-a-Service: Architecture, Algorithms, and Use Case Cloud of Thngs for Sensng-as-a-Servce: Archtecture, Algorthms, and Use Case Sherf Abdelwahab, Bechr Hamdaou, Mohsen Guzan, and Taeb Znat Oregon State Unversty, abdelwas,hamdaou@eecs.orst.edu Unversty of

More information

Application of Intelligent Voltage Control System to Korean Power Systems

Application of Intelligent Voltage Control System to Korean Power Systems Applcaton of Intellgent Voltage Control System to Korean Power Systems WonKun Yu a,1 and HeungJae Lee b, *,2 a Department of Power System, Seol Unversty, South Korea. b Department of Power System, Kwangwoon

More information

FFT Spectrum Analyzer

FFT Spectrum Analyzer THE ANNUAL SYMPOSIUM OF THE INSTITUTE OF SOLID MECHANICS SISOM 22 BUCHAREST May 16-17 ----------------------------------------------------------------------------------------------------------------------------------------

More information