Fast and Scalable Eigensolvers for Multicore and Hybrid Architectures
|
|
- Carmel Davis
- 5 years ago
- Views:
Transcription
1 Fast and Scalable Eigensolvers for Multicore and Hybrid Architectures Paolo Bientinesi AICES, RWTH Aachen 40th SPEEDUP Workshop on High-Performance Computing February 6 7, 2012 University of Basel, Switzerland Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
2 1 The Problem 2 Architectures and Libraries 3 Multicore Processors: MR 3 -SMP 4 Distributed Memory Architectures: PMRRR 5 GPUs 6 Conclusions Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
3 Symmetric Dense Eigenproblem AX = XΛ STDEIG AX = XBΛ GENEIG Input: A C n n, A H =A B C n n, SPD k, 1 k n #eigenpairs Output: X C n k, Λ R k k, eigenvectors eigenvalues Accuracy: AX XΛ, X H X I, residual orthogonality Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
4 6-stage approach GENEIG AX = XBΛ 1 LL H = B Cholesky factorization O(n 3 ) 2 M L 1 AL H Reduction to standard form O(n 3 ) 3 T = Q H MQ Reduction to tridiagonal form O(n 3 ) 4 T Z = ZΛ Tridiagonal eigenproblem O(kn) O(n 3 ) 5 Y = QZ Backtransformation #1 O(kn 2 ) 6 X = L H Y Backtransformation #2 O(kn 2 ) Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
5 Nested Eigensolvers GENEIG STDEIG TRDEIG 1 LL H = B Cholesky factorization O(n 3 ) 2 M L 1 AL H Reduction to standard form O(n 3 ) 3 T = Q H MQ Reduction to tridiagonal form O(n 3 ) 4 T Z = ZΛ Tridiagonal eigenproblem O(kn) O(n 3 ) 5 Y = QZ Backtransformation #1 O(kn 2 ) 6 X = L H Y Backtransformation #2 O(kn 2 ) Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
6 Algorithms Stage 4: TRDEIG 1958 Bisection + Inverse Iteration (BI) subsets O(kn 2 ) 1961 QR high-accuracy O(n 3 ) 1981 Divide & Conquer (DC) BLAS3, accurate O(n 3 ) 1997 MRRR subsets, no re-orth. O(kn) Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
7 Algorithms Stage 4: TRDEIG 1958 Bisection + Inverse Iteration (BI) subsets O(kn 2 ) 1961 QR high-accuracy O(n 3 ) 1981 Divide & Conquer (DC) BLAS3, accurate O(n 3 ) 1997 MRRR subsets, no re-orth. O(kn) Stage 3: Reduction to TRDEIG 1-stage Householder Successive Banded Reduction Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
8 1 The Problem 2 Architectures and Libraries 3 Multicore Processors: MR 3 -SMP 4 Distributed Memory Architectures: PMRRR 5 GPUs 6 Conclusions Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
9 Numerical Libs Development Cycle (?) (0) New architecture Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
10 Numerical Libs Development Cycle (?) (0) New architecture (1) GEMM Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
11 Numerical Libs Development Cycle (?) (0) New architecture (1) GEMM ( peak performance), FFT Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
12 Numerical Libs Development Cycle (?) (0) New architecture (1) GEMM ( peak performance), FFT (2) BLAS3, factorizations, AX=B Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
13 Numerical Libs Development Cycle (?) (0) New architecture (1) GEMM ( peak performance), FFT (2) BLAS3, factorizations, AX=B LINPACK benchmark Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
14 Numerical Libs Development Cycle (?) (0) New architecture (1) GEMM ( peak performance), FFT (2) BLAS3, factorizations, AX=B LINPACK benchmark (2) BLAS3, factorizations, AX=B Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
15 Numerical Libs Development Cycle (?) (0) New architecture (1) GEMM ( peak performance), FFT (2) BLAS3, factorizations, AX=B LINPACK benchmark (2) BLAS3, factorizations, AX=B (2) factorizations, AX=B Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
16 Numerical Libs Development Cycle (?) (0) New architecture (1) GEMM ( peak performance), FFT (2) BLAS3, factorizations, AX=B LINPACK benchmark (2) BLAS3, factorizations, AX=B (2) factorizations, AX=B (2) factorizations, AX=B Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
17 Numerical Libs Development Cycle (?) (0) New architecture (1) GEMM ( peak performance), FFT (2) BLAS3, factorizations, AX=B LINPACK benchmark (2) BLAS3, factorizations, AX=B (2) factorizations, AX=B (2) factorizations, AX=B (3) factorizations, AX=B, matrix operations Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
18 Numerical Libs Development Cycle (?) (0) New architecture (1) GEMM ( peak performance), FFT (2) BLAS3, factorizations, AX=B LINPACK benchmark (2) BLAS3, factorizations, AX=B (2) factorizations, AX=B (2) factorizations, AX=B (3) factorizations, AX=B, matrix operations. Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
19 Numerical Libs Development Cycle (?) (0) New architecture (1) GEMM ( peak performance), FFT (2) BLAS3, factorizations, AX=B LINPACK benchmark (2) BLAS3, factorizations, AX=B (2) factorizations, AX=B (2) factorizations, AX=B (3) factorizations, AX=B, matrix operations.. (4) Eigenproblems Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
20 Numerical Libs Development Cycle (?) (0) New architecture (1) GEMM ( peak performance), FFT (2) BLAS3, factorizations, AX=B LINPACK benchmark (2) BLAS3, factorizations, AX=B (2) factorizations, AX=B (2) factorizations, AX=B (3) factorizations, AX=B, matrix operations.. (4) Eigenproblems HPC Linear solvers Eigensolvers Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
21 History Eigensolvers? : Cell GEMM: 99% FFT Linear systems: HPL 2008: Roadrunner > 1 PetaFLOP 2009: discontinued Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
22 History Eigensolvers? : GPGPUs CUBLAS (*) HPL, Top500 CULA FLAME, MAGMA Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
23 History: Eigensolvers?? : multicores GEMM mt BLAS HPL, Top500 FLAME, PLASMA Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
24 Our contributions MR 3 -SMP multithreaded Matthias Petschow RWTH Aachen PMRRR, EleMRRR hybrid MPI + MT Matthias Petschow RWTH Aachen Jack Poulson UT Austin GPUs Christian Lessig University of Toronto Enrique Quintana-Ortí Universidad Jaume I Francisco Igual Universidad Jaume I Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
25 1 The Problem 2 Architectures and Libraries 3 Multicore Processors: MR 3 -SMP 4 Distributed Memory Architectures: PMRRR 5 GPUs 6 Conclusions Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
26 Multi-threaded BLAS Xeon, 32 physical cores 1 Efficiency of GEMM 0.8 Efficiency Matrix dimension 1 thread 2 threads 4 threads 8 threads 16 threads 32 threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
27 Multi-threaded BLAS for TRDEIG? Tridiagonal eigensolvers. Matrix size=4289, from DFT. Time in seconds MRRR (MKL) DC (MKL) QR (MKL) BI (MKL) Time in seconds MRRR (MKL) MRRR (LAPACK) DC (MKL) Number of threads Number of threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
28 Multi-threaded BLAS for TRDEIG? Tridiagonal eigensolvers. Matrix size=4289, from DFT. Time in seconds MRRR (MKL) DC (MKL) QR (MKL) BI (MKL) Time in seconds MRRR (MKL) MRRR (LAPACK) DC (MKL) Number of threads Number of threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
29 Multi-threaded BLAS for TRDEIG? Tridiagonal eigensolvers. Matrix size=4289, from DFT. Time in seconds MRRR (MKL) DC (MKL) QR (MKL) BI (MKL) Time in seconds MRRR (MKL) MRRR (LAPACK) DC (MKL) Number of threads Number of threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
30 Multi-threaded BLAS for TRDEIG? Tridiagonal eigensolvers. Matrix size=4289, from DFT. Time in seconds MRRR (MKL) DC (MKL) QR (MKL) BI (MKL) Time in seconds MRRR (MKL) MRRR (LAPACK) DC (MKL) Number of threads Number of threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
31 Multi-threaded BLAS for TRDEIG? Tridiagonal eigensolvers. Matrix size=4289, from DFT. Time in seconds MRRR (MKL) DC (MKL) QR (MKL) BI (MKL) Time in seconds MRRR (MKL) MRRR (LAPACK) DC (MKL) Number of threads Number of threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
32 More motivation? MR3 is O(n 2 ) anyway... Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
33 More motivation? MR3 is O(n 2 ) anyway... Fraction of execution time N = 4,289 Backtransformation Sequential MRRR Reduction Number of threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
34 MRRR Dhillon & Parlett (1998) Multiple Relatively Robust Representations first stable algorithm to compute k eigenpairs in O(nk) ops no reorthogonalization Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
35 MRRR Dhillon & Parlett (1998) Multiple Relatively Robust Representations first stable algorithm to compute k eigenpairs in O(nk) ops no reorthogonalization 1) eigenvalues 2) eigenvectors + eigenvalues eigenvalues: dqds or Bisection eigenvectors: Compute 1-(λ, z) Scan λ s sep. cluster Shift New RRR Refine λ s Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
36 MRRR Dhillon & Parlett (1998) Multiple Relatively Robust Representations first stable algorithm to compute k eigenpairs in O(nk) ops no reorthogonalization 1) eigenvalues 2) eigenvectors + eigenvalues eigenvalues: dqds or Bisection eigenvectors: Compute 1-(λ, z) Scan λ s sep. cluster Shift New RRR Refine λ s Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
37 Representation Tree Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
38 MR 3 -SMP: the work queue Tasks: a) Singleton S: Eigenvector computation b) Cluster C: Shift + new representation (RRR) c) New RRR R: Eigenvalues refinement Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
39 Example trace: 16 cores eigenvectors Matrix size: Execution time: 3.3s Sequential: 49.3s (LAPACK) Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
40 MR 3 -SMP: Timings Matrix size=4289, from DFT. Time in seconds MR 3 SMP MRRR (MKL) DC (MKL) QR (MKL) BI (MKL) Number of threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
41 MR 3 -SMP: Timings Matrix size=4289, from DFT. Time in seconds MR 3 SMP MRRR (MKL) DC (MKL) QR (MKL) BI (MKL) Time in seconds MR 3 SMP MRRR (MKL) MRRR (LAPACK) DC (MKL) Number of threads Number of threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
42 A larger example Matrix size=16023; frequency response analysis of automobiles. 600 N = Time in minutes MR 3 SMP MRRR (MKL) DC (MKL) QR (MKL) BI (MKL) Number of threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
43 A larger example Matrix size=16023; frequency response analysis of automobiles. 600 N = N = Time in minutes MR 3 SMP MRRR (MKL) DC (MKL) QR (MKL) BI (MKL) Time in seconds MR 3 SMP MRRR (MKL) MRRR (LAPACK) DC (MKL) Number of threads Number of threads From 9+ hours to 8.3 seconds. Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
44 Speedups Time in seconds Eigenvalues 4 Eigenvectors LAPACK Number of threads Speedup Ideal Eigenvalues (bisection) Eigenvectors (bisection) Eigenvectors(dqds) Total Number of threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
45 3 stages: before and after Fraction of execution time N = 4,289 Backtransformation Sequential MRRR Reduction Number of threads Fraction of execution time N = 4,289 Backtransformation MR 3 SMP Reduction Number of threads Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
46 1 The Problem 2 Architectures and Libraries 3 Multicore Processors: MR 3 -SMP 4 Distributed Memory Architectures: PMRRR 5 GPUs 6 Conclusions Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
47 Distributed memory PMRRR, EleMRRR Static assignment of eigenpairs to nodes Multithreading Node-node communication: only eigenvalues PMRRR + Elemental EleMRRR Generalized, standard and tridiagonal hybrid eigensolvers Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
48 TRDEIG: PMRRR matrix Wilkinson matrix Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
49 GENEIG: Weak & strong scaling Weak scalability Strong scalability, n=20000 Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
50 GENEIG: Efficiency Matrix size 20k 40k 80k 1 Parallel efficiency EleMRRR ScaLAPACK s DC ScaLAPACK s MRRR Number of cores Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
51 1 The Problem 2 Architectures and Libraries 3 Multicore Processors: MR 3 -SMP 4 Distributed Memory Architectures: PMRRR 5 GPUs 6 Conclusions Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
52 TRDEIG NVIDIA Tesla mrrr_dp = data-parallel MRRR rand(0,1) rand(-1,1) n LAPACK mrrr_dp LAPACK mrrr_dp Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
53 STDEIG Nehalem, 8 cores Reduction to tridiagonal form n LAPACK SBR SBR + GPU Reduction + backtransformation n LAPACK SBR SBR + GPU Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
54 1 The Problem 2 Architectures and Libraries 3 Multicore Processors: MR 3 -SMP 4 Distributed Memory Architectures: PMRRR 5 GPUs 6 Conclusions Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
55 Conclusions Multi-threaded BLAS for eigensolvers: not THAT good MR 3 -SMP, PMRRR, EleMRRR eigensolvers tailored for multi-core, distributed, hybrid architectures faster than LAPACK, MKL, ScaLAPACK almost perfect speedups software is available Financial support from the Deutsche Forschungsgemeinschaft (German Research Association) through grant GSC 111 is gratefully acknowledged. Paolo Bientinesi (AICES, RWTH Aachen) Fast and Scalable Eigensolvers February 6, / 34
Application of Maxwell Equations to Human Body Modelling
Application of Maxwell Equations to Human Body Modelling Fumie Costen Room E, E0c at Sackville Street Building, fc@cs.man.ac.uk The University of Manchester, U.K. February 5, 0 Fumie Costen Room E, E0c
More informationDesign of Parallel Algorithms. Communication Algorithms
+ Design of Parallel Algorithms Communication Algorithms + Topic Overview n One-to-All Broadcast and All-to-One Reduction n All-to-All Broadcast and Reduction n All-Reduce and Prefix-Sum Operations n Scatter
More informationChallenges in Transition
Challenges in Transition Keynote talk at International Workshop on Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC 2016) 1 Kazuaki Ishizaki IBM Research Tokyo kiszk@acm.org
More informationGPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links
DLR.de Chart 1 GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links Chen Tang chen.tang@dlr.de Institute of Communication and Navigation German Aerospace Center DLR.de Chart
More informationSignals and Systems. A signal is the representation of a physical wave
Signals and Systems A signal is the representation of a physical wave Expressed as a variable in time-space, for instance x(t) Signals that might vary are the voltage or current of a circuit, the force
More informationEarly Adopter : Multiprocessor Programming in the Undergraduate Program. NSF/TCPP Curriculum: Early Adoption at the University of Central Florida
Early Adopter : Multiprocessor Programming in the Undergraduate Program NSF/TCPP Curriculum: Early Adoption at the University of Central Florida Narsingh Deo Damian Dechev Mahadevan Vasudevan Department
More informationContribution to the Smecy Project
Alessio Pascucci Contribution to the Smecy Project Study some performance critical parts of Signal Processing Applications Study the parallelization methodology in order to achieve best performances on
More informationWhat can POP do for you?
What can POP do for you? Mike Dewar, NAG Ltd EU H2020 Center of Excellence (CoE) 1 October 2015 31 March 2018 Grant Agreement No 676553 Outline Overview of codes investigated Code audit & plan examples
More information6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS
6 TH INTERNATIONAL CONFERENCE ON APPLIED INTERNET AND INFORMATION TECHNOLOGIES 3-4 JUNE 2016, BITOLA, R. MACEDONIA PROCEEDINGS Editor: Publisher: Prof. Pece Mitrevski, PhD Faculty of Information and Communication
More informationDocument downloaded from:
Document downloaded from: http://hdl.handle.net/1251/64738 This paper must be cited as: Reaño González, C.; Pérez López, F.; Silla Jiménez, F. (215). On the design of a demo for exhibiting rcuda. 15th
More informationRecent Advances in Simulation Techniques and Tools
Recent Advances in Simulation Techniques and Tools Yuyang Li, li.yuyang(at)wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download Abstract: Simulation refers to using specified kind
More informationA GNU Radio-based Full Duplex Radio System
A GNU Radio-based Full Duplex Radio System Adam Parower The Aerospace Corporation September 13, 2017 2017 The Aerospace Corporation Agenda Theory: Full-Duplex What is Full Duplex? The Problem The Solution
More informationCP2K PERFORMANCE FROM CRAY XT3 TO XC30. Iain Bethune Fiona Reid Alfio Lazzaro
CP2K PERFORMANCE FROM CRAY XT3 TO XC30 Iain Bethune (ibethune@epcc.ed.ac.uk) Fiona Reid Alfio Lazzaro Outline CP2K Overview Features Parallel Algorithms Cray HPC Systems Trends Water Benchmarks 2005 2013
More informationMUMPS overview. MUMPS group, CERFACS, CNRS, ENS-Lyon, INRIA, INPT, Université Bordeaux 1
MUMPS overview Patrick Amestoy (INPT(ENSEEIHT)-IRIT), Abdou Guermouche (Univ. de Bordeaux), Jean-Yves L Excellent (Inria-LIP-ENS Lyon) MUMPS group, CERFACS, CNRS, ENS-Lyon, INRIA, INPT, Université Bordeaux
More informationParallel Dynamic and Selective Community Detection in Massive Streaming Graphs
Parallel Dynamic and Selective Community Detection in Massive Streaming Graphs European Conference on Data Analysis 2013, Luxembourg July 11, 2013 Christian L. Staudt, Yassine Marrakchi, Aleksejs Sazonovs
More informationPRACE PATC Course: Intel MIC Programming Workshop & Scientific Workshop: HPC for natural hazard assessment and disaster mitigation, June 2017,
PRACE PATC Course: Intel MIC Programming Workshop & Scientific Workshop: HPC for natural hazard assessment and disaster mitigation, 26-30 June 2017, LRZ CzeBaCCA Project Czech-Bavarian Competence Team
More informationGPUs: what are they good for?
GPUs: what are they good for? Mike Giles mike.giles@maths.ox.ac.uk Oxford e-research Centre University of Oxford Fujitsu Research Laboratories: Feb 1, 2011 GPUs p. 1 Outline CPUs and GPUs: comparison,
More informationComputational Simulations of The World s Biggest Eye on GPUs
Computational Simulations of The World s Biggest Eye on GPUs Hatem Ltaief Extreme Computing Research Center King Abdullah University of Science and Technology, Saudi Arabia NVIDIA GTC at San Jose, CA April
More informationThe Bump in the Road to Exaflops and Rethinking LINPACK
The Bump in the Road to Exaflops and Rethinking LINPACK Bob Meisner, Director Office of Advanced Simulation and Computing The Parker Ranch installation in Hawaii 1 Theme Actively preparing for imminent
More informationHybrid QR Factorization Algorithm for High Performance Computing Architectures. Peter Vouras Naval Research Laboratory Radar Division
Hybrid QR Factorization Algorithm for High Performance Computing Architectures Peter Vouras Naval Research Laboratory Radar Division 8/1/21 Professor G.G.L. Meyer Johns Hopkins University Parallel Computing
More informationCTWatch. the Promise and Perils of the Coming Multicore Revolution. and its Impact JACK DONGARRA ISSN VOLUME 3 NUMBER 1 FEBRUARY 2007
CTWatch ISSN 1555-9874 VOLUME 3 NUMBER 1 FEBRUARY 2007 the Promise and Perils of the Coming Multicore Revolution GUEST EDITOR and its Impact JACK DONGARRA CTWatch Quarterly February 2007 Introduction Over
More informationVirtual EM Prototyping: From Microwaves to Optics
Virtual EM Prototyping: From Microwaves to Optics Dr. Frank Demming, CST AG Dr. Avri Frenkel, Anafa Electromagnetic Solutions Virtual EM Prototyping Efficient Maxwell Equations solvers has been developed,
More informationWhen Should You Apply 3D Planar EM Simulation?
When Should You Apply 3D Planar EM Simulation? Agilent EEsof EDA IMS 2010 MicroApps Andy Howard Agilent Technologies 1 3D planar EM is now much more of a design tool Solves bigger problems and runs faster
More informationSCAI SuperComputing Application & Innovation. Sanzio Bassini October 2017
SCAI SuperComputing Application & Innovation Sanzio Bassini October 2017 The Consortium Private non for Profit Organization Founded in 1969 by Ministry of Public Education now under the control of Ministry
More informationPerformance Evaluation Of OFDM Based Wireless Communication Systems Using Graphics Processing Unit (GPU) Based High Performance Computing.
Performance Evaluation Of OFDM Based Wireless Communication Systems Using Graphics Processing Unit (GPU) Based High Performance Computing. A Thesis submitted in partial fulfillment of the Requirements
More informationLecture 8: Introduction to Hybrid FEM IE
Lecture 8: Introduction to Hybrid FEM IE 2015.0 Release ANSYS HFSS for Antenna Design 1 2015 ANSYS, Inc. Hybrid FEM-IE Solution Using HFSS and HFSS-IE Advantages of Hybrid Solution Leverage the strength
More informationA PageRank Algorithm based on Asynchronous Gauss-Seidel Iterations
Simulation A PageRank Algorithm based on Asynchronous Gauss-Seidel Iterations D. Silvestre, J. Hespanha and C. Silvestre 2018 American Control Conference Milwaukee June 27-29 2018 Silvestre, Hespanha and
More informationDr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar. Data programming model for an operation based parallel image processing system
Name: Affiliation: Field of research: Specific Field of Study: Proposed Research Topic: Dr Myat Su Hlaing Asia Research Center, Yangon University, Myanmar Information Science and Technology Computer Science
More informationSolving Large Multi-Scale Problems in CST STUDIO SUITE
Solving Large Multi-Scale Problems in CST STUDIO SUITE An Aircraft Application M. Kunze, Z. Reznicek, I. Munteanu, P. Tobola, F. Wolfheimer Motivation I New A/C concepts (fly-by-wire, all electric aircraft,
More informationOptimization of Tile Sets for DNA Self- Assembly
Optimization of Tile Sets for DNA Self- Assembly Joel Gawarecki Department of Computer Science Simpson College Indianola, IA 50125 joel.gawarecki@my.simpson.edu Adam Smith Department of Computer Science
More informationCUDA-Accelerated Satellite Communication Demodulation
CUDA-Accelerated Satellite Communication Demodulation Renliang Zhao, Ying Liu, Liheng Jian, Zhongya Wang School of Computer and Control University of Chinese Academy of Sciences Outline Motivation Related
More informationDevelopment of a parallel, tree-based neighbour-search algorithm
Mitglied der Helmholtz-Gemeinschaft Development of a parallel, tree-based neighbour-search algorithm for the tree-code PEPC 28.09.2010 Andreas Breslau Outline 1 Motivation 2 Short introduction to tree-codes
More informationCUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads
Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA
More informationProgramming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp
Programming and Optimization with Intel Xeon Phi Coprocessors Colfax Developer Training One-day Boot Camp Abstract: Colfax Developer Training (CDT) is an in-depth intensive course on efficient parallel
More informationNVIDIA GPU Computing Theater
NVIDIA GPU Computing Theater The theater will feature talks given by experts on a wide range of topics on high performance computing. Open to all attendees of SC10, the theater is located in the NVIDIA
More informationMultiple Clock and Voltage Domains for Chip Multi Processors
Multiple Clock and Voltage Domains for Chip Multi Processors Efraim Rotem- Intel Corporation Israel Avi Mendelson- Microsoft R&D Israel Ran Ginosar- Technion Israel institute of Technology Uri Weiser-
More informationModule 3 Greedy Strategy
Module 3 Greedy Strategy Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Introduction to Greedy Technique Main
More informationVampir Getting Started. Holger Brunst March 4th 2008
Vampir Getting Started Holger Brunst holger.brunst@tu-dresden.de March 4th 2008 What is Vampir? Program Monitoring, Visualization, and Analysis 1. Step: VampirTrace monitors your program s runtime behavior
More informationAccelerated Impulse Response Calculation for Indoor Optical Communication Channels
Accelerated Impulse Response Calculation for Indoor Optical Communication Channels M. Rahaim, J. Carruthers, and T.D.C. Little Department of Electrical and Computer Engineering Boston University, Boston,
More informationPRACE PATC Course Intel MIC Programming Workshop. February, 7-8, 2017, IT4Innovations, Ostrava, Czech Republic
PRACE PATC Course Intel MIC Programming Workshop February, 7-8, 2017, IT4Innovations, Ostrava, Czech Republic LRZ in the HPC Environment Bavarian Contribution to National Infrastructure HLRS@Stuttgart
More informationA Parallel Monte-Carlo Tree Search Algorithm
A Parallel Monte-Carlo Tree Search Algorithm Tristan Cazenave and Nicolas Jouandeau LIASD, Université Paris 8, 93526, Saint-Denis, France cazenave@ai.univ-paris8.fr n@ai.univ-paris8.fr Abstract. Monte-Carlo
More informationScalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL
Scalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL Dmitri Yudanov (Advanced Micro Devices, USA) Leon Reznik (Rochester Institute of Technology, USA) WCCI 2012, IJCNN, June
More informationDecember 10, Why HPC? Daniel Lucio.
December 10, 2015 Why HPC? Daniel Lucio dlucio@utk.edu A revolution in astronomy Galileo Galilei - 1609 2 What is HPC? "High-Performance Computing," or HPC, is the application of "supercomputers" to computational
More informationBen Baker. Sponsored by:
Ben Baker Sponsored by: Background Agenda GPU Computing Digital Image Processing at FamilySearch Potential GPU based solutions Performance Testing Results Conclusions and Future Work 2 CPU vs. GPU Architecture
More informationTrend of Software R&D for Numerical Simulation Hardware for parallel and distributed computing and software automatic tuning
SCIENCE & TECHNOLOGY TRENDS 4 Trend of Software R&D for Numerical Simulation Hardware for parallel and distributed computing and software automatic tuning Takao Furukawa Promoted Fields Unit Minoru Nomura
More informationRe-Visiting Power Measurement for the Green500
Re-Visiting Power Measurement for the Green500 Thomas R. W. Scogland (LLNL/CASC, Green500) The Green500 List and its Continuing 1 Evolution BoF, November 2014 Level 1 Requirements Workload phase: Measure
More informationEuropean Exascale Software Initiative: Numerical Libraries, Solvers and Algorithms
European Exascale Software Initiative: Numerical Libraries, Solvers and Algorithms Iain Duff STFC Rutherford Appleton Laboratory and CERFACS HPSS 2011. EuroPar 2011, Bordeaux, France. August 29 2011 2
More informationHardware Software Science Co-design in the Human Brain Project
Hardware Software Science Co-design in the Human Brain Project Wouter Klijn 29-11-2016 Pune, India 1 Content The Human Brain Project Hardware - HBP Pilot machines Software - A Neuron - NestMC: NEST Multi
More informationPetascale Quantum Simulations of Nano Systems and Biomolecules
Petascale Quantum Simulations of Nano Systems and Biomolecules J. Bernholc, E. Briggs, W. Lu,Y. Li and M. Hodak North Carolina State University, Raleigh I. RMG petascale, open-source electronic structure
More informationNRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology
NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology Bronson Messer Director of Science National Center for Computational Sciences & Senior R&D Staff Oak Ridge
More informationHalf- and Full-Duplex FDD Operation in Cellular Multi-Hop Mobile Radio Networks
5 th FFV Workshop Half- and Full-Duplex FDD Operation in Cellular Multi-Hop Mobile Radio Networks Arif Otyakmaz, Rainer Schoenen Department of Communication Networks RWTH Aachen University, Germany FFV
More informationBuilding a Cell Ecosystem. David A. Bader
Building a Cell Ecosystem David A. Bader Acknowledgment of Support National Science Foundation CSR: A Framework for Optimizing Scientific Applications (06-14915) CAREER: High-Performance Algorithms for
More informationEnergy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture
Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture Jingwen Leng Yazhou Zu Vijay Janapa Reddi The University of Texas at Austin {jingwen, yazhou.zu}@utexas.edu,
More informationJoint MT/CSEM Anisotropic Inversion Olympic Dam
Joint MT/CSEM Anisotropic Inversion Olympic Dam T.J. Ritchie* P.A. Rowston* Practical 1 Day Workshop Geophysical Inversion for Mineral Explorers * Geophysical Resources and Services Pty. Ltd. Brisbane
More informationLS-DYNA Performance Enhancement of Fan Blade Off Simulation on Cray XC40
LS-DYNA Performance Enhancement of Fan Blade Off Simulation on Cray XC40 Ting-Ting Zhu, Cray Inc. Jason Wang, LSTC Brian Wainscott, LSTC Abstract This work uses LS-DYNA to enhance the performance of engine
More informationPerformance Metrics, Amdahl s Law
ecture 26 Computer Science 61C Spring 2017 March 20th, 2017 Performance Metrics, Amdahl s Law 1 New-School Machine Structures (It s a bit more complicated!) Software Hardware Parallel Requests Assigned
More informationProc. IEEE Intern. Conf. on Application Specific Array Processors, (Eds. Capello et. al.), IEEE Computer Society Press, 1995, 76-84
Proc. EEE ntern. Conf. on Application Specific Array Processors, (Eds. Capello et. al.), EEE Computer Society Press, 1995, 76-84 Session 2: Architectures 77 toning speed is affected by the huge amount
More informationCOTSon: Infrastructure for system-level simulation
COTSon: Infrastructure for system-level simulation Ayose Falcón, Paolo Faraboschi, Daniel Ortega HP Labs Exascale Computing Lab http://sites.google.com/site/hplabscotson MICRO-41 tutorial November 9, 28
More informationSno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations
Sno Projects List IEEE 1 High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations 2 A Generalized Algorithm And Reconfigurable Architecture For Efficient And Scalable
More informationModule 3 Greedy Strategy
Module 3 Greedy Strategy Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Introduction to Greedy Technique Main
More informationApplication-Specific Node Clustering of IR-UWB Sensor Networks with Two Classes of Nodes
Application-Specific Node Clustering of IR-UWB Sensor Networks with Two Classes of Nodes Daniel Bielefeld 1, Gernot Fabeck 2, Rudolf Mathar 3 Institute for Theoretical Information Technology, RWTH Aachen
More informationCellSpecks: A Software for Automated Detection and Analysis of Calcium
Biophysical Journal, Volume 115 Supplemental Information CellSpecks: A Software for Automated Detection and Analysis of Calcium Channels in Live Cells Syed Islamuddin Shah, Martin Smith, Divya Swaminathan,
More informationComputational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs
5 th International Conference on Logic and Application LAP 2016 Dubrovnik, Croatia, September 19-23, 2016 Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs
More informationThe end of Moore s law and the race for performance
The end of Moore s law and the race for performance Michael Resch (HLRS) September 15, 2016, Basel, Switzerland Roadmap Motivation (HPC@HLRS) Moore s law Options Outlook HPC@HLRS Cray XC40 Hazelhen 185.376
More informationDecentralized Data Detection for Massive MU-MIMO on a Xeon Phi Cluster
Decentralized Data Detection for Massive MU-MIMO on a Xeon Phi Cluster Kaipeng Li 1, Yujun Chen 1, Rishi Sharan 2, Tom Goldstein 3, Joseph R. Cavallaro 1, and Christoph Studer 2 1 Department of Electrical
More informationApplication of Multi-core and GPU Architectures on Signal Processing: Case Studies
Application of Multi-core and GPU Architectures on Signal Processing: Case Studies Alberto Gonzalez 1, José A. Belloch 1, Gema Piñero 1, Jorge Lorente 1, Miguel Ferrer 1, Sandra Roger 1, Carles Roig 1,
More informationAutomatic Domain Decomposition for a Black-Box PDE Solver
Automatic Domain Decomposition for a Black-Box PDE Solver Torsten Adolph and Willi Schönauer Forschungszentrum Karlsruhe Institute for Scientific Computing Karlsruhe, Germany torsten.adolph@iwr.fzk.de
More informationExploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs
Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Michael Gordon, William Thies, and Saman Amarasinghe Massachusetts Institute of Technology ASPLOS October 2006 San Jose,
More informationMonte Carlo integration and event generation on GPU and their application to particle physics
Monte Carlo integration and event generation on GPU and their application to particle physics Junichi Kanzaki (KEK) GPU2016 @ Rome, Italy Sep. 26, 2016 Motivation Increase of amount of LHC data (raw &
More informationGPU ACCELERATED DEEP LEARNING WITH CUDNN
GPU ACCELERATED DEEP LEARNING WITH CUDNN Larry Brown Ph.D. March 2015 AGENDA 1 Introducing cudnn and GPUs 2 Deep Learning Context 3 cudnn V2 4 Using cudnn 2 Introducing cudnn and GPUs 3 HOW GPU ACCELERATION
More informationLecture 20: Combinatorial Search (1997) Steven Skiena. skiena
Lecture 20: Combinatorial Search (1997) Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.sunysb.edu/ skiena Give an O(n lg k)-time algorithm
More informationArchitecting Systems of the Future, page 1
Architecting Systems of the Future featuring Eric Werner interviewed by Suzanne Miller ---------------------------------------------------------------------------------------------Suzanne Miller: Welcome
More informationR and the Message Passing Interface on the Little Fe Cluster
the Little Fe October 3, 2012 O Discussion Topics Overview Little Fe BCCD Parallel Programming MPI R with MPI Results R with CUDA Conclusion O Overview At SuperComputing 2011, the University of Houston
More informationReal-Time Software Receiver Using Massively Parallel
Real-Time Software Receiver Using Massively Parallel Processors for GPS Adaptive Antenna Array Processing Jiwon Seo, David De Lorenzo, Sherman Lo, Per Enge, Stanford University Yu-Hsuan Chen, National
More informationST Tool. A CASE tool for security aware software requirements analysis
ST Tool A CASE tool for security aware software requirements analysis Paolo Giorgini Fabio Massacci John Mylopoulos Nicola Zannone Departement of Information and Communication Technology University of
More informationSpectrum Requirements for 4G Wireless Systems
Spectrum Requirements for 4G Wireless Systems Tim Irnich ComNets, RWTH Aachen University FFV Workshop, 30.3.2007 1 Outline Introduction Radio Spectrum Management Why? The ITU framework for spectrum management
More informationCombining Differential/Integral Methods and Time/Frequency Domain Analysis to Solve Complex Antenna Problems
Combining Differential/Integral Methods and Time/Frequency Domain Analysis to Solve Complex Antenna Problems IEEE Long Island Section MTT-S Jan. 27, 20 Overview of Presentation Antenna design challenges
More informationAlgorithmic-Technique for Compensating Memory Errors in JPEG2000 Standard
Algorithmic-Technique for Compensating Memory Errors in JPEG2000 Standard M. Pradeep Raj 1, E.Dinesh 2 PG Student, Dept of ECE, M. Kumarasamy College of Engineering, Karur, Tamilnadu, India 1 Asst. Professor,
More informationFAST RADIX 2, 3, 4, AND 5 KERNELS FOR FAST FOURIER TRANSFORMATIONS ON COMPUTERS WITH OVERLAPPING MULTIPLY ADD INSTRUCTIONS
SIAM J. SCI. COMPUT. c 1997 Society for Industrial and Applied Mathematics Vol. 18, No. 6, pp. 1605 1611, November 1997 005 FAST RADIX 2, 3, 4, AND 5 KERNELS FOR FAST FOURIER TRANSFORMATIONS ON COMPUTERS
More information2017 by Bilge Acun. All rights reserved.
2017 by Bilge Acun. All rights reserved. MITIGATING VARIABILITY IN HPC SYSTEMS AND APPLICATIONS FOR PERFORMANCE AND POWER EFFICIENCY BY BILGE ACUN DISSERTATION Submitted in partial fulfillment of the requirements
More informationPROGRESSIVE CHANNEL ESTIMATION FOR ULTRA LOW LATENCY MILLIMETER WAVE COMMUNICATIONS
PROGRESSIVECHANNELESTIMATIONFOR ULTRA LOWLATENCYMILLIMETER WAVECOMMUNICATIONS Hung YiCheng,Ching ChunLiao,andAn Yeu(Andy)Wu,Fellow,IEEE Graduate Institute of Electronics Engineering, National Taiwan University
More informationAdvanced Computer Architecture - Baylor University The World s Most Advanced Technology For Solving The Nuix...
Advanced Parallel Processing Technologies 9th International Symposium Appt 2011 Shanghai China September 26 27 2011 Proceedings Lecture Notes In Computer Science ADVANCED PARALLEL PROCESSING TECHNOLOGIES
More informationMACHINE LEARNING Games and Beyond. Calvin Lin, NVIDIA
MACHINE LEARNING Games and Beyond Calvin Lin, NVIDIA THE MACHINE LEARNING ERA IS HERE And it is transforming every industry... including Game Development OVERVIEW NVIDIA Volta: An Architecture for Machine
More information1) Evolução das velocidades de processamento, de acesso a memória e ao disco e das interfaces de rede - Um apanhado histórico até os dias de hoje
2010 1) Evolução das velocidades de processamento, de acesso a memória e ao disco e das interfaces de rede - Um apanhado histórico até os dias de hoje 2) Green Computing - Qual a perspectiva em organização
More informationCourse Overview. Dr. Edmund Lam. Department of Electrical and Electronic Engineering The University of Hong Kong
Course Dr. Edmund Lam Department of Electrical and Electronic Engineering The University of Hong Kong ELEC8601: Advanced Topics in Image Processing (Second Semester, 2013 14) http://www.eee.hku.hk/ work8601
More informationParallelized Benchmark-Driven Performance Evaluation of SMPs and Tiled Multi-Core Architectures for Embedded Systems
Parallelized Benchmark-Driven Performance Evaluation of SMPs and Tiled Multi-Core Architectures for Embedded Systems Arslan Munir Department of Electrical and Computer Engineering Rice University, Houston,
More informationEESI Presentation at IESP
Presentation at IESP San Francisco, April 6, 2011 WG 3.1 : Applications in Energy & Transportation Chair: Philippe RICOUX (TOTAL) Vice-Chair: Jean-Claude ANDRE (CERFACS) 1 WG3.1 Scientific and Technical
More informationKnow your Algorithm! Architectural Trade-offs in the Implementation of a Viterbi Decoder. Matthias Kamuf,
Know your Algorithm! Architectural Trade-offs in the Implementation of a Viterbi Decoder Matthias Kamuf, 2009-12-08 Agenda Quick primer on communication and coding The Viterbi algorithm Observations to
More informationExtreme Scale Computational Science Challenges in Fusion Energy Research
Extreme Scale Computational Science Challenges in Fusion Energy Research William M. Tang Princeton University, Plasma Physics Laboratory Princeton, NJ USA International Advanced Research 2012 Workshop
More informationThreading libraries performance when applied to image acquisition and processing in a forensic application
Threading libraries performance when applied to image acquisition and processing in a forensic application Carlos Bermúdez MSc. in Photonics, Universitat Politècnica de Catalunya, Barcelona, Spain Student
More informationProgramming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Labs CDT 102
Programming and Optimization with Intel Xeon Phi Coprocessors Colfax Developer Training One-day Labs CDT 102 Abstract: Colfax Developer Training (CDT) is an in-depth intensive course on efficient parallel
More informationIMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU
IMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU Seunghak Lee (HY-SDR Research Center, Hanyang Univ., Seoul, South Korea; invincible@dsplab.hanyang.ac.kr); Chiyoung Ahn (HY-SDR
More informationArchitectural and Technology Influence on the Optimal Total Power Consumption
Architectural and Technology Influence on the Optimal Total Power Consumption Schuster Christian 1, Nagel Jean-Luc 1, Piguet Christian, Farine Pierre-André 1 1 IMT, University of Neuchâtel, Switzerland
More informationWavelet-based image compression
Institut Mines-Telecom Wavelet-based image compression Marco Cagnazzo Multimedia Compression Outline Introduction Discrete wavelet transform and multiresolution analysis Filter banks and DWT Multiresolution
More informationParallel Computing in the Multicore Era
Parallel Computing in the Multicore Era Mikel Lujan & Graham Riley 21 st September 2016 Combining the strengths of UMIST and The Victoria University of Manchester MSc in Advanced Computer Science Theme
More informationCenter for Hybrid Multicore Productivity Research (CHMPR)
A CISE-funded Center University of Maryland, Baltimore County, Milton Halem, Director, 410.455.3140, halem@umbc.edu University of California San Diego, Sheldon Brown, Site Director, 858.534.2423, sgbrown@ucsd.edu
More informationB(, ) + + / = B(, ) B( +, ) B(, ) B( +, ) B( + +, ) B( +, ) B( +, ) B( +, ) B( +, ) = --xoptflags="-g -xmic-avx512 -O3 -mp2opt_hpo_vec_remainder=f" --with-memalign=64 = = ( + + [ + + + + ] ) + + σ +
More informationParallel Computing in the Multicore Era
Parallel Computing in the Multicore Era Prof. John Gurd 18 th September 2014 Combining the strengths of UMIST and The Victoria University of Manchester MSc in Advanced Computer Science Theme on Routine
More informationIMPULSIVE NOISE MITIGATION IN OFDM SYSTEMS USING SPARSE BAYESIAN LEARNING
IMPULSIVE NOISE MITIGATION IN OFDM SYSTEMS USING SPARSE BAYESIAN LEARNING Jing Lin, Marcel Nassar and Brian L. Evans Department of Electrical and Computer Engineering The University of Texas at Austin
More informationDisclaimer. Primer. Agenda. previous work at the EIT Department, activities at Ericsson
Disclaimer Know your Algorithm! Architectural Trade-offs in the Implementation of a Viterbi Decoder This presentation is based on my previous work at the EIT Department, and is not connected to current
More information