22nd December Dear Sir/Madam:

Size: px
Start display at page:

Download "22nd December Dear Sir/Madam:"

Transcription

1 Jose Renau Siebel Center for Computer Science Homepage N. Goodwin Phone (217) (mobile) Urbana, IL (217) (work) 22nd December 2003 Dear Sir/Madam: Please find enclosed my application for a position of tenure-track Assistant Professor at your department. I will finish my Ph.D. in computer science at the University of Illinois Urbana-Champaign this coming summer, and would be available to start working in Fall My research area is Computer Architecture. I have been working under the supervision of Professor Josep Torrellas. I have a broad expertise in computer architecture, which includes chip and processor architecture, multiprocessor systems architecture, low power design, Thread Level Speculation (TLS), and processor-inmemory systems. I have also worked on compilation support for new architectures and Linux kernel development. Finally, I have developed substantial software systems, including a simulator for computer architectures and a TLS compiler. Please find enclosed my resume, statement of research and teaching, and samples of my publications. All these documents are also available at my website ( I am very excited at the opportunity of contributing to your department. I am sure that I can enhance its visibility and reputation with my work. I am looking forward to your positive request for an interview. Yours sincerely, Jose Renau

2 Personal Information Jose Renau Citizenship Spain Siebel Center for Computer Science 201 N. Goodwin Homepage Urbana, IL Phone (217) (mobile) (217) (work) Research Interests Computer architecture, chip multiprocessors, energy/performance trade-offs, thread level speculation, interaction between architecture and compilers, Linux kernel. Education University of Illinois at Urbana Champaign: (Advisor: Professor Josep Torrellas) 2004 (expected) Ph.D. Computer Science Thesis: Chip Multiprocessor with Thread Level Speculation: Performance and Energy The thesis challenges, for the first time, the commonly-held view that Thread Level Speculation (TLS) consumes excessive energy. It also proposes novel micro-architectural mechanisms to support out-oforder task spawning in Chip Multiprocessors (CMP) with TLS. The experimental work included the development of a full TLS compiler M.S. Computer Science Thesis: Memory Hierarchies in Intelligent Memories: Energy/Performance Design The thesis describes the FlexRAM architecture, focusing on energy, performance, and complexity issues. FlexRAM is a processor-in-memory architecture. Ramon Llull University, Spain: 1997 M.S. Computer Science Thesis: Linux Kernel IEEE1284 Implementation The thesis consisted of building TCP/IP over IEEE1284 and SCSI in Linux. The implementation also included drivers B.S. Computer Science Final project: ILZR, a New Data Compression Algorithm Awards IBM Graduate Research Fellowship ( ) J. Poppelbaum Memorial Award, University of Illinois (2003). Given to one graduate student every year for academic merit and creativity in computer architecture Page 1 of 7

3 Publications Conferences and Journals [1] Speculative Multithreading Does Not (Necessarily) Waste Energy, Jose Renau, Smruti Sarangi, James Tuck, Karin Strauss, Luis Ceze, Wei Liu, and Josep Torrellas, Submitted to International Symposium on Computer Architecture (ISCA), November [2] TLS Chip Multiprocessors: Micro-Architectural Mechanisms for Tasking with Out-of-Order Spawn, Jose Renau, James Tuck, Wei Liu, Luis Ceze, Karin Strauss, and Josep Torrellas, Submitted to International Symposium on Computer Architecture (ISCA), November [3] Managing Multiple Low-Power Adaptation Techniques: The Positional Approach, Michael Huang, Jose Renau, Josep Torrellas, Sidebar, IEEE Computer Magazine, December [4] Programming the FlexRAM Parallel Intelligent Memory System, Basilio Fraguela, Jose Renau, Paul Feautrier, David Padua, and Josep Torrellas, International Symposium on Principles and Practice of Parallel Programming (PPoPP), June [5] Positional Adaptation of Processors: Application to Energy Reduction, Michael Huang, Jose Renau, and Josep Torrellas, International Symposium on Computer Architecture (ISCA), June [6] Cherry: Checkpointed Early Resource Recycling in Out-of-order Microprocessors, José F. Martínez, Jose Renau, Michael Huang, Milos Prvulovic, and Josep Torrellas, International Symposium on Microarchitecture (MICRO), November [7] Energy-Efficient Hybrid Wakeup Logic, Michael Huang, Jose Renau, and Josep Torrellas, International Symposium on Low Power Electronics and Design (ISLPED), August [8] A Framework for Dynamic Energy Efficiency and Temperature Management, Wei Huang, Jose Renau, and Josep Torrellas, Journal on Instruction Level Parallelism (JILP), October [9] Cache Decomposition for Energy-Efficient Processors, Michael Huang, Jose Renau, Seung-Moon Yoo, and Josep Torrellas, International Symposium on Low Power Electronics and Design (ISLPED), August [10] A Framework for Dynamic Energy Efficiency and Temperature Management, Wei Huang, Jose Renau, Seung-Moon Yoo, and Josep Torrellas, International Symposium on Microarchitecture (MICRO), December Workshops [11] Profile-Based Energy Reduction for High Performance, Wei Huang, Jose Renau, and Josep Torrellas, ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO), December [12] Energy/Performance Design of Memory Hierarchies for Processor-In-Memory Chips, Wei Huang, Jose Renau, Seung-Moon Yoo, and Josep Torrellas, Workshop on Intelligent Memory Systems, November It also appeared in Lecture Notes in Computer Science (Vol. 2107) by Springer-Verlag, [13] Memory Hierarchies in Intelligent Memories: Energy/Performance Design, Wei Huang, Jose Renau, Seung-Moon Yoo, and Josep Torrellas, Ninth Workshop on Scalable Shared Memory Multiprocessors, June Technical Reports and Theses [14] CFlex: A Programming Language for the FlexRAM Intelligent Memory Architecture, Basilio Fraguela, Jose Renau, Paul Feautrier, David Padua, and Josep Torrellas, Technical Report UIUCDCS-R , July [15] FlexRAM Architecture Design Parameters, Seung-Moon Yoo, Jose Renau, Wei Huang, and Josep Torrellas, Technical Report 1584, October [16] Memory Hierarchies in Intelligent Memories: Energy/Performance Design, Jose Renau, M.S. Thesis, University of Illinois, December [17] Linux Kernel IEEE1284 Implementation, Jose Renau, M.S. Thesis, Ramon Llull University, June Page 2 of 7

4 Talks As Presenter at Conferences/Workshops Cherry: Checkpointed Early Resource Recycling in Out-of-order Microprocessors, International Symposium on Microarchitecture (MICRO), November Cache Decomposition for Energy-Efficient Processors, International Symposium on Low Power Electronics and Design (ISLPED), August Memory Hierarchies in Intelligent Memories: Energy/Performance Design, The Ninth Workshop on Scalable Shared Memory Multiprocessors, June As Invited Speaker Architectural Support for Hierarchical Thread-Level Speculation, IBM T.J.Watson Research Center, New York, August As Presenter in DARPA PI Meeting Morphable Multithreaded Memory Tiles (M3T) Architecture, IBM T.J.Watson Research Center, New York, April Software Created Designed and implemented a new simulator of computer architectures (Sesc). It is used by several research groups at the University of Illinois, University of Rochester, North Carolina State University, Georgia Institute of Technology, and Cornell University. It models a variety of architectures, including dynamic superscalar processors, CMPs, processor-in-memory, and TLS architectures. Created a fully automatic TLS compiler pass using GCC. It generates tasks with software value prediction. This is the compiler used to evaluate the architecture proposed in my Ph.D. thesis. Made some extensions to CACTI, a widely used cache power model. The extensions have been used at the University of Illinois, University of Rochester, North Carolina State University, U.C. Davis, U.C. Irvine, U.C. Riverside, and University of Arizona. Contributed to official Shared Memory Multiprocessors (SMP) Linux patches to support SMP boards. These patches are included in all the Linux kernel distributions since Co-developed the IEEE 1284 (parallel port) in Linux. This implementation is included in all Linux kernels since Developed official GCC patches, which are included in the main distribution (2002). Developed TCP/IP over SCSI boards, which involved several modifications to the Linux kernel to support a high-performance interconnection system between Linux machines. Invented a new data compression algorithm (ILZR), a variant of Lempel Zib Ross William, distributed as public domain for Amiga Computers in Aminet-CD (1993). Developed the superscalar simulation infrastructure used by the Architecture Group at the Computer Science Department of Ramon Llull University ( ). Teaching Experience Substitute teacher for some senior- and graduate-level computer architecture classes at the University of Illinois (2002, 2003). Tutoring graduate students at the University of Illinois (2002, 2003). Created and taught a course for system administrators at Ramon Llull University, Spain. The course was 4 hours a week for 10 weeks (1997). Page 3 of 7

5 Professional Experience Jan 1999-Aug 2003 Research Assistant. University of Illinois at Urbana-Champaign. Aug 1998-Dec 1998 System Administrator. University of Illinois at Urbana-Champaign. Worked for the Computing and Communications Services Office. Jan 1998-Jul 1998 Computer Network Specialist. FIHOCA, S.A. (Spain). Sep 1996-Sep 1997 System Administrator. Asertel, S.A. (Spain). In charge of the computer infrastructure. Specialized in network security. May 1995-Sep 1996 Systems Manager. Ramon Lull University (Spain). In charge of the administration of the UNIX machines, PCs, and the network of the University. Profesional Activities and Memberships Reviewer of papers for conferences and journals in computer architecture (ISCA, MICRO, HPCA, ICS, CAL, and IPDPS). ACM member since References Josep Torrellas (advisor) Professor & Willet Faculty Scholar Department of Computer Science University of Illinois at Urbana-Champaign Siebel Center for Computer Science 201 North Goodwin 201 North Goodwin Urbana, IL Urbana, IL (217) (217) David Padua Professor Department of Computer Science University of Illinois at Urbana-Champaign Siebel Center for Computer Science 201 North Goodwin 201 North Goodwin Urbana, IL Urbana, IL (217) (217) Wen-Mei Hwu Franklin W. Woeltge Professor Department of Electrical and Computer Enginering University of Illinois at Urbana-Champaign 215 Coordinated Science Laboratory 1308 West Main Urbana, IL (217) Mark Snir Faiman/Muroga Professor Head, Department of Computer Science University of Illinois at Urbana-Champaign Siebel Center for Computer Science Sarita Adve Associate Professor Department of Computer Science University of Illinois at Urbana-Champaign Siebel Center for Computer Science Page 4 of 7

6 Appendix: Abstracts of my Conference Papers [1] Speculative Multithreading Does Not (Necessarily) Waste Energy (Submitted to ISCA 2004) While Chip Multiprocessors (CMP) with Speculative Multithreading (SM) have been gaining momentum, experienced processor designers in industry have reservations about their practical implementation. In particular, it is felt that SM is too energy-inefficient to compete against conventional superscalars. This paper challenges the commonly-held view that SM consumes excessive energy. We show a CMP with SM support that is not only faster but also more energy efficient than a state-of-the-art wide-issue superscalar. We demonstrate it with a new energy-efficient CMP micro-architecture. In addition, we identify the additional sources of energy consumption in SM, and propose energy-centric optimizations that mitigate them. Experiments with the SpecInt 2000 codes show that a CMP with 2 4-issue cores and support for SM delivers a speedup of 1.08 over a 8-issue superscalar and consumes only 54% of its power. Alternatively, for the same average power in both chips, the SM CMP is 1.6 times faster than the superscalar on average. [2] TLS Chip Multiprocessors: Micro-Architectural Mechanisms for Tasking with Out-of-Order Spawn (Submitted to ISCA 2004) Chip Multiprocessors (CMP) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must exploit multiple sources of speculative task-level parallelism, including any nesting levels of both subroutines and loop iterations. Unfortunately, these environments are hard to support in decentralized CMP hardware: since tasks are spawned out-of-order and unpredictably, maintaining key TLS basics such as task ordering and efficient resource allocation is challenging. This paper is the first one to propose micro-architectural mechanisms that, taken together, fundamentally enable fast TLS with out-of-order spawn in a CMP. These simple mechanisms are: Splitting Timestamp Intervals, the Immediate Successor List, and Dynamic Task Merging. To evaluate them, we develop a TLS compiler with out-of-order spawn. With our mechanisms, a TLS CMP with 2 4-issue processors increases the average speedup of full SpecInt 2000 applications from 1.15 (no out-of-order spawn) to 1.25 (with out-of-order spawn). Moreover, the resulting CMP outperforms a very aggressive 8-issue superscalar. Specifically, with the same clock frequency, the CMP delivers an average speedup of 1.14 over the 8-issue processor. [4] Programming the FlexRAM Parallel Intelligent Memory System (PPoPP 2003) In an intelligent memory architecture, the main memory of a computer is enhanced with many simple processors. The result is a highly-parallel, heterogeneous machine that is able to exploit computation in the main memory. While several instantiations of this architecture have been proposed, the question of how to effectively program them with little effort has remained a major challenge. In this paper, we show how to effectively hand-program an intelligent memory architecture at a high level and with very modest effort. We use FlexRAM as a prototype architecture. To program it, we propose a family of high-level compiler directives inspired by OpenMP called CFlex. Such directives enable the processors in memory to execute the program in cooperation with the main processor. In addition, we propose libraries of highly-optimized functions called Intelligent Memory Operations (IMOs). These functions program the processors in memory through CFlex, but make them completely transparent to the programmer. Simulation results show that, with CFlex and IMOs, a server with 64 simple processors in memory runs on average 10 times faster than a conventional server. Moreover, a set of conventional programs with 240 lines on average are transformed into CFlex parallel form with only 7 CFlex directives and 2 additional statements on average [5] Positional Adaptation of Processors: Application to Energy Reduction (ISCA 2003) Although adaptive processors can exploit application variability to improve performance or save energy, effectively managing their adaptivity is challenging. To address this problem, we introduce a new approach to adaptivity: the Positional approach. In this approach, both the testing of configurations and the application of the chosen configurations are associated with particular code sections. This is in contrast to the currently-used Temporal approach to adaptation, where both the testing and application of configurations are tied to successive intervals in time. Page 5 of 7

7 We propose to use subroutines as the granularity of code sections in positional adaptation. Moreover, we design three implementations of subroutine-based positional adaptation that target energy reduction in three different workload environments: embedded or specialized server, general purpose, and highly dynamic. All three implementations of positional adaptation are much more effective than temporal schemes. On average, they boost the energy savings of applications by 50% and 84% over temporal schemes in two experiments. [6] Cherry: Checkpointed Early Resource Recycling in Out-of-order Microprocessors (MICRO 2002) This paper presents CHeckpointed Early Resource RecYcling (Cherry), a hybrid mode of execution based on ROB and checkpointing that decouples resource recycling and instruction retirement. Resources are recycled early, resulting in a more efficient utilization. Cherry relies on state checkpointing and rollback to service exceptions for instructions whose resources have been recycled. Cherry leverages the ROB to (1) not require in-order execution as a fallback mechanism, (2) allow memory replay traps and branch mispredictions without rolling back to the Cherry checkpoint, and (3) quickly fall back to conventional out-of-order execution without rolling back to the checkpoint or flushing the pipeline. We present a Cherry implementation with early recycling at three different points of the execution engine: the load queue, the store queue, and the register file. We report average speedups of 1.06 and 1.26 in SPECint and SPECfp applications, respectively, relative to an aggressive conventional architecture. We also describe how Cherry and speculative multithreading can be combined and complement each other. [7] Energy-Efficient Hybrid Wakeup Logic (ISLPED 2002) The instruction window is a critical component and a major energy consumer in out-of-order superscalar processors. An important source of energy consumption in the instruction window is the instruction wakeup: a completing instruction broadcasts its result register tag and an associative comparison is performed with all the entries in the window. This paper shows that a very large fraction of the completing instructions have to wake up no more than a single instruction currently in the window. Consequently, we propose to save energy by using indexing to only enable the comparator at the single instruction to wake up. Only in the rare case when more than one instruction needs to wake up, our scheme reverts to enabling all the comparators or a subset of them. For this reason, we call our scheme Hybrid. Overall, our scheme is very effective: for a processor with a 96-entry window, the number of comparisons performed by the average completing instruction is reduced to 1.1. The exact magnitude of the energy savings will depend on the specific instruction window implementation. Furthermore, in the Hybrid schemes, the application suffers no performance penalty. [9] Cache Decomposition for Energy-Efficient Processors (ISLPED 2001) The L1 data cache is a time-critical module and, at the same time, a major source of energy consumption. To reduce its energy-delay product, we apply two principles of low power design: specialize part of the cache structure and break down the cache into smaller caches. To this end, we propose a L1 cache that combines new designs of a stack cache and a PSA cache. Individually, our stack and PSA cache designs have a lower energy-delay product than previously proposed designs. In addition, their combined operation is very effective. Relative to a conventional 2-way 32KB data cache, our design containing a 4-way 32KB PSA cache and a 512B stack cache reduces the energy-delay product of several applications by an average of 44%. [10] A Framework for Dynamic Energy Efficiency and Temperature Management (MICRO 2000) While technology is delivering increasingly sophisticated and powerful chip designs, it is also imposing alarmingly high energy requirements on the chips. One way to address this problem is to manage the energy dynamically. Unfortunately, current dynamic schemes for energy management are relatively limited. In addition, they manage energy either for energy efficiency or for temperature control, but not for both simultaneously. In this paper, we design and evaluate for the first time an energy-management framework that tackles both energy efficiency and temperature control in a unified manner. We call this general approach Dynamic Energy Efficiency and Temperature Management (DEETM). Our framework combines many energy-management techniques Page 6 of 7

8 and can activate them individually or in groups in a fine-grained manner according to a given policy. The goal of the framework is two-fold: maximize energy savings without extending application execution time beyond a given tolerable limit, and guarantee that the temperature remains below a given limit while minimizing any resulting slowdown. The framework successfully meets these goals. For example, it delivers a 40% energy reduction with only a 10% application slowdown. Page 7 of 7

9 Jose Renau Research Statement Research Interests I am a computer architect with broad interdisciplinary research interests and experience. I have made contributions to chip-level architectures for Thread Level Speculation (TLS) [1,2], superscalar processor microarchitecture [6], low-power architectures [2,7,9] and adaptive processors [3,5,8,10,11], programmability, energy, and performance of processor-in-memory architectures [4,12,13,14,15,16], and compilation support for emerging architectures [1,2,4]. I feel that interdisciplinary research is required to push the envelope in computer architecture. Past and Present Research Thread Level Speculation Most of my research has been on performance and energy trade-offs in chip-level architectures. My thesis focuses on improving the performance and minimizing the energy consumption of TLS architectures [1,2]. The thesis challenges for the first time the commonly-held view that TLS consumes excessive energy. This is an important issue because energy and power are arguably the main design constraints in current processors. My thesis describes the architecture of a Chip Multiprocessor (CMP) with TLS support that is both faster and more energy efficient than a state-of-the-art wide-issue superscalar processor. Additionally, it identifies the sources of energy waste in TLS and proposes novel energy-centric optimizations. My thesis is also the first one to propose detailed microarchitectural mechanisms to enable speculative tasking with out-of-order task spawn in a TLS CMP. Out-of-order task spawn unlocks higher performance. To evaluate my proposals, I built a detailed simulator and a novel TLS compiler on top of GCC. The compiler generates energy-efficient tasks with out-of-order spawning. Experiments with SPECint codes show that a TLS CMP with 2 narrow-issue cores delivers significant speedups over a wider-issue superscalar, while consuming a fraction of its average power. Therefore, I claim that TLS CMPs are highly-promising platforms for next-generation processors. Processor Checkpointing As part of my TLS work, I reused TLS s support for program state checkpointing and rollback recovery to improve superscalar pipeline design [6]. Current superscalar pipelines are sub-optimal in that instructions retain their resources (registers or load/store queue entries) well past their completion until they retire. In our proposal, called Cherry, we decouple resource recycling and instruction retirement. Registers and load/store queue entries are recycled before instruction retirement, boosting pipeline utilization and, as a result, processor performance. Cherry relies on TLS s support for register and cache state checkpointing and rollback to service exceptions for instructions whose resources have been recycled. The resulting higher resource utilization enabled by Cherry leads to substantial speedups for SPECint and SPECfp codes. Overall, this work is significant in that it proposes enhancing the performance of processors through aggressive resource recycling rather than through adding more resources, which is not scalable. Low-Power Adaptive Processors Before working on TLS and Cherry, I worked on low-power adaptive (or reconfigurable) processor architectures. Run-time adaptation of hardware structures such as caches or pipelines is a promising approach to partially solve the problem of high energy and power requirements in current processors. We proposed a hardware and software algorithm that controls energy consumption and temperature in a unified manner [8,10]. We also designed a novel approach to processor adaptation called Positional adaptation [3,5,11]. In positional adaptation, a processor remembers the best configuration when it executes a code section; it then uses the same configuration when the code section is invoked again. This is in contrast to the conventional adaptation schemes. In such schemes, which we call temporal, the configuration is chosen based on the behavior of the code section immediately preceding the current one. In our work, we show that positional adaptation is more effective than temporal at saving processor energy with little performance impact. In addition to this work, I have also proposed new energy-efficient cache [9] and instruction window [7] organizations. Processor-in-Memory Architectures I have also worked on the FlexRAM project, which proposes a new processor-in-memory architecture. A FlexRAM chip includes up to 64 simple processors and 64 Mbytes of DRAM. Several such chips can be placed in the memory system of a workstation, resulting in a very versatile computing platform. For example, Page 1 of 2

10 Jose Renau Research Statement highly-parallel, or memory-intensive tasks can be off-loaded to the memory processors, which can execute in parallel with the main processor. The original FlexRAM architecture did not focus on energy or complexity issues. For my M.S. thesis, I redesigned the FlexRAM chip (on paper) to make it energy-efficient [12,13,15,16]. The resulting architecture has energy and performance advantages over conventional workstations. However, it is quite complex to program. To mitigate this problem, we proposed Open MP-based extensions to a high-level language to help program FlexRAM. These extensions, called C-Flex [4,14], substantially enhance the programmability of FlexRAM and similar processor-in-memory architectures. Tool Development During my Ph.D., I have developed a large number of software tools that my colleagues and I have used for research. Specifically, I have designed and implemented a simulator of computer architectures (Sesc). It is used by several research groups at the Univ. of Illinois, Univ. of Rochester, North Carolina State Univ., Cornell Univ., and Georgia Institute of Technology. It models a variety of architectures, including dynamic superscalar processors, CMPs, processor-in-memory, and TLS architectures. To evaluate the TLS architecture proposed in my thesis, I built together with three other graduate students a TLS compiler pass using GCC. The pass automatically generates tasks with software value prediction. In addition, to improve the task selection quality, we built a profiler pass. I made some extensions to CACTI, a widely-used tool that models power consumption in caches. The extensions have been used at the Univ. of Illinois, Univ. of Rochester, North Carolina State Univ, U.C. Davis, U.C. Irvine, U.C. Riverside, and Univ. of Arizona. Finally, I also made several open source contributions. I co-developed the IEEE 1284 and some multiprocessor patches for Linux. They are included in all Linux kernels since I also contributed with some official patches for GCC in Future Research In the short-to-medium term, I plan to keep investigating TLS architectures. I think that TLS has great potential for future processors. However, there are many issues that need to be solved before we can see commercial processors supporting TLS. I have observed that processor designers in companies such as IBM and Intel have reservations about TLS. They are especially concerned about power consumption and design complexity. I plan to focus on making TLS a viable alternative. I will systematically address all the open questions and problems in TLS, starting with the impact of TLS on the chip temperature. This research is challenging because it requires interdisciplinary expertise in energy, performance, and compilation support. I plan to contribute with novel ideas to improve the performance and complexity trade-offs in Chip Multiprocessors (CMPs) with out-of-order superscalars. I believe that Cherry-style checkpointing in modern pipelines is a promising approach to boost performance while limiting complexity. For these architectures, I also want to make fundamental advances in energy, power, and temperature issues, which I consider the true constraints in future CMP designs. Current microprocessors and multiprocessor systems are very complex. In computer architecture, more complexity implies harder-to-test designs and longer time to market. I believe that microarchitectural proposals to reduce design complexity will be the next big thing in computer architecture. Therefore, in the next 5 years, I would like to open new research areas in complexity management. I plan to make contributions to simplify the design of hardware and software. Finally, I have worked on many areas because I truly enjoy working in groups. Group work is a very gratifying experience, and I want to keep doing it as I build my research team of graduate students. I plan to build an interdisciplinary research team, with graduate students performing research on microarchitecture, multiprocessor systems architecture, energy and temperature issues, compilers, and performance evaluation. Page 2 of 2

11 Jose Renau Teaching Statement Teaching Statement I consider teaching one of the most effective ways to make the world better. Teachers had a strong influence in my life, second only to my family. While research can affect a large group of people, I feel that teaching has a much larger effect on a small group of people. In my life, some professors have had a bigger impact on me than any paper that I have ever read. I would like my teaching to have this kind of impact. As a senior student in a large research group at the University of Illinois, I have had the pleasure of coordinating the research of several younger Ph.D. and M.S. students in the group. I have always liked helping new members by suggesting lists of papers to read and research problems to examine. In addition, I have coordinated the work of several group members on our research tool infrastructure. While working in my M.S. degree in Spain, I instructed a group of 20 system administrators. I prepared and taught a course on networking and security that lasted a couple of months. Moreover, at the University of Illinois, I have been a substitute instructor several times. When my advisor has been out of town, I have volunteered to teach his classes. This has given me the opportunity to interact with students on multiple occasions. When I teach, I like to balance an abstract global view against real industrial examples. I like to impart a solid understanding with some additional insights that would be difficult to find in a book. I extract most of these insights from conferences or recent news. Given my background in computer science, I am comfortable teaching any computer architecture class at the graduate and undergraduate level. I would like to teach the following subjects: single processor architectures, multiprocessor architectures, energy and performance issues, and emerging architectural approaches like processors-in-memory and thread level speculation. At the senior undergraduate level, I can also teach compiler and operating systems courses. At the undergraduate level, I can teach VLSI and networking courses. Aside from already established courses, I would like to create new interdisciplinary courses. I feel that the emerging research topics in computer architecture are at the intersection between multiple areas. I would like to teach courses analyzing the interaction between architectures and compilers, and the interaction between performance and energy optimizations. University professors are particularly fortunate in that they interact with groups of smart graduate students. I fondly remember becoming interested in computer architecture by participating in small reading groups. As a professor, I would like to create reading groups where junior students can discover their interests in computer architecture. Page 1 of 1

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the High Performance Computing Systems and Scalable Networks for Information Technology Joint White Paper from the Department of Computer Science and the Department of Electrical and Computer Engineering With

More information

Instruction Scheduling for Low Power Dissipation in High Performance Microprocessors

Instruction Scheduling for Low Power Dissipation in High Performance Microprocessors Instruction Scheduling for Low Power Dissipation in High Performance Microprocessors Abstract Mark C. Toburen Thomas M. Conte Department of Electrical and Computer Engineering North Carolina State University

More information

RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM

RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, Shaojun Wei Institute of Microelectronics Tsinghua University The 45th International

More information

Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems

Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems Eric Rotenberg Center for Embedded Systems Research (CESR) Department of Electrical & Computer Engineering North

More information

Research Statement. Sorin Cotofana

Research Statement. Sorin Cotofana Research Statement Sorin Cotofana Over the years I ve been involved in computer engineering topics varying from computer aided design to computer architecture, logic design, and implementation. In the

More information

Final Report: DBmbench

Final Report: DBmbench 18-741 Final Report: DBmbench Yan Ke (yke@cs.cmu.edu) Justin Weisz (jweisz@cs.cmu.edu) Dec. 8, 2006 1 Introduction Conventional database benchmarks, such as the TPC-C and TPC-H, are extremely computationally

More information

Low Power and High Performance Level-up Shifters for Mobile Devices with Multi-V DD

Low Power and High Performance Level-up Shifters for Mobile Devices with Multi-V DD JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.5, OCTOBER, 2017 ISSN(Print) 1598-1657 https://doi.org/10.5573/jsts.2017.17.5.577 ISSN(Online) 2233-4866 Low and High Performance Level-up Shifters

More information

Project 5: Optimizer Jason Ansel

Project 5: Optimizer Jason Ansel Project 5: Optimizer Jason Ansel Overview Project guidelines Benchmarking Library OoO CPUs Project Guidelines Use optimizations from lectures as your arsenal If you decide to implement one, look at Whale

More information

Efficiently Exploiting Memory Level Parallelism on Asymmetric Coupled Cores in the Dark Silicon Era

Efficiently Exploiting Memory Level Parallelism on Asymmetric Coupled Cores in the Dark Silicon Era 28 Efficiently Exploiting Memory Level Parallelism on Asymmetric Coupled Cores in the Dark Silicon Era GEORGE PATSILARAS, NIKET K. CHOUDHARY, and JAMES TUCK, North Carolina State University Extracting

More information

SCALCORE: DESIGNING A CORE

SCALCORE: DESIGNING A CORE SCALCORE: DESIGNING A CORE FOR VOLTAGE SCALABILITY Bhargava Gopireddy, Choungki Song, Josep Torrellas, Nam Sung Kim, Aditya Agrawal, Asit Mishra University of Illinois, University of Wisconsin, Nvidia,

More information

Kosuke Imamura, Assistant Professor, Department of Computer Science, Eastern Washington University

Kosuke Imamura, Assistant Professor, Department of Computer Science, Eastern Washington University CURRICULUM VITAE Kosuke Imamura, Assistant Professor, Department of Computer Science, Eastern Washington University EDUCATION: PhD Computer Science, University of Idaho, December

More information

4202 E. Fowler Ave., ENB118, Tampa, Florida kose

4202 E. Fowler Ave., ENB118, Tampa, Florida kose Department of Electrical Engineering, 813.974.6636 (phone), kose@usf.edu 4202 E. Fowler Ave., ENB118, Tampa, Florida 33620 http://www.eng.usf.edu/ kose Research Interests Research interests: On-chip voltage

More information

shangupt 2260 Hayward St. #4861, Ann Arbor, MI 48105, Ph:

shangupt 2260 Hayward St. #4861, Ann Arbor, MI 48105, Ph: Shantanu Gupta www.eecs.umich.edu/ shangupt 2260 Hayward St. #4861, Ann Arbor, MI 48105, Ph: 734-276-3331 shangupt@umich.edu RESEARCH INTERESTS Architecture and Compiler level solutions for Fault Tolerance

More information

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Ramon Canal NCD Master MIRI. NCD Master MIRI 1 Wattch, Hotspot, Hotleakage, McPAT http://www.eecs.harvard.edu/~dbrooks/wattch-form.html http://lava.cs.virginia.edu/hotspot http://lava.cs.virginia.edu/hotleakage http://www.hpl.hp.com/research/mcpat/

More information

Parallelism Across the Curriculum

Parallelism Across the Curriculum Parallelism Across the Curriculum John E. Howland Department of Computer Science Trinity University One Trinity Place San Antonio, Texas 78212-7200 Voice: (210) 999-7364 Fax: (210) 999-7477 E-mail: jhowland@trinity.edu

More information

SATSim: A Superscalar Architecture Trace Simulator Using Interactive Animation

SATSim: A Superscalar Architecture Trace Simulator Using Interactive Animation SATSim: A Superscalar Architecture Trace Simulator Using Interactive Animation Mark Wolff Linda Wills School of Electrical and Computer Engineering Georgia Institute of Technology {wolff,linda.wills}@ece.gatech.edu

More information

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing Radu Teodorescu, Jun Nakano, Abhishek Tiwari and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of

More information

PhD Student Mentoring Committee Department of Electrical and Computer Engineering Rutgers, The State University of New Jersey

PhD Student Mentoring Committee Department of Electrical and Computer Engineering Rutgers, The State University of New Jersey PhD Student Mentoring Committee Department of Electrical and Computer Engineering Rutgers, The State University of New Jersey Some Mentoring Advice for PhD Students In completing a PhD program, your most

More information

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing *

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing * Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing * Radu Teodorescu, Jun Nakano, Abhishek Tiwari and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

Instruction-Driven Clock Scheduling with Glitch Mitigation

Instruction-Driven Clock Scheduling with Glitch Mitigation Instruction-Driven Clock Scheduling with Glitch Mitigation ABSTRACT Gu-Yeon Wei, David Brooks, Ali Durlov Khan and Xiaoyao Liang School of Engineering and Applied Sciences, Harvard University Oxford St.,

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Out-of-Order Schedulers Data-Capture Scheduler Dispatch: read available operands from ARF/ROB, store in scheduler Commit: Missing operands filled in from bypass Issue: When

More information

The Intel Science and Technology Center for Pervasive Computing

The Intel Science and Technology Center for Pervasive Computing The Intel Science and Technology Center for Pervasive Computing Investing in New Levels of Academic Collaboration Rajiv Mathur, Program Director ISTC-PC Anthony LaMarca, Intel Principal Investigator Professor

More information

Educational Experiment on Generative Tool Development in Architecture

Educational Experiment on Generative Tool Development in Architecture Educational Experiment on Generative Tool Development in Architecture PatGen: Islamic Star Pattern Generator Birgül Çolakoğlu 1, Tuğrul Yazar 2, Serkan Uysal 3 1,2-3 Yildiz Technical University, Computational

More information

EE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004

EE 382C EMBEDDED SOFTWARE SYSTEMS. Literature Survey Report. Characterization of Embedded Workloads. Ajay Joshi. March 30, 2004 EE 382C EMBEDDED SOFTWARE SYSTEMS Literature Survey Report Characterization of Embedded Workloads Ajay Joshi March 30, 2004 ABSTRACT Security applications are a class of emerging workloads that will play

More information

1 Educational Experiment on Generative Tool Development in Architecture PatGen: Islamic Star Pattern Generator

1 Educational Experiment on Generative Tool Development in Architecture PatGen: Islamic Star Pattern Generator 1 Educational Experiment on Generative Tool Development in Architecture PatGen: Islamic Star Pattern Generator Birgül Çolakoğlu 1, Tuğrul Yazar 2, Serkan Uysal 3. Yildiz Technical University, Computational

More information

Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements

Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements Christophe Giacomotto 1, Mandeep Singh 1, Milena Vratonjic 1, Vojin G. Oklobdzija 1 1 Advanced Computer systems Engineering Laboratory,

More information

Computer Architecture A Quantitative Approach

Computer Architecture A Quantitative Approach Computer Architecture A Quantitative Approach Fourth Edition John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With Contributions by Andrea C. Arpaci-Dusseau

More information

Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications

Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications Seongsoo Lee Takayasu Sakurai Center for Collaborative Research and Institute of Industrial Science, University

More information

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System

Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Performance Evaluation of Multi-Threaded System vs. Chip-Multi-Processor System Ho Young Kim, Robert Maxwell, Ankil Patel, Byeong Kil Lee Abstract The purpose of this study is to analyze and compare the

More information

Conventional 4-Way Set-Associative Cache

Conventional 4-Way Set-Associative Cache ISLPED 99 International Symposium on Low Power Electronics and Design Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption Koji Inoue, Tohru Ishihara, and Kazuaki Murakami

More information

Self-Aware Adaptation in FPGAbased

Self-Aware Adaptation in FPGAbased DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Self-Aware Adaptation in FPGAbased Systems IEEE FPL 2010 Filippo Siorni: filippo.sironi@dresd.org Marco Triverio: marco.triverio@dresd.org Martina Maggio: mmaggio@mit.edu

More information

An Area Efficient Decomposed Approximate Multiplier for DCT Applications

An Area Efficient Decomposed Approximate Multiplier for DCT Applications An Area Efficient Decomposed Approximate Multiplier for DCT Applications K.Mohammed Rafi 1, M.P.Venkatesh 2 P.G. Student, Department of ECE, Shree Institute of Technical Education, Tirupati, India 1 Assistant

More information

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs Li Zhou and Avinash Kodi Technologies for Emerging Computer Architecture Laboratory (TEAL) School of Electrical Engineering and

More information

CS 6135 VLSI Physical Design Automation Fall 2003

CS 6135 VLSI Physical Design Automation Fall 2003 CS 6135 VLSI Physical Design Automation Fall 2003 1 Course Information Class time: R789 Location: EECS 224 Instructor: Ting-Chi Wang ( ) EECS 643, (03) 5742963 tcwang@cs.nthu.edu.tw Office hours: M56R5

More information

DEVELOPMENT OF A ROBOID COMPONENT FOR PLAYER/STAGE ROBOT SIMULATOR

DEVELOPMENT OF A ROBOID COMPONENT FOR PLAYER/STAGE ROBOT SIMULATOR Proceedings of IC-NIDC2009 DEVELOPMENT OF A ROBOID COMPONENT FOR PLAYER/STAGE ROBOT SIMULATOR Jun Won Lim 1, Sanghoon Lee 2,Il Hong Suh 1, and Kyung Jin Kim 3 1 Dept. Of Electronics and Computer Engineering,

More information

Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance

Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance Michael D. Powell, Arijit Biswas, Shantanu Gupta, and Shubu Mukherjee SPEARS Group, Intel Massachusetts EECS, University

More information

Hardware/Software Codesign of Real-Time Systems

Hardware/Software Codesign of Real-Time Systems ARTES Project Proposal Hardware/Software Codesign of Real-Time Systems Zebo Peng and Anders Törne Center for Embedded Systems Engineering (CESE) Dept. of Computer and Information Science Linköping University

More information

Outline Simulators and such. What defines a simulator? What about emulation?

Outline Simulators and such. What defines a simulator? What about emulation? Outline Simulators and such Mats Brorsson & Mladen Nikitovic ICT Dept of Electronic, Computer and Software Systems (ECS) What defines a simulator? Why are simulators needed? Classifications Case studies

More information

Recent Advances in Simulation Techniques and Tools

Recent Advances in Simulation Techniques and Tools Recent Advances in Simulation Techniques and Tools Yuyang Li, li.yuyang(at)wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download Abstract: Simulation refers to using specified kind

More information

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Presentation of UniServer Horizon 2020 European project findings: X-Gene server chips, voltage-noise characterization, high-bandwidth voltage measurements,

More information

Big versus Little: Who will trip?

Big versus Little: Who will trip? Big versus Little: Who will trip? Reena Panda University of Texas at Austin reena.panda@utexas.edu Christopher Donald Erb University of Texas at Austin cde593@utexas.edu Lizy Kurian John University of

More information

Fast Placement Optimization of Power Supply Pads

Fast Placement Optimization of Power Supply Pads Fast Placement Optimization of Power Supply Pads Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of Illinois at Urbana-Champaign

More information

Hybrid QR Factorization Algorithm for High Performance Computing Architectures. Peter Vouras Naval Research Laboratory Radar Division

Hybrid QR Factorization Algorithm for High Performance Computing Architectures. Peter Vouras Naval Research Laboratory Radar Division Hybrid QR Factorization Algorithm for High Performance Computing Architectures Peter Vouras Naval Research Laboratory Radar Division 8/1/21 Professor G.G.L. Meyer Johns Hopkins University Parallel Computing

More information

Design Trade-offs for Memory Level Parallelism on an Asymmetric Multicore System

Design Trade-offs for Memory Level Parallelism on an Asymmetric Multicore System Design Trade-offs for Memory Level Parallelism on an Asymmetric Multicore System George Patsilaras, Niket K. Choudhary, James Tuck Department of Electrical and Computer Engineering North Carolina State

More information

James P. Millan. Citizenship. Education

James P. Millan. Citizenship. Education James P. Millan 13 Merasheen Pl. St.John s, Newfoundland Canada A1E 5P5 T (709)-772-2472 B jim.millan@nrc-cnrc.gc.ca http:// www.nrc.ca/ iot http:// www.engr.mun.ca/ ~millan Citizenship Canadian and Irish.

More information

Second Workshop on Pioneering Processor Paradigms (WP 3 )

Second Workshop on Pioneering Processor Paradigms (WP 3 ) Second Workshop on Pioneering Processor Paradigms (WP 3 ) Organizers: (proposed to be held in conjunction with HPCA-2018, Feb. 2018) John-David Wellman (IBM Research) o wellman@us.ibm.com Robert Montoye

More information

Performance Evaluation of Recently Proposed Cache Replacement Policies

Performance Evaluation of Recently Proposed Cache Replacement Policies University of Jordan Computer Engineering Department Performance Evaluation of Recently Proposed Cache Replacement Policies CPE 731: Advanced Computer Architecture Dr. Gheith Abandah Asma Abdelkarim January

More information

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona NPTEL Online - IIT Kanpur Instructor: Dr. Mainak Chaudhuri Instructor: Dr. S. K. Aggarwal Course Name: Department: Program Optimization for Multi-core Architecture Computer Science and Engineering IIT

More information

Adaptive Modulation with Customised Core Processor

Adaptive Modulation with Customised Core Processor Indian Journal of Science and Technology, Vol 9(35), DOI: 10.17485/ijst/2016/v9i35/101797, September 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Adaptive Modulation with Customised Core Processor

More information

CREATING A MINDSET FOR INNOVATION Paul Skaggs, Richard Fry, and Geoff Wright Brigham Young University /

CREATING A MINDSET FOR INNOVATION Paul Skaggs, Richard Fry, and Geoff Wright Brigham Young University / CREATING A MINDSET FOR INNOVATION Paul Skaggs, Richard Fry, and Geoff Wright Brigham Young University paul_skaggs@byu.edu / rfry@byu.edu / geoffwright@byu.edu BACKGROUND In 1999 the Industrial Design program

More information

Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs

Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Michael Gordon, William Thies, and Saman Amarasinghe Massachusetts Institute of Technology ASPLOS October 2006 San Jose,

More information

CS4617 Computer Architecture

CS4617 Computer Architecture 1/26 CS4617 Computer Architecture Lecture 2 Dr J Vaughan September 10, 2014 2/26 Amdahl s Law Speedup = Execution time for entire task without using enhancement Execution time for entire task using enhancement

More information

www.ixpug.org @IXPUG1 What is IXPUG? http://www.ixpug.org/ Now Intel extreme Performance Users Group Global community-driven organization (independently ran) Fosters technical collaboration around tuning

More information

System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching Regulators

System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching Regulators System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching s Wonyoung Kim, Meeta S. Gupta, Gu-Yeon Wei and David Brooks School of Engineering and Applied Sciences, Harvard University, 33 Oxford

More information

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors Chapter 16 - Instruction-Level Parallelism and Superscalar Processors Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 16 - Superscalar Processors 1 / 78 Table of Contents I 1 Overview

More information

MECHATRONICS Master study program. St. Kliment Ohridski University in Bitola Faculty of Technical Sciences Bitola.

MECHATRONICS Master study program. St. Kliment Ohridski University in Bitola Faculty of Technical Sciences Bitola. MECHATRONICS Master study program St. Kliment Ohridski University in Bitola Faculty of Technical Sciences Bitola www.tfb.edu.mk 1 2 Contents Mechatronics - an interdisciplinary approach Competences / Invest

More information

ABOUT COMPUTER SCIENCE

ABOUT COMPUTER SCIENCE ABOUT COMPUTER SCIENCE MOST COMMON CS JOB TITLES Computer Programmer Computer System Analyst Software Developers Computer and Information Research 2 COMPUTER PROGRAMMERS What they do: Write programs in

More information

Leading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005]

Leading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005] Leading by design: Q&A with Dr. Raghuram Tupuri, AMD Chris Hall, DigiTimes.com, Taipei [Monday 12 December 2005] AMD s drive to 64-bit processors surprised everyone with its speed, even as detractors commented

More information

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp

Programming and Optimization with Intel Xeon Phi Coprocessors. Colfax Developer Training One-day Boot Camp Programming and Optimization with Intel Xeon Phi Coprocessors Colfax Developer Training One-day Boot Camp Abstract: Colfax Developer Training (CDT) is an in-depth intensive course on efficient parallel

More information

MULTISCALAR PROCESSORS

MULTISCALAR PROCESSORS MULTISCALAR PROCESSORS THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE MULTISCALAR PROCESSORS by Manoj Franklin University of Maryland, US.A. SPRINGER SCIENCE+BUSINESS MEDIA, LLC Library

More information

Computer & Information Science & Engineering (CISE)

Computer & Information Science & Engineering (CISE) Computer & Information Science & Engineering (CISE) Mitra Basu, PhD mbasu@nsf.gov Computer and Information Science and Engineering http://www.nsf.gov/cise Advanced Cyberinfrastructure Computing & Communication

More information

Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors

Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors STIJN EYERMAN and LIEVEN EECKHOUT Ghent University A thread executing on a simultaneous multithreading (SMT) processor

More information

COTSon: Infrastructure for system-level simulation

COTSon: Infrastructure for system-level simulation COTSon: Infrastructure for system-level simulation Ayose Falcón, Paolo Faraboschi, Daniel Ortega HP Labs Exascale Computing Lab http://sites.google.com/site/hplabscotson MICRO-41 tutorial November 9, 28

More information

Multiple Clock and Voltage Domains for Chip Multi Processors

Multiple Clock and Voltage Domains for Chip Multi Processors Multiple Clock and Voltage Domains for Chip Multi Processors Efraim Rotem- Intel Corporation Israel Avi Mendelson- Microsoft R&D Israel Ran Ginosar- Technion Israel institute of Technology Uri Weiser-

More information

Distributed Vision System: A Perceptual Information Infrastructure for Robot Navigation

Distributed Vision System: A Perceptual Information Infrastructure for Robot Navigation Distributed Vision System: A Perceptual Information Infrastructure for Robot Navigation Hiroshi Ishiguro Department of Information Science, Kyoto University Sakyo-ku, Kyoto 606-01, Japan E-mail: ishiguro@kuis.kyoto-u.ac.jp

More information

Evaluation of CPU Frequency Transition Latency

Evaluation of CPU Frequency Transition Latency Noname manuscript No. (will be inserted by the editor) Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz Alexandre Laurent Benoît Pradelle William Jalby Abstract Dynamic Voltage and Frequency

More information

Durham Research Online

Durham Research Online Durham Research Online Deposited in DRO: 29 August 2017 Version of attached le: Accepted Version Peer-review status of attached le: Not peer-reviewed Citation for published item: Chiu, Wei-Yu and Sun,

More information

Dynamic MIPS Rate Stabilization in Out-of-Order Processors

Dynamic MIPS Rate Stabilization in Out-of-Order Processors Dynamic Rate Stabilization in Out-of-Order Processors Jinho Suh and Michel Dubois Ming Hsieh Dept of EE University of Southern California Outline Motivation Performance Variability of an Out-of-Order Processor

More information

SCIENTIFIC LITERACY FOR SUSTAINABILITY

SCIENTIFIC LITERACY FOR SUSTAINABILITY SCIENTIFIC LITERACY FOR SUSTAINABILITY Karen Murcia: BAppSc., GradDipEd., M Ed. Submitted in total fulfilment of the requirements of the Degree of Doctor of Philosophy. November 2006 Division of Arts School

More information

Computer Logical Design Laboratory

Computer Logical Design Laboratory Division of Computer Engineering Computer Logical Design Laboratory Tsuneo Tsukahara Professor Tsuneo Tsukahara: Yukihide Kohira Senior Associate Professor Yu Nakajima Research Assistant Software-Defined

More information

Yutaka Hori Web:

Yutaka Hori   Web: Email: yhori@appi.keio.ac.jp Web: http://bi.appi.keio.ac.jp/~yhori/ Full CV is available upon request. Education March 2013 Ph.D. Department of Information Physics and Computing, Graduate School of Information

More information

An Evaluation of Speculative Instruction Execution on Simultaneous Multithreaded Processors

An Evaluation of Speculative Instruction Execution on Simultaneous Multithreaded Processors An Evaluation of Speculative Instruction Execution on Simultaneous Multithreaded Processors STEVEN SWANSON, LUKE K. McDOWELL, MICHAEL M. SWIFT, SUSAN J. EGGERS and HENRY M. LEVY University of Washington

More information

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA

More information

Teaching digital control of switch mode power supplies

Teaching digital control of switch mode power supplies Teaching digital control of switch mode power supplies ABSTRACT This paper explains the methodology followed to teach the subject Digital control of power converters. The subject is focused on several

More information

Information Technology Fluency for Undergraduates

Information Technology Fluency for Undergraduates Response to Tidal Wave II Phase II: New Programs Information Technology Fluency for Undergraduates Marti Hearst, Assistant Professor David Messerschmitt, Acting Dean School of Information Management and

More information

On the Rules of Low-Power Design

On the Rules of Low-Power Design On the Rules of Low-Power Design (and Why You Should Break Them) Prof. Todd Austin University of Michigan austin@umich.edu A long time ago, in a not so far away place The Rules of Low-Power Design P =

More information

SDR Applications using VLSI Design of Reconfigurable Devices

SDR Applications using VLSI Design of Reconfigurable Devices 2018 IJSRST Volume 4 Issue 2 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology SDR Applications using VLSI Design of Reconfigurable Devices P. A. Lovina 1, K. Aruna Manjusha

More information

WEI HUANG Curriculum Vitae

WEI HUANG Curriculum Vitae 1 WEI HUANG Curriculum Vitae 4025 Duval Road, Apt 2538 Phone: (434) 227-6183 Austin, TX 78759 Email: wh6p@virginia.edu (preferred) https://researcher.ibm.com/researcher/view.php?person=us-huangwe huangwe@us.ibm.com

More information

2009 Brian L. Greskamp

2009 Brian L. Greskamp 2009 Brian L. Greskamp IMPROVING PER-THREAD PERFORMANCE ON CMPS THROUGH TIMING SPECULATION BY BRIAN L. GRESKAMP B.S. Clemson University, 2003 M.S. University of Illinois at Urbana-Champaign, 2005 DISSERTATION

More information

II. ROBOT SYSTEMS ENGINEERING

II. ROBOT SYSTEMS ENGINEERING Mobile Robots: Successes and Challenges in Artificial Intelligence Jitendra Joshi (Research Scholar), Keshav Dev Gupta (Assistant Professor), Nidhi Sharma (Assistant Professor), Kinnari Jangid (Assistant

More information

Using Variability Modeling Principles to Capture Architectural Knowledge

Using Variability Modeling Principles to Capture Architectural Knowledge Using Variability Modeling Principles to Capture Architectural Knowledge Marco Sinnema University of Groningen PO Box 800 9700 AV Groningen The Netherlands +31503637125 m.sinnema@rug.nl Jan Salvador van

More information

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Rachata Ausavarungnirun Joshua Landgraf Vance Miller Saugata Ghose Jayneel Gandhi Christopher J. Rossbach Onur

More information

Low Power VLSI Circuit Synthesis: Introduction and Course Outline

Low Power VLSI Circuit Synthesis: Introduction and Course Outline Low Power VLSI Circuit Synthesis: Introduction and Course Outline Ajit Pal Professor Department of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA -721302 Agenda Why Low

More information

An Integrated Modeling and Simulation Methodology for Intelligent Systems Design and Testing

An Integrated Modeling and Simulation Methodology for Intelligent Systems Design and Testing An Integrated ing and Simulation Methodology for Intelligent Systems Design and Testing Xiaolin Hu and Bernard P. Zeigler Arizona Center for Integrative ing and Simulation The University of Arizona Tucson,

More information

DIGF 6B21 Ubiquitous Computing

DIGF 6B21 Ubiquitous Computing DIGF 6B21 Ubiquitous Computing NUMBER OF CREDITS: 1.5 Day and Time: Tuesdays 18:30 21:30, beginning October 30th Location: Room 7301, 205 Richmond Professor: Nick Puckett Email: npuckett@faculty.ocadu.ca

More information

A High Definition Motion JPEG Encoder Based on Epuma Platform

A High Definition Motion JPEG Encoder Based on Epuma Platform Available online at www.sciencedirect.com Procedia Engineering 29 (2012) 2371 2375 2012 International Workshop on Information and Electronics Engineering (IWIEE) A High Definition Motion JPEG Encoder Based

More information

David Daly. IBM T. J. Watson Research Center P.O. Box 218 Yorktown Heights, NY

David Daly. IBM T. J. Watson Research Center P.O. Box 218 Yorktown Heights, NY David Daly IBM T. J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598 http://researcher.ibm.com/person/us-dmdaly Education University of Illinois, Urbana-Champaign Ph.D. in Electrical Engineering

More information

Invitation for involvement: NASA Frontier Development Lab (FDL) 2018

Invitation for involvement: NASA Frontier Development Lab (FDL) 2018 NASA Frontier Development Lab 189 N Bernardo Ave #200, Mountain View, CA 94043, USA www.frontierdevelopmentlab.org January 2, 2018 Invitation for involvement: NASA Frontier Development Lab (FDL) 2018 Dear

More information

Architecture ISCA 16 Luis Ceze, Tom Wenisch

Architecture ISCA 16 Luis Ceze, Tom Wenisch Architecture 2030 @ ISCA 16 Luis Ceze, Tom Wenisch Mark Hill (CCC liaison, mentor) LIVE! Neha Agarwal, Amrita Mazumdar, Aasheesh Kolli (Student volunteers) Context Many fantastic community formation/visioning

More information

Cross Linking Research and Education and Entrepreneurship

Cross Linking Research and Education and Entrepreneurship Cross Linking Research and Education and Entrepreneurship MATLAB ACADEMIC CONFERENCE 2016 Ken Dunstan Education Manager, Asia Pacific MathWorks @techcomputing 1 Innovation A pressing challenge Exceptional

More information

Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators

Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators Reduction of Peak Input Currents during Charge Pump Boosting in Monolithically Integrated High-Voltage Generators Jan Doutreloigne Abstract This paper describes two methods for the reduction of the peak

More information

Power Management in Multicore Processors through Clustered DVFS

Power Management in Multicore Processors through Clustered DVFS Power Management in Multicore Processors through Clustered DVFS A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Tejaswini Kolpe IN PARTIAL FULFILLMENT OF THE

More information

Heterogeneous Concurrent Error Detection (hced) Based on Output Anticipation

Heterogeneous Concurrent Error Detection (hced) Based on Output Anticipation International Conference on ReConFigurable Computing and FPGAs (ReConFig 2011) 30 th Nov- 2 nd Dec 2011, Cancun, Mexico Heterogeneous Concurrent Error Detection (hced) Based on Output Anticipation Naveed

More information

STRATEGIC FRAMEWORK Updated August 2017

STRATEGIC FRAMEWORK Updated August 2017 STRATEGIC FRAMEWORK Updated August 2017 STRATEGIC FRAMEWORK The UC Davis Library is the academic hub of the University of California, Davis, and is ranked among the top academic research libraries in North

More information

Faculty of Arts and Social Sciences. STRUCTUURRAPPORT Chair Digital Arts and Culture

Faculty of Arts and Social Sciences. STRUCTUURRAPPORT Chair Digital Arts and Culture Faculty of Arts and Social Sciences STRUCTUURRAPPORT Chair Digital Arts and Culture December 2017 Pagina 1 van 7 MOTIVATION The Faculty of Arts and Social Sciences (FASoS) of Maastricht University (UM)

More information

Compiler Optimisation

Compiler Optimisation Compiler Optimisation 6 Instruction Scheduling Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2018 Introduction This

More information

Technical-oriented talk about the principles and benefits of the ASSUMEits approach and tooling

Technical-oriented talk about the principles and benefits of the ASSUMEits approach and tooling PROPRIETARY RIGHTS STATEMENT THIS DOCUMENT CONTAINS INFORMATION, WHICH IS PROPRIETARY TO THE ASSUME CONSORTIUM. NEITHER THIS DOCUMENT NOR THE INFORMATION CONTAINED HEREIN SHALL BE USED, DUPLICATED OR COMMUNICATED

More information

Grundlagen des Software Engineering Fundamentals of Software Engineering

Grundlagen des Software Engineering Fundamentals of Software Engineering Software Engineering Research Group: Processes and Measurement Fachbereich Informatik TU Kaiserslautern Grundlagen des Software Engineering Fundamentals of Software Engineering Winter Term 2011/12 Prof.

More information

DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS

DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS DESIGN OF A MEASUREMENT PLATFORM FOR COMMUNICATIONS SYSTEMS P. Th. Savvopoulos. PhD., A. Apostolopoulos 2, L. Dimitrov 3 Department of Electrical and Computer Engineering, University of Patras, 265 Patras,

More information