Architectural Support for Efficient Large-Scale Automata Processing

Size: px
Start display at page:

Download "Architectural Support for Efficient Large-Scale Automata Processing"

Transcription

1 Architecturl Support for Efficient Lrge-Scle Automt cessing Hongyun Liu, Mohme Ibrhim, Onur Kyirn, Sreepthi Pi, n Awit Jog College of Willim & Mry Avnce Micro Devices, Inc. University of Rochester Emil: {hliu08,mibrhim}@emil.wm.eu, onur.kyirn@m.com, sree@cs.rochester.eu, jog@wm.eu Abstrct The Automt cessor (AP) ccelertes pplictions from omins rnging from mchine lerning to genomics. However, s sptil rchitecture, it is unble to hnle lrger utomt progrms without repete reconfigurtion n reexecution. To chieve high throughput, this pper proposes for the first time rchitecturl support for AP to efficiently execute lrge-scle pplictions. We fin tht lrge number of existing n new Non-eterministic Finite Automt (NFA) bse pplictions hve sttes tht re never enble but re still configure on the AP chips leing to their unerutiliztion. With the help of creful chrcteriztion n profiling-bse mechnisms, we preict which sttes re never enble n hence nee not be configure on AP. Furthermore, we evelop SprseAP, new execution moe for AP to efficiently hnle the mis-preicte NFA sttes. Our etile simultions cross 6 pplictions from vrious omins show tht our newly propose execution moel for AP cn obtin. geometric men speeup (up to 47 ) over the bseline AP execution. I. INTRODUCTION Mny pplictions from omins such s genomics, mlwre etection, mchine lerning, n t nlytics exhibit high levels of prllelism n re being ccelerte through the use of sptil rchitectures tht cn exploit higher levels of prllelism thn CPUs n lso cn significntly reuce t movement [] [9]. Sptil rchitectures usully consist of mny interconnecte processing elements tht expose very high egree of prllelism. Fiel-progrmmble gte rrys (FPGAs) re clssic exmple; the systolic-rry-bse Mtrix Multiply Unit in Google s Tensor cessing Unit [0] is lso sptil rchitecture. One of the funmentl chllenges with sptil rchitectures is tht progrm size is first orer concern there re fixe number of sttes vilble n sptil progrm must fit completely to begin execution. Otherwise, execution my be impossible, or in the best cse multiple rouns of reconfigurtion n re-execution my be require tht cn incur significnt performnce penlties []. On tritionl von Neumnn rchitectures, these issues cn typiclly be hnle by tritionl mechnisms such s context switching n virtuliztion. However, the lrge size of the sptil progrm stte mens tht these techniques o not trnsfer irectly. Some of these issues ffect lso tritionl rchitectures like the Grphics cessing Units (GPUs), whose mssive prllelism lso mens tht the mount of stte is often prohibitively lrge to support efficient multitsking [] [5]. In this pper, we focus on proviing rchitecturl support for executing lrge-scle tsks on specil clss of sptil rchitectures, known s utomt processors (APs) [6]. These rchitectures ccelerte the processing of Non-eterministic Finite Automt (NFA), wiely use representtion of Finite Stte Mchines (FSMs). FSMs re fountionl in wie rnge of ppliction omins such s DNA sequence mtching, network intrusion etection n mchine lerning [7] []. Although mny existing pproches [] [6] ccelerte NFA processing on CPUs or GPUs, none of them completely solve the problem of t movement cuse by irregulr ccesses ue to NFA trnsition tble lookups. In comprison, the AP executes NFAs ntively n chieves significnt performnce speeup [7], [8] primrily becuse of: ) AP s mssive prllelism where NFA sttes re mppe to columns in DRAM n cn be ctivte inepenently n simultneously in given cycle; n b) AP s in-memory processing cpbility tht hnles NFA trnsitions without t movement between processor n memory. An AP hlf-core (the bsic processing unit of AP) cn hol up to 4K sttes. However, in future, we expect tht the NFAbse pplictions re going to scle both in terms of the number of NFAs per ppliction n the number of sttes in n NFA. We expect this scling from t lest two spects. First, in the er of big-t, the new pplictions will likely be mining even lrger tbses. For exmple, ClmAV [9], n nti-virus ppliction, uses vrint of regulr expression to specify ech virus signture in n ever-enlrging tbse. The number of NFA sttes constructe from these signture regulr expressions is consequently lrger n stte-of-the-rt AP chips cn no longer hol ll the sttes t once. Secon, number of existing n newly propose techniques enhnce the throughput of FSM processing, but only by incresing the number of sttes. For exmple, existing AP supports uplicting NFAs to run multiple input symbol strems in prllel [0]; newly propose Prllel Automt cessor [] uplictes NFAs for prllel enumertion; n the Multi-strie NFAs [], [] trnsformtion increses the number of trnsitions for processing multiple symbols t one step. Current AP chips execute these pplictions with lrge number of NFAs/sttes by mking inepenent btches of NFAs n executing ech btch on the entire input while reconfiguring the AP between ech btch. To ress the performnce inefficiencies from repete re-executions, we propose hrwre n softwre support for lrge-scle NFA-bse pplictions tht currently o not fit in the AP chips. Our mechnisms re bse on our key observtion tht not ll sttes of n NFA re enble uring execution, n

2 Percentge of sttes 00% 80% 60% 40% 0% 0% Hot (Enble) Col (Never-enble) HM TCP Rg EM Rg05 LV Bro7 Fig. : A lrge portion of NFA sttes re col (never-enble) but re still configure on the AP leing to its unerutiliztion. hence nee not be configure to the AP. Specificlly, lrge frction of sttes unnecessrily tke spce in the AP chip but re not prt of ny stte trnsitions. We refer to such never enble sttes s col sttes n the remining (enble) sttes s hot sttes. Figure quntittively shows our observtion cross 6 iverse pplictions [7], [4] sorte in the incresing orer of their percentge of hot sttes (cross ll NFAs in n ppliction). We fin tht on verge 59% of sttes re col n it cn be up to 99% in pplictions such s. These observtions cn be expline by revisiting the wy NFAs process inputs. NFA behvior is highly input epenent. A stte cn ttempt to mtch symbol of input only if it is enble. In the most generl cse, stte is enble only if t lest one of its preecessor sttes mtche symbol of input (the exceptions being strting sttes, which re lwys enble). A mtch inictes tht the current input string is plusibly still vli prefix of the regulr lnguge recognize by the NFA. Sttes stop mtching s soon s the input string is efinitely not in the lnguge. However, the AP must still process ll input symbols s long s there is one stte enble (which is lwys true for n NFA with t lest one strting stte tht is lwys enble), thus leving mny sttes never enble. Section III shows tht this is inee the cse for the NFAs running on the AP. Bse on the bove key insight, we first evelop softwrebse mechnism to preict which sttes re col n hence nee not be configure on the AP. Next, we propose chnges in the AP hrwre to efficiently execute the mis-preicte col sttes. To the best of our knowlege, this is the first work tht proposes rchitecturl support for efficiently executing lrge-scle NFA-bse pplictions on the AP. In summry, this pper mkes the following contributions: We emonstrte tht lrge number of NFA sttes re col uring execution but re still configure on the AP. This les to its severe unerutiliztion. We evelop preiction mechnism to clssify the NFA sttes into preicte hot n preicte col sets. We use properties of NFA execution to evelop simple n effective prtitioning scheme bse on stte s topologicl orer n profiling informtion. We evelop efficient hrwre mechnisms to execute preicte col sttes using new sprse execution moe for the AP (clle s SprseAP). Our etile evlution shows tht we cn chieve. geometric men speeup (up to 47 ) over the bseline AP execution cross wie rnge of 6 pplictions. II. BACKGROUND AND TMINOLOGY In this section, we provie brief bckgroun on NFAs n their processing on the AP. A. NFA-bse Pttern Mtching An NFA is represente by 5-tuple, (Q,Σ,,q 0,F), where Q is set of sttes, Σ is the lphbet (set of input symbols), is trnsition function which mps Σ pirs to new set of sttes, q 0 is the set of strting sttes, n F is set of ccepting or reporting sttes. Becuse there cn be more thn one possible stte on trnsition, such FSM is clle non-eterministic. The NFAs use by APs re homogeneous. These NFAs cn be visulize s irecte grph where ech noe represents stte n ech ege represents stte trnsition. Ech stte in the NFA hs symbol-set tht represents wht symbols cn be ccepte by this stte. Ech stte hs one or multiple successors connecte by irecte eges. In ech step, S S S4 b c S S5 c S6 Fig. : A homogeneous NFA tht ccepts regulr expression ((bc) (c)+)f: the ouble circle represents strting stte n the hexgon represents reporting stte. the NFA hs number of enble sttes. The strting sttes re enble prior to the execution. The mtching process is riven by strem of input symbols. Ech cycle, n enble stte compres the input symbol with its symbol-set for mtching; when the symbol mtches, the stte is ctivte, n ll its successor sttes re enble in the next cycle. When reporting stte is ctivte, it genertes report showing tht relevnt pttern hs been observe in the input symbol strem. Figure shows the NFA of the regulr expression ((bc) (c)+)f. At first, the strting stte S is enble. bcf is the input symbol strem. ctivtes stte S, resulting in the successors of S (i.e., S n S 4 ) to be enble in the next cycle. b ctivtes stte S (S 4 is not ctivte since it oes not ccept symbol b), then the successor of S (i.e., S ) is enble. The process repets until ll input symbols re consume. In this cse, since reporting stte S 6 is ctivte by input symbol f, report is generte inicting successful mtch. B. Bseline Automt cessor (AP) Figure shows schemtic of the consiere bseline AP chip. The AP is DRAM-bse sptil rchitecture in which ech stte of NFA is store in memory column of the DRAM, nmely stte trnsition element (STE). A bit in the column represents whether the STE cn ccept the corresponing input symbol represente by ech row. The mximum size of the lphbet is 56 s this is the with of the ress ecoer in the current AP rchitecture. Therefore, there re 56 rows In homogeneous NFAs [6], [5], [6], ll incoming trnsitions to ny given stte must ccept the sme set of input symbols (symbol-set). In the rest of this pper, we tret homogeneous NFA synonymous with NFA, becuse they hve the sme computtionl bility n time complexity. f

3 input symbol 97th row 8-56 Decoer Stte bit S S S S4 S5 S6 Routing mtrix Stte vector Fig. : The figure illustrtes the first execution cycle of n AP configure with the NFA shown in Figure. S is enble when input symbol rrives, which ctivtes S, n enbles S n S4 in the next cycle. Downwr rrows represent the enble signl being fe to routing mtrix in the current cycle. Upwr rrows enble successor sttes for the next cycle. The physicl connections between STEs n routing mtrix re bi-irectionl, which re represente by the she rrows. in totl. An AP chip consists of two hlf-cores. The stte trnsition cnnot go cross hlf-cores ue to the limittion of the interconnect. The stte trnsitions re compile to the reconfigurble interconnecting network nmely routing mtrix. The entire input strem is processe sequentilly with the rte of one symbol per cycle. Ech cycle, one input symbol is fe into the ress ecoer, which selects whole row (out of 56) of the DRAM (ornge she prt in Figure ). Ech STE column hs bit tht represents whether the STE is enble or not, nmely stte bit. The stte bits for ll STEs re combine s stte vector. This informtion is vilble from the previous cycle. An AND opertion is performe between the selecte row (e.g., she prt) n the stte vector resulting in vector tht etermines the ctivte sttes. This ctivtion informtion is sent to the routing mtrix, which uptes the stte vector with the enble sttes for processing next symbol. Such process is repete until the entire input symbol strem is processe. To unerstn the working of AP, we illustrte the execution of previously consiere NFA (Figure ) vi Figure. We previously observe in Figure tht S ccepts symbol. Accoringly, the bit store in the 97th row (corresponing to the ASCII of ) n the column of STE tht stores S is set to n the others remin 0. The stte bit of S is n {} is in the symbol-set of S, therefore, S is ctivte n it brocsts the enble signls to the successor sttes (S, S 4 ) vi the routing mtrix (upwr rrows in Figure ). III. MOTIVATION AND ANALYSIS In this section, we nlyze why high percentge of sttes re col, which sttes re more likely to be col, n how voiing these sttes from being configure to AP cn improve the performnce. A. Topologicl Orer n Normlize Depth In generl, it is hr to preict which sttes will be enble in NFAs [7]. Clerly, ll strting sttes will be enble t lest once n this oes not epen on the input. The sttes tht re further wy from the strting stte, however, epen on the input. Ech subsequent stte trnsition in homogeneous NFA must mtch symbol of input (homogeneous NFAs o not hve ε-trnsitions [8]). Intuitively, stte tht is further wy from the strting stte is less likely to be enble since ech itionl stte on the pth to it increses the chnces of mismtch. To verify if this intuition hols on NFAs from rel-worl pplictions executing on the AP, we stuy whether sttes re hot or col with respect to their epths in the NFAs. For simplicity of exposition, we first consier only NFAs tht re lso irecte-cyclic grphs (DAGs). In this cse, the epth of stte is simply its topologicl orer (i.e., the mximum steps from the strting stte to itself in the mtching process). Thus, the mtching process goes from sttes with lower topologicl orer to sttes with higher topologicl orer but cnnot go bck s DAGs o not hve cycles. Such n NFA cn be viewe s grph with lyers, where ll strting sttes re in the first lyer (i.e., their topologicl orer is one), sttes in the secon lyer (i.e., sttes with topologicl orer of two) re rechble from the first lyer, sttes in the thir lyer re rechble from the first n secon lyers, n so on. However, NFAs re not lwys DAGs, becuse they cn contin bck eges (i.e., from lter lyer to n erlier lyer) n cycles. For exmple, the NFA in Figure 4 ( ) contins cycle between sttes S 4 n S 5. Topologicl sort cnnot be performe on such grphs. Therefore, we pre-process n NFA by ientifying ll its strongly connecte components (SCC) [9]. Ech stte s is mrke with connecte component number SCC(s), such tht the sttes belonging to the sme SCC re mrke with the sme number. We construct grph G from irecte grph G (i.e., the NFA) by treting ech SCC in G s single noe in G (e.g., in Figure 4, the SCC tht inclues sttes S 4 n S 5 is consiere s single noe in G ). For ech ege (u,v) in G, n ege (SCC(u),SCC(v)) is e in G if noes u n v re in ifferent SCCs. The resulting G is DAG on which we cn run topologicl sort. Figure 4 ( ) shows the results of ientifying SCCs n topologicl sort. The topologicl orer of ech stte is inicte s number right to the stte. Since S 4 n S 5 belong to the sme SCC, they re ssigne with the sme topologicl orer. S S b c S S6 f S4 S5 c S S b c S S6 f 4 4/4 /4 /4 /4 /4 S4 c S5 SCC Fig. 4: Illustrtion of topologicl orering n normlize epth. The bsolute topologicl orer or epth of stte is uninformtive s ifferent NFAs cn hve ifferent number of lyers, even within the sme ppliction. Therefore, we /4

4 normlize the epth of stte to the mximum epth in the NFA it belongs to, resulting in normlize epth. For exmple, in Figure 4 ( ), becuse the mximum topologicl orer is 4 (S 6 ), the normlize epth of ech stte s is topoorer(s)/4 (e.g., for S 4 n S 5, it is /4 or 0.5) where topoorer is function tht returns the topologicl orer of stte. A normlize epth closer to inictes the stte is t the bottom of the NFA (or reltively eep), while vlue closer to 0 inictes the stte is closer to the top (or reltively shllow). B. Anlysis of Normlize Depth n Enble NFA Sttes Figure 5() shows the normlize epth istribution of enble (hot) sttes for our evlute pplictions. Ech ppliction is comprise of mny NFAs, ech representing ifferent pttern. We fin tht for the mjority of pplictions, the hot sttes hve low normlize epth (i.e., they re closer to the strting stte of the NFAs). Furthermore, for the sme set of pplictions, Figure 5(b) shows the normlize epth istribution of col (never enble) sttes. We observe tht the col sttes in the mjority of the pplictions hve high normlize epth (i.e., they re in eeper regions of the NFAs). To confirm this conclusion further, we lso fin tht there is significnt negtive correltion (verge correltion coefficient is 0.8) between normlize epth n percentge of hot sttes for ll pplictions, except. Percentge of sttes Percentge of sttes 00% 80% 60% 40% 0% 0% 00% 80% 60% 40% 0% 0% Shllow Meium Deep HM TCP Rg EM Rg05 LV Bro7 () Hot (Enble) sttes HM TCP Rg EM Rg05 LV Bro7 (b) Col (Never-enble) sttes. Fig. 5: Distribution of normlize epth for NFA sttes. For presenttion purposes only, normlize epth is clssifie s: i) shllow ([0 0.)), ii) meium ([0. 0.6)), n iii) eep ([0.6 ]). We conclue tht whether stte is hot or col is highly correlte with its normlize epth. Overll, shllow sttes re more likely to be hot while eep sttes re more likely to be col. C. Anlysis of Performnce Benefits We nlyze the iel performnce benefits when we completely eliminte the col sttes from being configure on the AP. We show the potentil benefits using performnce moel ssuming orculr knowlege of which sttes re col n not configure on the AP. Performnce Moel. Consier the cse of the bseline AP execution, where the ppliction hs S sttes (cross ll NFAs) n the number of sttes the AP hlf-core cn hol (cpcity) is C AP. Without loss of generlity, we only iscuss the cse of one AP hlf-core. If the number of sttes (S) is lrger thn the size of AP (C AP ), it is not possible to configure the entire ppliction t once to the AP n will require configuring the AP multiple times. Ech configurtion plces set of NFAs tht cn collectively fit in the AP. Suppose the size of ech NFA in the ppliction is less thn the size of AP, therefore, the number of configurtions to the AP woul be N config = C S AP, uner the ssumption tht iniviul NFAs cn be split t stte grnulrity. In the current AP rchitecture, btches (prtitions) usully contin whole NFAs, so the number of configurtions my be even higher. To mintin semntics, ech configurtion btch must see the sme input strem. The mtching process finishes fter ll btches of NFAs re execute on the sme input strem. Thus, the totl number of cycles spent on the sme input strem is N config n, where n is the length of the input strem n N config is the number of btches. Uner perfect scenrio where we cn ientify col sttes (S col ) with 00% ccurcy, we cn reuce N config by not configuring the col sttes to the AP. We efine the resource sving p = S col S. Therefore, the speeup over the bseline cse is C S AP / ( p) S C AP. If the number of sttes is sufficiently lrge, the speeup we cn get is proportionl to p, p. Thus, the lrger the proportion of col sttes tht cn be correctly ientifie n eliminte, the more speeup we cn hve over the bseline execution scenrio. Illustrtive Exmple. To illustrte the benefits of configuring the AP with only hot sttes, Figure 6 shows two scenrios: ) the bseline AP execution, n b) the AP tht only executes hot sttes. The execution in both cses consiers the sme ppliction ( A ). In the bseline scenrio, if the number of totl sttes is more thn the AP cpcity, the execution will nee to be one in btches s iscusse before. In this exmple, the compiler prtitions the ppliction into two btches, where ech btch cn iniviully fit in the AP ( B ). Hence, the sme input strem is execute twice in sequentil mnner ( D ). However, with the orculr knowlege of col sttes, the compiler cn generte perfect prtition of the ppliction with only the hot sttes ( C ). If this perfect prtition fits in the AP, it cn execute on it by consuming the sme input strem only once ( E ), resulting in significnt svings in the execution cycles. In summry, significnt speeup cn be chieve if col sttes re not configure to AP. In the next section, we propose simple n effective profiling-bse mechnism to ientify such sttes in relistic scenrios n then leverge the profiling informtion to efficiently prtition them from the NFAs. IV. DESIGN AND IMPLEMENTATION OF NFA PARTITIONING Any relistic implementtion tht elimintes col sttes from NFAs (i.e., prtitions NFAs into col n hot sttes, n only configures AP with hot sttes) hs to el with t lest three chllenges. First, lthough it is not possible to preict col

5 Bseline Compile-time Btch Btch Runtime Btch Btch D A Appliction Strting stte Hot stte B Perfect Prtitioning Hot Sttes input strem input strem Cycles Hot Sttes Sve input strem E Col stte C Time Fig. 6: An illustrtive figure showing tht by not configuring col sttes on AP, ll the hot sttes cn fit onto n AP t the sme time, reucing the number of re-executions over the input n hence sving time. sttes with 00% ccurcy in generl, we nee to evelop low-overhe techniques to improve the ccurcy of preiction s much s possible. Secon, in the cse of mis-preiction, some trnsitions my require sttes tht re not configure on the AP. To this en, we nee mechnism working s sfety net to hnle trnsition from stte on the AP to stte tht is not on the AP. Thir, to minimize the cost of such mis-preictions, trnsitions shoul be uniirectionl to voi re-executions of inputs on the AP. Our propose prtitioning scheme systemticlly resses these chllenges. First, we use profiling-bse scheme to ientify the topologicl lyer tht cts s prtition lyer for ech NFA in the ppliction. Secon, our propose scheme hnles trnsitions out of the AP by ing intermeite reporting sttes tht piggybck on existing AP reporting hrwre. Finlly, to ensure uniirectionl trnsitions, we prtition the NFA t specific topologicl orer. Since the mtching lwys procees from lower to higher topologicl orer, eges tht cross prtitions go only in one irection. A. filing-bse Hot/Col Stte Preiction We use smll portion of input for ech ppliction s profiling input. Bsiclly, t compile time, we run the profiling input on the NFAs of the ppliction n etermine whether stte is hot or col. We ssume tht this profiling informtion hols true uring the ctul execution n hence re ble to preict which sttes will be hot or col. In the following prts of this sub-section, we evlute the effectiveness of our profiling-bse preiction. filing n Testing Inputs. Ech ppliction tht we evlute hs MB input. We ivie this MB input into two equl prts of 5KB. The first 5KB of input is use for creting ifferent sizes of profiling inputs n the lst 5KB is use for testing input. We crete ifferent sizes of profiling inputs by using the first 0.%, %, 0%, 00% symbols of the 5KB portion, which is essentilly 0.%, %, 0%, 50% of the entire input. Methoology for Evluting the Effectiveness of filing. In our evlution, we tret hot s positive (P) n col s negtive (N). Therefore, true positives (TP) re sttes tht re hot both uner profiling input n testing input. Similrly, flse positives (FP) re sttes tht re hot uner profiling TABLE I: The effectiveness of profile-bse preiction Percentge of the entire input 0.% % 0% 50% Accurcy 87% 90% 9% 97% Recll 64% 76% 87% 97% Precision 94% 9% 90% 9% input but ctully col uner testing input. True negtives (TN) n flse negtives (FN) re efine similrly. We efine: ) ccurcy = TP+TN P+N, which mesures overll how well is the profiling-bse preiction; b) recll = TP+FN TP, which mesures how complete our preiction is terms of preicting hot sttes; n c) precision = TP+FP TP, which mesures how well the preiction coul relize the resource sving scope (p). Effectiveness of filing. Tble I shows the verge numbers for ccurcy, recll, n precision when we use ifferent sizes of profiling inputs. We evlute ll pplictions except n. Specificlly, using only % prefix of the first 5KB (i.e., % of the entire input) cn chieve 76% recll, which mens 76% of hot sttes uner testing input re lso hot with the smll profiling input. The results re consistent cross 4 pplictions (recll vries from 49% to 00%). In ition, the preiction lso hs goo results in terms of ccurcy n precision. To conclue, only smll profiling input cn ientify most of the hot sttes uring the ctul execution. Therefore, we use 0.% n % of the entire input for profiling n the remining for the ctul evlution (Section VII). B. Where to Prtition? In current AP rchitecture, the ppliction is split t NFA grnulrity into btches. In contrst, we prtition the NFAs t topologicl-orer grnulrity. There re two resons tht we use topologicl-orer s our prtition grnulrity. First, our previous nlysis (Section III-B) shows there is correltion between normlize epth n percentge of hot sttes. Secon, prtition t topologicl-orer grnulrity cn gurntee the uniirectionl trnsition between preicte col n hot sttes. In this subsection, we show how o we obtin prtition lyer k U for ech NFA U of the ppliction. We will show how to prtition ech NFA t the topologicl-orer grnulrity in Section IV-C. For n, we use the entire input for the ctul execution becuse their strting sttes re only enble t position 0 (strt-of-t in ANML configurtion).

6 Choosing Prtition Lyer. At compile time, we functionlly simulte ll NFAs of the ppliction using the profiling input n preict whether stte is hot or col. After simultion, for ech NFA U, we set k U = mx{topoorer(s)}, s is hot stte in NFA U uner the profiling input. We efine the preicte hot set = {s s U topoorer(s) k U, U}. Accoringly, the preicte col set = {s s U topoorer(s) > k U, U}. We ivie the preicte hot set t NFA level into btches tht cn fit in AP n configure ech btch sequentilly. Optimiztion. As n optimiztion, to mke ech btch fill the AP completely, we ssign itionl sttes to the preicte hot set from preicte col set. This is chieve by incrementing k U, which s the sttes of the subsequent prtition lyers for ech NFA U. This process termintes when the cpcity of AP is met for ech btch. C. How to Prtition? In this sub-section, we emonstrte how to prtition n NFA into two prts t given prtition lyer k clculte bse on the escription presente in Section IV-B n how to hnle stte trnsitions when the prtitioning is imperfect. For brevity, we escribe our prtitioning scheme for single NFA, which then cn be seprtely pplie to ech NFA in the ppliction. Figure 7 illustrtes NFA prtitioning using the prtition lyer k = n cut the eges tht connect sttes with k to sttes with k > (inicte s she lines in Figure 7 ( )). However, the preiction my not be perfect stte in the preicte col set coul en up being enble uring mtching. Since only sttes in the preicte hot set re present on the AP, the mtching process must trnsition out of the AP. b c Preicte hot set Preicte col set S c T e e S f Q S b c e e c c f P P P P4 4 P P P P4 S c T Trnsltion tble S S S S S S f Q Intermeite reporting stte Fig. 7: Prtitioning n NFA by the prtition lyer. To hnle such cses, for ech ege (u,v) we cut in the originl NFA, we introuce n intermeite reporting stte v n n ege (u,v ). The stte v mtches exctly the sme input symbols (symbol-set) s v but is lso reporting stte. During execution, the AP contins these intermeite reporting sttes long with the preicte hot set. Therefore, when the mtching process tries to enble stte tht is not on the AP (i.e., in the preicte col set), it ctivtes the corresponing intermeite reporting stte inste. Consequently, n intermeite report is generte tht notifies hnler (Section V). The hnler will enble corresponing sttes in preicte col set to continue the Percentge of sttes 00% 80% 60% 40% 0% 0% 0 Hot sttes Constrine sttes Col sttes HM Fig. 8: Constrine sttes re col sttes but configure on the AP ue to the constrints in our topologicl-orer-bse prtitioning scheme. Consequently, some AP resources re unerutilize with few pplictions. mtching process. Since we use topologicl orer to prtition, fter the mtching process continues, it will never go bck to the preicte hot set. In Figure 7 ( ), the intermeite reporting sttes re P through P 4. When ctivte, these sttes enble their corresponing sttes S, S n S s inicte in the trnsltion tble (Figure 7 ( )), which lie in the preicte col set shown in Figure 7( 4 ). D. Discussion The use of SCC n topologicl-orer-bse prtitioning imposes constrints tht le to more sttes thn necessry being e to the preicte hot set. Specificlly, () even if only one stte in n SCC is hot, the whole SCC must be inclue in preicte hot set, n () col stte with topologicl orer less thn the prtition lyer k is still inclue in the preicte hot set. This might reuce the AP resource svings. To stuy the extent of this unerutiliztion, Figure 8 shows tht for ll the 6 evlute pplictions, our topologicl-orer bse perfect prtitioning constrins only 4% on verge more sttes to the preicte hot set (which in relity re not going to be enble), compre with perfect prtitioning tht cn cut NFAs t rbitrry eges. Two exceptions re LV n whose lrge SCCs prevent effective prtitions. In summry, we still hve significnt opportunity for resource svings if we cn ccurtely ientify the prtition lyer for ech NFA. V. HARDWARE SUPPORT FOR INTMEDIATE REPORT HANDLING AND PARTITIONED NFA PROCESSING In this section, we iscuss how to efficiently hnle the intermeite reports generte from the execution of the preicte hot set. To this en, we propose to: ) enble the sttes tht intermeite reporting stte irects to, n b) continue the mtching process from the cycle (i.e., the input position) where the intermeite report ws generte t. Although both steps cn be performe on CPU, it incurs significnt performnce slowown (Section VII), therefore we propose new execution moe for the AP. A. Anlysis of New Moes for AP In orer to support the forementione steps, we propose n ugmente AP which supports two moes: BseAP moe, n SprseAP (SpAP) moe. The BseAP moe execution is similr to the bseline AP execution, however, AP in this moe is configure with only the preicte hot set. Once the TCP Rg EM Rg05 LV Bro7

7 Perfect Prtitioning Relistic Prtitioning Hot Sttes Preicte Hot Set Remining sttes (preicte col set) Hot Sttes input strem (BseAP moe) Preicte Hot Set b input strem Strting stte Hot stte Col stte (SpAP moe) c Jump Remining sttes Jump 5 4 b Intermeite reports Cycles sve vi Perfect prtitioning Cycles sve vi Relistic prtitioning input strem Fig. 9: Illustrtion of performnce benefits uner relistic prtitioning: becuse of the jump opertion, only portion of input symbols re execute in the SpAP moe execution. Time execution of BseAP moe finishes, the generte intermeite reports re hnle in the SpAP moe. In the SpAP moe, the AP is configure with the preicte col set. The AP in this moe not only consumes input symbols but is lso riven by the intermeite reports. In this context, we evelop two mjor opertions for the SpAP moe: enble n jump. The enble opertion llows ech intermeite report to enble the pproprite stte in the preicte col set. The jump opertion skips over the input symbols tht re not necessry for hnling the intermeite reports. Since no bck-ege exists from preicte col sttes to preicte hot sttes (iscusse in Section IV), no bck n forth switching between BseAP n SpAP moes is require. Ech intermeite report in the list of intermeite reports (L) is represente by tuple: input position n stte ID (c, si) enoting tht the intermeite report is generte t input position c (i.e., cycle c in the BseAP moe execution) n the stte to be enble is si. Algorithm shows the pseuo coe for the SpAP moe execution. In ech cycle, if no stte is enble (Line 4), it performs jump opertion setting the current input position i to the input position where next intermeite report ws generte. The enble opertion (Line 9 to Line ) is performe ue to either scenrio: current input position i reches the input position in next intermeite report or the current input position i ws just set to L[ j].c by the jump opertion. The remining functionlity of the SpAP moe is the sme s the BseAP moe. We escribe next how these opertions re use to hnle relistic prtitioning scenrios with the help of n illustrtive exmple. Illustrtive Exmple. Figure 6 erlier iscusse the performnce benefits of perfect prtitioning. Uner relistic prtitioning, inccurte preictions of col sttes require intermeite report hnling. Figure 9 shows n illustrtive exmple emonstrting the benefits of executing AP in BseAP n SpAP moes. The execution strts in the BseAP moe ( ) tht is configure with the preicte hot set. During its execution, two intermeite reports re generte t input position 5 n input position 4, respectively n re store (, b ). Once ll the input symbols re consume, the SpAP moe begins ( ), which is riven by both the input strem Algorithm Functionlity of SpAP moe Input: L, the list of intermeite reports. Ech element in L contins (c, si) showing the input position where the report ws generte, n the stte i to be enble. Input: input, the input symbol strem. Output: out list, the list of reports. : i 0 : j 0 i is the inex (input position) of input, j is the inex of L. : while i < input.length o 4: if E is /0 then E is the set of enble sttes. 5: if j < L.length then 6: i L[ j].c Jump opertion. 7: else 8: brek 9: while L[ j].c = i n j < L.length o 0: enble L[ j].si Enble opertion. : j j + : A {sttes in E tht ccept input[i]} : A is the set of ctivte sttes. 4: E /0 5: for ll s in A o 6: if s is reporting stte then 7: ppen (i, s.i) to out list 8: E E {successors of s}. 9: i i + n the intermeite reports. If no stte is enble, SpAP moe jumps to the input position where the next intermeite report ws generte. In this exmple, initilly, it jumps to the input position 5 of the first intermeite report irectly ( c ). During the execution, when there is no enble stte (t input position 8), the SpAP jumps to input position (4) of the next intermeite report ( ). Therefore, uner SpAP, only portion of the input symbols re execute (green she prt in ). B. Implementtion Detils We escribe the require hrwre implementtion supporting SpAP moe by implementing the jump n enble opertions on top of the current AP rchitecture. We strt by

8 the implementtion of the SpAP opertions. Then we estimte the execution time overhe of these opertions. Finlly, we emonstrte the storge requirements for the intermeite reports. Jump Opertion. The jump opertion moifies register tht trcks the current input position. Specificlly, if no STE is enble, the jump opertion uptes the register vlue with the input position from the next intermeite report. Since no stte configure to SpAP is lwys enble, the enble sttes in next cycle re only etermine by the ctivte sttes in the current cycle. Therefore, given tht the routing mtrix routes the enble signl from the ctivte sttes, we ssume tht the routing mtrix provies flg tht is set if no STE is enble. Enble Opertion. Given n intermeite report, we use the stte ID informtion to enble the corresponing STE. Since STEs re connecte to the routing mtrix, n the routing mtrix follows hierrchicl esign (block, rows, n STEs) [6], we utilize such hierrchy to perform the enble opertion. The routing mtrix consists of 96 blocks per hlf core. Ech block is group of 6 rows, n ech row is group of 6 STEs. Since stte ID is represente by 6 bits, we ivie these bits to enble the require STE in hierrchicl mnner. We use the first 8 bits to select the block, the mile 4 bits to select the row, n the lst 4 bits to select the require STE within the row. We use totl of three ecoers to select the require block, row, n STE, respectively. Specificlly, 7 8 ecoer is use to select the block. Then, 4 6 ecoer selects the row. Finlly, 4 6 ecoer enbles the require STE. The enble opertion works in prllel with the processing of input symbols uring SpAP moe. Enble Opertion Overhe. We cn overlp the enble opertion of only one intermeite report with the processing of the input symbols in SpAP moe. Thus, if multiple intermeite reports were generte in the sme input position uring BseAP moe, the input processing is stlle until ll the sttes in the simultneous intermeite reports re enble. In SpAP moe, to o tht, we compre the input position of the he intermeite report with the next input position (current input position + ). Similrly, we compre the input position of the secon intermeite report with the next input position. If both of these comprisons re set, we puse the processing of the input symbols. After enbling the sttes in ll simultneous intermeite reports, the input processing resumes. The cycles spent to enble the simultneous intermeite reports re consiere overhe to the overll SpAP moe execution n re ccounte for in our evlution methoology. Intermeite Reports Storge Overhe. The list of intermeite reports is store in the off-chip evice memory. Only portion of the reports is loe to the on-chip memory to be consume uring the SpAP moe. We use queue of 8 entries to store the loe intermeite reports. Becuse ech intermeite report is (input position, stte ID) tuple, we nee 6 bytes per intermeite report (4 bytes for the input position, n bytes for the stte ID). Thus, the overll storge require for the intermeite reports queue is 8 6 bytes. A. Applictions VI. EVALUATION METHODOLOGY We evlute our mechnisms with ll pplictions in the ANMLZoo benchmrk suite [7] n the Regex benchmrk suite [4]. Tble II shows tht these pplictions hve sttes rnging from pproximtely K to 00K, n severl of them hve sttes more thn 4K, which is the size of our bseline AP hlf-core. In orer to evlute pplictions with n even lrger number of sttes, we generte multiple pplictions bse on three sources: ClmAV [9], Hmming [40], n [4]. ClmAV4k (). We convert the regulr expressions in min.cv of the Q 08 ClmAV Virus Dtbse to ANML formt. We select the first 4,000 ptterns from the virus tbse. We use the sme input of ClmAV in ANMLZoo [7]. Hmming. We generte Hmming utomt using the sme pproch s the ANMLZoo benchmrk suite [40]. To keep it consistent with Hmming in ANMLZoo, we lso crete the utomt in the BMIA (Boune Mismtch Ientifiction Automton) formt. We crete three ifferent worklos from Hmming tht contin ifferent number of NFAs, nmely, n. For ech worklo we generte, we crete mix of ifferent expecte pttern lengths (8,, 0, 0), ech with istnce of to 0% of the pttern length (e.g., 0. 0 = 6). Similr to Hmming in ANMLZoo [7], we generte the inputs rnomly. L. Our ppliction inclues,6 rules from both community rules n registere rules of the network intrusion etector [4]. We convert the regulr expressions to ANML formt. We use the sme network trffic input s the ppliction in ANMLZoo. We consier totl of 6 pplictions n ivie them into three groups bse on the number of sttes they contin. The high resource requirement (high) group contins pplictions with sttes more thn the cpcity of n AP chip (49K). The meium resource requirement (meium) group contins pplictions with sttes more thn the cpcity of n AP hlfcore (4K). The rest of the pplictions re groupe into low resource requirement (low) group. B. Experimentl Setup We buil our mechnisms on top of the open-source virtul utomt simultor VASim [4]. As we mentione in Section V, we evlute both AP CPU n BseAP/SpAP execution. In the AP CPU execution, the sttes tht re execute in the SpAP moe re inste execute on the CPU. Tble III shows summry of the evlute scenrios. We moel ifferent timing mechnisms for AP CPU n BseAP/SpAP in the simultor s etile below. Timing AP CPU. We recor the totl mount of time tht the CPU spens to hnle the intermeite reports by using st::chrono in C++ librry. Therefore, we use the rel time when we clculte the speeup in the AP CPU execution. We run our experiments on mchine with Intel(R) Xeon(R) CPU E5-68 v. We use 7.5 ns s the cycle time per symbol [] for the BseAP execution.

9 TABLE II: List of evlute pplictions: RSttes stns for reporting sttes n MxTopo stns for mximum topologicl orer cross NFAs. Grp stns for resource requirement groups: High (H), Meium (M), Low (L). Appliction Abbr. Grp. #Sttes #NFAs MxTopo #RSttes ClmAV4000 [9] H Hmming500 [40] H Hmming000 [40] H big [4] L H Hmming500 [40] H [7] H Dotstr [7] H EntityResolution [7] H RnomForest [7] H [7] H ClmAV [7] H [7] M tomt [7] M [7] M PowerEN [7] M RnomForest [7] M TCP [4] TCP L Dotstr06 [4] 06 L Rnges05 [4] Rg05 L Rnges [4] Rg L ExctMth [4] EM L Dotstr09 [4] 09 L Dotstr0 [4] 0 L Hmming [7] HM L Levenshtein [7] LV L Bro7 [4] Bro7 L TABLE III: Summry of Scenrios System Softwre Hrwre of of preicte entire NFAs hot set AP Prtition (t NFA grnulrity) AP CPU Prtition (hot/col set) BseAP/SpAP Prtition (hot/col set) of preicte col set BseAP Moe N/A N/A N/A BseAP Moe CPU N/A BseAP Moe SpAP moe Recoring the Cycles in BseAP/SpAP. In the BseAP/SpAP execution, we recor the execution cycles vi the simultor. The number of cycles in BseAP/SpAP execution is the sum of cycles spent on BseAP moe n SpAP moe. Therefore, Speeup BseAP/SpAP = Number of cycles on AP bseline execution Number of cycles on BseAP Moe+Number of cycles on SpAP Moe. Performnce per STE. We efine metric clle performnce per STE to show how much throughput ech STE cn provie on verge. Specificlly, performnce per STE = throughput number of input symbols C AP, where throughput = number of cycles. This llows us to compre APs with ifferent cpcities while lso consiering techniques tht improve performnce solely by incresing the AP size. Becuse ech STE in the AP occupies ie re, we cn lso consier this metric s proxy for performnce/re. Overhes. In this pper, we focus on reucing the reexecution overhe s we foun it is the mjor performnce bottleneck in AP. The new SpAP moe incurs the stll cycles ue to simultneous intermeite reports (Section V-B). Our finl results inclue these stll cycles. There re two more generic overhes relte to output n reconfigurtion. In our evlutions, we o not inclue the output overhe [0] n rely on existing work [4] tht proposes both hrwre n softwre techniques to ress it. We lso o not inclue the reconfigurtion overhe (50 ms [44], [45] for reconfiguring full AP bor) in our results s we believe it cn be mortize over AP execution, especilly when it executes very lrge inputs. VII. EXPIMENTAL RESULTS Effect on Performnce. To show the benefits of our schemes, we evlute the speeup for pplictions in the high n meium groups. Our mechnisms o not chnge the throughput of AP for pplictions in the low ctegory since the sizes of pplictions re smller thn our bseline AP with 4K STEs. Figure 0() shows the performnce results of our proposl, from which we rw four mjor observtions. First, The AP CPU execution shows significnt geometric men slowown of 9.8 n.9 uner 0.% n % profiling input, respectively. However, five pplictions out of 6 pplictions (,,,, ) chieve 4. geometric men speeup t no cost of hrwre moifiction. Secon, we fin tht BseAP/SpAP execution shows speeup in the mjority of evlute pplictions. It cn chieve.8 n. geometric men speeup using 0.% n % of input s profiling input, respectively. Thir, BseAP/SpAP execution cn be slower thn the AP in few pplictions (e.g., ), since these pplictions generte mny simultneous intermeite reports, leing to lengthy enble stlls on the SpAP moe (shown in Tble IV). Fourth, in pplictions with lrge SCCs tht prevent efficient prtitioning (e.g.,, see Figure 8), our scheme configures ll the sttes to the BseAP moe execution with no chnge in execution time. Effect on Performnce per STE. In orer to evlute the efficiency of our schemes cross wier set of system sizes n configurtions, we show performnce per STE in Figure, from which we rw two mjor observtions. First, lthough ifferent sizes of AP chips cn execute the sme ppliction with the sme performnce (e.g. n ppliction in low group fits n runs on both n AP chip or n AP hlf-core), lrger AP chips hve less performnce/ste, becuse fewer STEs in the lrger AP re utilize for the sme ppliction size. Such unerutiliztion les to less performnce/ste. Secon, on verge, our scheme not only increses performnce/ste by.% uner the scenrio of AP hlf-core n using % profiling input, but consistently chieves better performnce/ste uner ifferent sizes of AP s well. There re two mjor resons: () we preict col sttes n eliminte them from being configure, which increses AP utiliztion; () we use fewer cycles in the SpAP moe for mis-preiction hnling thn re-execution by btches hence incresing the throughput. Resource Svings n Speeup. We show the results of resource svings in Figure 0(b). By compring it with Figure 0(), we mke three observtions. First, generlly, the pplictions with high resource svings lso hve goo speeups. Secon, shows slowown lthough it hs goo resource svings. This is becuse its SpAP moe execution hs lots of enble stlls ue to lrge mount of simultneous intermeite reports (Tble IV). Thir, lthough the resource svings my be the sme uner ifferent profiling inputs, the speeup my be ifferent (e.g., ). It is becuse the originl size of the preicte hot set ws ifferent, but ue to the optimiztion in Section IV-B, ech btch ws extene with prt of the preicte col sttes to mtch the cpcity of AP. Consequently, this les to the sme resource svings.

10 Speeup AP-CPU file0.% AP-CPU file% BseAP/SpAP file0.% BseAP/SpAP file% high meium () Speeup with AP CPU n BseAP/SpAP execution using 0.% n % profiling input (cpcity = 4K). GeoMen Resource Svings Fig. 0: Speeup n Resource Svings on AP. high file 0.% file % meium (b) Resource svings (i.e., the portion of sttes tht re not configure in the BseAP moe) Avg Performnce/STE (* 0000) AP (49k) BseAP/SpAP (49k) AP (4k) BseAP/SpAP (4k) AP (k) BseAP/SpAP (k) TCP 06 Rg05 Rg high meium low EM 09 0 HM Fig. : Performnce per STE of vrious AP sizes with BseAP/SpAP execution consiering % profiling input. TABLE IV: Runtime sttistics for AP n BseAP/SpAP (uner % profiling input): The first three columns show the number of executions on the AP, BseAP moe n SpAP moe, respectively. EStlls stns for the stlls cuse by enble opertions for hnling simultneous intermeite reports. JumpRtio is efine s the proportion of cycles skippe in the SpAP moe. #Bseline #BseAP/SpAP BseAP/SpAP Runtime Sttistics App AP BseAP SpAP #Intermeite Moe Moe Reports #EStlls JumpRtio % % L % % % % % % % % However, since lrger profiling input hs higher recll for hot sttes (Section IV-A), the speeup is lso higher. To conclue, the speeup is generlly relte to resource svings s we expline in Section III-C, but the speeup lso epens on other fctors such s the qulity of preiction n the number of enble stlls. Intermeite Reporting Sttes. The ition of intermeite reporting sttes increses the totl number of sttes which coul increse the totl number of configurtions n executions (e.g., in Tble IV). Figure shows the effect on the number of reporting sttes in BseAP moe normlize to tht of the bseline. In the BseAP/SpAP moe, the totl number of reporting sttes inclues both originl reporting sttes n intermeite reporting sttes (stcke brs in the figure). We mke two observtions. First, the totl number of reporting sttes in BseAP moe coul be more thn the #Reporting Sttes (normlize to Bseline) 4 0 high LV Bro7 Bseline P0.%_True P0.%_IM P%_True P%_IM meium Avg Fig. : Comprison of number of reporting sttes: IM stns for intermeite reporting sttes. True stns for originl reporting sttes on BseAP moe. P stns for profiling. bseline AP execution tht only contins originl reporting sttes. For exmple, the number of reporting sttes in increses by.6, becuse it hs lrge number of crossing eges between preicte hot set n preicte col set. Secon, the number of reporting sttes coul ecrese (e.g., n ) in the BseAP moe execution becuse the number of crossing eges is smller thn the number of originl reporting sttes. Although our scheme my increse the number of reporting sttes, we re wre tht n effective softwrebse reporting stte compression technique [4] coul be pplie on top of our scheme. Effect of Jump Opertions. In Tble IV, lthough for some pplictions (e.g.,, ) the number of executions of BseAP/SpAP my be greter thn or equl to the bseline, we still obtin speeups on them becuse SpAP moe execution cn reuce totl number of cycles ue to the jump opertions. To show the effect of jump opertions, we efine JumpRtio s the proportion of cycles skippe in the SpAP moe. Formlly, JumpRtio = Totl cycles on SpAp moe Number of btches on SpAP moe Length of input strem. Higher JumpRtio inictes better effect of jump opertions. We show JumpRtio in Tble IV for the pplictions tht use SpAP moe. To conclue, the mjority of the pplictions only execute few percent of input symbols with the help of jump opertions.

11 Speeup AP-CPU file0.% AP-CPU file% BseAP/SpAP file0.% BseAP/SpAP file% TCP 06 Rg05 Rg EM 09 high meium low () Speeup on smll AP (cpcity = K) HM LV Bro7 0 GeoMen Speeup Fig. : Sensitivity on the ifferent cpcities of AP chip. AP-CPU file0.% AP-CPU file% high BseAP/SpAP file0.% BseAP/SpAP file% GeoMen (b) Speeup on n AP chip (cpcity = 49K) Sensitivity of speeup on cpcity of AP. The pplictions in the low resource requirement group require fewer sttes thn the cpcity of AP hlf-core. Figure () shows the speeup chieve by our schemes when the cpcity of AP is K. Similr observtions still hol s iscusse in Figure 0(). Specificlly, BseAP/SpAP chieves.9 n. speeup using 0.% n % profiling input, respectively. In ition, we emonstrte nother sensitivity stuy on AP with 49K STEs for the pplictions in the high group. Figure (b) shows BseAP/SpAP execution chieves.9 n. speeup using 0.% n % profiling input in the pplictions of this group. VIII. RELATED WORK To the best of our knowlege, this is the first work tht esigns n efficient rchitecturl support for lrge-scle NFA pplictions on AP. Sptil Architectures. Multitsking on sptil rchitectures is usully crrie out through the use of multiple contexts [46], which cn consume extr memory. In contrst, our BseAP/SpAP proposl relies on the bility to eliminte ynmiclly unuse sttes from NFAs to improve AP utiliztion. We rely on mechnism to trnsfer control to sptilly istinct prtition to ccommote lrger thn evice NFAs, though these coul be implemente s multiple contexts. Recently, gte removl hs been propose to eliminte unuse logic gtes from generl purpose processor IPs to customize processors to specific pplictions [47]. In our pproch, we only eliminte sttes from the NFA (i.e., the progrm), n not the hrwre. There re lso lterntive implementtions of AP [48] [50]. For exmple, cche utomton [49] re-purposes the lst-level cche for utomt processing. We believe our techniques re complementry s we propose hrwre/softwre mechnisms to mke the utomt processing itself more efficient. DFA n NFA Accelertion. Deterministic finite utomt (DFA) hve been chrcterize previously with respect to implementing specil mchines [5] n for prlleliztion [7], [5] [55]. Prllel execution of NFAs on the AP processor hs been propose by tring AP resources for higher throughput []. However, our chrcteriztion of the ynmic execution properties of NFAs specific to the AP execution moel is, to our knowlege, the first of its kin. Our elimintion of ynmiclly unuse sttes cn free up AP resources to complement prllel execution. FSM Decomposition. FSM ecompositions [56] [59] coul reuce the complexity of plcement n routing in the routing mtrix by simplifying the lyout. While csce ecompositions re the closest to our stuies, they re often sttic, for eterministic mchines only, n re mostly not bse on ynmic stte behvior (i.e., preicte hot vs. preicte col sttes). In contrst, our propose pproch (which uses grphtheoretic techniques, rther thn sequentil mchine theory) is focuse on incresing the AP throughput by llowing only preicte hot sttes to be configure to the AP. We believe both pproches re complementry n cn be pplie to ifferent bottlenecks in the AP execution pipeline. For exmple, FSM ecomposition cn mke the reconfigurtion process efficient while our technique cn ccelerte the NFA execution on AP by reucing the number of re-executions of the input symbol strem. IX. CONCLUSIONS Automt processors (AP) re very efficient in executing Non-eterministic Finite Automt (NFAs). However, like other types of sptil rchitectures, AP fces mjor chllenges in its execution moel to efficiently execute very lrge tsks. In this pper, we mke use of the inherent properties of NFAs to voi using compute resources for sttes tht re never use uring execution by low-cost softwre/hrwre-coorinte pproch. Consequently, this results in new execution moel for APs tht enbles efficient n high-performnce processing for lrge-scle tsks. We believe this work will be helpful towrs wier option of APs n will open up new reserch irections for enbling efficient NFA processing. ACKNOWLEDGMENT The uthors thnk the nonymous reviewers n members of the Insight Computer Architecture Lb t the College of Willim n Mry for their feebck. This mteril is bse upon work supporte by the Ntionl Science Fountion (NSF) grnts (#6576, #775, n #750667). This work ws performe in prt using computing fcilities t the College of Willim n Mry which were provie by contributions from the NSF, the Commonwelth of Virgini Equipment Trust Fun n the Office of Nvl Reserch. AMD, the AMD Arrow logo, n combintions thereof re tremrks of Avnce Micro Devices, Inc. Other prouct nmes use in this publiction re for ientifiction purposes only n my be tremrks of their respective compnies.

Algorithms for Memory Hierarchies Lecture 14

Algorithms for Memory Hierarchies Lecture 14 Algorithms for emory Hierrchies Lecture 4 Lecturer: Nodri Sitchinv Scribe: ichel Hmnn Prllelism nd Cche Obliviousness The combintion of prllelism nd cche obliviousness is n ongoing topic of reserch, in

More information

CHAPTER 2 LITERATURE STUDY

CHAPTER 2 LITERATURE STUDY CHAPTER LITERATURE STUDY. Introduction Multipliction involves two bsic opertions: the genertion of the prtil products nd their ccumultion. Therefore, there re two possible wys to speed up the multipliction:

More information

Math Circles Finite Automata Question Sheet 3 (Solutions)

Math Circles Finite Automata Question Sheet 3 (Solutions) Mth Circles Finite Automt Question Sheet 3 (Solutions) Nickols Rollick nrollick@uwterloo.c Novemer 2, 28 Note: These solutions my give you the nswers to ll the prolems, ut they usully won t tell you how

More information

The Discussion of this exercise covers the following points:

The Discussion of this exercise covers the following points: Exercise 4 Bttery Chrging Methods EXERCISE OBJECTIVE When you hve completed this exercise, you will be fmilir with the different chrging methods nd chrge-control techniques commonly used when chrging Ni-MI

More information

MAXIMUM FLOWS IN FUZZY NETWORKS WITH FUNNEL-SHAPED NODES

MAXIMUM FLOWS IN FUZZY NETWORKS WITH FUNNEL-SHAPED NODES MAXIMUM FLOWS IN FUZZY NETWORKS WITH FUNNEL-SHAPED NODES Romn V. Tyshchuk Informtion Systems Deprtment, AMI corportion, Donetsk, Ukrine E-mil: rt_science@hotmil.com 1 INTRODUCTION During the considertion

More information

Design of a Pipelined DSP Microprocessor MUN DSP2000

Design of a Pipelined DSP Microprocessor MUN DSP2000 Design of Pipeline DSP icroprocessor N DSP2000 Cheng Li, Lu io, Qiyo Yu, P.Gillr n R.Venktesn Fculty of Engineering n Applie Science emoril niversity of Newfounln St. John s, NF, Cn A1B 3 E-mil: {licheng,

More information

Inclined Plane Walking Compensation for a Humanoid Robot

Inclined Plane Walking Compensation for a Humanoid Robot Incline Plne Wlking Compenstion for Humnoi Robot Nttpong Kewlek n Thvi Mneewrn Institute of Fiel Robotics, King Mongkut's University of Technology Thonburi, Bngkok, Thiln (Tel : +662-4709339; E-mil: k.nttpong@hotmil.co.th,

More information

CS 135: Computer Architecture I. Boolean Algebra. Basic Logic Gates

CS 135: Computer Architecture I. Boolean Algebra. Basic Logic Gates Bsic Logic Gtes : Computer Architecture I Boolen Algebr Instructor: Prof. Bhgi Nrhri Dept. of Computer Science Course URL: www.ses.gwu.edu/~bhgiweb/cs35/ Digitl Logic Circuits We sw how we cn build the

More information

Understanding Basic Analog Ideal Op Amps

Understanding Basic Analog Ideal Op Amps Appliction Report SLAA068A - April 2000 Understnding Bsic Anlog Idel Op Amps Ron Mncini Mixed Signl Products ABSTRACT This ppliction report develops the equtions for the idel opertionl mplifier (op mp).

More information

Exercise 1-1. The Sine Wave EXERCISE OBJECTIVE DISCUSSION OUTLINE. Relationship between a rotating phasor and a sine wave DISCUSSION

Exercise 1-1. The Sine Wave EXERCISE OBJECTIVE DISCUSSION OUTLINE. Relationship between a rotating phasor and a sine wave DISCUSSION Exercise 1-1 The Sine Wve EXERCISE OBJECTIVE When you hve completed this exercise, you will be fmilir with the notion of sine wve nd how it cn be expressed s phsor rotting round the center of circle. You

More information

Safe Inter-domain Routing under Diverse Commercial Agreements

Safe Inter-domain Routing under Diverse Commercial Agreements University of Pennsylvni ScholrlyCommons Deprtmentl Ppers (ESE) Deprtment of Electricl & Systems Engineering 5-4-2010 Sfe Inter-omin Routing uner Diverse Commercil Agreements Yong Lio University of Msschusetts

More information

Birka B22: threaded in variation

Birka B22: threaded in variation Tblet Weving: 4-Hole Ptterns Stringcrfter The chrt, fining your wy roun the pttern, n suggestions for viking style bris for rnks in the Drchenwl Acemy of Defence You will nee: 22 crs 1 repet 88 Thres:

More information

ABB STOTZ-KONTAKT. ABB i-bus EIB Current Module SM/S Intelligent Installation Systems. User Manual SM/S In = 16 A AC Un = 230 V AC

ABB STOTZ-KONTAKT. ABB i-bus EIB Current Module SM/S Intelligent Installation Systems. User Manual SM/S In = 16 A AC Un = 230 V AC User Mnul ntelligent nstlltion Systems A B 1 2 3 4 5 6 7 8 30 ma 30 ma n = AC Un = 230 V AC 30 ma 9 10 11 12 C ABB STOTZ-KONTAKT Appliction Softwre Current Vlue Threshold/1 Contents Pge 1 Device Chrcteristics...

More information

Figure 2.14: Illustration of spatial frequency in image data. a) original image, f(x,y), b) plot of f(x) for the transect across image at the arrow.

Figure 2.14: Illustration of spatial frequency in image data. a) original image, f(x,y), b) plot of f(x) for the transect across image at the arrow. CEE 615: DIGITL IMGE PROCESSING Topic 2: The Digitl Imge 2-1 Fourier Trnsform full escription of the istribution of sptil frequencies in n imge is given by the twoimensionl Fourier trnsform of the imge.

More information

Engineer-to-Engineer Note

Engineer-to-Engineer Note Engineer-to-Engineer Note EE-297 Technicl notes on using Anlog Devices DSPs, processors nd development tools Visit our Web resources http://www.nlog.com/ee-notes nd http://www.nlog.com/processors or e-mil

More information

A Development of Earthing-Resistance-Estimation Instrument

A Development of Earthing-Resistance-Estimation Instrument A Development of Erthing-Resistnce-Estimtion Instrument HITOSHI KIJIMA Abstrct: - Whenever erth construction work is done, the implnted number nd depth of electrodes hve to be estimted in order to obtin

More information

ECE 274 Digital Logic. Digital Design. RTL Design RTL Design Method. RTL Design Memory Components

ECE 274 Digital Logic. Digital Design. RTL Design RTL Design Method. RTL Design Memory Components ECE 27 Digitl Logic Memories n Hierrchy Digitl Design 5.6 5. Digitl Design Chpter 5: Slies to ccompny the textbook Digitl Design, First Eition, by Frnk Vhi, John Wiley n Sons Publishers, 27. http://www.vhi.com

More information

Testing Delay Faults in Asynchronous Handshake Circuits

Testing Delay Faults in Asynchronous Handshake Circuits Testing Dely Fults in Asynchronous Hnshke Circuits Feng Shi Electricl Engineering Dept. Yle University New Hven, Connecticut feng.shi@yle.eu Yiorgos Mkris Electricl Engineering Dept. Yle Univerisity New

More information

First Round Solutions Grades 4, 5, and 6

First Round Solutions Grades 4, 5, and 6 First Round Solutions Grdes 4, 5, nd 1) There re four bsic rectngles not mde up of smller ones There re three more rectngles mde up of two smller ones ech, two rectngles mde up of three smller ones ech,

More information

Eliminating Non-Determinism During Test of High-Speed Source Synchronous Differential Buses

Eliminating Non-Determinism During Test of High-Speed Source Synchronous Differential Buses Eliminting Non-Determinism During of High-Speed Source Synchronous Differentil Buses Abstrct The t-speed functionl testing of deep sub-micron devices equipped with high-speed I/O ports nd the synchronous

More information

ALTERNATIVE WAYS TO ENHANCE PERFORMANCE OF BTB HVDC SYSTEMS DURING POWER DISTURBANCES. Pretty Mary Tom 1, Anu Punnen 2.

ALTERNATIVE WAYS TO ENHANCE PERFORMANCE OF BTB HVDC SYSTEMS DURING POWER DISTURBANCES. Pretty Mary Tom 1, Anu Punnen 2. ALTERNATIVE WAYS TO ENHANCE PERFORMANCE OF BTB HVDC SYSTEMS DURING POWER DISTURBANCES Pretty Mry Tom, Anu Punnen Dept.of Electricl n Electronics Engg. Sint Gits College of Engineering,Pthmuttm,Kerl,Ini

More information

Multi-beam antennas in a broadband wireless access system

Multi-beam antennas in a broadband wireless access system Multi-em ntenns in rodnd wireless ccess system Ulrik Engström, Mrtin Johnsson, nders Derneryd nd jörn Johnnisson ntenn Reserch Center Ericsson Reserch Ericsson SE-4 84 Mölndl Sweden E-mil: ulrik.engstrom@ericsson.com,

More information

Synchronous Generator Line Synchronization

Synchronous Generator Line Synchronization Synchronous Genertor Line Synchroniztion 1 Synchronous Genertor Line Synchroniztion Introduction One issue in power genertion is synchronous genertor strting. Typiclly, synchronous genertor is connected

More information

METHOD OF LOCATION USING SIGNALS OF UNKNOWN ORIGIN. Inventor: Brian L. Baskin

METHOD OF LOCATION USING SIGNALS OF UNKNOWN ORIGIN. Inventor: Brian L. Baskin METHOD OF LOCATION USING SIGNALS OF UNKNOWN ORIGIN Inventor: Brin L. Bskin 1 ABSTRACT The present invention encompsses method of loction comprising: using plurlity of signl trnsceivers to receive one or

More information

Quantitative Studies of Impact of 3D IC Design on Repeater Usage

Quantitative Studies of Impact of 3D IC Design on Repeater Usage Quntittive Stuies of Impct of 3D IC Design on Repeter Usge Json Cong, Chunyue Liu, Guojie Luo Computer Science Deprtment, UCLA {cong, liucy, gluo}@cs.ucl.eu Abstrct: In this pper, we present our quntittive

More information

Experiment 3: Non-Ideal Operational Amplifiers

Experiment 3: Non-Ideal Operational Amplifiers Experiment 3: Non-Idel Opertionl Amplifiers Fll 2009 Equivlent Circuits The bsic ssumptions for n idel opertionl mplifier re n infinite differentil gin ( d ), n infinite input resistnce (R i ), zero output

More information

Module 9. DC Machines. Version 2 EE IIT, Kharagpur

Module 9. DC Machines. Version 2 EE IIT, Kharagpur Module 9 DC Mchines Version EE IIT, Khrgpur esson 40 osses, Efficiency nd Testing of D.C. Mchines Version EE IIT, Khrgpur Contents 40 osses, efficiency nd testing of D.C. mchines (esson-40) 4 40.1 Gols

More information

EE Controls Lab #2: Implementing State-Transition Logic on a PLC

EE Controls Lab #2: Implementing State-Transition Logic on a PLC Objective: EE 44 - Controls Lb #2: Implementing Stte-rnsition Logic on PLC ssuming tht speed is not of essence, PLC's cn be used to implement stte trnsition logic. he dvntge of using PLC over using hrdwre

More information

Solutions to exercise 1 in ETS052 Computer Communication

Solutions to exercise 1 in ETS052 Computer Communication Solutions to exercise in TS52 Computer Communiction 23 Septemer, 23 If it occupies millisecond = 3 seconds, then second is occupied y 3 = 3 its = kps. kps If it occupies 2 microseconds = 2 6 seconds, then

More information

April 9, 2000 DIS chapter 10 CHAPTER 3 : INTEGRATED PROCESSOR-LEVEL ARCHITECTURES FOR REAL-TIME DIGITAL SIGNAL PROCESSING

April 9, 2000 DIS chapter 10 CHAPTER 3 : INTEGRATED PROCESSOR-LEVEL ARCHITECTURES FOR REAL-TIME DIGITAL SIGNAL PROCESSING April 9, 2000 DIS chpter 0 CHAPTE 3 : INTEGATED POCESSO-LEVEL ACHITECTUES FO EAL-TIME DIGITAL SIGNAL POCESSING April 9, 2000 DIS chpter 3.. INTODUCTION The purpose of this chpter is twofold. Firstly, bsic

More information

SMALL SIGNAL MODELING OF DC-DC POWER CONVERTERS BASED ON SEPARATION OF VARIABLES

SMALL SIGNAL MODELING OF DC-DC POWER CONVERTERS BASED ON SEPARATION OF VARIABLES SMA SGNA MOENG OF CC POWER CONERTERS BASE ON SEPARATON OF ARABES BY NG POH KEONG (B.S.E.E, University of Kentucky, USA) EPARTMENT OF EECTRCA AN COMPUTER ENGNEERNG A THESS SUBMTTE FOR THE EGREE OF MASTER

More information

Dataflow Language Model. DataFlow Models. Applications of Dataflow. Dataflow Languages. Kahn process networks. A Kahn Process (1)

Dataflow Language Model. DataFlow Models. Applications of Dataflow. Dataflow Languages. Kahn process networks. A Kahn Process (1) The slides contin revisited mterils from: Peter Mrwedel, TU Dortmund Lothr Thiele, ETH Zurich Frnk Vhid, University of liforni, Riverside Dtflow Lnguge Model Drsticlly different wy of looking t computtion:

More information

Experiment 3: Non-Ideal Operational Amplifiers

Experiment 3: Non-Ideal Operational Amplifiers Experiment 3: Non-Idel Opertionl Amplifiers 9/11/06 Equivlent Circuits The bsic ssumptions for n idel opertionl mplifier re n infinite differentil gin ( d ), n infinite input resistnce (R i ), zero output

More information

Application Note. Differential Amplifier

Application Note. Differential Amplifier Appliction Note AN367 Differentil Amplifier Author: Dve n Ess Associted Project: Yes Associted Prt Fmily: CY8C9x66, CY8C7x43, CY8C4x3A PSoC Designer ersion: 4. SP3 Abstrct For mny sensing pplictions, desirble

More information

Redundancy Data Elimination Scheme Based on Stitching Technique in Image Senor Networks

Redundancy Data Elimination Scheme Based on Stitching Technique in Image Senor Networks Sensors & Trnsducers 204 by IFSA Publishing, S. L. http://www.sensorsportl.com Redundncy Dt Elimintion Scheme Bsed on Stitching Technique in Imge Senor Networks hunling Tng hongqing Technology nd Business

More information

Topic 20: Huffman Coding

Topic 20: Huffman Coding Topic 0: Huffmn Coding The uthor should gze t Noh, nd... lern, s they did in the Ark, to crowd gret del of mtter into very smll compss. Sydney Smith, dinburgh Review Agend ncoding Compression Huffmn Coding

More information

Regular languages can be expressed as regular expressions.

Regular languages can be expressed as regular expressions. Regulr lnguges cn e expressed s regulr expressions. A generl nondeterministic finite utomton (GNFA) is kind of NFA such tht: There is unique strt stte nd is unique ccept stte. Every pir of nodes re connected

More information

Energy Harvesting Two-Way Channels With Decoding and Processing Costs

Energy Harvesting Two-Way Channels With Decoding and Processing Costs IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, VOL., NO., MARCH 07 3 Energy Hrvesting Two-Wy Chnnels With Decoding nd Processing Costs Ahmed Arf, Student Member, IEEE, Abdulrhmn Bknin, Student

More information

Sequential Logic (2) Synchronous vs Asynchronous Sequential Circuit. Clock Signal. Synchronous Sequential Circuits. FSM Overview 9/10/12

Sequential Logic (2) Synchronous vs Asynchronous Sequential Circuit. Clock Signal. Synchronous Sequential Circuits. FSM Overview 9/10/12 9//2 Sequentil (2) ENGG5 st Semester, 22 Dr. Hden So Deprtment of Electricl nd Electronic Engineering http://www.eee.hku.hk/~engg5 Snchronous vs Asnchronous Sequentil Circuit This Course snchronous Sequentil

More information

Example. Check that the Jacobian of the transformation to spherical coordinates is

Example. Check that the Jacobian of the transformation to spherical coordinates is lss, given on Feb 3, 2, for Mth 3, Winter 2 Recll tht the fctor which ppers in chnge of vrible formul when integrting is the Jcobin, which is the determinnt of mtrix of first order prtil derivtives. Exmple.

More information

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center Resource Overview Quntile Mesure: Skill or Concept: 300Q Model the concept of ddition for sums to 10. (QT N 36) Model the concept of sutrction using numers less thn or equl to 10. (QT N 37) Write ddition

More information

On the Description of Communications Between Software Components with UML

On the Description of Communications Between Software Components with UML On the Description of Communictions Between Softwre Components with UML Zhiwei An Dennis Peters Fculty of Engineering nd Applied Science Memoril University of Newfoundlnd St. John s NL A1B 3X5 zhiwei@engr.mun.c

More information

Study on SLT calibration method of 2-port waveguide DUT

Study on SLT calibration method of 2-port waveguide DUT Interntionl Conference on Advnced Electronic cience nd Technology (AET 206) tudy on LT clibrtion method of 2-port wveguide DUT Wenqing Luo, Anyong Hu, Ki Liu nd Xi Chen chool of Electronics nd Informtion

More information

Use of compiler optimization of software bypassing as a method to improve energy efficiency of exposed data path architectures

Use of compiler optimization of software bypassing as a method to improve energy efficiency of exposed data path architectures Guzm et l. EURASIP Journl on Emedded Systems 213, 213:9 RESEARCH Open Access Use of compiler optimiztion of softwre ypssing s method to improve energy efficiency of exposed dt pth rchitectures Vldimír

More information

Geometric quantities for polar curves

Geometric quantities for polar curves Roerto s Notes on Integrl Clculus Chpter 5: Bsic pplictions of integrtion Section 10 Geometric quntities for polr curves Wht you need to know lredy: How to use integrls to compute res nd lengths of regions

More information

Synchronous Machine Parameter Measurement

Synchronous Machine Parameter Measurement Synchronous Mchine Prmeter Mesurement 1 Synchronous Mchine Prmeter Mesurement Introduction Wound field synchronous mchines re mostly used for power genertion but lso re well suited for motor pplictions

More information

CASCADED MODEL ANALYSIS OF PIXELATED SCINTILLATOR IMAGING DETECTORS

CASCADED MODEL ANALYSIS OF PIXELATED SCINTILLATOR IMAGING DETECTORS Biomeicl echtronics Lb SCIN 7 CSCE OEL NLYSIS OF PIXELE SCINILLOR IGING EECORS Ho Kyung Kim, Seung n Yun n Chng Hwy Lim June, 7, 7 School of echnicl Engineering Pusn Ntionl University Republic of Kore

More information

Adaptive Geometric Features Based Filtering Impulse Noise in Colour Images

Adaptive Geometric Features Based Filtering Impulse Noise in Colour Images Aptive Geometric Fetures Bse Filtering Impulse Noise in Colour Imges Zhengy Xu #1, Bin Qiu *, Hong Ren Wu #3 Xinghuo Yu #4 # School of Electricl n Computer Engineering, Pltform Technologies Reserch Institute,

More information

EET 438a Automatic Control Systems Technology Laboratory 5 Control of a Separately Excited DC Machine

EET 438a Automatic Control Systems Technology Laboratory 5 Control of a Separately Excited DC Machine EE 438 Automtic Control Systems echnology bortory 5 Control of Seprtely Excited DC Mchine Objective: Apply proportionl controller to n electromechnicl system nd observe the effects tht feedbck control

More information

Interference Cancellation Method without Feedback Amount for Three Users Interference Channel

Interference Cancellation Method without Feedback Amount for Three Users Interference Channel Open Access Librry Journl 07, Volume, e57 ISSN Online: -97 ISSN Print: -9705 Interference Cncelltion Method without Feedbc Amount for Three Users Interference Chnnel Xini Tin, otin Zhng, Wenie Ji School

More information

ECE 274 Digital Logic. Digital Design. Datapath Components Shifters, Comparators, Counters, Multipliers Digital Design

ECE 274 Digital Logic. Digital Design. Datapath Components Shifters, Comparators, Counters, Multipliers Digital Design ECE 27 Digitl Logic Shifters, Comprtors, Counters, Multipliers Digitl Design..7 Digitl Design Chpter : Slides to ccompny the textbook Digitl Design, First Edition, by Frnk Vhid, John Wiley nd Sons Publishers,

More information

Y9.ET1.3 Implementation of Secure Energy Management against Cyber/physical Attacks for FREEDM System

Y9.ET1.3 Implementation of Secure Energy Management against Cyber/physical Attacks for FREEDM System Y9.ET1.3 Implementtion of Secure Energy ngement ginst Cyber/physicl Attcks for FREED System Project Leder: Fculty: Students: Dr. Bruce cillin Dr. o-yuen Chow Jie Dun 1. Project Gols Develop resilient cyber-physicl

More information

CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN fall 2008

CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN fall 2008 CS224 DIGITAL LOGIC & STATE MACHINE DESIGN fll 28 STAND ALONE XILINX PROJECT 2-TO- MULTIPLEXER. Gols : Lern how to develop stnd lone 2-to- multiplexer () Xilinx project during which the following re introduced

More information

University of North Carolina-Charlotte Department of Electrical and Computer Engineering ECGR 4143/5195 Electrical Machinery Fall 2009

University of North Carolina-Charlotte Department of Electrical and Computer Engineering ECGR 4143/5195 Electrical Machinery Fall 2009 Problem 1: Using DC Mchine University o North Crolin-Chrlotte Deprtment o Electricl nd Computer Engineering ECGR 4143/5195 Electricl Mchinery Fll 2009 Problem Set 4 Due: Thursdy October 8 Suggested Reding:

More information

Network Sharing and its Energy Benefits: a Study of European Mobile Network Operators

Network Sharing and its Energy Benefits: a Study of European Mobile Network Operators Network Shring nd its Energy Benefits: Study of Europen Mobile Network Opertors Mrco Ajmone Mrsn Electronics nd Telecommunictions Dept Politecnico di Torino, nd Institute IMDEA Networks, mrco.jmone@polito.it

More information

Information-Coupled Turbo Codes for LTE Systems

Information-Coupled Turbo Codes for LTE Systems Informtion-Coupled Turbo Codes for LTE Systems Lei Yng, Yixun Xie, Xiowei Wu, Jinhong Yun, Xingqing Cheng nd Lei Wn rxiv:709.06774v [cs.it] 20 Sep 207 Abstrct We propose new clss of informtion-coupled

More information

Chapter 2 Literature Review

Chapter 2 Literature Review Chpter 2 Literture Review 2.1 ADDER TOPOLOGIES Mny different dder rchitectures hve een proposed for inry ddition since 1950 s to improve vrious spects of speed, re nd power. Ripple Crry Adder hve the simplest

More information

Implementation of Different Architectures of Forward 4x4 Integer DCT For H.264/AVC Encoder

Implementation of Different Architectures of Forward 4x4 Integer DCT For H.264/AVC Encoder Implementtion of Different Architectures of Forwrd 4x4 Integer DCT For H.64/AVC Encoder Bunji Antoinette Ringnyu, Ali Tngel, Emre Krulut 3 Koceli University, Institute of Science nd Technology, Koceli,

More information

Synchronous Machine Parameter Measurement

Synchronous Machine Parameter Measurement Synchronous Mchine Prmeter Mesurement 1 Synchronous Mchine Prmeter Mesurement Introduction Wound field synchronous mchines re mostly used for power genertion but lso re well suited for motor pplictions

More information

Three-Phase Synchronous Machines The synchronous machine can be used to operate as: 1. Synchronous motors 2. Synchronous generators (Alternator)

Three-Phase Synchronous Machines The synchronous machine can be used to operate as: 1. Synchronous motors 2. Synchronous generators (Alternator) Three-Phse Synchronous Mchines The synchronous mchine cn be used to operte s: 1. Synchronous motors 2. Synchronous genertors (Alterntor) Synchronous genertor is lso referred to s lterntor since it genertes

More information

Lecture 20. Intro to line integrals. Dan Nichols MATH 233, Spring 2018 University of Massachusetts.

Lecture 20. Intro to line integrals. Dan Nichols MATH 233, Spring 2018 University of Massachusetts. Lecture 2 Intro to line integrls Dn Nichols nichols@mth.umss.edu MATH 233, Spring 218 University of Msschusetts April 12, 218 (2) onservtive vector fields We wnt to determine if F P (x, y), Q(x, y) is

More information

Kirchhoff s Rules. Kirchhoff s Laws. Kirchhoff s Rules. Kirchhoff s Laws. Practice. Understanding SPH4UW. Kirchhoff s Voltage Rule (KVR):

Kirchhoff s Rules. Kirchhoff s Laws. Kirchhoff s Rules. Kirchhoff s Laws. Practice. Understanding SPH4UW. Kirchhoff s Voltage Rule (KVR): SPH4UW Kirchhoff s ules Kirchhoff s oltge ule (K): Sum of voltge drops round loop is zero. Kirchhoff s Lws Kirchhoff s Current ule (KC): Current going in equls current coming out. Kirchhoff s ules etween

More information

A Novel Back EMF Zero Crossing Detection of Brushless DC Motor Based on PWM

A Novel Back EMF Zero Crossing Detection of Brushless DC Motor Based on PWM A ovel Bck EMF Zero Crossing Detection of Brushless DC Motor Bsed on PWM Zhu Bo-peng Wei Hi-feng School of Electricl nd Informtion, Jingsu niversity of Science nd Technology, Zhenjing 1003 Chin) Abstrct:

More information

Area-Time Efficient Digit-Serial-Serial Two s Complement Multiplier

Area-Time Efficient Digit-Serial-Serial Two s Complement Multiplier Are-Time Efficient Digit-Seril-Seril Two s Complement Multiplier Essm Elsyed nd Htem M. El-Boghddi Computer Engineering Deprtment, Ciro University, Egypt Astrct - Multipliction is n importnt primitive

More information

CSI-SF: Estimating Wireless Channel State Using CSI Sampling & Fusion

CSI-SF: Estimating Wireless Channel State Using CSI Sampling & Fusion CSI-SF: Estimting Wireless Chnnel Stte Using CSI Smpling & Fusion Riccrdo Crepldi, Jeongkeun Lee, Rul Etkin, Sung-Ju Lee, Robin Krvets University of Illinois t Urbn-Chmpign Hewlett-Pckrd Lbortories Emil:{rcrepl,rhk}@illinoisedu,

More information

B inary classification refers to the categorization of data

B inary classification refers to the categorization of data ROBUST MODULAR ARTMAP FOR MULTI-CLASS SHAPE RECOGNITION Chue Poh Tn, Chen Chnge Loy, Weng Kin Li, Chee Peng Lim Abstrct This pper presents Fuzzy ARTMAP (FAM) bsed modulr rchitecture for multi-clss pttern

More information

10.4 AREAS AND LENGTHS IN POLAR COORDINATES

10.4 AREAS AND LENGTHS IN POLAR COORDINATES 65 CHAPTER PARAMETRIC EQUATINS AND PLAR CRDINATES.4 AREAS AND LENGTHS IN PLAR CRDINATES In this section we develop the formul for the re of region whose oundry is given y polr eqution. We need to use the

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad Hll Ticket No Question Pper Code: AEC009 INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigl, Hyderd - 500 043 MODEL QUESTION PAPER Four Yer B.Tech V Semester End Exmintions, Novemer - 2018 Regultions:

More information

Simulation of Transformer Based Z-Source Inverter to Obtain High Voltage Boost Ability

Simulation of Transformer Based Z-Source Inverter to Obtain High Voltage Boost Ability Interntionl Journl of cience, Engineering nd Technology Reserch (IJETR), olume 4, Issue 1, October 15 imultion of Trnsformer Bsed Z-ource Inverter to Obtin High oltge Boost Ability A.hnmugpriy 1, M.Ishwry

More information

Lecture 16: Four Quadrant operation of DC Drive (or) TYPE E Four Quadrant chopper Fed Drive: Operation

Lecture 16: Four Quadrant operation of DC Drive (or) TYPE E Four Quadrant chopper Fed Drive: Operation Lecture 16: Four Qudrnt opertion of DC Drive (or) TYPE E Four Qudrnt chopper Fed Drive: Opertion The rmture current I is either positive or negtive (flow in to or wy from rmture) the rmture voltge is lso

More information

This is a repository copy of Effect of power state on absorption cross section of personal computer components.

This is a repository copy of Effect of power state on absorption cross section of personal computer components. This is repository copy of Effect of power stte on bsorption cross section of personl computer components. White Rose Reserch Online URL for this pper: http://eprints.whiterose.c.uk/10547/ Version: Accepted

More information

High-speed Simulation of the GPRS Link Layer

High-speed Simulation of the GPRS Link Layer 989 High-speed Simultion of the GPRS Link Lyer J Gozlvez nd J Dunlop Deprtment of Electronic nd Electricl Engineering, University of Strthclyde 204 George St, Glsgow G-lXW, Scotlnd Tel: +44 4 548 206,

More information

Spiral Tilings with C-curves

Spiral Tilings with C-curves Spirl Tilings with -curves Using ombintorics to Augment Trdition hris K. Plmer 19 North Albny Avenue hicgo, Illinois, 0 chris@shdowfolds.com www.shdowfolds.com Abstrct Spirl tilings used by rtisns through

More information

A New Stochastic Inner Product Core Design for Digital FIR Filters

A New Stochastic Inner Product Core Design for Digital FIR Filters MATEC Web of Conferences, (7) DOI:./ mtecconf/7 CSCC 7 A New Stochstic Inner Product Core Design for Digitl FIR Filters Ming Ming Wong,, M. L. Dennis Wong, Cishen Zhng, nd Ismt Hijzin Fculty of Engineering,

More information

Fuzzy Logic Controller for Three Phase PWM AC-DC Converter

Fuzzy Logic Controller for Three Phase PWM AC-DC Converter Journl of Electrotechnology, Electricl Engineering nd Mngement (2017) Vol. 1, Number 1 Clusius Scientific Press, Cnd Fuzzy Logic Controller for Three Phse PWM AC-DC Converter Min Muhmmd Kml1,, Husn Ali2,b

More information

(CATALYST GROUP) B"sic Electric"l Engineering

(CATALYST GROUP) Bsic Electricl Engineering (CATALYST GROUP) B"sic Electric"l Engineering 1. Kirchhoff s current l"w st"tes th"t (") net current flow "t the junction is positive (b) Hebr"ic sum of the currents meeting "t the junction is zero (c)

More information

Design and Modeling of Substrate Integrated Waveguide based Antenna to Study the Effect of Different Dielectric Materials

Design and Modeling of Substrate Integrated Waveguide based Antenna to Study the Effect of Different Dielectric Materials Design nd Modeling of Substrte Integrted Wveguide bsed Antenn to Study the Effect of Different Dielectric Mterils Jgmeet Kour 1, Gurpdm Singh 1, Sndeep Ary 2 1Deprtment of Electronics nd Communiction Engineering,

More information

Postprint. This is the accepted version of a paper presented at IEEE PES General Meeting.

Postprint.   This is the accepted version of a paper presented at IEEE PES General Meeting. http://www.div-portl.org Postprint This is the ccepted version of pper presented t IEEE PES Generl Meeting. Cittion for the originl published pper: Mhmood, F., Hooshyr, H., Vnfretti, L. (217) Sensitivity

More information

Mixed CMOS PTL Adders

Mixed CMOS PTL Adders Anis do XXVI Congresso d SBC WCOMPA l I Workshop de Computção e Aplicções 14 20 de julho de 2006 Cmpo Grnde, MS Mixed CMOS PTL Adders Déor Mott, Reginldo d N. Tvres Engenhri em Sistems Digitis Universidde

More information

A Slot-Asynchronous MAC Protocol Design for Blind Rendezvous in Cognitive Radio Networks

A Slot-Asynchronous MAC Protocol Design for Blind Rendezvous in Cognitive Radio Networks Globecom 04 - Wireless Networking Symposium A Slot-Asynchronous MAC Protocol Design for Blind Rendezvous in Cognitive Rdio Networks Xingy Liu nd Jing Xie Deprtment of Electricl nd Computer Engineering

More information

Section 17.2: Line Integrals. 1 Objectives. 2 Assignments. 3 Maple Commands. 1. Compute line integrals in IR 2 and IR Read Section 17.

Section 17.2: Line Integrals. 1 Objectives. 2 Assignments. 3 Maple Commands. 1. Compute line integrals in IR 2 and IR Read Section 17. Section 7.: Line Integrls Objectives. ompute line integrls in IR nd IR 3. Assignments. Red Section 7.. Problems:,5,9,,3,7,,4 3. hllenge: 6,3,37 4. Red Section 7.3 3 Mple ommnds Mple cn ctully evlute line

More information

Power-Aware FPGA Logic Synthesis Using Binary Decision Diagrams

Power-Aware FPGA Logic Synthesis Using Binary Decision Diagrams Power-Awre FPGA Logic Synthesis Using Binry Decision Digrms Kevin Oo Tinmung, Dvid Howlnd, nd Russell Tessier Deprtment of Electricl nd Computer Engineering University of Msschusetts Amherst, MA 01003

More information

2. Self-tapping screws as tensile reinforcements perpendicular to the grain

2. Self-tapping screws as tensile reinforcements perpendicular to the grain Reinforcements perpeniculr to te grin using self-tpping screws Univ.-Prof. Dr.-Ing. Hns Jocim Blss Dipl.-Ing. Ireneusz Bejtk Lerstul für Ingenieurolzbu un Bukonstruktionen University of Krlsrue 7618 Krlsrue,

More information

Timing Constraint-driven Technology Mapping for FPGAs Considering False Paths and Multi-Clock Domains

Timing Constraint-driven Technology Mapping for FPGAs Considering False Paths and Multi-Clock Domains Timing Constrint-driven Technology Mpping for FPGAs Considering Flse Pths nd Multi-Clock Domins Lei Cheng, Deming Chen, Mrtin D.F. Wong Univ. of Illinois t UC, Chmpign, IL USA {lcheng1,dchen,mdfwong}@uiuc.edu

More information

High Speed On-Chip Interconnects: Trade offs in Passive Termination

High Speed On-Chip Interconnects: Trade offs in Passive Termination High Speed On-Chip Interconnects: Trde offs in Pssive Termintion Rj Prihr University of Rochester, NY, USA prihr@ece.rochester.edu Abstrct In this pper, severl pssive termintion schemes for high speed

More information

MEASURE THE CHARACTERISTIC CURVES RELEVANT TO AN NPN TRANSISTOR

MEASURE THE CHARACTERISTIC CURVES RELEVANT TO AN NPN TRANSISTOR Electricity Electronics Bipolr Trnsistors MEASURE THE HARATERISTI URVES RELEVANT TO AN NPN TRANSISTOR Mesure the input chrcteristic, i.e. the bse current IB s function of the bse emitter voltge UBE. Mesure

More information

Joanna Towler, Roading Engineer, Professional Services, NZTA National Office Dave Bates, Operations Manager, NZTA National Office

Joanna Towler, Roading Engineer, Professional Services, NZTA National Office Dave Bates, Operations Manager, NZTA National Office . TECHNICA MEMOANDM To Cc repred By Endorsed By NZTA Network Mngement Consultnts nd Contrctors NZTA egionl Opertions Mngers nd Are Mngers Dve Btes, Opertions Mnger, NZTA Ntionl Office Jonn Towler, oding

More information

ECE 274 Digital Logic

ECE 274 Digital Logic ECE - Digitl Logic (Textbook - Required) ECE Digitl Logic Instructor: Romn Lysecky, rlysecky@ece.rizon.edu Office Hours: TBA, ECE F Lecture: MWF :-: PM, ILC Course Website: http://www.ece.rizon.edu/~ece/

More information

Convolutional Networks. Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow

Convolutional Networks. Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow Convolutionl Networks Lecture slides for Chpter 9 of Deep Lerning In Goodfellow 2016-09-12 Convolutionl Networks Scle up neurl networks to process very lrge imges / video sequences Sprse connections Prmeter

More information

Improving Iris Identification using User Quality and Cohort Information

Improving Iris Identification using User Quality and Cohort Information Improving Iris Identifiction using User Qulity nd Cohort Informtion Arun Pssi, Ajy Kumr Biometrics Reserch Lbortory Deprtment of Electricl Engineering, Indin Institute of Technology Delhi Huz Khs, New

More information

Digital Design. Chapter 1: Introduction

Digital Design. Chapter 1: Introduction Digitl Design Chpter : Introduction Slides to ccompny the textbook Digitl Design, with RTL Design, VHDL, nd Verilog, 2nd Edition, by, John Wiley nd Sons Publishers, 2. http://www.ddvhid.com Copyright 2

More information

CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2005

CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2005 CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2005 EXPERIMENT 1 FUNDAMENTALS 1. GOALS : Lern how to develop cr lrm digitl circuit during which the following re introduced : CS2204 l fundmentls, nd

More information

Student Book SERIES. Fractions. Name

Student Book SERIES. Fractions. Name D Student Book Nme Series D Contents Topic Introducing frctions (pp. ) modelling frctions frctions of collection compring nd ordering frctions frction ingo pply Dte completed / / / / / / / / Topic Types

More information

Digital Design. Sequential Logic Design -- Controllers. Copyright 2007 Frank Vahid

Digital Design. Sequential Logic Design -- Controllers. Copyright 2007 Frank Vahid Digitl Design Sequentil Logic Design -- Controllers Slides to ccompny the tetook Digitl Design, First Edition, y, John Wiley nd Sons Pulishers, 27. http://www.ddvhid.com Copyright 27 Instructors of courses

More information

Soft switched DC-DC PWM Converters

Soft switched DC-DC PWM Converters Soft switched DC-DC PWM Converters Mr.M. Prthp Rju (), Dr. A. Jy Lkshmi () Abstrct This pper presents n upgrded soft switching technique- zero current trnsition (ZCT), which gives better turn off chrcteristics

More information

DYE SOLUBILITY IN SUPERCRITICAL CARBON DIOXIDE FLUID

DYE SOLUBILITY IN SUPERCRITICAL CARBON DIOXIDE FLUID THERMAL SCIENCE, Yer 2015, Vol. 19, No. 4, pp. 1311-1315 1311 DYE SOLUBILITY IN SUPERCRITICAL CARBON DIOXIDE FLUID by Jun YAN, Li-Jiu ZHENG *, Bing DU, Yong-Fng QIAN, nd Fng YE Lioning Provincil Key Lbortory

More information

SOLVING TRIANGLES USING THE SINE AND COSINE RULES

SOLVING TRIANGLES USING THE SINE AND COSINE RULES Mthemtics Revision Guides - Solving Generl Tringles - Sine nd Cosine Rules Pge 1 of 17 M.K. HOME TUITION Mthemtics Revision Guides Level: GCSE Higher Tier SOLVING TRIANGLES USING THE SINE AND COSINE RULES

More information

Student Book SERIES. Patterns and Algebra. Name

Student Book SERIES. Patterns and Algebra. Name E Student Book 3 + 7 5 + 5 Nme Contents Series E Topic Ptterns nd functions (pp. ) identifying nd creting ptterns skip counting completing nd descriing ptterns predicting repeting ptterns predicting growing

More information

Polar Coordinates. July 30, 2014

Polar Coordinates. July 30, 2014 Polr Coordintes July 3, 4 Sometimes it is more helpful to look t point in the xy-plne not in terms of how fr it is horizontlly nd verticlly (this would men looking t the Crtesin, or rectngulr, coordintes

More information

Section 2.2 PWM converter driven DC motor drives

Section 2.2 PWM converter driven DC motor drives Section 2.2 PWM converter driven DC motor drives 2.2.1 Introduction Controlled power supply for electric drives re obtined mostly by converting the mins AC supply. Power electronic converter circuits employing

More information