Ambiguity Measure Feature Selection Algorithm

Size: px
Start display at page:

Download "Ambiguity Measure Feature Selection Algorithm"

Transcription

1 Ambguty Measure Feature Selecton Algorthm Saet S.R. Mengle and Nazl Goharan Informaton Retreval Lab, Illnos Insttute of Technology, Chcago, IL E-mal: {saet, Wth the ncreasng number of dgtal documents, the ablty to automatcally classfy those documents both effcently and accurately s becomng more crtcal and dffcult. One of the major problems n text classfcaton s the hgh dmensonalty of feature space. We present the ambguty measure (AM) feature-selecton algorthm, whch selects the most unambguous features from the feature set. Unambguous features are those features whose presence n a document ndcate a strong degree of confdence that a document belongs to only one specfc category. We apply AM feature selecton on a naïve Bayes text classfer. We favorably show the effectveness of our approach n outperformng eght exstng feature selecton methods, usng fve benchmar datasets wth a statstcal sgnfcance of at least 95% confdence. The support vector machne (SVM) text classfer s shown to perform consstently better than the naïve Bayes text classfer. The drawbac, however, s the tme complexty n tranng a model. We further explore the effect of usng the AM feature-selecton method on an SVM text classfer. Our results ndcate that the tranng tme for the SVM algorthm can be reduced by more than 50%, whle stll mprovng the accuracy of the text classfer. We favorably show the effectveness of our approach by demonstratng that t statstcally sgnfcantly (99% confdence) outperforms eght exstng feature-selecton methods usng four standard benchmar datasets. Introducton There s an overflow of unorganzed dgtal data n today s world. Vast volumes of dgtal text are avalable va the World Wde Web (WWW), news feeds, electronc mal, corporate databases, medcal patent records and dgtal lbrares. The problem of classfyng and storng these documents poses a sgnfcant challenge. Large companes flter ncomng e-mal and store them n folders or route them to concerned departments. News agences also use classfcaton tools for flterng or routng the news from dfferent sources to the approprate clent. Other applcatons of text classfcaton are n the feld of nowledge-base extracton, e-commerce and nformaton extracton. Companes spend sgnfcant resources on classfyng documents manually. The feasblty of manual classfcaton decreases as the number of documents ncreases over tme. As the number of documents s large, a fast and scalable automatc classfer s needed to classfy the exstng and ncomng documents accurately and effcently. We propose, desgn, develop and evaluate one such classfer. Text classfcaton nvolves scannng through the text documents, and assgnng categores to documents to reflect ther content (Yang, 1999). One of the major characterstcs of text classfcaton s the hgh dmensonalty of a feature set (Mladenć & Grobeln, 1998). The feature set for a dataset conssts of the unque terms n tranng documents. However, the number of features n the text classfcaton dataset s prohbtvely hgh for many learnng algorthms. Hence, t s hghly desrable to reduce the feature set wthout sacrfcng categorzaton accuracy. Feature selecton s formally defned n Galavott & Sebastan (2000) as the actvty of selectng, from the set of r dstnct features (.e., words) occurrng n the collecton, the subset of r r features that are most useful for compactly representng the meanng of the documents. Feature-selecton methods are used to acheve two objectves: 1. To reduce the sze of the feature set to optmze the classfcaton effcency. 2. To reduce nose n the feature set to optmze the classfcaton effectveness. Most exstng feature selecton algorthms such as odds rato (Mladenć & Grobeln, 1998), nformaton gan (Qunlan J, 1986), Ch-Squared (Yang & Pedersen, 1997), b-normal separaton (Forman, 2003) and tfcf (Chh & Kulathuramayer, 2004) calculate a score based on the probablty that a feature belongs to a gven category and the probablty that a feature does not belong to the other categores. These algorthms perform poorly on the unbalanced text classfcaton datasets. The nature of unbalanced datasets s such that a few categores have sgnfcantly more tranng documents than most of the categores, and hence, the term frequency of many features appearng n these few categores s sgnfcantly hgher than ther frequency n other categores. Moreover, f such terms have the same term frequency n two or more categores, the feature can not confdently pont to a gven category. Thus, such terms should not be consdered mportant n a sngle-labeled text classfcaton process and should be fltered. However, algorthms such as odds rato, nformaton gan, ch-squared, b-normal separaton

2 and tfcf assgn a hgher weght to these terms even f they appear n more than one category. We call these terms ambguous terms. To tacle ths problem, we present a feature selecton method called Ambguty Measure (AM; Mengle, Goharan & Platt, 2007) that assgns a hgh score to a term, f t appears consstently n only one specfc category. The ntuton s that the term that appears n only one category ponts more strongly to that specfc category and thus, s a better ndcator n a sngle-labeled classfcaton decson. We apply AM on sngle labeled Naïve Bayes text classfer and compare AM wth eght feature selecton algorthms on fve standard datasets from varous subject domans, namely news feeds, web pages, and bo-medcal text. Our results ndcate that AM feature selecton acheves statstcally sgnfcant mprovements on unbalanced datasets such as OHSUMED (20%) and Genomcs (7.5%), and on balanced datasets such as WebKB (2.6%), 20NG (2.14%) and Reuters (0.25%) when compared to the best performng feature selecton method out of the eght methods. However, the mprovements on the unbalanced datasets are larger than the mprovements on the balanced datasets. Furthermore, we also explore the effects of the AM feature selecton method when appled on the sngle labeled Support Vector Machne (SVM) algorthm (Cortes & Vapn, 1995; Joachms, 1999; Yang, Zhang & Ksel, 2003). The SVM algorthm s one of the wdely used text classfcaton algorthms. Pror wor (Joachms, 1998) ndcates that SVM performs consstently better than Naïve Bayes, NN, C4.5 and Roccho text classfers. However, one of the lmtatons of SVM s ts tranng tme complexty. Yang, Zhang & Ksel 2003) shows that SVM has a hgher tme complexty for tranng a model than other text classfcaton algorthms. To overcome ths lmtaton of SVM, feature selecton methods are used as a preprocessng step before tranng SVM (Wenqan et al., 2007; Novovcova & Mal, 2005; Yan et al., 2005). Many well-nown feature selecton algorthms are used wth SVM to mprove ts accuracy and effcency. We use the AM feature selecton method as a pre-processng step for the Support Vector Machne classfer (Mengle & Goharan, 2008). The features whose AM scores are below a gven threshold,.e., more ambguous terms, are purged whle the features whose AM scores are above a gven threshold are used for the SVM learnng phase. We favorably compare the results of AM feature selecton algorthm wth the same eght feature selecton algorthms reported n (Wenqan et al., 2007; Yan et al., 2005) on four of the standard benchmar datasets. We also emprcally show that usng AM feature selecton wth SVM reduces the tranng tme by more than 50%, whle mantanng the accuracy of the classfer. Pror Wor Varous technques are used for fndng an "optmal" subset of features from a larger set of possble features. Exhaustvely tryng all the subsets s not computatonally feasble. Hence, automatc feature selecton algorthms are used to fnd the most mportant features n the feature set. In ths secton, we present the commonly used feature selecton algorthms. Odds Rato The basc dea of usng odds rato (Mladenć & Grobeln, 1998) s to calculate the odds of a term occurrng n the postve class (the category a term s related to) normalzed by the odds of that term occurrng n the negatve class (the category a term s not related to). The odds rato of a term t for a category c s defned usng Equaton 1: Odds Rato ( t P( t c )[1 P( t c )], c )..1 [1 P( t c )] P( t c ) Odds Rato s nown to wor well wth the Nave Bayes text classfer algorthm (Mladenć et al., 2004; Mladenć & Grobeln, 1998). Informaton Gan Informaton gan (Qunlan, 1986) s commonly used as a surrogate for approxmatng a condtonal dstrbuton for text classfcaton. In nformaton gan, class membershp and the presence/absence of a partcular term n a gven category are seen as random varables; one computes how much nformaton about the class membershp s ganed by nowng the presence/absence statstcs. If the class membershp s nterpreted as a random varable C wth two values, postve ( c ) and negatve ( c ), and a word s lewse seen as a random varable T wth two values, present ( t ) and absent ( t ), then Informaton Gan s defned as Equaton 2: P( t c) IG ( t, c ) P( t c) log..2 2 P( t) P( c) Ch-Squared c { c, c } t { t, t } The 2 test s used n statstcs to test the ndependence 2 between two events. In text classfcaton, (Yang & Pedersen, 1997; Galavott & Sebastan, 2000; Wu & Flach, 2001) s used to measure the assocaton between a category and features. The 2 measure of a term t for a category c s defned usng Equaton 3:

3 2 P( t ( t, c ) c ) P( t P( t c ) P( t ) P( t c ) P( t ) P( c ) P( c ) c ) Thus, the 2 (t, c ) score ndcates the weght of term t wth respect to category c. If a term s close to more categores, then the score of that term s hgher. The score of each term t s calculated usng Equaton 4: c 2 2 ( t ) P( c ) ( t, c )..4 1 B-Normal Separaton In the B-Normal Separaton (BNS) feature selecton method (Forman, 2003; Forman, 2008), the occurrence of a gven term s modeled n each document by a random normal varable that exceeds a hypothetcal threshold. The prevalence rate s calculated wth respect to both postve and negatve classes. Prevalence rate can be defned as the area under the curve past a certan threshold. Thus, f a term consstently appears n the postve class, the threshold s farther from the tal of the curve than that of the negatve class. BNS s calculated based on the separaton between these two thresholds. Thus, f a term appears more consstently n the postve class than the negatve class, t s assgned a hgher BNS score. BNS s calculated usng the Equaton 5. 1 tp 1 fp BNS F ( ) F ( )..5 tp fn fp tn Where, tp s the number of postve cases contanng the word; fp s the number of negatve cases contanng the word; fn s the number of postve cases that do not contan the word; tn s the number of negatve cases that do not contan the word, and F -1 s the standard normal dstrbuton s nverse cumulatve probablty functon. As reported n (Forman, 2003), BNS+F1 yelds the best performance on most of the tass n comparson wth odds rato, nformaton gan and Ch-Squared. F1 metrcs (Equaton 6) s the harmonc mean of precson (Equaton 7) and recall (Equaton 8). 2 * Precson * Recall F1..6 Precson Recall tp Precson..7 tp fp tp Recall..8 tp fn Improved Gn Index In Gn Index (Breman, Fredman and Olshen, 1984), f a term appears n every document of class c, then t..3 receves a hgh Gn Index score. (Ths s regardless of term occurrence n other classes.) When a term s dstrbuted evenly n the documents of varous categores, the term s then assgned a lower Gn Index score. Gn Index for a term t can be calculated usng Equaton 9. Gn ( t ) P( t ) 1 P( c t ) P( t ) 1 P( c t )..9 However, the Gn Index fals to consder the frequency of documents where the term occurs wthn larger categores. The categores are generally unbalanced wth respect to the number of tranng documents. Hence, the Gn Index score s based wth respect to categores that have a large number of tranng documents. (Wenqan et al., 2007) constructed a new functon called Improved Gn Index that consders a term's condton probablty and combnes the posteror probablty and condton probablty to avod the effects of unbalanced classes n datasets. Improved Gn Index of a gven term t s defned usng Equaton 10: C 2 2 I Gn ( t ) P( t c ) P( c t )..10 Tfcf 1 In tfcf (Chh & Kulathuramayer, 2004), tf refers to term frequency of a term n a gven category and cf refers to nverse category frequency,.e., the rato of total number of categores n a dataset to the number of categores a term falls nto. The tfcf scheme does not dscrmnate between terms that occur frequently n a small subset of documents n a category and terms that are present n a large number of documents throughout a category. Thus, tfcf consders that the less a term occurs across categores, the hgher s ts score. The tfcf of a term t n category c s defned usng Equaton 11: tfcf ( t..11, c ) tf ( t C, c )log cf ( t ) Where C refers to the total number of categores n a dataset, tf(t,c ) s the term frequency of a term t n category c and cf(t ) refers to the number of categores n whch a term t appears. Tfdf In tfdf (Chh & Kulathuramayer, 2004), tf refers to term frequency of a term n a gven document. df s defned as the nverse document frequency,.e., the rato of the total number of documents present n a dataset to the number of documents a gven term appears n. A hgher df of a term ndcates that the term appears n relatvely few documents and may be more mportant durng the

4 process of text classfcaton. tfdf s a commonly used technque for term weghng n the feld of nformaton retreval (Grossman & Freder, 2004) and s also used n text classfcaton (Lavell, Sebastan & Zanol, 2004; Debole & Sebastan, 2003). tfdf of a term t n document d s defned usng Equaton 12: D tfdf ( t, ) (, )log d tf t d..12 df ( t ) Where D refers to the total number of documents n a dataset, tf(t,d ) s the term frequency of a term t n document d and df(t ) refers to the number of documents n whch term t appears. Orthogonal Centrod Feature Selecton (OCFS) The Orthogonal Centrod Feature Selecton (OCFS) (Yan et al., 2005) selects features optmally accordng to the functon mpled by the Orthogonal Centrod algorthm. The centrod of each class (m j ) and also for the entre dataset (m) s calculated usng tranng data. A score for term t s calculated usng Equaton 13. c n j t t m j m 2 OCFS ( t ) ( )..13 n j 1 Where n j s the number of tranng samples that belong to category j and n s the total number of tranng samples. The feature set s pruned by selectng only the features whose scores are hgher than a threshold. OCFS s not greedy n nature le odds rato or nformaton gan. Hence, the OCFS algorthm can be optmzed based on the objectve functon that s mpled by the Orthogonal Centrod algorthm and has been shown to mprove over tradtonal algorthms. Methodology In ths secton, we ntally descrbe our motvaton behnd our AM feature selecton algorthm and formally defne AM. Secondly, we dscuss the dfferences between the AM measure and varous feature selecton algorthms. Fnally we defne a methodology for usng the feature selecton algorthms wth text classfcaton algorthms such as NB and SVM. AM Feature Selecton Algorthm Intally, we descrbe the ntutve motvaton behnd our AM feature selecton approach and then provde a formal defnton. Frst, we consder the human percepton of the topc of a document by glancng at the document and capturng ts eywords. Instead of usng all the terms n a document to determne the subject of a text, normally one bases a decson on the most unambguous words that the eye captures. The person then has an dea of the topc of the document. Some words can easly suggest the category n whch the document can fall nto. For example, f the document has phrases le Chcago Whte Sox and MLB World Seres Champon, then one can suggest that the document relates to baseball n partcular and sports n general. The sample text below s taen from Wpeda 1. By havng a glance at ths text, the reader can guess the category. Metallca s a Grammy Award-wnnng Amercan heavy metal/thrash metal band formed n 1981 and has become one of the most commercally successful muscal acts of recent decades. They are consdered one of the "Bg Four" poneers of thrash metal, along wth Anthrax, Slayer, and Mega-death. Metallca has sold more than 90 mllon records worldwde, ncludng 57 mllon albums n the Unted States alone. The text seems to be about Musc. Our human percepton s based on our nowledge of the doman or what we hear or read on varous subjects n daly lfe. Thus, wthout readng ths specfc text completely, one can confdently clam that the text belongs to Musc rather than Terrorsm or Poltcs. Some terms may be stronger ndcators that a gven text belongs to a certan category than others. Thus, we can gve a score as to how strongly a term suggests a partcular category. We clarfy ths by gvng the followng hypothetcal example. "Carolna Panthers lost the Superbowl ttle to Chcago Bears due to a last mnute touchdown." In the above sentence, we have the terms Bears and Panthers, whch are related to wldlfe. On the other hand, they are also the names of famous NFL football teams. Here we notce uncertanty n classfyng the text to Wldlfe or to Sports categores. Terms such as Superbowl and touchdown, n the same gven text suggest wth more certanty that the text s about Sports. 1 Wpeda.

5 Table 1. Ambguty Measure (AM) example Term Metallca Anthrax Records Category Count AM Count AM Count AM Medcne Musc Sports Poltcs We defne an Ambguty Measure, AM, for each term t wth respect to category c, usng Equaton 14. The maxmum AM score for term t wth respect to all categores s assgned as AM score of term t (Equaton 15). tf ( t, c ) AM ( t, ) c..14 tf ( t ) AM t ) max ( AM ( t, c ))..15 ( Where tf(t,c ) s the term frequency of a term t n category c and tf(t ) s the term frequency of a term t n the entre collecton. We then assgn a hgher score to unambguous terms. In the above example, the term touchdown has a hgher AM than that of the terms Bears and Panthers. The AM score s close to 1 f the term s unambguous. Conversely, f AM s closer to 0, the term s consdered more ambguous and may pont to more than one category. The AM score for the feature Metallca, for the sample text, s 0.99, whch ndcates that the feature Metallca s an unambguous feature and should be ept and not fltered (Table 1). Anthrax s related to the Medcne category wth an AM score of Anthrax s also the name of a famous musc band of the 1980s. Hence, t also appears n the category Musc. Thus, the AM of Anthrax s less than Metallca. In some cases the AM score of some features s low as they appear consstently n multple categores. An example of such s the term Records, whch may appear n all three (sports, musc, and medcne) categores. Thus, the AM score of such a term s low (0.33), and t s desrable to flter out such features. Ths reducton n dmensonalty of the feature set ncreases the accuracy by avodng the terms that have lower AM scores. We emprcally determne a threshold and flter out features whose AM scores are below that gven threshold. Dfferences The feature selecton methods of odds rato, nformaton gan, BNS+F1 and Ch-Squared assgn a hgh score to a term even f t appears n more than one category. Usng such features do not assst a sngle labeled text classfer n dstngushng between categores. AM feature selecton method assgns a hgh score to a term, f t appears consstently n only one specfc category. Such terms then can pont the classfer to that specfc category. For example, consder a term t 1 wth half of ts occurrences n one category c 1 and the other half dstrbuted unformly across the other categores. Term t 1 confdently ponts to category c 1 and hence s assgned an AM score of Consder another term t 2 wth 49% of occurrences n category c 1 and the other 51% of occurrences concentrated n two other categores c 2 and c 3. An AM score of 0.49 s assgned to term t 2. As our goal s sngle-labeled classfcaton, AM assgns a hgher score to term t 1 than term t 2 as t ponts more confdently to category c 1. However, algorthms such as nformaton gan, odds rato, BNS+F1 and Ch-Squared assgn a score to a term that s nversely proportonal to the number of categores that term appears n. Hence, term t 2 (occurs n three categores) s assgned a hgher score than t 1 (occurs n all categores). However, the term t 2 may mslead a sngle-labeled classfer as t also ponts to categores c 2 and c 3 each wth a lower probablty (25.5%). Term t 1 confdently ponts to only category c 1 and hence, should be assgned a hgher score than t 2. In the Improved Gn Index method, the probabltes of a term wth respect to all the categores are consdered. If the term t appears n many documents of category c, then t s assgned a hgh score. In a stuaton where term frequency of the term t n categores c and c j s the same, and also t appears n every document of both categores c and c j, then t s assgned a hgh score. However, as term t belongs to two dfferent categores t s ambguous. Our proposed AM feature selecton method avods such stuaton and assgns a low score to features le t. Usng tfdf and tfcf methods, the terms that appear wth a low frequency n only a sngle category are purged durng the feature selecton process. However, such terms are unambguous and pont to a sngle category. Another problem s that some terms have a smlar dstrbuton n more than one category (low df or cf), but have a hgh term frequency. These terms are selected durng the process of feature selecton as the term frequency of such terms s hgh. These terms are ambguous, as they do not pont strongly to only a sngle category. The AM feature selecton method avods such stuatons by only consderng the rato between the numbers of occurrences of a term n a gven category to the total number of

6 Fgure 1. Bloc dagram for usng feature selecton method on a text classfer occurrences of that term n the tranng set. Thus, both these stuatons are avoded. In OCFS the tranng and the testng tme s quadratc as the centrods of each class and the entre dataset are calculated. However, AM feature selecton method trans and tests n lnear tme (ths s dscussed later n the paper). Usng feature selecton algorthms on SVM and NB text classfers We evaluate our feature selecton algorthm on SVM and Naïve Bayes text classfers. SVM s commonly used as t was shown to perform better n terms of effectveness than other text classfers such as Naïve Bayes, NN, C4.5 and Roccho (Joachms, 1998). Naïve Bayes algorthm s, however, more effcent and scalable than other algorthms (Yang, Zhang & Ksel, 2003). We present the methodology for applyng feature selecton algorthms on SVM and NB text classfers (Fgure 1). Ths process s dvded nto four phases. Phase 1. Calculatng feature selecton scores In the pre-processng step, feature selecton score for each feature n tranng documents s calculated. Phase 2. Flterng terms wth lower feature scores We only eep the features n tranng documents f the feature score of a term s above a certan emprcally determned threshold. We determne these thresholds by exhaustvely optmzng the results of each algorthm on the testng documents. The choce of testng versus separate valdaton set s to be consstent wth the pror wors (Wenqan, et al, 2007) (Yan, et al., 2005) (Chh, Kulathuramayer, 2004) that we compare our wor wth. We compare AM wth both local and global feature selecton algorthms. We globalze the local feature selecton algorthms by selectng the terms wth the hghest local scores. Addtonally, we also experment wth usng round robn method (Forman, 2004) to convert local feature selecton scores nto global scores. Phase 3. Tranng the text classfer Pruned documents from Phase 2 are used by NB and SVM classfers to tran a text classfcaton model. For NB, we use the tradtonal NB classfer as explaned n (Mccullum and Ngam, 1998) to create a text classfcaton model. We use the lnear SVM ernel, as the non-lnear versons gan very lttle n terms of performance (Mladenć et al., 2004). For tranng and testng the SVM model, we use LbSVM software that s commonly used for classfyng the documents nto bnary or mult-labeled categores. Phase 4. Classfyng documents In the testng phase, the traned text classfcaton model s used to classfy testng documents by predctng a 2 Chang C.C., Ln C.J., LIBSVM: a lbrary for support vector machnes, 2001.

7 Table 2. Tme and space complexty for applyng AM on Nave Bayes and SVM Classfer Tranng tme Testng tme per document Space Complexty Nave Bayes usng AM O( N L d +M V) O(M L v) O(M V) SVM usng AM O( N L d +M V + M N c ) c 1.2~1.5 O( M L v) O( M V + N L v+ q 2 ) N - number of tranng documents L d - average document length M- number of categores L v - average number of unque terms n document V sze of vocabulary (features) q constant that depends on the teratons needed category for each. Unle the tradtonal Naïve Bayes text classfer, we as n (Renne, Teevan & Karger, 2003) do not consder pror probablty whle predctng the category for a testng document. As SVM only classfes documents nto two classes (bnary classfer), we use one aganst all (Y & Zheng, 2005) technque to run SVM on multclass datasets. We use sngle labeled classfcaton n ths wor to classfy documents. Hence, only one category s predcted for each testng document by the text classfer. Tme and Space Complexty Analyss AM scores are computed n lnear tme as tranng documents arrve. However, the scalablty of usng AM depends on the text classfer. The comparson of tme and space complexty for applyng AM on Nave Bayes and SVM are gven n Table 2, and are dscussed n the subsectons that follow. Analyss of tme complexty for applyng AM on Naïve Bayes The term frequency of each term per category s calculated. Thus, Naïve Bayes parses NL d terms durng the tranng phase. For every term n the vocabulary, M dfferent AM scores are calculated whch taes O(MV) tme. Thus, the tranng tme for Nave Bayes usng AM s also O(NL d +MV) and equates to O(NL d )(as MV<<NL d ). Durng the testng phase, we calculate the product of AM of terms present n the testng document wth respect to each category, whch taes O(M L v ). A lexcon of all the terms n vocabulary (V) and ther AM scores wth respect to all M categores are stored as the NB model. Many of the features are fltered durng feature selecton process, thus only some of the features and ther AM scores are stored. Space needed by Naïve Bayes usng AM s O(MV). Analyss of tme complexty for applyng AM on SVM As shown, the tranng tme for Nave Bayes usng AM s O(NL d +MV). Thus, AM for all the features n tranng set can be found n lnear tme. SVM, however, trans n quadratc tme. Algorthms used n LbSVM tran n O(M N c ) where c 1.2~1.5 (Yang, Zhang & Ksel, 2003). Hence, the total tme taen for tranng a model usng AM as a preprocessng step of SVM s O(NL d +MV+MN c ). However, as NL d and MV are much smaller than MN c, we consder the tranng and testng tme for usng AM wth SVM as O (M N c ). The space taen for storng SVM model s O(N L v + q 2 ) where q s a constant that depends on the teratons needed. Hence, the space complexty for usng AM wth SVM s O(M V + N L v + q 2 ). Expermental Setup We emprcally evaluated the effectveness of AM feature selecton algorthm usng fve benchmar data sets (Reuters 21578, 20 Newsgroups, WebKB, OHSUMED, Genomcs), whch are commonly used n text classfcaton evaluaton. The detals on these data sets are gven n Table 3. We ntentonally chose these datasets, whch consst of news artcles, web pages and bo-medcal documents, to show the effects of AM on dfferent domans. Although we observe dfferent accuraces across dfferent domans, AM consstently outperforms other feature selecton algorthms over all domans. To show the scalablty of our AM feature selecton approach, usng NB classfer, we also show the effectveness and effcency analyss on TREC 2005 Genomcs dataset, whch contans 4.5 mllon documents. We do not show the results for TREC Genomcs 05 on SVM classfer, as SVM s not scalable for use on very large datasets. (The tranng tme for a SVM model for TREC 05 Genomcs s almost 4 days.) In all our experments, we use a sngle computer, wth an AMD Athlon 2.16Ghz processor and 1 GB of RAM. A bref explanaton about the benchmar datasets that are used n our experments s gven below.

8 Table 3. Benchmar datasets used for our experments Datasets No. of documents No. of Sze of Categores dataset Doman Reuters ,578 Top 10 categores 28 MB News Artcles 20 News Group 20, categores 61 MB News Artcles WebKB 8,282 7 categores 43 MB Web Pages (Unversty webstes) OHSUMED GENOMICS (TREC 05) 54,710(Total) 39,320 (Subset) 4.5 mllon (Total) 591,689 (Subset) Top 50 categores Top 50 categores 382 MB Bo-medcal Documents 15.5 GB Bo-medcal Documents Reuters Dataset The Reuters corpus contans the Reuters news artcles from These documents range from multlabeled, sngle-labeled, or not labeled. The average document length n Reuters dataset s 200 (nonunque) terms per document. Reuters dataset conssts of a total number of 135 categores (labels), ten of whch have sgnfcantly more documents than the rest of the categores. Thus, commonly the top 10 categores are used to evaluate the accuracy of the classfcaton results. The top 10 categores of Reuters are earn, acq, money-fx, gran, trade, crude, nterest, wheat, corn and shp. 20 Newsgroup (20NG) Dataset 20 Newsgroup 4 (20NG) conssts of total of 20,000 documents that are categorzed nto twenty dfferent categores. Each category contans 1,000 documents. The average document length n 20NG dataset s 311 terms per document. Thus, the average sze of the documents s much larger than those n Reuters dataset. Some of the newsgroups categores are very closely related to each other (e.g., comp.sys.bm.pc.hardware and comp.sys.mac.- hardware), whle others are hghly unrelated (e.g., msc.forsale and soc.relgon.chrstan). Ths characterstc contrbutes to the dffculty of categorzaton of documents that belong to very smlar categores. WebKB Dataset The WebKB dataset 5 s a collecton of web pages from four dfferent college webstes namely Cornell, Texas, 3 4 Lews D., Reuters-21578, resources/testcollectons/reuters Lang K., Orgnal 20 Newsgroups Dataset. people.csa.mt.edu/jrenne/20newsgroups 5 WebKB dataset. cs.cmu.edu/ project/theo-20/www/data/ Washngton, Wsconsn and some mscellaneous web pages. These web pages are pre-classfed nto seven categores: student, faculty, staff, department, course, project and others. WebKB contans 8,282 web pages. The average document length n WebKB dataset s 130 terms. OHSUMED Dataset OHSUMED (Hersh, Bucley, Leone & Hcman, 1994) s a collecton of Medlne documents,.e., medcal ctatons, from 1987 to 1991, and s commonly used for bo-medcal lterature search evaluaton and classfcaton. We use only the top (largest) 50 categores wth documents publshed n The average document length n OHSUMED dataset s 63 terms per document. The dstrbuton of documents n OHSUMED dataset s uneven. The largest category contans 2,415 documents, whle the smallest category contans 873 documents. Hence, more tranng data are avalable for some categores as compared to others. TREC 2005 Genomcs Dataset TREC 05 GENOMICS s a collecton of 4.5 mllon bomedcal documents and s 15.5 GB n sze. Ths s the largest publcly avalable benchmar dataset that contans categorzed (labeled) documents n the doman of bonformatcs. The average document length s 183 terms per document. We are not aware of any text classfcaton efforts on TREC 05 GENOMICS 6 data set. Thus, for ths dataset no comparson wth pror efforts was possble. We used the data processed by (Urban, Goharan & Freder, 2007). They use a pre-processng model that breas up gene names and s shown to perform well. Acronyms and ther long-forms are dentfed durng preprocessng usng the Schwartz and Hearst algorthm (Schwartz & Hearst, 2003). An example of such long-short form would nclude 6 TREC 2005 Genomcs dataset.

9 mmuno defcency enzyme (IDE), and a short-long form would nclude IDE (mmuno defcency enzyme). The algorthm wors bacwards through the long form text and attempts to dentfy correspondng letters n the acronym. All terms are toenzed, stop words removed, and lexcal varants are generated. Porter stemmng (Porter, 1997) s used on each toen wth the followng exceptons: gene names (as defned by the Entrez Gene database); all upper case, mxed case, alpha-numerc terms; and non-gene terms that would become a gene name after beng stemmed. Smlar to OHSUMED dataset, the top (largest) 50 categores are chosen those that contan hghest number of documents for GENOMICS dataset. Smlarly, the categores are raned based on the number of documents. Ths subset of Genomcs dataset contans 591,589 documents. The category that contans the hghest number of documents contans 295,773 documents whle the category among top 50 categores that contans least number of documents has 8,049 documents. Hence, f we choose the categores after the top 50, then the number of tranng documents n these categores s very low, leadng to a lower classfcaton accuracy. Evaluaton Metrcs To evaluate the effectveness of our approach and compare to the state of the art feature selecton research results, we use the commonly used evaluaton metrcs precson, recall and F1 measure. True Postve Precson (P) True Postve False Postve.. 16 Precson (Equaton 16) s defned as the rato of correct classfcaton of documents nto categores to the total number of attempted classfcatons. True Postve Recall (R) True Postve False Negatve.. 17 Recall (Equaton 17) s defned as the rato of correct classfcatons of documents nto categores to the total number of labeled data n the testng set. 2* Precson * Recall F1Measure Precson Recall..18 F1 measure (Equaton 18) s defned as the harmonc mean of precson and recall. Hence, a good classfer s assumed to have a hgh F1 measure, whch ndcates that classfer performs well wth respect to both precson and recall. We present the mcro-averaged results for precson, recall and F1 measure. Mcro-averagng consders the sum of all the true postves, false postves and false negatves that are generated n ten runs of 10-fold cross valdaton (Lews, 1991). Results We organze the results nto two subsectons. In the frst subsecton, we present the result for Naïve Bayes classfer usng AM feature selecton method. In the second subsecton, the results for AM feature selecton wth SVM classfer are presented. Nave Bayes usng AM We evaluated the expermental results usng Reuters 21578, 20NG, WebKB, OHSUMED and TREC 05 Genomcs datasets. We present the comparson of AM feature selecton algorthm wth the eght feature selecton algorthms explaned earler n the pror wor secton. We vared the threshold to dentfy the optmal F1 measure for each feature selecton method. The results show that AM outperforms the others statstcally sgnfcantly wth a confdence level of at least 95%. We demonstrate the effects of usng round robn method, whch s used for globalzng the localzed feature selecton score. We also present the effects of AM on the tranng and testng tme for Nave Bayes classfer. Comparson wth other feature selecton algorthms usng Nave Bayes classfer We used stratfed 10-fold cross valdaton for all the datasets except WebKB. We used a standard 4-1 splt for WebKB where the data for three unverstes are used for tranng and the data for one unversty s used as a testng set. We vared thresholds to observe the best results wth respect to F1. Our results show that AM comparatvely performs better than the next best performng feature selecton algorthms by 20%, 7.5%, 0.25%, 2.14%, and 2.6%, on OHSUMED, TREC 05 Genomcs, Reuters 21578, 20 Newsgroups, WebKB datasets, respectvely. Fgure 2 shows the comparson of eght feature selecton algorthms on Reuters dataset wth respect to F1 measure. Our expermental results show that AM (Precson: 92.36%, Recall: 85.72%, F1: 88.92%) performs better than tfdf (Precson: 90.78%, Recall: 86.69%, F1: 88.69%) and BNS+F1 (Precson: 88.13%, Recall: 88.01%, F1: 88.07%), whch are the next best performng algorthms. As all the feature selecton algorthms perform well on Reuters dataset, the F1 mprovement when usng AM measure s only 0.25% (95% confdence). Statstcal sgnfcance of the AM wth respect to other feature selecton algorthms for varous datasets s reported n Table 4. For 20 Newsgroups (Fgure 3), AM (Precson:

10 Fgure 2. Comparson of AM wth other feature selecton methods n terms of F1 measure on Reuters dataset for Nave Bayes Fgure 3. Comparson of AM wth other feature selecton methods n terms of F1 measure on 20 Newsgroups dataset for Nave Bayes Fgure 4. Comparson of AM wth other feature selecton methods n terms of F1 measure on WebKB dataset for Nave Bayes Fgure 5. Comparson of AM wth other feature selecton methods n terms of F1 measure on OHSUMED dataset for Nave Bayes Fgure 6. Comparson of AM wth other feature selecton methods n terms of F1 measure on TREC 05 Genomcs dataset for Naïve Bayes Fgure 7. Effect of feature selecton on tranng and testng tme of Naïve Bayes usng AM on TREC 05 Genomcs dataset 91.68%, Recall: 91.69%, F1: 91.72%) performs sgnfcantly better than the next best feature selecton algorthm, mproved gn ndex (Precson: 91.69%, Recall: 87.97%, F1: 89.79%), by 2.14%. Although the mprovement s margnal, the results are statstcally sgnfcant by at least 95% confdence. The results on WebKB dataset, whch are gven n Fgure 4, show that AM (Precson: 74.34%, Recall: 73.76%, F1: 74.05%) performs better than the second best performng algorthm, mproved gn ndex (Precson: 71.74%, Recall: 72.56%, F1: 72.15%), by 2.6%. WebKB dataset conssts of web pages, whch contans mages, tables and other anchor text. Classfyng such documents s more dffcult than

11 Table 4. Statstcal comparson of AM and other feature selecton algorthms on Naïve Bayes wth respect to F1 measure (Pared t-test) Algorthm Datasets Reuters Newsgroups WebKB OHSUMED Genomcs Odds Rato BNS+F tfdf tfcf Info Gan Ch-Squared OCFS Imp. Gn : AM s statstcally sgnfcantly better than the feature selecton algorthm by 95% confdence ++ : AM s statstcally sgnfcantly better than the feature selecton algorthm by 99% confdence classfyng plan documents from Reuters and 20 Newsgroups datasets. Hence, the classfcaton effectveness for WebKB dataset s lower than Reuters and 20 Newsgroups datasets. On bomedcal datasets, our results ndcate that AM (Precson: 65.93%, Recall: 54.84%, F1: 59.88%) statstcally sgnfcantly mproves (20%) over mproved gn ndex (Precson: 53.83%, Recall: 46.54%, F1: 49.92%) on OHSUMED dataset (Fgure 5). AM (Precson: 61.71%, Recall: 60.54%, F1: 61.12%) also shows a statstcally sgnfcant mprovement of 7.5% over mproved gn ndex (Precson: 61.71%, Recall: 52.64%, F1: 56.82%) for TREC Genomcs 05 dataset (Fgure 6). Improved Gn ndex s the second best performng algorthm on both these datasets. Dscusson The motvaton for usng AM feature selecton s to select terms that belong to only one category. As mentoned n ntroducton, ambguous features lead to wrong classfcaton predctons n unbalanced datasets. Our results ndcate that AM performs better than odds rato, nformaton gan, tfcf, tfdf, BNS+F1 and Ch- Squared on OHSUMED and Genomcs datasets by more than 30% (comparatve gan). OHSUMED and Genomcs datasets are unbalanced and a large number of tranng documents belong to the top two categores. The feature selecton methods such as odds rato, nformaton gan, tfcf, tfdf, BNS+F1 and Ch-Squared use both postve and negatve examples to assgn scores to the features. A hgh score s assgned to a feature even f t appears evenly n only 2 or 3 categores out of 50. As the number of tranng documents n the top two or three categores s large, many features only appear n the top two or top three categores. Such features are assgned hgh scores. These features mslead the text classfer and hence, many false postves are generated durng the testng phase. Such features are assgned a low AM score and are fltered durng the process of feature selecton. Fgure 8. Comparson between AM wth/wthout round robn method Improved gn ndex nullfes the effects of unbalanced classes n dataset by combnng the posteror probabltes and condton probabltes for each term. OCFS s optmzed based on the number of documents avalable n each class. Hence, mproved gn ndex and OCFS perform comparatvely better than odds rato, nformaton gan, tfcf, tfdf, BNS+F1 and Ch-Squared algorthms. However, our results ndcate that mproved gn ndex and OCFS perform statstcally sgnfcantly worse than AM on unbalanced dataset such as OHSUMED and Genomcs. Globalzng Feature Selecton Scores As feature selecton algorthms such as tfcf, odds rato, nformaton gan, ch-squared, BNS+F1 and AM are local feature selecton algorthms, we have used the tradtonal method (selectng the terms wth the hghest local scores) to convert ther local scores to global feature selecton score. Addtonally, smlar as n (Forman, 2004), we used round robn method to convert the local feature selecton score nto global score. Round robn method selects the top n features from each category. Thus, the categores wth low number of tranng documents also have the same number of features n the feature set that represent them. Ths method mproves the effectveness n dentfyng the documents that belong to categores that have less tranng documents and leads to an mprovement n macro-f1, whch s the average of F1 measure of all categores.

12 Fgure 9. Comparson of AM wth other feature selecton methods n terms of F1 measure on Reuters dataset for SVM Fgure 10. Comparson of AM wth other feature selecton methods n terms of F1 measure on 20 Newsgroups dataset for SVM Fgure 11. Comparson of AM wth other feature selecton methods n terms of F1 measure on WebKB dataset for our SVM However, the classfcaton accuracy of the categores wth a large number of tranng documents decreases. As we are usng stratfed splts for each dataset, the number of tranng documents that belong to a category s drectly related to the number of testng documents that belong to that category. Hence, the mcro-f1 measure when usng round robn method decreases. We provde the results of AM versus usng AM wth round robn method n Fgure 8. We observed that usng round robn method mproves macro F1 measure by 1.7% whle decreasng the mcro F1 by 5.2% for Reuters dataset. Tradeoff of accuracy and tme wth respect to AM thresholds for Nave Bayes We now present the effects of AM threshold on the tranng and testng tme of Nave Bayes usng TREC 05 Genomcs dataset (Fgure 7). We performed smlar experments on other datasets and observed the same trends. As the TREC 05 Genomcs dataset s relatvely large, the trends wth respect to tranng and testng tme are observed clearly. Hence, we only report the results for TREC 05 Genomcs dataset. Fgure 12. Comparson of AM wth other feature selecton methods n terms of F1 measure on OHSUMED dataset for our SVM The tranng tme complexty of Nave Bayes usng AM s O(NL d +MV) where N s the number of documents, L d s the average document length, M s the number of categores and V s the total terms n the vocabulary. As N, L d, M and V are all constant durng the tranng phase, the tranng tme of our algorthm s constant (Fgure 7). The features whose AM s above the threshold are ept. The space complexty of our Nave Bayes usng AM s O(MV). As the sze of V decreases when the threshold ncreases, there s a slght drop n the tranng tme. Though there s margnal decrease n tranng tme durng feature selecton phase, the tme complexty for applyng AM on Nave Bayes classfer s lnear and s faster than other commonly used algorthms such as SVM. The tme complexty n the testng phase s O(ML v ), where L v s the total number of unque terms per testng document. As we start selectng fewer features (ncrease the threshold), the value of M remans constant, whle the value of L v decreases. Ths s because fewer features are avalable n the feature set and hence, less unque terms are used from each document n testng set. Hence, as we

13 Table 5. Statstcal comparson of AM and other feature selecton algorthms wth respect to F1 measure (Pared t- test) Algorthm Datasets Reuters Newsgroups WebKB OHSUMED Genomcs Odds Rato BNS+F tfdf tfcf Info Gan Ch-Squared OCFS Imp. Gn : AM s statstcally sgnfcantly better than the feature selecton algorthm by 99% confdence ncrease the threshold the testng tme consstently decreases. It s also observed that as the threshold ncreases up to 0.4, F1 measure ncreases whle there s a reducton n testng tme. SVM usng AM In ths secton, we favorably compare our results of applyng AM feature selecton usng SVM to the results usng the same eght feature-selecton algorthm. We vared the threshold to dentfy the optmal F1 measure for each feature selecton method. We demonstrate how AM feature selecton reduces the tranng tme whle mprovng the F1 measure. We also explan the effects of the AM threshold score on the classfcaton results. Comparson wth other feature selecton algorthms for SVM classfer SVM trans wth a tme complexty of O(NL d +MV+MN c ) where N s the number of documents, L d s the average document length, M s the number of categores, V s the total terms n the vocabulary and c s a constant (c 1.2~1.5). SVM n nature s not a scalable algorthm. We use the ModApte splt for Reuters dataset and 9-1 splt for 20 Newsgroups dataset and OHSUMED dataset as gven on the LbSVM dataset webste. We use a standard 4-1 splt for WebKB where the data for three unverstes s used for tranng, and the data for one unversty s used as a testng set. We use these splts as they are readly avalable and commonly used n the pror wors (Wenqan et al., 2007; Yan et al., 2005). AM performs statstcally sgnfcantly better than the eght feature selecton algorthms wth a confdence of 99% (Table 5). Improved Gn Index s the second best performng algorthm for all the four datasets. Therefore, we present the comparson of AM wth Improved Gn Index. Our expermental results on Reuters (Fgure 9) dataset ndcate that AM (F1: 89.1%) performs better than Improved Gn Index (F1: 88.6%) by 0.56%. For 20 Newsgroups dataset (Fgure 10), whch s another dataset that contans news artcles, AM (F1: 78.74%) outperforms mproved gn ndex (F1: 77.3%) by 1.8%. The result on WebKB dataset (Fgure 11), whch s a dataset that contans web pages, ndcates that AM (F1: 76.14%) outperforms Improved Gn Index (F1: 75.54%) by 0.8%. For the OHSUMED dataset (Fgure 12) that contans bomedcal documents, AM (F1: 60.74%) outperforms Improved Gn Index (F1: 58.23%) by 4.3%. Dscusson Our results for SVM usng AM also ndcate that mprovements n OHSUMED, whch s very unbalanced dataset, are better than n other datasets. OHSUMED has the majorty of documents n the frst few (2-3) categores and fewer documents n the other 50 categores. Ths mprovement s acheved due to the selecton of the features that pont to only one category (unambguous features). SVM classfcaton s based on the entre set of terms n the testng document and not on only unambguous features. Hence, the mprovements observed usng SVM are smaller than those observed usng Naïve Bayes classfer. All features from the testng documents are used for classfyng a document. LbSVM always predcts one category for each document. When a category s wrongly predcted, a false postve s generated; a false negatve s also generated because a true predcton s not made. Precson and recall for all the runs usng LbSVM are the same. Precson and recall vary for Naïve Bayes because when the AM threshold s hgh, the number of eywords s sparse and some documents do not contan any terms that are above the thresholds. Such documents are predcted as uncertan and only a false negatve s generated n such cases. As we flter more features from the feature set, the number of uncertan cases ncreases and recall decreases.

14 Fgure 13. Effect of feature selecton on tranng and testng tme of SVM usng AM on OHSUMED dataset Tradeoff of accuracy and tme wth respect to AM thresholds for SVM We now report the effect of the AM threshold on F1 measure and the correspondng tme taen to tran the model and classfy the documents usng SVM classfer s depcted n Fgure 13, whch shows results for OHSUMED dataset. Other datasets also show the same trends. The x-axs represents dfferent threshold values and the y-axs represents mcro-f1 measure and tme. The threshold value ndcates that all features whose scores are above that threshold are selected and the remanng features are fltered. As we apply AM feature selecton, mcro-f1 ncreases (Fgure 13). We obtan the best mcro- F1 when the threshold s set to 0.2. As the threshold s ncreased, the mcro-f1 starts to decrease. Ths ndcates that when the threshold s less than 0.2, most of the features that are fltered are ambguous and lead to an mprovement F1 measure. When the threshold s above 0.2, most of the features that are fltered contan relevant nformaton. Thus, the F1 measure of the classfer decreases. The tranng tme ncludes the feature selecton tme and the tme taen to tran the SVM model. The testng tme s the tme taen by LbSVM to classfy the testng data. Fgure 13 demonstrates that when no feature selecton s used.e., when threshold s equal to zero, the tme taen for tranng on OHSUMED dataset s 3356 seconds. When we reduce the dmensonalty of feature set, by settng the threshold to 0.2, the tranng tme also decreases to 1623 seconds. Ths shows that even though the learnng tme s reduced by more than 50%, we stll obtan better F1 measure than when we do not apply any feature selecton. One of the lmtatons of usng feature selecton algorthm on SVM s that a proper threshold must be found for a gven dataset. We found the threshold for Reuters and WebKB dataset to be 0.2 and for 20 Newsgroups and OHSUMED datasets threshold was 0.3. To further nvestgate ths problem, we expermented on two addtonal standard datasets from statlog collecton (Mche, Spegelhalter & Taylor, 1994) called DNA dataset (3 categores; 2,000 tranng documents; 1,186 testng documents) and Vehcle dataset (4 categores; 761 tranng documents; 85 testng documents). Smlarly, we found that a threshold between 0.2 to 0.3 yelds the best results on all the four datasets we used for our expermentatons. Concluson We presented a new feature selecton algorthm called Ambguty Measure (AM). The underlyng premse behnd the AM approach s the quc dentfcaton of unambguous terms. We defne unambguous terms as features that belong to only one category. We showed how AM s used wth Naïve Bayes classfer. The most unambguous terms (eywords) from the tranng documents are selected usng AM and a classfcaton model s bult. Based on ths model, the documents that are to be classfed are scanned to dentfy the eywords; and the ambguty measures (AM) of the eywords are used to calculate the probablty that the document falls n a specfc category. The category wth the hghest probablty s selected as the category for that document. We emprcally evaluated the performance of our methodology for usng AM wth Nave Bayes classfer usng fve standard benchmar data sets (Reuters 21578, 20 News Groups, WebKB, OHSUMED and TREC 05 Genomcs collecton). Our expermental results demonstrate that AM performs statstcally sgnfcantly better than eght exstng feature selecton algorthms usng fve benchmar datasets wth a confdence of at least 95%. We also appled AM as a preprocessng step for SVM classfcaton algorthm. We showed that AM feature selecton reduces the tranng tme of the SVM classfer, whle mantanng ts effectveness. Experments were performed on four standard benchmar datasets. Our results ndcated that AM performs statstcally sgnfcantly better than the current publshed state of the art feature selecton algorthms on SVM classfer. Our results also ndcated that AM feature selecton mproved over odds rato, nformaton gan, Ch-Squared, BNS+F1 and tfcf on unbalanced datasets le OHSUMED and Genomcs, where majorty of documents belong to only 2-3 categores. Our analyss showed that selectng the features that pont to only one category performs better than selectng features that pont to more than one category. Words that pont to more than one category may

Learning Ensembles of Convolutional Neural Networks

Learning Ensembles of Convolutional Neural Networks Learnng Ensembles of Convolutonal Neural Networks Lran Chen The Unversty of Chcago Faculty Mentor: Greg Shakhnarovch Toyota Technologcal Insttute at Chcago 1 Introducton Convolutonal Neural Networks (CNN)

More information

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b 2nd Internatonal Conference on Computer Engneerng, Informaton Scence & Applcaton Technology (ICCIA 207) Research of Dspatchng Method n Elevator Group Control System Based on Fuzzy Neural Network Yufeng

More information

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel To: Professor Avtable Date: February 4, 3 From: Mechancal Student Subject:.3 Experment # Numercal Methods Usng Excel Introducton Mcrosoft Excel s a spreadsheet program that can be used for data analyss,

More information

MTBF PREDICTION REPORT

MTBF PREDICTION REPORT MTBF PREDICTION REPORT PRODUCT NAME: BLE112-A-V2 Issued date: 01-23-2015 Rev:1.0 Copyrght@2015 Bluegga Technologes. All rghts reserved. 1 MTBF PREDICTION REPORT... 1 PRODUCT NAME: BLE112-A-V2... 1 1.0

More information

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht 68 Internatonal Journal "Informaton Theores & Applcatons" Vol.11 PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION Evgeny Artyomov and Orly

More information

Calculation of the received voltage due to the radiation from multiple co-frequency sources

Calculation of the received voltage due to the radiation from multiple co-frequency sources Rec. ITU-R SM.1271-0 1 RECOMMENDATION ITU-R SM.1271-0 * EFFICIENT SPECTRUM UTILIZATION USING PROBABILISTIC METHODS Rec. ITU-R SM.1271 (1997) The ITU Radocommuncaton Assembly, consderng a) that communcatons

More information

A Preliminary Study on Targets Association Algorithm of Radar and AIS Using BP Neural Network

A Preliminary Study on Targets Association Algorithm of Radar and AIS Using BP Neural Network Avalable onlne at www.scencedrect.com Proceda Engneerng 5 (2 44 445 A Prelmnary Study on Targets Assocaton Algorthm of Radar and AIS Usng BP Neural Networ Hu Xaoru a, Ln Changchuan a a Navgaton Insttute

More information

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques The th Worshop on Combnatoral Mathematcs and Computaton Theory Effcent Large Integers Arthmetc by Adoptng Squarng and Complement Recodng Technques Cha-Long Wu*, Der-Chyuan Lou, and Te-Jen Chang *Department

More information

Uncertainty in measurements of power and energy on power networks

Uncertainty in measurements of power and energy on power networks Uncertanty n measurements of power and energy on power networks E. Manov, N. Kolev Department of Measurement and Instrumentaton, Techncal Unversty Sofa, bul. Klment Ohrdsk No8, bl., 000 Sofa, Bulgara Tel./fax:

More information

Weighted Penalty Model for Content Balancing in CATS

Weighted Penalty Model for Content Balancing in CATS Weghted Penalty Model for Content Balancng n CATS Chngwe Davd Shn Yuehme Chen Walter Denny Way Len Swanson Aprl 2009 Usng assessment and research to promote learnng WPM for CAT Content Balancng 2 Abstract

More information

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University Dynamc Optmzaton Assgnment 1 Sasanka Nagavall snagaval@andrew.cmu.edu 16-745 January 29, 213 Robotcs Insttute Carnege Mellon Unversty Table of Contents 1. Problem and Approach... 1 2. Optmzaton wthout

More information

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol., No., November 23, 3-9 Rejecton of PSK Interference n DS-SS/PSK System Usng Adaptve Transversal Flter wth Condtonal Response Recalculaton Zorca Nkolć, Bojan

More information

ANNUAL OF NAVIGATION 11/2006

ANNUAL OF NAVIGATION 11/2006 ANNUAL OF NAVIGATION 11/2006 TOMASZ PRACZYK Naval Unversty of Gdyna A FEEDFORWARD LINEAR NEURAL NETWORK WITH HEBBA SELFORGANIZATION IN RADAR IMAGE COMPRESSION ABSTRACT The artcle presents the applcaton

More information

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results AMERICAN JOURNAL OF UNDERGRADUATE RESEARCH VOL. 1 NO. () A Comparson of Two Equvalent Real Formulatons for Complex-Valued Lnear Systems Part : Results Abnta Munankarmy and Mchael A. Heroux Department of

More information

Side-Match Vector Quantizers Using Neural Network Based Variance Predictor for Image Coding

Side-Match Vector Quantizers Using Neural Network Based Variance Predictor for Image Coding Sde-Match Vector Quantzers Usng Neural Network Based Varance Predctor for Image Codng Shuangteng Zhang Department of Computer Scence Eastern Kentucky Unversty Rchmond, KY 40475, U.S.A. shuangteng.zhang@eku.edu

More information

HELPFUL OR UNHELPFUL: A LINEAR APPROACH FOR RANKING PRODUCT REVIEWS

HELPFUL OR UNHELPFUL: A LINEAR APPROACH FOR RANKING PRODUCT REVIEWS Zhang & Tran: Helpful or Unhelpful: A Lnear Approach for Rankng Product Revews HELPFUL OR UNHELPFUL: A LINEAR APPROACH FOR RANKING PRODUCT REVIEWS Rchong Zhang School of Informaton Technology and Engneerng

More information

antenna antenna (4.139)

antenna antenna (4.139) .6.6 The Lmts of Usable Input Levels for LNAs The sgnal voltage level delvered to the nput of an LNA from the antenna may vary n a very wde nterval, from very weak sgnals comparable to the nose level,

More information

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985 NATONAL RADO ASTRONOMY OBSERVATORY Green Bank, West Vrgna SPECTRAL PROCESSOR MEMO NO. 25 MEMORANDUM February 13, 1985 To: Spectral Processor Group From: R. Fsher Subj: Some Experments wth an nteger FFT

More information

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme Performance Analyss of Mult User MIMO System wth Block-Dagonalzaton Precodng Scheme Yoon Hyun m and Jn Young m, wanwoon Unversty, Department of Electroncs Convergence Engneerng, Wolgye-Dong, Nowon-Gu,

More information

High Speed, Low Power And Area Efficient Carry-Select Adder

High Speed, Low Power And Area Efficient Carry-Select Adder Internatonal Journal of Scence, Engneerng and Technology Research (IJSETR), Volume 5, Issue 3, March 2016 Hgh Speed, Low Power And Area Effcent Carry-Select Adder Nelant Harsh M.tech.VLSI Desgn Electroncs

More information

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate Comparatve Analyss of Reuse and 3 n ular Network Based On IR Dstrbuton and Rate Chandra Thapa M.Tech. II, DEC V College of Engneerng & Technology R.V.. Nagar, Chttoor-5727, A.P. Inda Emal: chandra2thapa@gmal.com

More information

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION Phaneendra R.Venkata, Nathan A. Goodman Department of Electrcal and Computer Engneerng, Unversty of Arzona, 30 E. Speedway Blvd, Tucson, Arzona

More information

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter Walsh Functon Based Synthess Method of PWM Pattern for Full-Brdge Inverter Sej Kondo and Krt Choesa Nagaoka Unversty of Technology 63-, Kamtomoka-cho, Nagaoka 9-, JAPAN Fax: +8-58-7-95, Phone: +8-58-7-957

More information

The Performance Improvement of BASK System for Giga-Bit MODEM Using the Fuzzy System

The Performance Improvement of BASK System for Giga-Bit MODEM Using the Fuzzy System Int. J. Communcatons, Network and System Scences, 10, 3, 1-5 do:10.36/jcns.10.358 Publshed Onlne May 10 (http://www.scrp.org/journal/jcns/) The Performance Improvement of BASK System for Gga-Bt MODEM Usng

More information

Ensemble Evolution of Checkers Players with Knowledge of Opening, Middle and Endgame

Ensemble Evolution of Checkers Players with Knowledge of Opening, Middle and Endgame Ensemble Evoluton of Checkers Players wth Knowledge of Openng, Mddle and Endgame Kyung-Joong Km and Sung-Bae Cho Department of Computer Scence, Yonse Unversty 134 Shnchon-dong, Sudaemoon-ku, Seoul 120-749

More information

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES IEE Electroncs Letters, vol 34, no 17, August 1998, pp. 1622-1624. ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES A. Chatzgeorgou, S. Nkolads 1 and I. Tsoukalas Computer Scence Department, 1 Department

More information

{webb, cc.gatech.edu. {yorke3, ece.gatech.edu

{webb, cc.gatech.edu. {yorke3, ece.gatech.edu A Dscrmnatve Classfer Learnng Approach to Image Modelng and Identfcaton Byungk Byun, Chn-Hu Lee, 2 Steve Webb, and 2 Calton Pu School of Electrcal & Computer Engr. Georga Insttute of Technology Atlanta,

More information

CS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford University

CS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford University CS345a: Data Mnng Jure Leskovec and Anand Rajaraman Stanford Unversty HW3 s out Poster sesson s on last day of classes: Thu March 11 at 4:15 Reports are due March 14 Fnal s March 18 at 12:15 Open book,

More information

Equity trend prediction with neural networks

Equity trend prediction with neural networks Res. Lett. Inf. Math. Sc., 2004, Vol. 6, pp 15-29 15 Avalable onlne at http://ms.massey.ac.nz/research/letters/ Equty trend predcton wth neural networks R.HALLIDAY Insttute of Informaton & Mathematcal

More information

Application of Linear Discriminant Analysis to Doppler Classification

Application of Linear Discriminant Analysis to Doppler Classification Applcaton of Lnear Dscrmnant Analyss to Doppler Classfcaton M. Jahangr QnetQ St Andrews Road, Malvern WORCS, UK, WR14 3PS Unted Kngdom mjahangr@qnetq.com ABSTRACT In ths wor the author demonstrated a robust

More information

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation 1 Parameter Free Iteratve Decodng Metrcs for Non-Coherent Orthogonal Modulaton Albert Gullén Fàbregas and Alex Grant Abstract We study decoder metrcs suted for teratve decodng of non-coherently detected

More information

Hard Real-Time Scheduling for Low-Energy Using Stochastic Data and DVS Processors

Hard Real-Time Scheduling for Low-Energy Using Stochastic Data and DVS Processors Hard Real-me Schedulng for Low-Energy Usng Stochastc Data and DVS Processors Flavus Gruan Department of Computer Scence, Lund Unversty Box 118 S-221 00 Lund, Sweden el.: +46 046 2224673 e-mal: Flavus.Gruan@cs.lth.se

More information

Appendix E: The Effect of Phase 2 Grants

Appendix E: The Effect of Phase 2 Grants Appendx E: The Effect of Phase 2 Grants Roughly a year after recevng a $150,000 Phase 1 award, a frm may apply for a $1 mllon Phase 2 grant. Successful applcants typcally receve ther Phase 2 money nearly

More information

Estimating Mean Time to Failure in Digital Systems Using Manufacturing Defective Part Level

Estimating Mean Time to Failure in Digital Systems Using Manufacturing Defective Part Level Estmatng Mean Tme to Falure n Dgtal Systems Usng Manufacturng Defectve Part Level Jennfer Dworak, Davd Dorsey, Amy Wang, and M. Ray Mercer Texas A&M Unversty IBM Techncal Contact: Matthew W. Mehalc, PowerPC

More information

Generalized Incomplete Trojan-Type Designs with Unequal Cell Sizes

Generalized Incomplete Trojan-Type Designs with Unequal Cell Sizes Internatonal Journal of Theoretcal & Appled Scences 6(1): 50-54(2014) ISSN No. (Prnt): 0975-1718 ISSN No. (Onlne): 2249-3247 Generalzed Incomplete Trojan-Type Desgns wth Unequal Cell Szes Cn Varghese,

More information

UNIT 11 TWO-PERSON ZERO-SUM GAMES WITH SADDLE POINT

UNIT 11 TWO-PERSON ZERO-SUM GAMES WITH SADDLE POINT UNIT TWO-PERSON ZERO-SUM GAMES WITH SADDLE POINT Structure. Introducton Obectves. Key Terms Used n Game Theory.3 The Maxmn-Mnmax Prncple.4 Summary.5 Solutons/Answers. INTRODUCTION In Game Theory, the word

More information

Webinar Series TMIP VISION

Webinar Series TMIP VISION Webnar Seres TMIP VISION TMIP provdes techncal support and promotes knowledge and nformaton exchange n the transportaton plannng and modelng communty. DISCLAIMER The vews and opnons expressed durng ths

More information

A MODIFIED DIFFERENTIAL EVOLUTION ALGORITHM IN SPARSE LINEAR ANTENNA ARRAY SYNTHESIS

A MODIFIED DIFFERENTIAL EVOLUTION ALGORITHM IN SPARSE LINEAR ANTENNA ARRAY SYNTHESIS A MODIFIED DIFFERENTIAL EVOLUTION ALORITHM IN SPARSE LINEAR ANTENNA ARRAY SYNTHESIS Kaml Dmller Department of Electrcal-Electroncs Engneerng rne Amercan Unversty North Cyprus, Mersn TURKEY kdmller@gau.edu.tr

More information

EEG Channel Selection Using Decision Tree in Brain-Computer Interface

EEG Channel Selection Using Decision Tree in Brain-Computer Interface EEG Channel Selecton Usng Decson ree n Bran-Computer Interface Mahnaz Arvaneh * Cunta Guan Ka Keng Ang and Hok Cha Quek * * School of Computer Engneerng Nanyang echnologcal Unversty Sngapore Insttute for

More information

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems 0 nd Internatonal Conference on Industral Technology and Management (ICITM 0) IPCSIT vol. 49 (0) (0) IACSIT Press, Sngapore DOI: 0.776/IPCSIT.0.V49.8 A NSGA-II algorthm to solve a b-obectve optmzaton of

More information

New Applied Methods For Optimum GPS Satellite Selection

New Applied Methods For Optimum GPS Satellite Selection New Appled Methods For Optmum GPS Satellte Selecton Hamed Azam, Student Member, IEEE Department of Electrcal Engneerng Iran Unversty of Scence &echnology ehran, Iran hamed_azam@eee.org Mlad Azarbad Department

More information

Test 2. ECON3161, Game Theory. Tuesday, November 6 th

Test 2. ECON3161, Game Theory. Tuesday, November 6 th Test 2 ECON36, Game Theory Tuesday, November 6 th Drectons: Answer each queston completely. If you cannot determne the answer, explanng how you would arrve at the answer may earn you some ponts.. (20 ponts)

More information

Chaotic Filter Bank for Computer Cryptography

Chaotic Filter Bank for Computer Cryptography Chaotc Flter Bank for Computer Cryptography Bngo Wng-uen Lng Telephone: 44 () 784894 Fax: 44 () 784893 Emal: HTwng-kuen.lng@kcl.ac.ukTH Department of Electronc Engneerng, Dvson of Engneerng, ng s College

More information

RC Filters TEP Related Topics Principle Equipment

RC Filters TEP Related Topics Principle Equipment RC Flters TEP Related Topcs Hgh-pass, low-pass, Wen-Robnson brdge, parallel-t flters, dfferentatng network, ntegratng network, step response, square wave, transfer functon. Prncple Resstor-Capactor (RC)

More information

Digital Transmission

Digital Transmission Dgtal Transmsson Most modern communcaton systems are dgtal, meanng that the transmtted normaton sgnal carres bts and symbols rather than an analog sgnal. The eect o C/N rato ncrease or decrease on dgtal

More information

FEATURE SELECTION FOR SMALL-SIGNAL STABILITY ASSESSMENT

FEATURE SELECTION FOR SMALL-SIGNAL STABILITY ASSESSMENT FEAURE SELECION FOR SMALL-SIGNAL SABILIY ASSESSMEN S.P. eeuwsen Unversty of Dusburg teeuwsen@un-dusburg.de Abstract INRODUCION hs paper ntroduces dfferent feature selecton technques for neural network

More information

Exploiting Dynamic Workload Variation in Low Energy Preemptive Task Scheduling

Exploiting Dynamic Workload Variation in Low Energy Preemptive Task Scheduling Explotng Dynamc Worload Varaton n Low Energy Preemptve Tas Schedulng Lap-Fa Leung, Ch-Yng Tsu Department of Electrcal and Electronc Engneerng Hong Kong Unversty of Scence and Technology Clear Water Bay,

More information

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation T. Kerdchuen and W. Ongsakul / GMSARN Internatonal Journal (09) - Optmal Placement of and by Hybrd Genetc Algorthm and Smulated Annealng for Multarea Power System State Estmaton Thawatch Kerdchuen and

More information

Networks. Backpropagation. Backpropagation. Introduction to. Backpropagation Network training. Backpropagation Learning Details 1.04.

Networks. Backpropagation. Backpropagation. Introduction to. Backpropagation Network training. Backpropagation Learning Details 1.04. Networs Introducton to - In 1986 a method for learnng n mult-layer wor,, was nvented by Rumelhart Paper Why are what and where processed by separate cortcal vsual systems? - The algorthm s a sensble approach

More information

Control Chart. Control Chart - history. Process in control. Developed in 1920 s. By Dr. Walter A. Shewhart

Control Chart. Control Chart - history. Process in control. Developed in 1920 s. By Dr. Walter A. Shewhart Control Chart - hstory Control Chart Developed n 920 s By Dr. Walter A. Shewhart 2 Process n control A phenomenon s sad to be controlled when, through the use of past experence, we can predct, at least

More information

Resource Allocation Optimization for Device-to- Device Communication Underlaying Cellular Networks

Resource Allocation Optimization for Device-to- Device Communication Underlaying Cellular Networks Resource Allocaton Optmzaton for Devce-to- Devce Communcaton Underlayng Cellular Networks Bn Wang, L Chen, Xaohang Chen, Xn Zhang, and Dacheng Yang Wreless Theores and Technologes (WT&T) Bejng Unversty

More information

Performance Analysis of the Weighted Window CFAR Algorithms

Performance Analysis of the Weighted Window CFAR Algorithms Performance Analyss of the Weghted Wndow CFAR Algorthms eng Xangwe Guan Jan He You Department of Electronc Engneerng, Naval Aeronautcal Engneerng Academy, Er a road 88, Yanta Cty 6400, Shandong Provnce,

More information

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode A Hgh-Senstvty Oversamplng Dgtal Sgnal Detecton Technque for CMOS Image Sensors Usng Non-destructve Intermedate Hgh-Speed Readout Mode Shoj Kawahto*, Nobuhro Kawa** and Yoshak Tadokoro** *Research Insttute

More information

Safety and resilience of Global Baltic Network of Critical Infrastructure Networks related to cascading effects

Safety and resilience of Global Baltic Network of Critical Infrastructure Networks related to cascading effects Blokus-Roszkowska Agneszka Dzula Przemysław Journal of Polsh afety and Relablty Assocaton ummer afety and Relablty emnars, Volume 9, Number, Kołowrock Krzysztof Gdyna Martme Unversty, Gdyna, Poland afety

More information

NETWORK 2001 Transportation Planning Under Multiple Objectives

NETWORK 2001 Transportation Planning Under Multiple Objectives NETWORK 200 Transportaton Plannng Under Multple Objectves Woodam Chung Graduate Research Assstant, Department of Forest Engneerng, Oregon State Unversty, Corvalls, OR9733, Tel: (54) 737-4952, Fax: (54)

More information

Distributed Fault Detection of Wireless Sensor Networks

Distributed Fault Detection of Wireless Sensor Networks Dstrbuted Fault Detecton of Wreless Sensor Networs Jnran Chen, Shubha Kher, and Arun Soman Dependable Computng and Networng Lab Iowa State Unversty Ames, Iowa 50010 {jrchen, shubha, arun}@astate.edu ABSTRACT

More information

Priority based Dynamic Multiple Robot Path Planning

Priority based Dynamic Multiple Robot Path Planning 2nd Internatonal Conference on Autonomous obots and Agents Prorty based Dynamc Multple obot Path Plannng Abstract Taxong Zheng Department of Automaton Chongqng Unversty of Post and Telecommuncaton, Chna

More information

A Novel Optimization of the Distance Source Routing (DSR) Protocol for the Mobile Ad Hoc Networks (MANET)

A Novel Optimization of the Distance Source Routing (DSR) Protocol for the Mobile Ad Hoc Networks (MANET) A Novel Optmzaton of the Dstance Source Routng (DSR) Protocol for the Moble Ad Hoc Networs (MANET) Syed S. Rzv 1, Majd A. Jafr, and Khaled Ellethy Computer Scence and Engneerng Department Unversty of Brdgeport

More information

A MODIFIED DIRECTIONAL FREQUENCY REUSE PLAN BASED ON CHANNEL ALTERNATION AND ROTATION

A MODIFIED DIRECTIONAL FREQUENCY REUSE PLAN BASED ON CHANNEL ALTERNATION AND ROTATION A MODIFIED DIRECTIONAL FREQUENCY REUSE PLAN BASED ON CHANNEL ALTERNATION AND ROTATION Vncent A. Nguyen Peng-Jun Wan Ophr Freder Computer Scence Department Illnos Insttute of Technology Chcago, Illnos vnguyen@t.edu,

More information

Space Time Equalization-space time codes System Model for STCM

Space Time Equalization-space time codes System Model for STCM Space Tme Eualzaton-space tme codes System Model for STCM The system under consderaton conssts of ST encoder, fadng channel model wth AWGN, two transmt antennas, one receve antenna, Vterb eualzer wth deal

More information

Inverse Halftoning Method Using Pattern Substitution Based Data Hiding Scheme

Inverse Halftoning Method Using Pattern Substitution Based Data Hiding Scheme Proceedngs of the World Congress on Engneerng 2011 Vol II, July 6-8, 2011, London, U.K. Inverse Halftonng Method Usng Pattern Substtuton Based Data Hdng Scheme Me-Y Wu, Ja-Hong Lee and Hong-Je Wu Abstract

More information

A Simple Satellite Exclusion Algorithm for Advanced RAIM

A Simple Satellite Exclusion Algorithm for Advanced RAIM A Smple Satellte Excluson Algorthm for Advanced RAIM Juan Blanch, Todd Walter, Per Enge Stanford Unversty ABSTRACT Advanced Recever Autonomous Integrty Montorng s a concept that extends RAIM to mult-constellaton

More information

Document Comparison with a Weighted Topic Hierarchy

Document Comparison with a Weighted Topic Hierarchy Document Comparson wth a Weghted Topc Herarchy A. Gelbukh, G. Sdorov, and A. Guzmán-Arenas Natural Language Laboratory, Center for Computng Research (CIC), Natonal Polytechnc Insttute (IPN), Mexco Cty

More information

Comparison of Two Measurement Devices I. Fundamental Ideas.

Comparison of Two Measurement Devices I. Fundamental Ideas. Comparson of Two Measurement Devces I. Fundamental Ideas. ASQ-RS Qualty Conference March 16, 005 Joseph G. Voelkel, COE, RIT Bruce Sskowsk Rechert, Inc. Topcs The Problem, Eample, Mathematcal Model One

More information

Fast Code Detection Using High Speed Time Delay Neural Networks

Fast Code Detection Using High Speed Time Delay Neural Networks Fast Code Detecton Usng Hgh Speed Tme Delay Neural Networks Hazem M. El-Bakry 1 and Nkos Mastoraks 1 Faculty of Computer Scence & Informaton Systems, Mansoura Unversty, Egypt helbakry0@yahoo.com Department

More information

MULTIPLE LAYAR KERNEL-BASED APPROACH IN RELEVANCE FEEDBACK CONTENT-BASED IMAGE RETRIEVAL SYSTEM

MULTIPLE LAYAR KERNEL-BASED APPROACH IN RELEVANCE FEEDBACK CONTENT-BASED IMAGE RETRIEVAL SYSTEM Proceedngs of the Fourth Internatonal Conference on Machne Learnng and Cybernetcs, Guangzhou, 18-21 August 2005 MULTIPLE LAYAR KERNEL-BASED APPROACH IN RELEVANCE FEEDBACK CONTENT-BASED IMAGE RETRIEVAL

More information

ROBUST IDENTIFICATION AND PREDICTION USING WILCOXON NORM AND PARTICLE SWARM OPTIMIZATION

ROBUST IDENTIFICATION AND PREDICTION USING WILCOXON NORM AND PARTICLE SWARM OPTIMIZATION 7th European Sgnal Processng Conference (EUSIPCO 9 Glasgow, Scotland, August 4-8, 9 ROBUST IDENTIFICATION AND PREDICTION USING WILCOXON NORM AND PARTICLE SWARM OPTIMIZATION Babta Majh, G. Panda and B.

More information

The Impact of Spectrum Sensing Frequency and Packet- Loading Scheme on Multimedia Transmission over Cognitive Radio Networks

The Impact of Spectrum Sensing Frequency and Packet- Loading Scheme on Multimedia Transmission over Cognitive Radio Networks Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. The Impact of Spectrum Sensng Frequency and Pacet- Loadng

More information

STATISTICS. is given by. i i. = total frequency, d i. = x i a ANIL TUTORIALS. = total frequency and d i. = total frequency, h = class-size

STATISTICS. is given by. i i. = total frequency, d i. = x i a ANIL TUTORIALS. = total frequency and d i. = total frequency, h = class-size STATISTICS ImPORTANT TERmS, DEFINITIONS AND RESULTS l The mean x of n values x 1, x 2, x 3,... x n s gven by x1+ x2 + x3 +... + xn x = n l mean of grouped data (wthout class-ntervals) () Drect method :

More information

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

A study of turbo codes for multilevel modulations in Gaussian and mobile channels A study of turbo codes for multlevel modulatons n Gaussan and moble channels Lamne Sylla and Paul Forter (sylla, forter)@gel.ulaval.ca Department of Electrcal and Computer Engneerng Laval Unversty, Ste-Foy,

More information

[Type text] [Type text] [Type text] Wenjing Yuan Luxun Art Academy of Yan an University Xi an, , (CHINA)

[Type text] [Type text] [Type text] Wenjing Yuan Luxun Art Academy of Yan an University Xi an, , (CHINA) [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 19 BoTechnology 2014 An Indan Journal FULL PAPER BTAIJ, 10(19, 2014 [10873-10877] Computer smulaton analyss on pano tmbre ABSTRACT Wenjng

More information

Research Article Indoor Localisation Based on GSM Signals: Multistorey Building Study

Research Article Indoor Localisation Based on GSM Signals: Multistorey Building Study Moble Informaton Systems Volume 26, Artcle ID 279576, 7 pages http://dx.do.org/.55/26/279576 Research Artcle Indoor Localsaton Based on GSM Sgnals: Multstorey Buldng Study RafaB Górak, Marcn Luckner, MchaB

More information

A Novel UWB Imaging System Setup for Computer- Aided Breast Cancer Diagnosis

A Novel UWB Imaging System Setup for Computer- Aided Breast Cancer Diagnosis A Novel UWB Imagng System Setup for Computer- Aded Breast Cancer Dagnoss Xang He, Ja L, Chenxng Wu Electrcal and Computer Engneerng Oakland Unversty, OU Rochester, I 48309, U.S.A xhe2@oakland.edu, l4@oakland.edu,

More information

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6) Passve Flters eferences: Barbow (pp 6575), Hayes & Horowtz (pp 360), zzon (Chap. 6) Frequencyselectve or flter crcuts pass to the output only those nput sgnals that are n a desred range of frequences (called

More information

THEORY OF YARN STRUCTURE by Prof. Bohuslav Neckář, Textile Department, IIT Delhi, New Delhi. Compression of fibrous assemblies

THEORY OF YARN STRUCTURE by Prof. Bohuslav Neckář, Textile Department, IIT Delhi, New Delhi. Compression of fibrous assemblies THEORY OF YARN STRUCTURE by Prof. Bohuslav Neckář, Textle Department, IIT Delh, New Delh. Compresson of fbrous assembles Q1) What was the dea of fbre-to-fbre contact accordng to van Wyk? A1) Accordng to

More information

Guidelines for CCPR and RMO Bilateral Key Comparisons CCPR Working Group on Key Comparison CCPR-G5 October 10 th, 2014

Guidelines for CCPR and RMO Bilateral Key Comparisons CCPR Working Group on Key Comparison CCPR-G5 October 10 th, 2014 Gudelnes for CCPR and RMO Blateral Key Comparsons CCPR Workng Group on Key Comparson CCPR-G5 October 10 th, 2014 These gudelnes are prepared by CCPR WG-KC and RMO P&R representatves, and approved by CCPR,

More information

Classification of Satellite Images by Texture-Based Models Modulation Using MLP, SVM Neural Networks and Nero Fuzzy

Classification of Satellite Images by Texture-Based Models Modulation Using MLP, SVM Neural Networks and Nero Fuzzy Internatonal Journal of Electroncs and Electrcal Engneerng Vol. 1, No. 4, December, 2013 Classfcaton of Satellte Images by Texture-Based Models Modulaton Usng MLP, SVM Neural Networks and Nero Fuzzy Gholam

More information

Time-frequency Analysis Based State Diagnosis of Transformers Windings under the Short-Circuit Shock

Time-frequency Analysis Based State Diagnosis of Transformers Windings under the Short-Circuit Shock Tme-frequency Analyss Based State Dagnoss of Transformers Wndngs under the Short-Crcut Shock YUYING SHAO, ZHUSHI RAO School of Mechancal Engneerng ZHIJIAN JIN Hgh Voltage Lab Shangha Jao Tong Unversty

More information

A Patent Quality Classification System Using a Kernel-PCA with SVM

A Patent Quality Classification System Using a Kernel-PCA with SVM ADVCOMP 05 : The nth Internatonal Conference on Advanced Engneerng Computng and Applcatons n Scences A Patent Qualty Classfcaton System Usng a Kernel-PCA wth SVM Pe-Chann Chang Innovaton Center for Bg

More information

A New Type of Weighted DV-Hop Algorithm Based on Correction Factor in WSNs

A New Type of Weighted DV-Hop Algorithm Based on Correction Factor in WSNs Journal of Communcatons Vol. 9, No. 9, September 2014 A New Type of Weghted DV-Hop Algorthm Based on Correcton Factor n WSNs Yng Wang, Zhy Fang, and Ln Chen Department of Computer scence and technology,

More information

Secure Transmission of Sensitive data using multiple channels

Secure Transmission of Sensitive data using multiple channels Secure Transmsson of Senstve data usng multple channels Ahmed A. Belal, Ph.D. Department of computer scence and automatc control Faculty of Engneerng Unversty of Alexandra Alexandra, Egypt. aabelal@hotmal.com

More information

Ensemble of Distributed Learners for Online Classification of Dynamic Data Streams

Ensemble of Distributed Learners for Online Classification of Dynamic Data Streams 1 Ensemble of Dstrbuted Learners for Onlne Classfcaton of Dynamc Data Streams Luca Canzan, Member, IEEE, Yu Zhang, and Mhaela van der Schaar, Fellow, IEEE arxv:138.5281v1 [cs.lg] 24 Aug 213 Abstract We

More information

Discussion on How to Express a Regional GPS Solution in the ITRF

Discussion on How to Express a Regional GPS Solution in the ITRF 162 Dscusson on How to Express a Regonal GPS Soluton n the ITRF Z. ALTAMIMI 1 Abstract The usefulness of the densfcaton of the Internatonal Terrestral Reference Frame (ITRF) s to facltate ts access as

More information

Low Switching Frequency Active Harmonic Elimination in Multilevel Converters with Unequal DC Voltages

Low Switching Frequency Active Harmonic Elimination in Multilevel Converters with Unequal DC Voltages Low Swtchng Frequency Actve Harmonc Elmnaton n Multlevel Converters wth Unequal DC Voltages Zhong Du,, Leon M. Tolbert, John N. Chasson, Hu L The Unversty of Tennessee Electrcal and Computer Engneerng

More information

Optimizing a System of Threshold-based Sensors with Application to Biosurveillance

Optimizing a System of Threshold-based Sensors with Application to Biosurveillance Optmzng a System of Threshold-based Sensors wth Applcaton to Bosurvellance Ronald D. Frcker, Jr. Thrd Annual Quanttatve Methods n Defense and Natonal Securty Conference May 28, 2008 What s Bosurvellance?

More information

Analysis of Time Delays in Synchronous and. Asynchronous Control Loops. Bj rn Wittenmark, Ben Bastian, and Johan Nilsson

Analysis of Time Delays in Synchronous and. Asynchronous Control Loops. Bj rn Wittenmark, Ben Bastian, and Johan Nilsson 37th CDC, Tampa, December 1998 Analyss of Delays n Synchronous and Asynchronous Control Loops Bj rn Wttenmark, Ben Bastan, and Johan Nlsson emal: bjorn@control.lth.se, ben@control.lth.se, and johan@control.lth.se

More information

Tile Values of Information in Some Nonzero Sum Games

Tile Values of Information in Some Nonzero Sum Games lnt. ournal of Game Theory, Vot. 6, ssue 4, page 221-229. Physca- Verlag, Venna. Tle Values of Informaton n Some Nonzero Sum Games By P. Levne, Pars I ), and ZP, Ponssard, Pars 2 ) Abstract: The paper

More information

AIR FORCE INSTITUTE OF TECHNOLOGY

AIR FORCE INSTITUTE OF TECHNOLOGY MULTIPLE ACCESS INTERFERENCE CHARACTERIZATION FOR DIRECT-SEQUENCE SPREAD-SPECTRUM COMMUNICATIONS USING CHIP WAVEFORM SHAPING THESIS Matthew G. Glen, Captan, USAF AFIT/GE/ENG/04-10 DEPARTMENT OF THE AIR

More information

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13 A Hgh Gan DC - DC Converter wth Soft Swtchng and Power actor Correcton for Renewable Energy Applcaton T. Selvakumaran* and. Svachdambaranathan Department of EEE, Sathyabama Unversty, Chenna, Inda. *Correspondng

More information

Review: Our Approach 2. CSC310 Information Theory

Review: Our Approach 2. CSC310 Information Theory CSC30 Informaton Theory Sam Rowes Lecture 3: Provng the Kraft-McMllan Inequaltes September 8, 6 Revew: Our Approach The study of both compresson and transmsson requres that we abstract data and messages

More information

Optimal Sizing and Allocation of Residential Photovoltaic Panels in a Distribution Network for Ancillary Services Application

Optimal Sizing and Allocation of Residential Photovoltaic Panels in a Distribution Network for Ancillary Services Application Optmal Szng and Allocaton of Resdental Photovoltac Panels n a Dstrbuton Networ for Ancllary Servces Applcaton Reza Ahmad Kordhel, Student Member, IEEE, S. Al Pourmousav, Student Member, IEEE, Jayarshnan

More information

Artificial Intelligence Techniques Applications for Power Disturbances Classification

Artificial Intelligence Techniques Applications for Power Disturbances Classification Internatonal Journal of Electrcal and Computer Engneerng 3:5 28 Artfcal Intellgence Technques Applcatons for Power Dsturbances Classfcaton K.Manmala, Dr.K.Selv and R.Ahla Abstract Artfcal Intellgence (AI)

More information

Latency Insertion Method (LIM) for IR Drop Analysis in Power Grid

Latency Insertion Method (LIM) for IR Drop Analysis in Power Grid Abstract Latency Inserton Method (LIM) for IR Drop Analyss n Power Grd Dmtr Klokotov, and José Schutt-Ané Wth the steadly growng number of transstors on a chp, and constantly tghtenng voltage budgets,

More information

Evaluate the Effective of Annular Aperture on the OTF for Fractal Optical Modulator

Evaluate the Effective of Annular Aperture on the OTF for Fractal Optical Modulator Global Advanced Research Journal of Management and Busness Studes (ISSN: 2315-5086) Vol. 4(3) pp. 082-086, March, 2015 Avalable onlne http://garj.org/garjmbs/ndex.htm Copyrght 2015 Global Advanced Research

More information

Performance Testing of the Rockwell PLGR+ 96 P/Y Code GPS receiver

Performance Testing of the Rockwell PLGR+ 96 P/Y Code GPS receiver Performance Testng of the Rockwell PLGR+ 96 P/Y Code GPS recever By Santago Mancebo and Ken Chamberlan Introducton: The Rockwell PLGR (Precson Lghtweght GPS Recever) + 96 s a Precse Postonng Servce P/Y

More information

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS INTRODUCTION Because dgtal sgnal rates n computng systems are ncreasng at an astonshng rate, sgnal ntegrty ssues have become far more mportant to

More information

Graph Method for Solving Switched Capacitors Circuits

Graph Method for Solving Switched Capacitors Circuits Recent Advances n rcuts, ystems, gnal and Telecommuncatons Graph Method for olvng wtched apactors rcuts BHUMIL BRTNÍ Department of lectroncs and Informatcs ollege of Polytechncs Jhlava Tolstého 6, 586

More information

Cod and climate: effect of the North Atlantic Oscillation on recruitment in the North Atlantic

Cod and climate: effect of the North Atlantic Oscillation on recruitment in the North Atlantic Ths appendx accompanes the artcle Cod and clmate: effect of the North Atlantc Oscllaton on recrutment n the North Atlantc Lef Chrstan Stge 1, Ger Ottersen 2,3, Keth Brander 3, Kung-Sk Chan 4, Nls Chr.

More information

1 GSW Multipath Channel Models

1 GSW Multipath Channel Models In the general case, the moble rado channel s pretty unpleasant: there are a lot of echoes dstortng the receved sgnal, and the mpulse response keeps changng. Fortunately, there are some smplfyng assumptons

More information

Evaluation of short-time speech-based intelligibility metrics

Evaluation of short-time speech-based intelligibility metrics Communcaton: 9th Internatonal Congress on Nose as a Publc Health Problem (ICBEN) 8, Foxwoods, CT Evaluaton of short-tme speech-based ntellgblty metrcs Karen L Payton*, Mona Shrestha Unversty of Massachusetts

More information