Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, Rough Set Approch for Ctegorcl Dt Clusterng Tutut Herwn*, Rozd Ghzl, Iwn Tr Ryd Ynto, nd Mustf Mt Ders Deprtment of Mthemtcs Educton nversts Ahmd Dhln, Yogyrt, Indones Fculty of Informton Technology nd Multmed nverst Tun Hussen Onn Mlys, Johor, Mlys Deprtment of Mthemtcs nversts Ahmd Dhln, Yogyrt, Indones tutut8@ud.c.d* (correspondng uthor), rozd@uthm.edu.my, wn5@gml.com, mmustf@uthm.edu.my Abstrct Clusterng ctegorcl dt s n ntegrl prt of dt mnng nd hs ttrcted much ttenton recently. In ths pper, we focus our dscusson on the rough set theory for ctegorcl dt clusterng. We propose MADE (Mxml Attrbutes DEpendency), n lterntve technque for ctegorcl dt clusterng usng rough set theory tng nto ccount mxmum ttrbutes dependences degree n ctegorcl-vlued nformton systems. Expermentl results on two benchmr CI dtsets show tht MADE technque s better wth the bselne ctegorcl dt clusterng technque wth respect to computtonl complexty nd clusters purty. Keywords: Clusterng; Ctegorcl dt; Informton system; Rough set theory; Attrbutes dependences.. Introducton Clusterng set of obects nto homogeneous clsses s fundmentl operton n dt mnng. The operton s requred n number of dt nlyss tss, such s unsupervsed clssfcton nd dt summton, s well s n the segmentton of lrge homogeneous dtsets nto smller homogeneous subsets tht cn be esly mnged, modeled seprtely nd nlyzed. Recently, mny ttentons hve been pd on the ctegorcl dt clusterng [,], where dt obects re mde up of non-numercl ttrbutes. For ctegorcl dt clusterng, severl new trends hve emerged for the technques n hndlng uncertnty n the clusterng process. One of the populr pproches for hndlng uncertnty s bsed on rough set theory []. The mn de of the rough clusterng s the clusterng dtset s mpped s the decson tble. Ths cn be done by ntroducng decson ttrbute nd consequently, dvde-nd-conquer method cn be used to prtton/cluster the obects. The frst ttempt on rough set-bsed technque s to select clusterng ttrbute proposed by Mzlc et l. []. They proposed two technques,.e., -Clusterng nd TR technques whch re bsed on the b-vlued ttrbute nd mxmum totl roughness n ech ttrbute, respectvely. One of the most successful poneerng rough clusterng technques s An erly verson of ths pper ppered n the Proceedng of Interntonl Conference, DTA, held s Prt of the Future Generton Informton Technology Conference, FGIT, Jeu Islnd, Kore, December -,, CCIS Sprnger-Verlg, pp. 7 8,.
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, Mnmum-Mnmum Roughness (MMR) proposed by Prmr [5]. The technque s bsed on lower, upper nd qulty of pproxmtons of set []. However, snce pplcton of rough set theory n ctegorcl dt clusterng s reltvely new, the focus of MMR s stll on the evluton ts performnce. To ths, the computtonl complexty nd clusters purty re stll outstndng ssues snce ll ttrbutes re consdered for selecton nd obects n dfferent clss pper n cluster, respectvely. In ths pper, we propose MADE (Mxml Attrbutes DEpendency), n lterntve technque for ctegorcl dt clusterng. The technque dffers on the bselne method, where the rough ttrbutes dependences n ctegorcl-vlued nformton systems s used to select clusterng ttrbute bsed on the mxmum degree. Further, we use dvde-ndconquer method to prtton/cluster the obects. We hve succeed n showng tht the proposed technque s ble to cheve lower computtonl complexty wth hgher purty s compred to MMR. The rest of ths pper s orgnzed s follows. Secton descrbes rough set theory. Secton descrbes the nlyss nd comprson of Mzlc s TR nd MMR technques. Secton descrbes the Mxmum Attrbutes Dependency (MADE) technque. Comprson tests of MADE wth MMR technques bsed on Soyben nd Zoo dtsets re descrbed n secton 5. Fnlly, the concluson of ths wor s descrbed n secton.. Rough Set Theory The syntx of nformton systems s very smlr to reltons n reltonl dt bses. Enttes n reltonl dtbses re lso represented by tuples of ttrbute vlues. An S, A, V, f, where nformton system s -tuple (qudruple) u, u, u, s non-empty fnte set of obects, A,,,,, u non-empty fnte set of ttrbutes, V A V s, V s the domn (vlue set) of ttrbute, f : A V s n nformton functon such tht f u, V, for every u, A, clled nformton (nowledge) functon. An nformton system s lso clled nowledge representton systems or n ttrbute-vlued system nd cn be ntutvely expressed n terms of n nformton tble (see Tble ). Tble. An nformton system A f u, f u, f u, f u, u u f f f u, u, u, A f u, u f u, f u, f u, f u, The tme complexty for computng n nformton system S, A, V, f snce there re,,,, A s A A vlues of f u, to be computed, where,,,, nd. Note tht tnduces set of mps t f u : A V tuple t f u,, f u,, f u,,, f u, A,. Ech mp s, where where,,,,. Note tht the tuple t s not necessrly ssocted wth entty unquely (see Tble 7). In n nformton tble, two dstnct enttes could hve the sme tuple representton A A A
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, (duplcted/redundnt tuple), whch s not permssble n reltonl dtbses. Thus, the concept of nformton systems s generlzton of the concept of reltonl dtbses. Defnton. Two elements set of ttrbute A x, y re sd to be -ndscernble (ndscernble by the n S) f nd only f f x f y,,, for every. Obvously, every subset of A nduces unque ndscernblty relton. Notce tht, n ndscernblty relton nduced by the set of ttrbute, denoted by IND, s n equvlence relton. The prtton of nduced by IND s denoted by / nd the equvlence clss n the prtton / contnng x, s denoted by x. The notons of lower nd upper pproxmtons of set re defned s follows. Defnton. (See [].) The -lower pproxmton of, denoted by nd -upper pproxmtons of, denoted by, re defned by x x nd x x It s esly seen tht the upper pproxmton of subset complement nd lower pproxmton by, respectvely., s expressed usng set where denote the complement of reltve to. The ccurcy of pproxmton (ccurcy of roughness) of ny subset respect to A s mesured by, denoted, () / where denotes the crdnlty of. For empty set, we defne. If s unon of some equvlence clsses, then s crsp wth respect to, nd otherwse, f wth. Obvously,. Thus, the set, s rough wth respect to. The ccurcy of roughness n equton () cn lso be nterpreted usng the well-nown Mrczewes-Stenhus (MZ) metrc [7]. y pplyng the Mrczewes-Stenhus metrc to the lower nd upper pproxmtons of subset n nformton system S, we hve D,. () The noton of the dependency of ttrbutes n nformton systems s gven n the followng defnton. be n nformton system nd let D nd C be ny subsets of A. Attrbute D s clled depends totlly on ttrbute C, denoted C D, f ll vlues of ttrbutes D re unquely determned by vlues of ttrbutes C. Defnton. Let S, A, V, f 5
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, In other words, ttrbute D depends totlly on ttrbute C, f there exst functonl dependency between vlues D nd C. The noton of generlzed ttrbutes dependency s gven n the followng defnton. be n nformton system nd let D nd C be ny subsets of A. Degree of dependency of ttrbute D on ttrbutes C, denoted C D, s defned by Defnton. Let S, A, V, f C / D. () Obvously,. Attrbute D s sd to be (totlly dependent) depends totlly (n degree of ) on the ttrbute C f. Otherwse, D s depends prtlly on C. Thus, ttrbute D depends totlly (prtlly) on ttrbute C, f ll (some) elements of the unverse cn be unquely clssfed to equvlence clsses of the prtton / D, employng C. In the followng secton, we nlyze nd compre the Totl Roughness (TR) nd Mn-Mn Roughness (MMR) technques for selectng clusterng ttrbute.. TR nd MMR Technques.. The TR Technque The defnton of nformton system s bsed on the noton of nformton system s stted n secton. From the defnton, suppose tht ttrbute A hs -dfferent vlues, sy,,,, n. Let,,,, n be subset of the obects hvng - dfferent vlues of ttrbute. The roughness of TR technque of the set,,,,n, wth respect to, where, denoted by R, s defned by R,,,, n. () From TR technque, the men roughness of ttrbute A, where, denoted Rough, s evluted s follow A wth respect to ttrbute Rough V R V, (5) where V s the set of vlues of ttrbute A. The totl roughness of ttrbute denoted TR, s obtned by the followng formul A wth respect to ttrbute A, where,
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, TR A Rough A. () As stted n Mzlc et l. [], the hghest vlue of TR, s the best selecton of prttonng ttrbute... The MMR Technque The defnton of nformton system s bsed on the noton of nformton system s stted n secton. From the defnton, suppose tht ttrbute A hs -dfferent vlues, sy,,,, n. Let,,,, n be subset of the obects hvng - dfferent vlues of ttrbute. The roughness of MMR technque of the set,,,, n, wth respect to, where, denoted by R, s defned by MMR,,,, n. (7) It s cler tht MMR technque uses MZ metrc to mesure the roughness of the set,,,, n, wth respect to, where. The men roughness of MMR technque s defned by MMRough V MMR V. (8) Accordng to Prmr et l. [5], the lest men roughness s the best selecton of prttonng ttrbute... Comprson of TR nd MMR technques Proposton 5. The vlue of roughness of MMR technque s the opposte of tht TR technque. Proof. Snce MMR technque uses MZ metrc to mesure the roughness of the set,,,, n, wth respect to, where,.e., then from (7), we hve MMR, R MMR. () Thus, the vlue of men roughness of MMR technque s lso the opposte of tht TR technque (5),.e., 7
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, MMRough V MMR V 8 V V V R V V R V V V R V Rough, for. () The MMR technque s bsed on the mnmum vlue of men roughness n (), wthout clcultng totl roughness (). Ths nlyss nd comprson hs shown tht TR nd MMR technques re provdng the smlr result when used n determnng the clusterng ttrbute. To llustrte tht MMR nd Mzlc s technques provde the sme results, we consder to the followng exmple. Exmple. We consder the dtset n llustrtve exmple of Tble n [5]. Tble. An nformton system n [5] 5 g lue Hrd Indefnte Plstc Negtve Medum Red Moderte Smooth Wood Neutrl Smll Yellow Soft Fuzzy Plush Postve Medum lue Moderte Fuzzy Plstc Negtve 5 Smll Yellow Soft Indefnte Plstc Neutrl g Green Hrd Smooth Wood Postve 7 Smll Yellow Hrd Indefnte Metl Postve 8 Smll Yellow Soft Indefnte Plstc Postve g Green Hrd Smooth Wood Neutrl Medum Green Moderte Smooth Plstc Neutrl In Tble, there re ten obects wth sx ctegorcl-vlued ttrbutes:,,,, 5 nd. Ech ttrbute hs more thn two vlues V,,,,,5,. Snce n ths cse there s no b-vlued ttrbutes, then we cnnot employ Mzlc s C technque. The clculton of TR nd MMR technques must be ppled on ll of the ttrbute vlues for obtnng the clusterng ttrbute. The clculton of TR vlue s bsed on formuls n (), (5) nd (). The technques of TR nd MMR re mplemented n MATLA verson 7... (R8). They re executed sequentlly on processor Intel Core Duo CPs. The totl
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, mn memory s G nd the opertng system s Wndows P Professonl SP. The results of TR nd MMR re gven n the followng Tble nd, respectvely. Attrbute Attrbute Tble. The TR of ll ttrbutes of Tble TR men roughness Rough Rough Rough Rough Rough.88.7.7 Rough Rough Rough Rough Rough.5.7.57.5 Rough Rough Rough Rough Rough.7.55. Rough Rough Rough Rough Rough..587 5 Rough Rough Rough Rough Rough.57..7.7 Rough Rough Rough Rough Rough.75. Tble. The MMR of ll ttrbutes of Tble MMR men roughness Rough Rough Rough Rough 5 Rough..58.8 Rough Rough Rough Rough 5 Rough.75.8.8.75 Rough Rough Rough Rough 5 Rough.58..7 Rough Rough Rough Rough 5 Rough.7.7 5 Rough Rough Rough Rough Rough.88.5 Rough Rough Rough Rough Rough 5.5. sed on Fgure, ttrbute,.e.,.85 hs hgher TR s compred to,,,,5,. Thus, ttrbute s selected s the clusterng ttrbute. Menwhle, bsed on Fgure, two ttrbutes re of eqully of MMR ( nd,.e..58). ut, the second vlue correspondng to ttrbute,.e.. s lower thn tht of,.e..7. Therefore, ttrbute s selected s the clusterng ttrbute.
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, Fgure. The TR vlue of ll ttrbutes of Tble Fgure. The MMR vlue of ll ttrbutes of Tble Tble 5. The computton nd response tme of TR nd MMR Computton Response tme (Sec) TR 7.7 MMR 7.7
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, Fgure. The computton of TR nd MMR Fgure. The response tme of TR nd MMR sed on the result on selectng clusterng ttrbute n Fgures,, nd, t s esly seen tht the decson, computton complexty nd processng tme of TR nd MMR technques re totlly the sme. Thus bsed on Proposton 5, the sttement tht MMR s n extenson of n pproch proposed Mzlc et l. n comprson exmple [5] s therefore consdered s ncorrect nd unresonble. On the other hnd, to cheve lower computtonl complexty n selectng prttonng ttrbute usng MMR, Prmr et l. suggested tht the mesurement of the roughness to be bsed on reltonshp between n ttrbute A nd the set defned s A nsted of clcultng the mxmum wth respect to ll where [5]. As hs been observed by us, ths technque only cn be ppled to very specl dtset. To llustrte ths problem, we consder to the followng exmple.
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, Exmple 7. In Tble, f we consder to mesure the roughness of ttrbute respect to the set of ttrbutes A Tble. A wth, then we get the vlue of modfed MMR s n Tble. The modfed MMR of ll ttrbutes of dtset n [5] Attrbute w.r.t. Men Roughness Rough Rough Rough Rough Rough 5 Rough MMR A A A A A 5 A sed on Tble, we hve not been ble to select clusterng ttrbute. Thus, the suggested technque would led problem,.e., fter clculton of men roughness of ttrbute A wth respect to the set of ttrbutes A, the vlue of MMR usully cnnot preserve the orgnl decson. Thus, ths modfed technque s therefore not relevnt to ll type of dtset. To overcome the problem of computtonl complexty of MMR, n secton, we ntroduce the Mxmum Attrbutes Dependences (MADE) technque to del wth the problem of ctegorcl dt clusterng.. Mxmum Attrbutes DEpendences (MADE) Technque.. MADE technque The MADE technque for selectng prttonng ttrbute s bsed on the mxmum degree of dependency of ttrbutes. The ustfcton tht the hgher of the degree of dependency of ttrbutes mples the more ccurcy for selectng prttonng ttrbute s stted n the Proposton 8. be n nformton system nd let D nd C be ny subsets of A. If D depends totlly on C, then D C, for every. Proposton 8. Let S, A, V, f Proof. Let D nd C be ny subsets of A n nformton system S, A, V, f hypothess, we hve INDC INDD. Furthermore, the prttonng C tht / D, thus, t s cler tht ny equvlence clss nduced by D. From the / s fner thn IND s unon of
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, some equvlence clss nduced by C x D C x. And hence, for every, we hve Consequently D IND. Therefore, for every x, we hve C C D. D D D C C C... Complexty Suppose tht n n nformton system S, A, V, f, there s A ttrbutes. For MADE, the computton of clcultng of dependency degree of ttrbute on ttrbute, where s A A. Thus, the computtonl complexty for MADE technque s of the polynoml O A A. The MADE s lgorthm for selectng clusterng ttrbute s gven n Fgure 5. Algorthm: MADE Input: Dtset wthout clusterng ttrbute Output: Clusterng ttrbute egn Step. Compute the equvlence clsses usng the ndscernblty relton on ech ttrbute. Step. Determne the dependency degree of ttrbute wth respect to ll, where. Step. Select the mxmum of dependency degree of ech ttrbute. Step. Select clusterng ttrbute bsed on the mxmum degree of dependency of ttrbutes. End Fgure 5. The MADE lgorthm As the sme procedure for selectng clusterng ttrbute of MMR, n usng MADE technque, t s recommended to loo t the next lowest dependences degree nsde the ttrbutes tht re ted nd so on untl the te s broen... Exmple The dtset s n nml dtset from Hu [8]. In Tble 7, there re nne nmls wth nne ctegorcl-vlued ttrbutes A ; Hr, Teeth, Eye, Fether, Feet, Et, Ml, Fly nd Swm. The ttrbutes Hr, Eye, Fether, Ml, Fly nd Swm hve two vlues. Attrbutes Teeth hs three vlues, nd other ttrbutes hve four vlues.
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch,. To obtn the dependences degree of ll ttrbutes, the frst step of the technques s to obtn the equvlence clsses nduced by ndscernblty relton of sngleton ttrbutes,.e., dsont clsses of obects whch re contn ndscernble obects. b. y collectng the equvlence clsses, prtton of obects cn be obtned. The prttons re shown n Fgure. c. The dependency degree of ttrbutes cn be obtned usng formul n (). For ttrbute Hr depends on ttrbutes Teeth, Eye, Fether, Feet, Et, Ml, Fly nd Swm, we hve the degrees s shown n Fgure 7. Tble 7. Anml world dtset from [8] Anml Hr Teeth Eye Fether Feet Et Ml Fly Swm Tger Y Ponted Forwrd N Clw Met Y N Y Cheeth Y Ponted Forwrd N Clw Met Y N Y Grffe Y lunt Sde N Hoof Grss Y N N Zebr Y lunt Sde N Hoof Grss Y N N Ostrch N N Sde Y Clw Grn N N N Pengun N N Sde Y Web Fsh N N Y Albtross N N Sde Y Clw Grn N Y Y Egle N N Forwrd Y Clw Met N Y N Vper N Ponted Forwrd N N Met N N N. Hr yes,,,, Hr no 5,,7,8,, / Hr,,,, 5,,7,8,. b. Teeth ponted,,, Teeth blunt,, Teeth no 5,,7,8, / Teeth,,,, 5,,7,8. c. Eye Forwrd,,8,, Eye Sde,,5,,7, / Eye,,8,,,,5,,7. d. Fether no,,,,, Fether yes 5,,7,8, / Fether,,,,, 5,,7,8. e. Feet clw,,5,7,8, Feet hoof,, Feet web, Feet no. / Feet,,5,7,8,,,,,. f. Et Met,,8,, Et grss,, Et grn 5,7, Et fsh. / Et,,8,,,, 5,7,. g. Ml yes,,,, Ml no 5,,7,8,, / Ml,,,, 5,,7,8,. h. Fly no,,,,5,,, Fly yes 7,8, / Fly,,,,5,, 7,8.. Swm yes,,,7, Swm no,,5,8,, / Swm,,,7,,,5,8,. Fgure. The prttons usng sngleton ttrbutes
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, Teeth Eye Fether Feet Et Ml Fly Swm Hr, where Hr, where Hr, where Hr, where Hr, where Hr, where Hr, where Hr, where / Hr / Hr Teeth Eye / Hr / Hr / Hr / Hr / Hr Feet Et, 5,,7,8 Fether. 5,,7,8,, 5,7 Ml Fly / Hr. 5,,, 5,,7,8, 7,8 Swm...... Fgure 7. The ttrbutes dependences Smlr clcultons re performed for ll the ttrbutes. These clcultons re summrzed n Tble 8. Tble 8. The dependences degree of ll ttrbutes from Tble 7 Attrbute Degree of dependency Hr Teeth Eye Fether Feet Et Ml Fly Swm....555. Teeth Hr Eye Fether Feet Et Ml Fly Swm...555. Eye Hr Teeth Fether Feet Et Ml Fly Swm.555. Fether Hr Teeth Eye Feet Et Ml Fly Swm...555.. Feet Hr Teeth Eye Fether Et Ml Fly Swm..555. Et Hr Teeth Eye Fether Feet Ml Fly Swm.555.. Ml Hr Teet h Eye Fether Feet Et Fly Sw m....555. Fly Hr Teet h Eye Fethe r Feet Et Ml Sw m..555.555... 5
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, Swm Hr Teet Eye Fethe Feet Et Ml Fly h r... Fgure 8. The mxml ttrbutes dependences Wth the MADE technque, the frst mxmum degree of dependency of ttrbutes,.e. occurs n ttrbutes Hr (Ml), Eye nd Fether (.e., ) s Fgure 8 shows. The second mxmum degree of dependency of ttrbutes,.e.. occurs n ttrbutes Hr. Thus, bsed on Fgure 8, ttrbute Hr s selected s clusterng ttrbute... Obects splttng For obects splttng, we use dvde-conquer method. For exmple, n Tble 7 we cn cluster (prtton) the nmls bsed on the decson ttrbute selected,.e., Hr/Ml. Notce tht, the prtton of the set of nmls nduced by ttrbute Hr/Ml s,,,, 5,,7,8,. To ths, we cn splt the nmls usng the herrchcl tree s follows. Tger, Cheeth, Grffe, Zebr, Ostrch, Pengun, Albtross, Egle, Vper The obects Tger, Cheeth, Grffe, Zebr Ostrch, Pengun, Albtross, Egle, Vper st possble clusters Tger, Cheeth Grffe, Zebr Ostrch, Pengun, Albtross, Egle Vper nd possble clusters Fgure. The obects splttng The technque s ppled recursvely to obtn further clusters. At subsequent tertons, the lef node hvng more obects s selected for further splttng. The lgorthm termntes
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, when t reches pre-defned number of clusters. Ths s subectve nd s pre-decded bsed ether on user requrement or domn nowledge. 5. Comprson Tests In order to test MADE nd compre t wth MMR, we use two dtsets obtned from the benchmr CI Mchne Lernng Repostory. We use Soyben nd Zoo dtsets re wth 7 nd obects. The purty of clusters ws used s mesure to test the qulty of the clusters [5]. The purty of cluster nd overll purty re defned s Purty Overll Purty the number of dt occurng n both the th cluster nd ts correspondng clss the number of dt n the dt set # of cluster Purty # of cluster Accordng to ths mesure, hgher vlue of overll purty ndctes better clusterng result, wth perfect clusterng yeldng vlue of [5]. The lgorthms of MMR nd MADE for Soyben nd Zoo dtsets re mplemented n MATLA verson 7... (R8). They re executed sequentlly on processor Intel Core Duo CPs. The totl mn memory s Ggbyte nd the opertng system s Wndows P Professonl SP. 5.. Soyben dtset The Soyben dtset contns 7 obects on dseses n soybens. Ech obect cn be clssfed s one of the four dseses nmely, Dporthe Stem Cner (D), Chrcol Rot (D), Rhzocton Root Rot (D), nd Phytophthor Rot (D) nd re descrbed by 5 ctegorcl ttrbutes []. The dtset s comprsed 7 obects for Phytophthor Rot dsese nd obects for ech of the remnng dseses. Snce there re four possble dseses, the obects wll be splt nto four clusters. The results re summrzed n Tble. All of 7 obects belong to the morty clss lbel of the cluster n whch they re clssfed. Thus, the overll purty of the clusters s %. 5.. Zoo dtset Tble. The purty of clusters Cluster D D D D Purty number 7 Overll Purty The Zoo dtset s comprsed of obects, where ech dt pont represents nformton of n nml n terms of 8 ctegorcl ttrbutes []. Ech nml dt pont s clssfed nto seven clsses. Therefore, for MADE, the splttng dt s set t seven clusters. Tble summrzes the results of runnng the MADE lgorthm on the Zoo dtset. 7
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, Tble. The purty of clusters Cluster C C C C C5 C C7 Purty number 5 5 8 7 Overll Purty All of obects belong to the morty clss lbel of the cluster n whch they re clssfed. Thus, the overll purty of the clusters s %. 5.. Comprson The comprson of overll purty, computton nd response tme of MADE nd MMR on Soyben nd Zoo dtsets re gven n Fgures, nd, respectvely. sed on Tble, the MADE technque provdes better soluton compred to MMR technque both n Soyben nd Zoo dtset. Tble. The overll mprovement of MMR by MADE Improvement Clusters Purty Computton Response Tme Soyben 7% % % Zoo % 77% 7% Fgure. The comprson of overll purty 8
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, Fgure. The comprson of computton Fgure. The comprson of response tme. Concluson Ctegorcl dt clusterng technque hs emerged s new trend n technque of hndlng uncertnty n the clusterng process. In ths pper, we hve proposed MADE, n lterntve technque for ctegorcl dt clusterng usng rough set theory bsed on ttrbutes dependences. We hve proven tht MADE technque s generlzton of MMR technque whch s ble to cheve lower computtonl complexty nd hgher clusters purty. Wth ths pproch, we beleve tht some pplctons through MADE wll be pplcble, such s for decson mng, clusterng very lrge dtsets nd etc. Acnowledgement Ths wor ws supported by the grnt of nverst Tun Hussen Onn Mlys.
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, References [] Hung, Z. Extensons to the -mens lgorthm for clusterng lrge dt sets wth ctegorcl vlues. Dt Mnng nd Knowledge Dscovery () (8) 8. [] Km, D., Lee, K., Lee, D. Fuzzy clusterng of ctegorcl dt usng fuzzy centrods. Pttern Recognton Letters 5 () () 7. [] Pwl, Z. Rough sets. Interntonl Journl of Computer nd Informton Scence., 8, 5. [] Mzlc, L.J., He, A., Zhu, Y., Coppoc, S. A rough set pproch n choosng prttonng ttrbutes. Proceedngs of the ISCA th, Interntonl Conference, CAINE-,,. [5] Prmr, D., Wu, T. nd lchurst, J. MMR: An lgorthm for clusterng ctegorcl dt usng rough set theory. Dt nd Knowledge Engneerng, 7, 87 8. [] Pwl, Z. nd Sowron, A. Rudments of rough sets. Informton Scences, 77 (), 7, 7. [7] Yo, Y.Y. Two vews of the theory of rough sets n fnte unverses. Approxmte Resonng, 5 (),, 7. [8] Hu,. Knowledge dscovery n dtbses: An ttrbute orented rough set pproch. PhD thess, nversty of Regn, 5. [] http://rchve.cs.uc.edu/ml/dtsets/soyben+%8smll% [] http://rchve.cs.uc.edu/ml/dtsets/zoo 5
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, Authors Tutut Herwn He s Ph.D. cnddte n Dt Mnng t nverst Tun Hussen Onn Mlys (THM). Hs reserch re ncludes Dt Mnng, KDD nd Rel Anlyss. Rozd Ghzl She receved her.sc. (Hons) degree n Computer Scence from nverst Sns Mlys, nd M.Sc. degree n Computer Scence from nverst Tenolog Mlys. She obtned her Ph.D. degree n Hgher Order Neurl Networs t Lverpool John Moores nversty, K. She s currently techng stff t Fculty of Informton technology nd Multmed, nverst Tun Hussen Onn Mlys (THM). Her reserch re ncludes neurl networs, fuzzy logc, fnncl tme seres predcton nd physcl tme seres forecstng. Iwn Tr Ryd Ynto He s M.Sc. cnddte n Dt Mnng t nverst Tun Hussen Onn Mlys (THM). Hs reserch re ncludes Dt Mnng, KDD nd Rel Anlyss. Mustf Mt Ders He receved the.sc. from nversty Putr Mlys, M.Sc. from nversty of rdford, Englnd nd Ph.D. from nversty Putr Mlys. He s professor of computer scence n the Fculty of Informton Technology nd Multmed, THM, Mlys. Hs reserch nterests nclude dstrbuted dtbses, dt grd, dtbse performnce ssues nd dt mnng. He hs publshed more thn 8 ppers n ournls nd conference proceedngs. He ws pponted s one of edtorl bord members for Interntonl Journl of Informton Technology, World Enformt Socety, revewer of specl ssue on Interntonl Journl of Prllel nd Dstrbuted Dtbses, Elsever,, specl ssue on Interntonl Journl of Cluster Computng, Kluwer,, IEEE conference on Cluster nd Grd Computng, held n Chcgo, Aprl,, nd Mlysn Journl of Computer Scence. He hs served s progrm commttee member for numerous nterntonl conferences/worshops ncludng Grd nd Peer-to-Peer Computng, (GPP 5, ), Autonomc Dstrbuted Dt nd Storge Systems Mngement (ADSM 5, ), WSEAS, Interntonl Assocton of Scence nd Technology, IASTED on Dtbse, etc. 5
Interntonl Journl of Dtbse Theory nd Applcton Vol., No., Mrch, 5