1 the newsletter of the International Computer Science Institute The ICSI GAZETTE volume three issue two march 2005 featured research: bioinformatics in this issue: page 2 As I See It by Nelson Morgan, Director page 3 News Briefs page 4 Featured Alum page 9 Visiting Scholars page 10 Publications Ever since Watson and Crick discovered the double-helix structure of DNA, the field of genetics has faced new challenges. Now, the mapping of the human genome holds vast potential to influence or change our lives. But how do we leap from decoding sequences of genetic data to Eran Halperin recognizing a predisposition to develop cancer, for example? The sheer number of nucleotide bases, (commonly known as A, C, G, and T to high school biology students), complicated by the fact that every gene consists of two chromosomes (one from the mother and one from the father), which creates ambiguity as to which of the two sequences a base belongs to, amounts to a seemingly impossible decoding task for genetics researchers. Not only do they need to count and sort billions of nucleotide bases, they also need to extract the information on each of the two chromosomes (haplotypes) from the combined information gleaned from genetic sequencing technology. This task, though daunting, is no longer impossible, thanks to computational algorithms capable of accurately computing millions of pieces of data per second. The Algorithms Group is increasingly focused on solving questions in biology through the development of algorithms specifically designed to process genetic data. This relatively new field, known as Bioinformatics or Computational Biology, is rapidly changing the way biological research is done. Professor Richard M. Karp and Dr. Eran Halperin are particularly interested in the Diane Starr analysis of genetic regulatory networks and genetic variation and haplotyping. The research currently being conducted by these two talented scientists is providing biologists and genetic disease researchers with valuable information previously unavailable. Richard M. Karp Halperin coauthored a recent study in Science, in which human genetic data across the whole-genome was mapped for individuals from three populations. This was the first such study completed with such a large scope, and the results will enable researchers to study the relationship between genetics and diseases in humans, and is expected to be utilized in the development of indvidually tailored drug treatments for genetically influenced diseases. The study was a collaboration between Halperin, Eleazar Eskin of UCSD, and scientists at Perlegen Sciences, Inc. Researchers at Perlegen sequenced the single-letter variations (called single-nucleotide polymorphisms, or SNPs) in the DNA of 71 individuals of European American, African American, and Han Chinese American ancestry. Subsequently, scientists at the California Institute for Telecommunications and Information Technology (Calit2) at the University of California, San Diego, and the UC Berkeley-affiliated International Computer Science Institute (ICSI) helped analyze the set of over 100 million genotypes from the over 1.5 million SNPs sequenced in each sample by Perlegen. Continued on Page 8 Diane Starr

2 Howell Shaw as i see it by Nelson Morgan, Director I recently had the chance to see the science fiction film, I, Robot. I had read quite a bit of Asimov as a child, and so I was curious how the film would treat the series of stories that Asimov wrote starting in the 1950 s. I wasn t too shocked that they made an awful mess of it. The movie was entertaining, but not only did it miss the little things from the original work (plot, character development, tone), but the major theme was lost. In his robot stories, Asimov consistently opposed what he called the Frankenstein complex. He used this term to characterize the perspective dating back to such legends as the Prometheus and the Tower of Babel, in which those who would dare to achieve godly heights would be punished for their hubris. Whether it was building tall structures, inventing manmade fire, or developing intelligent systems like robots, such creative acts were was often viewed as sacrilege, invariably leading to a terrible fate. To Asimov, these legends represented Humankind s fear of technology, and he was determined to move past them. His stories were intellectual puzzles, not moral allegories, but if they had a point beyond the cleverness it was that technology is what we make of it. His robots were designed to protect humans, first and foremost (check your favorite search engine for the Three Laws of Robotics for further reading on this). I, Robot goes for a cheap copout by replacing the intellectual problem with loud and heated battle scenes, with robots once again becoming villains. (Incidentally, a fantastic 1977 screenplay by Harlan Ellison represented Asimov s perspective very well, and was recently released in book form.) There s a deeper point here than just dissing Hollywood. Any scientist or technologist with a social conscience must consider the potential effects, for good or ill, of the science being discovered or the technology being developed. Most scientists I know (in particular in Berkeley) would not knowingly take part in a project to build better land mines, or more effective nuclear weapons. On the other hand, many of us work on projects that will undoubtedly be used by governments, including their militaries, potentially causing loss of life. The commercial world is no more isolated from Humanity s Inhumanity. Any technological advances of significant importance will be used for purposes that their inventors would not approve of. Do I then endorse the Frankenstein vision? As the reader might guess, I do not. If we only had clubs to work with, we would still undoubtedly pound each other s heads from time to time. Fire, the wheel, electricity, calculus, quantum Any scientist or technologist with a social conscience must consider the potential effects, for good or ill, of the science being discovered or the technology being developed. theory, computers, and the Internet are all used for good and ill, with the distinction being fuzzy, context sensitive, and subjective. We simply play for bigger stakes now we have the chance to have a peaceful, prosperous, and creative world now, and we also have the opportunity to hit each other with bigger clubs. Expanding knowledge is no replacement for expanding wisdom, but there is not a good case to be made for endorsing ignorance. One of the key areas of our expanding knowledge is in biology. At ICSI, we have been gradually increasing our efforts in a key aspect of this science, bioinformatics. The research group working in this area is featured in this issue of the Gazette. Our work in this area is a small part of the amazing international growth in this topic. The effort is particularly being strengthened by the addition of a new key member of our scientific staff: Eran Halperin. Eran previously visited ICSI as a postdoctoral researcher, and has auspiciously begun his tenure at ICSI with a publication in Science. Along with Dick Karp, he will be spearheading the further growth of our bioinformatics effort. Our featured alumnus this issue is Oliver Guenther, the Chair of Information Systems at Humboldt University in Berlin. Oliver is the head of the Förderverein, the German organization that was first organized in the 1980 s to fund ICSI and its German visitor program. He is working hard to develop new collaborative arrangements with German industry to supplement the continuing program funded by the German government. He was one of our first postdoctoral Fellows, and has come back in this new capacity to help ICSI once again. As I ve done in previous columns, I am taking the opportunity to announce the birth of another ICSI baby. Congratulations to Mark Allman and his wife Meredith for the birth of their new Networking researcher, Leah Madison Allman! Finally, I have a sad announcement to make. Ben Gold, a Lincoln Labs researcher who was a biennial ICSI visitor since the early 1990 s, passed away on January 15 th. Ben was one of the most important pioneers in digital signal processing and speech engineering. I was fortunate in having the opportunity to co-teach a Berkeley course with him, and also to collaborate on our 1999 textbook. Most of all, I was fortunate to know him and to share a friendship. He was a wonderful guy and will be missed. ICSI Gazette Page 2

3 news briefs PROFESSOR RICHARD M. KARP, head of the Algorithms Group will speak at the Institute for Systems Biology's Fourth International Symposium in Seattle, Washington in April. Karp is also an invited speaker at the 2nd Brazilian Symposium on Graphs, Algorithms and Combinatorics (GRACO 2005), also in April, in Rio de Janeiro, Brazil. On Friday, February 18, Science published the study "WHOLE-GENOME PATTERNS OF COMMON DNA VARIATION IN THREE HUMAN POPULATIONS." The collaborative study by researchers at ICSI, Perlegen Sciences, and Calit2, developed a map of the human genome for 71 individuals, which is expected to be a major medical tool in fighting genetic diseases. (See cover story for more information.) HAP developer Eran Halperin explains haplotyping The American Association for the Advancement of Science released an article entitled "Map of human genetic variation across populations may promise improved disease treatments" on Thursday, February 17. The article is available online from the AAAS newsroom. DR. ERAN HALPERIN of ICSI's Algorithms group is a co-author of the featured study and is one of the developers of HAP, the haplotyping software used in the study. (See cover story for more information.) PROFESSOR RICHARD M. KARP, (Algorithms), and RODED SHARAN, a former postdoc at ICSI, participated in a study comparing DNA of baker's yeast, a worm, and fruit flies, which was published in the Proceedings of the National Academy of Sciences on February 8. Karp and Sharan collaborated with researchers from UCSD and the Institute of Genetics in Karlsruhe, Germany on this study. (See cover story for more information.) On January 17, 2005, the Spanish Ministry for Education and Science (MEC) posted the 2005 CALL FOR PROPOSALS FOR ICSI'S SPANISH VISITOR PROGRAM. This program provides support for postdoctoral scholars from Spain to participate in research visits to ICSI. The call is now closed, and this year s visitors have been selected. A new version of HAP, a haplotyping software program designed to aid in identification of genetic factors in human disease, will be released in ERAN HALPERIN (Algorithms) and Eleazar Eskin of UC San Diego, developed this software, which has so far aided hundreds of researchers worldwide in decoding more than 4000 sets of genetic data. More information on HAP is available online at hap/ as well as in a recent article available on the California Institute for Telecommunications and Information Technology website, CLARISSA, a spoken dialogue system to be used by astronauts, was delivered to the International Space Station (ISS) by a Russian rocket on December 25, ICSI researcher MANNY RAYNER worked with NASA's Beth Ann Hockey to develop the system, which will read procedures to astronauts and can also answer simple questions, take notes, and display pictures. CLARISSA is fully voiceoperated so that astronauts CLARISSA developer Manny Rayner can have their hands free at all times. More information about CLARISSA is available on the NASA website at JARON LANIER, an inventor, musician and researcher affiliated with ICSI, was featured in The Register on December 27th, He discusses his views on technology, religion and Continued on Page 6 Diane Starr ICSI Gazette Page 3

4 featured alum: Oliver Günther In January 1988, with a newly received PhD in Computer Science from UC Berkeley, Oliver Guenther was looking for a postdoctoral appointment. Having heard about the German government's postdoctoral program with ICSI, he contacted Ron Kay (the acting director) who hired him, making him ICSI's second postdoc. Another ICSI alum, Jeff Bilmes, then an undergraduate at UC Berkeley, served as Guenther's research assistant. Together they continued work based on Guenther's PhD thesis on spatial databases. At the time, relational databases contained only numbers and strings as data. Guenther worked on incorporating non-standard data into relational databases, focusing in particular on geometric objects. Following his postdoctoral work at ICSI, he accepted an assistant professorship at UC Santa Barbara, working with the National Center for Geographic Information and Analysis to incorporate his database work for use with geographic information systems. But Guenther was particularly interested in applying his computer science work to social issues. The opportunity presented itself in Guenther accepted a post at FAW in Ulm, Germany, to head a group using information systems for environmental management and protection in close cooperation with the state and federal government, and their environmental information systems program. In 1993, he accepted a full professorship at Humboldt University in Berlin. In the aftermath of WWII, communism, and finally the tearing down of the Berlin wall in 1989, the venerable university felt the need to move away from political ties to the old regime, so they entered a restructuring period that involved hiring a lot of new staff. Guenther was hired in the Business and Economics department as the Chair of Information Systems, a position he still holds today. Guenther wrote a textbook on Environmental Information Systems (EIS), which gives a conceptual framework for EIS by structuring the data flow into 4 phases: data capture, storage, analysis, and metadata management. Oliver Guenther In the late 1990's, his focus shifted to privacy and security issues on the Internet and pricing of IT services. He is currently studying the implications of new technology on personal privacy. A major privacy issue is how Internet users can utilize online services without divulging too much personal information. One very successful method involves the execution of the services on encrypted data. "Data encryption is routinely used to protect communication on the Internet. If you are communicating with your bank, the communication is encrypted (using HTTPS = "hypertext transfer protocol secure"), so if somebody gets hold of your communication on the way (e.g., by listening in on a router), they won't be able to decipher the Photo Courtesy of Oliver Guenther ICSI Gazette Page 4

5 actual information," says Guenther. " But this does not protect you from potential abuse on the part of your bank, your retailer, or service provider - the party you're communicating with. They receive the data in the clear and if there is abuse on their part, there is not much you can do about it. You have to trust them. What I am currently working on are mechanisms to keep the data encrypted on the service provider side and then have the service being executed on the encrypted data. This way not even the service provider has access to the data in the clear, which is a major improvement to the user's privacy." Data encryption is routinely used to protect communication on the Internet...What I am currently working on are mechanisms to keep the data encrypted on the service provider side and then have the service being executed on the encrypted data. This way not even the service provider has access to the data in the clear, which is a major improvement to the user s privacy. Consumers state that privacy is very important in online transactions, but their behavior does not back this up. A recent study conducted by Guenther and his colleagues on e-commerce focused on consumer privacy issues on the Internet. In the study, people were asked to shop online for digital cameras. In order to find their ideal camera, they were asked questions by a friendly "digital agent". People became so comfortable with this "agent" that they divulged very personal information, despite their views on the importance of privacy. The results of this study have been accepted for publication in Communications of the ACM. Another current privacy issue involves the use of RFID chips, which are used to track purchases, but could also be used to track the people who purchase items. This raises the question, when does tracking cross an ethical line? How can technology be put to use without infringing on consumers' personal privacy? One proposed solution involves the use of passwords - the consumer assigns a personal password to the chip when making a purchase, so that information from the chip cannot be extracted without the password. Guenther has strong ties to ICSI today. Although he has not been recently involved in ICSI research, he is working hard to revive the ICSI-FV visitor program. This tenyear program was ICSI's foundation and major source of financing in the late 1980's and early 1990's, with about 200 German visitors and an annual budget of two to three million dollars. The program also had a positive impact on German computer science, because of the quality of research and the quality of research staff at ICSI. Today, the German Academic Exchange Service (DAAD) provides funding for a small number of postdoctoral fellowships, but Guenther proposes to expand the program and provide more support for the postdoctoral fellows by finding industrial matching funds. The industrial partners would have the incentive of selecting specific candidates to work on specific projects of interest to the funding company, making the program particularly attractive to companies with offices in the US as well as Germany. With Guenther's commitment and enthusiasm, ICSI hopes the visitor program will be restored to its once flourishing state. ICSI Gazette Page 5

6 news briefs Continued from Page 3 metaphysics with reporter Andrew Orlowski in the article. ICSI s newly redesigned WEBSITE was launched in February, The new site features a searchable database of ICSI publications and talks, as well as easy to find information on current research, news, sponsors, and visitor programs. DR. ERAN HALPERIN, a former postdoc with Professor Richard M. Karp of the Algorithms Group, returned to ICSI in January to be a full time research scientist. Dr. Halperin's focus is bioinformatics, a field that utilizes computational algorithms in biological research. He is particularly interested in genetics, haplotypes and disease association. SRINI NARAYANAN, leader of ICSI's AI group, started a new position as Adjunct Professor of Cognitive Science at UC Berkeley in January. ICSI's XORP project was featured by Alex Salkever in Business Week on November 29, The article, which discusses XORP's viability as an alternative to commercial routers, can be found online at Atanu Ghosh give a talk about XORP The article points out that while the current Howell Shaw version of XORP is not at all threatening to router companies such as Cisco, future modifications made possible through funding from Intel, Microsoft and the National Science Foundation could create an attractive alternative to more expensive routers. DR. YANG LIU, a researcher in ICSI's Speech Group, received her Ph.D. in Electrical and Computer Engineering from Purdue University on December 19, Dr. Liu's thesis, "Structural Event Detection for Rich Transcription of Speech" was defended on December 3rd to committee members Mary Harper, Elizabeth Shriberg (also of ICSI's Speech Group), Leah Jamieson, and Jack Gandour. Her thesis research was conducted at ICSI as part of the DARPA EARS program. Congratulations Dr. Liu! Leah Madison Allman Congratulations to MARK ALLMAN of the Networking Group and his wife Meredith. Their daughter Leah Madison was born on November 30, 2004 at 10:05 pm. Leah weighs 8 pounds 9 ounces and is 21 inches long. Congratulations also to recent visitors from Spain JAVIER MACIAS AND SIRA PALAZUELOS on the birth of their son Javier, and to TUOMO PIRINEN, a recent visitor from Finland, and his wife, on the birth of their son Onni-Veikko. ICSI Gazette Page 6

7 BOARD OF TRUSTEES MEETING ICSI's annual BOARD OF TRUSTEES MEETING was held on October 1, At the meeting, ICSI Founders RON KAY AND NORBERT SZYPERSKI received Distinguished Service Awards, and former EECS Chair Shankar Sastry was elected Chairman of the Board. The distinguished service awards were given for the seminal role in the creation of ICSI. Ron Kay was present and thanked the Board and Director Nelson Morgan for the honor, while Norbert Szyperski was unable to pick up his award in person because he was receiving an award from the President of Germany at the same time. Chairman Shankar Sastry Ron Kay receives a Distinguished Service Award Norbert Szyperski receives an award from the President of Germany 2004 Chairman of the Board CLIFF HIGGERSON nominated PROFESSOR SHANKAR SASTRY to succeed him as the new Chairman. Professor Sastry accepted the nomination, and the Board voted unanimously to elect him. ICSI is grateful to Mr. Higgerson for his service as Chairman of the Board, and looks forward to the contributions Professor Sastry will make as the new Chairman. ICSI TOWN MEETING The ICSI TOWN MEETING was held on October 12, Professor Nelson Morgan, Director of ICSI, gave a presentation to ICSI staff on the State of the Institute. The presentation covered current research at ICSI, the financial state of the Institute, and several announcements regarding new staff and changes to the Board of Trustees. During the town meeting, Professor Morgan announced that DR. SRINI NARAYANAN would be taking over the responsibility of leading the Artificial Intelligence group. Dr. Narayanan was selected by Professor Feldman to succeed him in this leadership role at ICSI after Professor Feldman was chosen to be the Project Director of Cognitive Science at UC Berkeley, a position he has held since July, ICSI Gazette Page 7

8 bioinformatics: HAP and protein paths Continued from Page 1 This analysis was made possible using the haplotyping program HAP, which was developed by Halperin and Eskin. Over 190 million data points were processed, which would have taken months using other haplotyping software. With HAP running on a cluster of computers at UCSD s Calit2, the data was processed in less than twelve hours. HAP s speed, without loss of accuracy, is what sets it apart from other haplotyping programs such as the widely used PHASE program. This speed enables researchers to process much more data than was previously possible, thus reducing the cost of research on genetic variation. In addition, HAP offers the convenience of being located on a webserver, which means that researchers don t have to download any software in order to process their data. More than 4,000 data sets were processed using HAP in the past year, representing a few hundred users. Halperin and Eskin are currently testing a new version of HAP with increased accuracy and speed, as well as new algorithms to process various types of data. The HAP webserver is currently located at http: // Sequences of genetic data provide a code for the creation of proteins. Proteins and molecular machines composed of interacting proteins carry out much of the Computational algorithms are powerful tools in answering biological questions. The ability of computers to compute millions of equations accurately and quickly allows researchers to study data that they couldn t possibly compute manually in a lifetime. work of the living cell. If DNA is a blueprint, proteins are the foundation built from the blueprint. In order to fully understand the cellular functions, it is necessary to understand the protein-protein interactions that occur within the cells. Richard M. Karp, the head of the Algorithms Group, recently collaborated on a comparative study of protein-protein interaction in three species: yeast, worm and fly. Karp, along with former ICSI postdoc Roded Sharan, and scientists from UCSD and Germany, developed a widely applicable computational method which provides strong statistical evidence for hundreds of protein complexes and pathways, and thousands of protein functions and protein-protein interactions that had not previously been observed. This evidence could not have been gleaned from protein interactions in a single species, but requires supporting evidence of conserved interactions across all three. This analysis, Conserved Patterns of Protein Interaction in Multiple Species, published in the February 8 issue of Proceedings of the National Academy of Sciences, shows that at least seventy-one network regions are conserved across all three species. Many of the predicted functions and interactions would not have been identified from genetic sequence similarity alone, demonstrating that network comparisons provide essential biological information beyond what is gleaned from the genome. The interpretation of large-scale protein network data depends on the ability to identify significant substructures present in the data, a task that requires computational power. Karp and his collaborators designed linear-time algorithms to find paths in networks under several biologically-motivated constraints, and applied their methodology to find protein pathways in the yeast protein-protein interaction network. Their algorithm is capable of reconstructing known pathways and identifying functionally enriched paths. As these two studies show, computational algorithms are powerful tools in answering biological questions. The ability of computers to compute accurately and quickly millions of equations allows researchers to study data that they couldn t possibly compute manually in a lifetime. The whole-genome map gives the medical field a valuable tool in determining the relationship of human genetics to disease, and ultimately, to provide better treatments and preventative measures against genetically influenced diseases. The protein-protein interaction study provides valuable insight into the functioning of cellular machinery, and evidence that interactions between proteins provide additional biological information not available through genetic sequences alone. ICSI Gazette Page 8

9 visiting scholars Since its inception, ICSI has had a strong international program consisting primarily of ties with specific countries. Current formal agreements exist with Finland, Germany, Spain, and Switzerland. FROM FINLAND Konsta Koppinen (Speech) Jusso Rantala (Haas) Pauli Ristola Pasi Sarolahti (Networking) Pertti Tormala FROM GERMANY Mesut Guenes (Networking) Rene Beier (Algorithms) Left to right: Xavier Anguera, Frantisek Grezl, Marc Ferras, Michael Pucher. Photos on this page by Diane Starr In addition, we often have visitors associated with specific research and projects. FROM SPAIN Alberto Amengual (AI) Carmen Pelaez (Speech) Pedro Ruiz (Networking) Carlos Subirats (AI) Francisco Valverde (AI) FROM SWITZERLAND (IM2) Vincenzo Pallotta (AI) Matthias Zimmerman (Speech) AMI (EUROPEAN UNION) Mateo Aguilo (Speech) Xavier Anguera (Speech) Marc Ferras (Speech) Frantisek Grezl (Speech) Rosa Martinez (Speech) Michael Pucher (Speech) XORP Bruce Simpson (Networking) Marko Zec (Networking) ICSI open house The ICSI Open House took place on Thursday, February 10. This year's event included an introduction to the Institute, a feature presentation on XORP: extensible Opensource Router Platform, and demonstrations of the following technological developments at ICSI: Rapid Speech Prototyping: The Tamil Recognizer, The ICSI Meeting Project, HAP: Haplotype Resolution using Imperfect Phylogeny, CoPe: Community of Practice Environment, and FrameNet. Clockwise from top left: Eran Halperin demonstrates haplotyping softare, Adam Janin demonstrates speech technology for meetings, Chuck Fillmore demonstrates new FrameNet developments, Collin Baker discusses FrameNet with a guest, Chuck Wooters demonstrates Tamil Recognizer ICSI Gazette Page 9

11 tamil recognizer R. SOMMER AND V. PAXSON. Exploiting Independent State for Network Intrusion Detection. Technical Report TUM-10420, Technische Universitaet Muenchen, November S. STANIFORD, D. MOORE, V. PAXSON, AND N. WEAVER. The Top Speed of Flash Worms. Proceedings of the ACM CCS WORM, October J. STRIBLING, I.G. COUNCILL, J. LI, M. FRANS KAASHOEK, D.R. KARGER, R. MORRIS, AND S. SHENKER. OverCite: A Cooperative Digital Research Library. 4th International Workshop on Peer-to-Peer Systems (IPTPS'05), Ithaca, NY, February T. TANTAU. Uber Strukturelle Gemeinsamkeiten der Aufzaehlbarkeitslassen von Turingmaschinen und ednlichen Automaten. In Ausgezeichnete Informatikdissertationen 2003, Lecture Notes in Informatics, , Springer-Verlag, M. WALFISH, J. STRIBLINE, M. KROHN, H. BALAKRISHNAN, R. MORRIS, AND S. SHENKER. Middleboxes No Longer Considered Harmful. Proceedings of USENIX OSDI, San Francisco, CA, December J. WANG, A. MAJUMDAR, K. RAMCHANDRAN, AND H. GARUDADRi. Robust Video Transmission over a Lossy Network Using a Distributed Source Coded Auxiliary Channel. In Proceedings of Picture Coding Symposium (PCS), N. WEAVER, I. HAMADEH, G. KESIDIS, AND V. PAXSON. Preliminary Results Using Scale-Down to Explore Worm Dynamics. Proceedings of the ACM CCS WORM, October C. WENDELKEN AND L. SHASTRI. Multiple Instantiation and Rule Mediation in SHRUTI. Connection Science, 16: , E.P. XING, W. WU, M.I JORDAN, AND R.M. KARP. LOGOS: A Modular Bayesian Model for de novo Motif Detection. Journal of Bioinformatics and Computational Biology 2, , A SEARCHABLE DATABASE OF ICSI PUBLICATIONS IS AVAILABLE ON OUR WEBSITE AT CIG-BIN/PUBS/INDEX.PL Researchers at ICSI develop speech recognition technology for India with UC Berkeley s TIER Project (Technology and Infrastructure for Emerging Regions). Members of ICSI's Speech Group are working to provide speech recognition technology to UC Berkeley's TIER project (Technology and Infrastructure for Emerging Regions). Technologies developed for the affluent world and imported to developing regions often fail to address key challenges in cost, deployment, power consumption, and support for semi-literate and illiterate users. This issue prompted Chuck Wooters and Madelaine Plauché to begin developing a speech recognizer for Tamil, a language spoken by over 50 million people in Southeast India, where illiteracy rates hover around 50 percent for men and between 60 percent to 80 percent for women. Speech recognition, especially in combination with text-to-speech and visual user interfaces, may be key in increasing access to technology to those with limited or no literacy. One of the many challenges these researchers face is how to develop speech recognition that can support different dialects or accents. Tamil, like most languages, refers to several mutually understandable dialects varying by geography, social factors (caste), and register (formal vs. informal). Tamil is itself one of dozens of languages spoken in India, suggesting for many users of the technology, Tamil may not be their primary language. An ideal speech recognition system would support the small but significant variations in pronunciation due to these factors. UC Berkeley researchers have designed a data collection system for a Tamil speech recognition system, and collected data on 30 words in Tamil using 8 speakers at the UCB campus and 22 speakers in India. In February, Plauché traveled to three different sites in Tamil Nadu, India to collect data of more native speakers saying digits and command words in Tamil. She has sampled the speech of both uneducated and educated speakers, urban and rural speakers, and speakers from three different geographical dialects, to further investigate how dialect and geographic location may affect recognition error rates. Plauché and Wooters have also created a sample speech recognition application called Tamil Market, a simulated toll free telephone number that allows farmers and other rural community members to get information on market prices for agricultural crops, local weather, and agricultural innovations over the telephone. Tamil Market is for information inquiry, relies exclusively on speech recognition for user interface, and runs on a vocabulary of only 30 Tamil words: digits, some crop names, and selected command words. By allowing Tamil speakers in both urban and rural areas in Tamil Nadu to test this application, they hope to learn how speech recognition can be successfully integrated into useful applications for developing regions. ICSI Gazette Page 11

BOARD OF TRUSTEES
Charlie Bass
Hervé Bourlard
Beth Burnside
Jordan Cohen
Adele Goldberg
Greg Heinzinger
Clifford Higgerson
Richard Karp
Pedro Lizcano
Jitendra Malik
Nelson Morgan
David Nagel
Ilpo Reitmaa
Shankar Sastry
Scott Shenker
Wolfgang Wahlster