R&D PROGRAMME FOR EXCHANGE OF ICT RESEARCHERS & ENGINEERS FINAL PROJECT REPORT Research on IGOS Linux Voice Command in Bahasa Indonesia to Aid People with Different Abilities and Illiteracy REPORTED BY : MUSTAPA WANGSAATMADJA/ CHIEF RESEARCHER On July 24 th, 2008 R&D CENTER, PT TELEKOMUNIKASI INDONESIA 2008
FINAL PROJECT REPORT 1 - Executive Summary This project aims to support information access for difable people (especially those with visual problem and handicap that impair their ability to use keyboard) and illiterate people that need a jump start to information access. The implementation of voice command in the operating system will help people in these segments to interact and to work with computer. With easier access to computer, we hope that people in these segments can improve their life quality and can even support their daily activities through ICT. This research project proposes to build a voice command engine for Linux Operating System. This system will incorporate Bahasa Indonesia Automatic Speech Recognition (ASR). Spoken command in bahasa Indonesia will be recognized by the voice command engine and will subsequently trigger a relevant action in operating system environment. The action may vary from opening/starting and stopping an application, writing document, connecting to the Internet to send/retrieve e-mail, managing operating system, etc. The chosen Linux distribution is the one endorsed by the government of Indonesia to be used either in homes or offices. The distribution name is IGOS Nusantara Linux. IGOS stands for Indonesia Goes Open Source. We hope that by choosing Linux as research platform, we can help advancing the effort of Indonesian government in reducing the use of illegal software and in providing a cheap and affordable system yet having rich features such as those offered by voice command system proposed in this project. This final report will shows some progress report of APT Project from initiating through the final phase. According to the time schedule agreed, the project is scheduled to be finished at the end of July 2008. This report describe briefly about the project summary, and also in depth description through some detail of each project phase. With the support of Advanced Telecommunication Research (ATR) Spoken Language Research Lab. in Japan, we come up with a better solution and share some experience on developing voice command system in Linux. Some budget allocation and cost related to team activities are summarized on financial report and more detail on the following attachment(s). On June 2008, all team gathered in ATR Japan, to show some demonstration of IGOS Linux Voice Command (ILVC) Application has developed, continue with discussion on specific topic of ASR such as Screen Reader, Corpus based Text to Speech and also some design on the next version of ILVC, which will be a robust system.
1.1 Introduction This project has been selected by the Ministry of Internal Affairs and Communications of Japan and APT committee as one of the recipient of HRD program for exchange of ICT researchers and engineers award. This project commenced in September 2007 and is expected to be completed by July 2008. Total budget required for this project is $46,897. For the interim project term the fund transfered was $28,138. This research & development project involves 10 (ten) Indonesian researchers/engineers and 2 (two) ATR researchers/engineers. This number of researchers/engineers can be summarized as follows: 1. TELKOM R&D CENTER: 6 researchers/engineers 2. Badan Pengkajian dan Penerapan Teknologi (BPPT) : 4 researcher/engineer 3. Advanced Telecommunication Research (ATR) Japan : 2 researchers/engineers With recent progress in the development of bahasa Indonesia speech processing technology, it is our hope that in this project we can initiate an effort to implement this technology for helping our government effort in reducing the digital divide as well as providing equal opportunity in accessing ICT and hence accessing information with special target are people with handicap, visual problems and illiteracy It is also expected that by acquiring the IGOS Linux Voice Command in Bahasa Indonesia, Telkom R&D will be able to implement the technology in Indonesia s national telecommunication system. This will likely to result in the shifting of telecommunication paradigm in Indonesia, where people will start to make use of the speech recognition technology in areas such as voice-to-sms, digit collection by voice, telecommunication security by voice, and many more. 2 - Project Objective The objectives of this project are : a. To implement speech processing technology in voice command engine as an alternative user interface for IGOS Linux so that this official Linux distribution will be accessible to the difables and the illiterates. b. To provide a way for difables and illiterates to interact with computer hence they can have far better access to ICT. This will hopefully contribute in: o Reducing digital divide o Providing equal opportunity to ICT access o Providing equal opportunity to information access o Providing a jumpstart in accessing information to the illiterates so that they can start learning reading & writing faster. o Eventually helping difables and illiterates to improve their life quality.
3 - Overview of IGOS Linux Voice Command in Bahasa Indonesia System Indonesia as an emerging market provides great opportunity for ICT growth. This growth is supported by development of ICT infrastructures and introduction of new technologies that offer not only easy access to information but also possibility of rich content deployment. While this development is mostly centered in big cities, the government with the help from private sectors makes a continuous effort to increase access to ICT in suburban area and eastern Indonesia in order to reduce digital divide and to provide an equal opportunity to ICT and hence information access. However, this effort may potentially face problems when the majority of suburban and underdeveloped areas in eastern Indonesia are concerned. This is due to the poor literacy level in these areas. Another concern is the relatively high number of difable people. According to WHO estimation, percentage of difable people could reach between 7% to 10% in developing countries including Indonesia. From that number, recent survey stated that around 57% falls within handicap and blind categories, in which 40% suffer from handicap and 17% suffer from blindness. With 220 million total populations, this will account for more than 10 million people with handicap and blindness. This is a rough number. With recent progress in the development of bahasa Indonesia speech processing technology, it is our hope that in this project we can initiate an effort to implement this technology for helping our government effort in reducing the digital divide as well as providing equal opportunity in accessing ICT and hence accessing information with special target are people with handicap, visual problems and illiteracy. The voice command engine consists of three main components: The voice command interface. Functions of this component are receiving voice command spoken by user, doing end-point detection (EPD) to detect end of speech, create an appropriate form of speech signal and subsequently transfer the speech signal to ASR for recognition task Bahasa Indonesia ASR. Functions of this component are functions of general speech recognition task such as speech feature extraction, acoustic modeling and decoding. This component will output a word hypothesis in the form of text. This text will be transferred to voice command processor for the next process. Voice command processor. Functions of this component are validating command, and making function or procedure calls to implement relevant actions in applications or operating system
Fig 1. Voice Command Engine Structure This system will become the basic for another possible improvement such as Text-To- Speech. This application will be very helpful in reading text in various applications such as word documents, web contents and e-mail contents. Part of the application is using JULIAN/JULIUS engine as an Open Source library. This modules then will be delivered as a free distribution to IGOS Linux. Another version of software will be for commercial and developed by using the ATRIUMS (ATRASR Technology), so that there are a lot of part on our application that need a collaboration with researchers at ATR Labs. This collaboration will help all team member to enhance the programming experience on development using ATR Library. Fig. 2. Linux Voice Command Information Flow Illustration Opening/Sending E-Mail We plan to hold a system demo and to invite several concerned bodies to see the demo and to give assessment and feedback on the system. The assessment result and feedback will hopefully become useful information in system enhancements in the future.
4 - Summary of Project Cost As we ve defined earlier in our proposal, which describe milestones of projects, currently we have finished the project and has finalized Building the Application which is focus on development of speech & text corpus specific to Linux command domain, development of ASR subsystem, Linux voice command design and development, system functional test, and documentation. Summary of the financial statement, are attached to different document and related to receipt, as we send on the following files : 1. Accounting Report IGOS Linux Voice Command Filename : Accounting_IGOS_LVC.pdf 2. Receipts on Accounting Report Filename : Receipts_IGOS_LVC.pdf As the summary of accounting report, we would like to describe as follows,
5 - Summary of the Project Activities In the beginning, all team member has been agreed to develop the voice command engine on Linux platform. Apparently, there are so many applications built-in on IGOS Linux Distro hence we must decide which application that will be used and what words that is related to be recorded as voice commands. At the Interim Report reported, both Indonesian researchers/engineers and the Japanese counterparts involved in this project are actively supporting the project. Some activities were virtually held via e-mail, Instant Messenger and telephone line. Steps in Completion of Project will be describe below, 5.1 Steps of Project Activities As described earlier, all team now has been agreed on the scope of work and project time line directed by project manager. And with the great opportunity on join development with ATR, we re expecting a technology transfer in speech processing implementation specific to Linux operating system. ATR is a leading institution in Japan in speech processing technology research and development and its implementation (particularly in Linux OS) that we believe can be the source of knowledge that can be used in further research and development of the system proposed in this project. Project activities that had been completed by July 28 th, 2008 are as follow: 1. Project Planning o Planning, including project time schedule, project resource allocation and management. o Analysis & Design of Linux voice engine system o Preparing text data, lexicon and phoneme list. o Speaker List Preparation and Recording 2. Hardware Procurement & Facilities Development o Computer equipments for speech analysis o Recording equipment such as A/D Converter and Sensitive Microphone 3. System Development o Workshop on IGOS Linux Voice Command o Linux voice command Design System, o Development of speech & text corpus specific to GNOME Linux, Firefox and Thunderbird Mail Client command domain, o System Functional Test, and o Documentation (including interim reporting) 4. ATRASR Transfer Knowledge and Benchmark Visit to Japan Inter Process Communications (IPC) ATRASR Software Review of Next Research and Development Plan Voice Recognition Service on Mobile Telco Operator 5. Workshop and Field Trial 6. Documentation (including final report)
5.2 Detail Project Activities As described earlier, all team now has been agreed on the scope of work and project time line directed by project manager. And with the great opportunity on join development with ATR, we re expecting a technology transfer in speech processing implementation specific to Linux operating system. 5.3 Detail Research Activities within Interim Report All detail research activities that has been completed within interim project term are : 5.3.1 Project Planning First task is all team must be familiar and understand with IGOS Nusantara Linux Distro. This is a pre-requisite environment to all team member, before stepping to the next process of development. Team then gathered for an envisioning process, to discuss a planning and design process as part of the project steps. This discussion although held in Bandung, Indonesia, but we also collaborate with experts and researchers at ATR Spoken Language Translation Research Laboratories, Kyoto Japan. All team member then agreed to develop the voice command engine on Linux platform, where in the depth discussion we also choose specific application that mostly people widely used in their desktop, such as Email Client and Internet Browser Applications. These application must be a common and widely used by people, especially for difables. After choosing the specific application, we move on to defining the words that is appropriate and often used by people, such as open email, new email, reply email and so on. At this point, we re coordinating between team members and Japanese counterpart regarding detail concept, related joint research and development activities, and resource sharing. This is the first activity that will guide all team to the rest of all project steps. 5.3.2 Hardware Requirement and Facilities Development The important step of the project before team go through Development step is the recording process, with specifically collecting up to 50 male s voices and 50 female s voices. Their voice will be an important source for creating an extracted voice characteristic and acoustic modeling. Some hardware requirements and facilities we ve set up for the IGOS Linux Voice Command Recording phase are : 1. Number of speakers is 100. Each speaker will utter 367 sentences (Specifically for GNOME Desktop, Email Client and Internet Browser Application Domain). Speaker must be able to speak standard Indonesia language (EYD), and the age of speakers must range between 15 until 60 years old and sexually balanced (male and female).
2. The soundproof room for speech recording purpose must follow the following technical requirement - Sound insulation level : approx. 30dB - 50dB at 500Hz - Background noise level : approx. 20dB - Reverberation time : approx. 0 0.3 s - Follow the ISO140/4:1998 for reverberation time measurement. 3. The list of lexicon for each corresponding word will be developed. Along with the known phonemes list, the text data and the lexicon list will be used in building the phonetic balanced text data. The recording environment which is the configuration of the infrastructure used in the development of IGOS Linux Voice Command in Bahasa Indonesia can be described as follows, IGOS Linux Voice Command Configuration System The configuration above basically is a standard of project development environment team used for IGOS Linux Voice Command Project. This recording configuration will be part of our team discussion topic to Researchers at ATR Japan. This technology transfer session event will be held in Japan around April 2008. 5.3.3 System Development To achieve the same perception of IGOS Linux Voice Command project, we held a workshop on 17 th January 2008, that is attended not just by team member, but also by education institution in Bandung, such as UNPAD and STT TELKOM. Some documentation on workshop are attached on attachment #2 of this document.
This workshop is aim to gather some information and brainstorming of all team member on solutions to IGOS Linux Voice Command Projects. Hopefully we can come up to a better solutions and also cost efficient to each activity. The workshop also focus on library used by IGOS to accommodate the command that is triggered by voice, that are BSD socket API, X11 signal / event, Library / Function call: Glib, GTK+ and GDK. These are some development kit and API (Application Programming Interface) team used on ILVC : GTK+, GDK, GLib, wnck, SPI, ATK, Xmu, X11. Detail such as follow : GLib: g_spawn_command_line_async() wnck: (Window Navigator Construction Kit) wnck_screen_get_default () wnck_screen_get_active_window (screen) wnck_window_minimize (window) SPI: SPI_generateKeyboardEvent () GDK: gdk_display_get_pointer() Xmu/X11: XKeysymToKeycode() Also at this stage, all team member agree on words that is going to be use as a voice command. These words then about which words that we are going to use for recording step. Detail of words are described on attachment #4. 5.4 Detail Research Activities after Interim Report Research activities after Interim Report are focus on continuity of Development phase of IGOS Linux Voice Command System. The application is targeted to be finished at the end of May 2008, so that we can prepare for an application demonstration at ATR Labs in Japan. All detail research activities that has been completed are : 5.4.1 ATRASR Transfer Knowledge and Benchmark Visit to Japan Technology transfer session with ATR Japan researchers/engineers. The session was held for 10 days on June 2008 in ATR headquarter in Kyoto discussing the detail of Automatic Speech Recognition (ASR) System built by ATR and the techniques in developing the speech database/corpus that is valid for use in the ASR system. Among the topics discussed with ATR were : 1. ASR system details and the preparation for speech database development on Linux Platform 2. Techniques for designing, collecting/recording speech and managing large vocabulary speech database. 3. Quality checking of the speech database/corpus. 4. Implementation of End Point Detection technique in ASR system. Discussion with ATR Researcher regarding progress achieved, problem identification during the speech recording & the possible solution and strategy for the next activities will be held in Bandung, Indonesia in mid July 2008.
Some interesting topics also comes up in our discussion as a continuity of the APT IGOS Linux Voice Command Project and become our ideas on Future Research & Development to APT. Some of our topics are about Indonesian Speech Dictation, Indonesian Screen Reader and a Voice Portal. Regarding to those topics, we also conducting a meeting with NTT DoCoMo, as the biggest Telco-Operator in Tokyo, Japan. The discussion are subject on delivering new voice-based application services to the market. Through the discussion we are understood that in some situations, the voice based application services might need some larger bandwidth and also a new network equipments. As a trip back to Keihanna, we drop by to one of the larger factory of Mitsubishi in Osaka, that produce a high quality of fiber optic infrastructures, so that there would be no barrier on delivering high bandwidth services to the end user mobile terminal. This will encourage management of PT TELKOM to be more confidence on creating new voice-based application service. Voice-Based Application Services, at this moment is operating in NTT as we are shown by an expert in ATR Labs, a Voice Recognition Application that combine with Map through Geographical Positioning System (GPS) on a mobile device. This hopefully will be able to trigger researchers in Indonesia to come up with next commercial release of a voice-based application service. Minutes of Meeting regarding to the visit, also attached to the report as different filename : MoM TELKOM-ATR June 28, 2008.pdf. 5.4.2 Workshop and Field Trial As a part of Joint Cooperation Program and also stated on the Minutes of Meeting, we invite Japanese researchers to attend a workshop in Indonesia regarding on preparation of IGOS Linux Voice Command Implementation. This workshop was divided into 2 sessions. First session was held from July 14 to July 17, 2008. A Workshop will be held in Jakarta and invite Ms Sakriani Sakti from ATR to transfer knowledge on ATRASR. And the 2 nd session was held on July 18, 2008 in Bandung to trial the Application with the difable peoples. Regarding to the workshop in Jakarta, most of all teams members were attend the workshop and had a transfer knowledge from ATR on a how-to use the ATRASR v.3.31 Product. Teams also encouraged to implement the voice that has been recorded earlier and plan to build an acoustic model for speech recognition. The workshop was closed on Thursday, July 17 2008 as team has briefly understood on ATRASR Technology. The field trial of IGOS Linux Voice Command (IGOS LVC) Project, was held in Bandung on July 18, 2008. We had encouraged more than 9 difable peoples to try the voice-based command application as well as make some comment and evaluation review of the services. This event also attended by news media, so that we had prepared a pers release on IGOS LVC.
5.4.3 Documentation (including final report)project Planning Last step of the project, is to document all the effort and process we have been through from the beginning of the project. Including the photo and finalize the financial report. Regarding to completion of project, some documents related to the project are compiled on different files. 6 - Recommendations As we've finished on the development of IGOS Linux Voice Command Project, we are very appreciate to Asia Pacific Telecommunity (APT) for funding the project and hopefully will continue to fund the upcoming joint research. We also want to elaborate peoples, whose interested in Voice Command Recognition, to continue our research specifically on the following topics : 1. Indonesian Speech Dictation It s a Speech Recognition Application that perform text dictation to an application (exp Email, Word), Must focus on specific domain (medical, business) 2. Indonesian Screen Reader It s a Text to Speech Application that perform Screen Reader and also respond to a voice command 3. Voice Portal It s a telecommunication service that will deliver an up to date information from news portal by voice. The topics above also become our recommendation to APT. As TELKOM is a public company, therefore we also make some effort to implement IGOS Linux Voice Command Application as our Corporate Social Responsibility (CSR) Program. We hope, that illiterate peoples in villages far from city can also take benefit of application.