Working Paper No. 20 15 May 2003 ENGLISH ONLY UN STATISTICAL COMMISSION and UN ECONOMIC COMMISSION FOR EUROPE STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT) CONFERENCE OF EUROPEAN STATISTICIANS Joint ECE-EUROSTAT Work Session on Population and Housing Censuses (Ohrid, The former Yugoslav Republic of Macedonia, 21-23 May 2003) Session I Supporting paper EXPERIENCES IN SSORM IN NEW TECHNOLOGIES IN THE PROCESS OF CONDUCTING, PROCESSING AND EDITING THE DATA FROM THE CENSUS OF POPULATION, HOUSEHOLDS AND DWELLINGS 2002 1 Submitted by The former Yugoslav Republic of Macedonia 2 I. INTRODUCTION 1. Since the end of the Second World War, eight censuses of the population, households and dwellings were taken in the country i.e. in 1948, 1953, 1961, 1971 1981, 1991, 1994 and in 2002. All censuses taken in the period between 1961 and 1991 were taken in accordance with the relevant UN recommendations, and their contents being extended with each census. The last conducted census with a 10 years periodicity in 1991 did not have complete coverage. That s why in the same decade in 1994 was conducted other census. 2. Collecting the data in all previous censuses, as well as in the last one which was carried out from 1st till 15th of November 2002 with the reference date 31 October 2002 at 24:00 hrs, was carried out on a traditional way on a printed papers interviewing the people from educated enumerators. 1 This paper has been reproduced in the form in which it was received, because the ECE secretariat received it too late in order to subject it to any form of formal editing. It is being made available to all participants in this form so that they will be able to read it prior to coming to the meeting. The designations of all terms used in the paper are those of the authors and are not necessarily those of either the United Nations or the European Commission. 2 Paper prepared by Slobodan Karajovanovic, Mira Deleva, Biljana Ristevska, Valentina Dzukeska and Aleksa Petreski, State Statistical Office of the Republic of Macedonia.
2 II. THE DATA PROCESSING IN THE PREVIOUS CENSUSES 3. The data processing in the previous censuses were done in accordance with the available technology and equipment and human recourses in the State Statistical Office. The last census from 1994 was realized on mainframe (UNISYS A12 MCP/AS operative system). Data entry, controls, derived data for tabulations and making reports were realized in Cobol 74 program language. Data processing were produced on the BATCH PROCESSING Principle. In the phase of data entry we put limited ON LINE controls for accuracy of data. The errors from control of contingents and logical controls were produced on paper. After that they were checked in the basic material (questionnaires) from the field. The corrections were entered in computer. This way of processing in that IT environment in SSORM was acceptable, but in new conditions, when we have modern technologies and tools available, we have to implement the new way of data processing. III. THE DATA PROCESSING IN THE PILOT CENSUS IN THE YEAR 1999 AND TEST CENSUS IN THE YEAR 2000 4. In the frame of the Census of the Population, Households, Dwellings in the country, 2002, the SSORM conducted two (pilot and test) censuses, the first one in October 1999 and the second one in April 2000. 5. The SSORM paid a special attention in investigation possibilities of applying new technologies or instruments for data collection or data processing, which will contribute for shortening some phase or skipping whole census phase. For data collection in the Test Census 2000 we used the handheld computers and for data processing we had new way different from previous censuses (development of applications with visual tools). The data processing was tested and was done in client-server environment: - RISC UNIX (AIX) operational system - DB2 relation database - PC Windows NT, access 97 - ODBC (open data base connectivity) connection between client and server 6. Because of the complex organization (field support, collecting data, data protection, training of the enumerators) and high price of the handheld computers this design wasn't accepted. IV. FOLLOW UP OF THE CENSUS FIELD ACTIVITIES AND FIRST RESULTS 7. For the conducting of the Census, 39 census regions were establish, covering 123 municipalities. Two laptops and one printer has been installed in each census region for the needs of state instructors and commissions of the census regions for caring out the census and communicating with the State Statistical Office (SSO). Therefore this activities were taken: - Two direct telephone lines were established in the SSO for communication between the census regions and SSO - RAS and FTP server were established in the SSO under Windows 2000 server Environment - E-mail addresses were opened for each census region for intercommunication, SSO toward census regions, and communication between census regions - Application for finance was developed - Application for controlling of the daily and cumulatively flow of the census, was developed - Application for automatic transfer of the data from census regions toward SSO was established - Application for automatic gathering of the data was established (each day until 9 o'clock in the morning) - Application for input, controll on first results and generating of the reports, was developed - The on line technical support for 24 hrs a day, was provided
3 8. After finishing the census field activities and during the collection of the material in the census regions, first results were prepared on the census districts level. Using the application first results were input from the P-4 - Control form on the census district level: - Total population i. Total population present in the country ii. Absent abroad up to one year iii. Foreign citizens present in the country less then a year, refugees, humanitary cared persons and other - Total households - Total dwellings 9. The data inputted on this way was sent directly in SSO with the modem line where were controlled and prepared the first results in appropriated form. 10. One part of the used equipment was donation from USAID i.e. 38 laptop Compaq Presario 2800 and 38 Epson C42 printers (20 of them will be in SSO ownership), and the other laptops IBM ThinkPad R31 are in the SSO ownership. V. DATA PROCESSING 11. The processing of the census data is carried out in two phases, manual processing and electronic processing. 12. The aim of the manual processing is to prepare the data for the electronic processing and this contains the following processes: - Sorting out and controlling the material from the census - Establishing the number of the enumerated units - Connection of the data in the census forms (visual control) - Controll of the households, families - Logical control of the range and the connection of the data in the census forms - Coding of certain data (ethnic characteristics - ethnic affiliation, religion, mother tongue; economic characteristics - occupation, main activity; educational characteristics - the highest completed school; territorial characteristics - address) 13. After the manual processing of one municipality is done, the census material is sent to the department for electronic processing. A first phase in the electronic processing is to enter the data from the census forms. This takes place on 45 computers in two shifts. For this purpose The State Bureau of Statistics has got a donation from USAID: 100 Dell personal computers, one server and 2 HP Laser Jet printers. 14. The electronic processing of the data is carried out through a data entry program in Visual Basic. The data are organized in DB2 relational database. The database has been projected on a separate server IBM pseries 620. 15. The data are organized in the following tables: - P1 data from P1 and P3 enumeration forms for persons (primary key municipality, district, apartment, household, person), - P2 data from P2 questionnaire for households and dwellings (primary key, municipality, district, apartment, household), - PD1 data from PD1 additional form for persons (primary key municipality, district, apartment, household, person), -P1G data from P1 and P3 enumeration forms for persons (primary key municipality, district, apartment, household, person, time-date), -P2G data from P2 questionnaire for households and dwellings (primary key municipality, district, apartment, household, time-date),
4 - All reference tables (Census district, municipalities, settlements, states, streets, main activity, occupations, schools, nationality, religion, language). 16. The data entry application enables on-line controll of the data according to the rulebook. Correct data are written down in the tables P1, P2 and PD1. The data, which didn't pass the on-line controll, are written down in the tables P1G and P2G with an indicator gr - error and the type of error. In these tables also the history of the data is written. In each table there is a field - indicator where the status of the data is written i.e. what change was done in the data (vn - enter, kocorrection and br - erase only data with incorrect identification). In the data also exists an information about the operator who carried out the change and the time. 17. During the data entry the program enables control of the personal data (EMBG - unique number of the citizen, name, surname, name of the parent and sex) using the statistical population register. Also the program enables coding of the data for settlements and states. The program has been created to follow the flow of the filling out of the census forms. 18. There were deployed some subsidize applications with the aim to increase quality of the data and to reduce considerable mistakes: searching through the codes for main activity, occupations, education, statistical population register, statistical business register and statistical territorial register. There were deployed applications for controlling the work of the operators working on data entry, daily and cumulative output and also for scope of the input material by municipalities and by census districts. 19. After finishing the entering phase on each municipality as the second phase of the electronic work is considered batch processing. Batch processing is going in this established order: - Contingency control - in this phase the following work has to be done: o Control of the logical totality o Control of whether are being present or not being present data for household or dwelling o Control of the accuracy of the data (fulfill statistic) o Control of the families - Logical control of the data - control of the scope of the given values and connection between the data in the Census questionnaires P1, P2 and PD1 - Other additional ad-hoc controls for checking, analyzing the data and searching for double enumerated, defined by the subject matter department - Matching the data - generating some additional variables which are used during the tabulating phase - Preparation for tabulating phase - aggregating the data on the settlement and municipality level - Generating output tables for domestic and foreign users and satisfaction of the demands and recommendations given by the international community VI. DATA PROTECTION 20. Because of extreme sensitivity of the data from the Census and their vital significance for the state, some specific measures has been carried out for data protection from illegal access and illegal modification and loss. Therefore, the following steps are taken: - All census system is organized in separate network, totally isolated from other network being in the office - Each operator who is inputting the data and some other authorized users have their own user and password in Windows domain strictly dedicated for Census - Each operator has his/her own user and password for accessing the data from the database which is generated randomly - Procedure for automatic daily backuping of the database - Procedure for archiving the data on other mediums for backuping
5 VII. DEMANDS FROM THE MONITORING MISSION 21. According to the Memorandum for collaboration made between the government and the European Commission, all phases of conducting and editing of the census data are observe by monitoring mission. These plans involve the following actions: - Audit file system should be designed and installed for recording every change made in the census data entry to the completion of the cleaning and the editing of the data. This makes it possible to study and assess the volume and the content of the data editing. - Steps should be taken to ensure that a copy of the entire data set is taken at the point of data entry for matching that with the final editing data set. The purpose of this would be to gain another measure of the volume and the nature of the changes made in the census data during the editing phase. 22. All of these proposals were accepted with pleasure and realized until the next monitoring mission meeting planed and attended in March. VIII. CONCLUSIONS 23. One of the most important improvements using the new technologies was modem connection of the Regional Census Commissions with SSO, which allows much faster communication and data flow and following up of the census. Using of the computers and applications also allows much faster preparation of the first results. Using the client-server architecture and organization of data in db2 relational database (RDBMS), allows fast connection to the data, using software tools such as Visual Basic, SAS, Db2, Access, Excel for data processing. _