Data Mining In the Prediction of Impacts of Ambient Air Quality Data Analysis in Urban and Industrial Area

Similar documents
IBM SPSS Neural Networks

IJITKMI Volume 7 Number 2 Jan June 2014 pp (ISSN ) Impact of attribute selection on the accuracy of Multilayer Perceptron

Artificial neural networks in forecasting tourists flow, an intelligent technique to help the economic development of tourism in Albania.

Estimation of Ground Enhancing Compound Performance Using Artificial Neural Network

Stock Market Indices Prediction Using Time Series Analysis

MAGNT Research Report (ISSN ) Vol.6(1). PP , Controlling Cost and Time of Construction Projects Using Neural Network

Prediction of airblast loads in complex environments using artificial neural networks

COMPARATIVE ANALYSIS OF ACCURACY ON MISSING DATA USING MLP AND RBF METHOD V.B. Kamble 1, S.N. Deshmukh 2 1

Analysis of Learning Paradigms and Prediction Accuracy using Artificial Neural Network Models

Stock Price Prediction Using Multilayer Perceptron Neural Network by Monitoring Frog Leaping Algorithm

ISSN: [Jha* et al., 5(12): December, 2016] Impact Factor: 4.116

Image Extraction using Image Mining Technique

Prediction of Compaction Parameters of Soils using Artificial Neural Network

NEURAL NETWORK DEMODULATOR FOR QUADRATURE AMPLITUDE MODULATION (QAM)

Time and Cost Analysis for Highway Road Construction Project Using Artificial Neural Networks

Design Band Pass FIR Digital Filter for Cut off Frequency Calculation Using Artificial Neural Network

Characterization of LF and LMA signal of Wire Rope Tester

Colour Recognition in Images Using Neural Networks

Space Craft Power System Implementation using Neural Network

On the Application of Artificial Neural Network in Analyzing and Studying Daily Loads of Jordan Power System Plant

Artificial Neural Network Model for Prediction of Land Surface Temperature from Land Use/Cover Images

Neural network approximation precision change analysis on cryptocurrency price prediction

A Technology Forecasting Method using Text Mining and Visual Apriori Algorithm

ARTIFICIAL NEURAL NETWORK BASED CLASSIFICATION FOR MONOBLOCK CENTRIFUGAL PUMP USING WAVELET ANALYSIS

Forecasting Exchange Rates using Neural Neworks

Artificial Intelligence Elman Backpropagation Computing Models for Predicting Shelf Life of. Processed Cheese

Neural Model for Path Loss Prediction in Suburban Environment

Decriminition between Magnetising Inrush from Interturn Fault Current in Transformer: Hilbert Transform Approach

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

1 Introduction. w k x k (1.1)

A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

International Journal of Scientific & Engineering Research, Volume 4, Issue 12, December-2013 ISSN

ARTIFICIAL INTELLIGENCE IN POWER SYSTEMS

Arduino-Based Real Time Air Quality and Pollution Monitoring System

Multiple-Layer Networks. and. Backpropagation Algorithms

ANALYSIS OF CITIES DATA USING PRINCIPAL COMPONENT INPUTS IN AN ARTIFICIAL NEURAL NETWORK

[Ananth* et al., 5(8): August, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Application of ANN to Predict Reinforcement Height of Weld Bead under Magnetic Field

SMARTPHONE SENSOR BASED GESTURE RECOGNITION LIBRARY

CHAPTER 6 BACK PROPAGATED ARTIFICIAL NEURAL NETWORK TRAINED ARHF

Heterogeneous transfer functionsmultilayer Perceptron (MLP) for meteorological time series forecasting

[Mathur* et al., 5(6): June, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Evolutionary Artificial Neural Networks For Medical Data Classification

Adaptive Noise Reduction Algorithm for Speech Enhancement

SATELLITE BASED ESTIMATION OF PM10 FROM AOT OF LANDSAT 7ETM+ OVER CHENNAI CITY

Application of Artificial Neural Networks System for Synthesis of Phased Cylindrical Arc Antenna Arrays

Improvement of Classical Wavelet Network over ANN in Image Compression

COMPARISON OF MACHINE LEARNING ALGORITHMS IN WEKA

FINANCIAL TIME SERIES FORECASTING USING A HYBRID NEURAL- EVOLUTIVE APPROACH

Radial Basis Function to Predict the Maximum Surface Settlement Caused by EPB Shield Tunneling

INTELLIGENT APRIORI ALGORITHM FOR COMPLEX ACTIVITY MINING IN SUPERMARKET APPLICATIONS

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters

DRILLING RATE OF PENETRATION PREDICTION USING ARTIFICIAL NEURAL NETWORK: A CASE STUDY OF ONE OF IRANIAN SOUTHERN OIL FIELDS

Modeling the Drain Current of a PHEMT using the Artificial Neural Networks and a Taylor Series Expansion

Current Trends in Technology and Science ISSN: Volume: VI, Issue: VI

Enhanced MLP Input-Output Mapping for Degraded Pattern Recognition

MINE 432 Industrial Automation and Robotics

Neural Network Predictive Controller for Pressure Control

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Computational Intelligence Introduction

Image Finder Mobile Application Based on Neural Networks

Current Harmonic Estimation in Power Transmission Lines Using Multi-layer Perceptron Learning Strategies

The Intelligent Computer. Winston, Chapter 1

Prediction of Cluster System Load Using Artificial Neural Networks

Application of Data Mining Techniques for Tourism Knowledge Discovery

CHAPTER 4 LINK ADAPTATION USING NEURAL NETWORK

Implementation of decentralized active control of power transformer noise

Comparison of MLP and RBF neural networks for Prediction of ECG Signals

CONSTRUCTION COST PREDICTION USING NEURAL NETWORKS

Classification Experiments for Number Plate Recognition Data Set Using Weka

Live Hand Gesture Recognition using an Android Device

Fault Detection in Double Circuit Transmission Lines Using ANN

Neural Network Approach to Model the Propagation Path Loss for Great Tripoli Area at 900, 1800, and 2100 MHz Bands *

Harmonic detection by using different artificial neural network topologies

Practical Comparison of Results of Statistic Regression Analysis and Neural Network Regression Analysis

Using of Artificial Neural Networks to Recognize the Noisy Accidents Patterns of Nuclear Research Reactors

A Comparison of Particle Swarm Optimization and Gradient Descent in Training Wavelet Neural Network to Predict DGPS Corrections

DIAGNOSIS OF STATOR FAULT IN ASYNCHRONOUS MACHINE USING SOFT COMPUTING METHODS

Key-Words: - NARX Neural Network; Nonlinear Loads; Shunt Active Power Filter; Instantaneous Reactive Power Algorithm

Image interpretation and analysis

Adaptive Kalman Filter based Channel Equalizer

Analysis Of Feed Point Coordinates Of A Coaxial Feed Rectangular Microstrip Antenna Using Mlpffbp Artificial Neural Network

Investigation of data reporting techniques and analysis of continuous power quality data in the Vector distribution network

Fundamentals of Industrial Control

TOWARDS IMPROVING MULTI-AGENT SIMULATION IN SAFETY MANAGEMENT AND HAZARD CONTROL ENVIRONMENTS

PIP Summer School on Machine Learning 2018 Bremen, 28 September A Low cost forecasting framework for air pollution.

A 5 GHz LNA Design Using Neural Smith Chart

Air Sensor Study Design Details Matter

Human Robotics Interaction (HRI) based Analysis using DMT

Behaviour Patterns Evolution on Individual and Group Level. Stanislav Slušný, Roman Neruda, Petra Vidnerová. CIMMACS 07, December 14, Tenerife

Wireless Spectral Prediction by the Modified Echo State Network Based on Leaky Integrate and Fire Neurons

A Closest Fit Approach to Missing Attribute Values in Data Mining

Comparative Study of PID and FOPID Controller Response for Automatic Voltage Regulation

Application of Multi Layer Perceptron (MLP) for Shower Size Prediction

PERMANENT AND SEMI-PERMANENT NOISE MONITORING - FIRST RESULTS IN THE CITY OF NIS

IMPLEMENTATION OF NEURAL NETWORK IN ENERGY SAVING OF INDUCTION MOTOR DRIVES WITH INDIRECT VECTOR CONTROL

Vol. 2, No. 6, July 2012 ISSN ARPN Journal of Science and Technology All rights reserved.

ANN BASED ANGLE COMPUTATION UNIT FOR REDUCING THE POWER CONSUMPTION OF THE PARABOLIC ANTENNA CONTROLLER

MURDOCH RESEARCH REPOSITORY

Transactions on Information and Communications Technologies vol 1, 1993 WIT Press, ISSN

Transcription:

Mining In the Prediction of Impacts of Ambient Air Quality Analysis in Urban and Industrial Area S. Christy Research Scholar, Dept. of C.S.E. BIHER University Chennai, Tamil Nadu, India christymelwyn @ gmail.com Dr. V. Khanaa Dean Information, BIHER University Chennai, Tamil Nadu, India Drvkannan6@yahoo.com Abstract Air pollution caused due to the introduction of particulate matters, biological molecules and other harmful materials into the Earth's atmosphere. Pollution brings vital diseases, death to humans, damages other living organisms such as vegetations, animals, natural environment and built environment. mining concerned with finding hidden patterns inside largely available data, so that the information retrieved can be transformed into usable knowledge. The Air Quality Index is an indicator of air quality standards around Chennai. It is based on air pollutants that have bad effects on human health and the environment. Growing use of vehicles in the city and growing industrial activities on the outskirt of city cause more air pollution. The problem of air pollution is becoming a major concern for the health of the population. The ambient air quality data collected from Central Pollution Control Board and Tamil Nadu Pollution Control Board ambient air quality data available in the websites. Air quality is monitored by air quality monitoring stations in Chennai through the use of wireless sensors deployed in huge numbers around the city and industrial areas. The four years of data from the year 1 to are collected from various monitoring stations and processed. mining tool is used for the prediction, forecasting and support in making effective decision. Artificial Neural Network model in mining techniques analyzed the data using Feed Forward Neural Networks and Multilayer Perceptron neural network models. The pattern obtained from these models could serve as an important reference for the Government policy makers in devising future air pollution standard policies. Keywords- mining; analysis; Monitoring stations; Decision Support ***** I. INTRODUCTION mining, known as knowledge discovery in databases (KDD) is the process of discovering useful knowledge from large amount of data stored in databases, data warehouses, or other information repositories[]. understanding starts with data collection and proceeds with activities to identify data quality problems, and to discover missing values into the data. preparation construct the data to be modelled from the collected data. The modelling phase applies various modelling techniques, and determines the optimal values for parameters in models. The evaluation phase evaluates the model for the problem requirements[4]. mining technology is used to identify the national air quality distribution of Chennai, whose hourly air quality data are continuously collected and archived through a network of several stations. Major composition of air pollution are Suspended particulate matter(pm,pm. ), sulphur dioxide(so ), oxides of nitrogen(nox), carbon monoxide(co), volatile organic compounds, sulphur trioxide(so 3 ) and lead(pb). Four years data collected from CPCB and TNPCB are processed and analyzed with data mining techniques and provide decision support to policy makers. II. AIR POLLUTION MONITORING SYSTEM Air pollution monitoring system is considered as a very complex task but it is very important. Traditionally data collectors were used to collect data periodically and this was very time consuming and quite expensive. The use of Wireless Sensor Networks can make air pollution monitoring less complex and more instantaneous readings can be obtained[7]. Currently, the Air Monitoring Unit in Chennai lacks resources and makes use of bulky instruments. This reduces the flexibility of the system and makes it difficult to ensure proper control and monitoring. Air Quality Modelling is an attempt to predict or simulate the ambient concentrations of contaminants in the atmosphere. These models are used as quantitative tools to correlate cause and effect of concentration levels found in an area. They are also used to support laws and regulations designed to protect air quality. The models have the subjects of extensive evaluation to determine their performance under a variety of meteorological conditions, the wireless sensor network air pollution monitoring system comprises of an array of sensor nodes and a communications system which allows the data to reach a server. The sensor nodes gather data autonomously and the data network is used to pass data to one or more base stations and forward it to a sensor network server. The system sends commands to the 3

nodes to get the data, and also send out data whenever required. III. MINING EPA DATA The EPA (Environmental Protection Administration) of Chennai runs Chennai Air Quality Monitoring Network (CAQMN) which is composed of several air quality monitoring stations to automatically collect and monitor air quality every week. More stations are set up in the industrial area, thus possibly have higher air pollution. Five types of the priority pollutants are recorded: PM (suspended particulate), S (sulphur dioxides), N (nitrogen dioxide), CO (carbon monoxide), and 3 (ozone). The EPA maintains a Web site for publishing archived and real-time pollutant information and forecasting. For instance, the homogeneous regions could be varied when the scale of temporal data is changed from small scale (e.g., hourly, daily, etc.) to large scale (e.g., monthly, seasonally, or annually). The selection of an appropriate scale is dependent on the application purpose. The data are collected from online CPCB and TNPCB websites. IV. ARTIFICIAL NEURAL NETWORK DATA MINING Artificial neural network have found various applications in the field of environmental engineering. Models have also been developed for air pollution data optimizing the process for prediction of vehicular emissions. The most popular ANN is feed-forward back-propagation, multi-layer perceptron (MLP) neural network. The development of ANN model consists of six steps. They are variable selection, Formation of Training, Testing, Validation data sets, Network modeling and Neural network training[]. V. ARFF FILE FORMAT The data obtained from online CPCB and TNPCB are stored in Microsoft Excel sheet with FILENAME.CSV format. The data value will be more than 16 instances. The pollutants are taken as the field name. The file can be opened in WEKA tool for further processing and analysing. The data has to be pre processed and the data stored in Weka Explorer with FILENAME.ARFF file format. This data file can be accessed for weka tool for further analysis. The data is available from year 1 to. The huge volume of data can be accessed and processed using the WEKA tool. VI. FEED FORWARD NEURAL NETWORKS (FFNN) The simplest feed forward neural networks (FFNN), consists of three layers: input layer, hidden layer and output layer. In each layer there were one or more processing elements. A processing element receives inputs from outside world or the previous layer. There are connections between the Processing elements in each layer that have a weight (parameter) associated with them. This parameter is adjusted during training. Information travels in the forward direction through the network, there are no feedback loops[6]. The feed-forward back-propagation MLP for development of ANN model to predict daily maximum pollutants concentration in Chennai. 4 VII. BACK PROPAGATION ALGORITHM: Back propagation Algorithm is a common method of teaching artificial neural networks how to perform a given task. The back propagation algorithm, artificial neurons are organized in layers, and send their signals forwardly, and then the errors are propagated backwardly. The back propagation algorithm uses supervised learning, compute the result and then the error is calculated. The output for the MLP model was the daily maximum 1-hr pollutant level. All input dataset were normalized to provide values between. and.9 using the following formula:.9( p ) P i = i pmin. p p max min where P transformed values, P i actual observation values, P min and P max are the minimum and maximum values of observation values. Normalization of input data was performed for two reasons: to provide commensurate data range so that the models were not dominated by any variable that happened to be expressed in large numbers: and, to avoid the asymptotes of the sigmoid function. Once the best network is found, all the transformed data are transformed back into their original value by the formula: ' ( Pmax Pmin )( Pi.) P i = Pmin.9 Before an MLP model can be utilized for predicting, the number of hidden layer and hidden nodes, and connection weights between neurons of the MLP network were determined by an iterative process in training (learning) stage with the training dataset of 361 patterns until the training error, measured by performance statistical indicators, is below the given error. The initial values of the weights are randomly selected and they can be both negative and positive values. In addition, activation function used in the hidden and output layers was determined by the required degree of accuracy of the problem under study. The activation function selected for the layers were logistic sigmoid for hidden layer and linear for the output layer. The number of hidden layers and hidden nodes were tried and increased systematically, checking each time if the prepared neural network obtained the stable performance error in the

predicted pollutant Predicted pollutant International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 31-8169 performance plot. The best MLP network was the optimum found by the iterative process. The trained MLP network model was used to test the model s performance with testing dataset of 1 patterns. The resulting predictions were then compared with observed data, and performance statistical indicators were calculated. VIII. MULTIVARIATE REGRESSION MODEL: Multivariate regression, also known as ordinary least squares, is the most popular technique to obtain a linear input-output model for a given data set. The preliminary regression model has the general form: Y o X 1 1 X 3 X... X 3 k k where Y stand for the predictand variable Y (e.g., daily maximum pollution level), β i, i =, 1,,.k, are called the regression coefficients (parameters), X i is a set of k predictor variables X with matching β coefficients, and ε is a residual error. To further assess the accuracy of the developed MLP network, its predictions were compared to linear regression model. An LR model between the eight input variables and the output (domain peak pollutants) was performed using a stepwise regression analysis on the first dataset to determine the coefficients of the above equation. A least-squares analysis was carried out, with the objective of finding the best linear equation that fit the dataset. The developed regression model was also tested performance with the testing dataset. IX. LINEAR REGRESSION MODEL: The stepwise regression procedure on the first dataset showed that PM, PM., SO, NO, CO, O 3 were important to predict daily maximum pollutants levels. The best single variable among the six independent variables was the nitrogen dioxide. The second-best single variable was maximum SO. Each step of forward stepwise regression procedure is shown in the Table 1. There are two factors that attribute the strength of correlation between PM and PM.. High air temperature is an excellent indication of environmental conditions conductive to pollutants formation and accumulation. In addition, the photochemical reaction rates are highly temperature dependent. The following linear regression model (LR) was found to give the best fit, with the mean absolute error (MAE) was 1.67 ppb, the root mean square error (RMSE) was. ppb, the coefficient of determination (R ) was.9, and the index of agreement (d) was.74. A scatter plot for this model with the training and testing sets, showing the predicted versus the actual pollutant concentrations are given in Figure 1 and Figure. Based on the results of iterative process in training stage, it was found that the architecture of the best MLP network contains 6 input layer neurons, hidden neurons for the first hidden layer, 14 hidden neurons for the second hidden layer and 1 output layer neuron. The scatter plots of predicted and observed pollutant concentrations for the training and testing sets. The mean absolute error (MAE) and the root mean square error (RMSE) for the training dataset were.3 and.1 ppbv, respectively. The corresponding errors for the testing dataset were 17.4 and.14 ppbv, respectively. To further check the accuracy of the developed MLP model, a plot of predicted versus observed pollutant concentrations was shown in Figure 3 and 4. The predicted values are in good agreement with the recorded Pollutant concentrations, indicating that the maximum Pollutants levels are captured fairly well by the MLP model. Figure 1: Training dataset Scatter plots of observed versus predicted pollutant levels of regression model. Observed pollutant Table 1: Forward Stepwise regression results Steps Set of variables Coefficient of correlation, R r 1 NO. NO, SO.73 3 NO, SO, PM.3 4 NO, SO, PM, PM..31 NO, SO, PM, PM., CO.371 6 NO, SO, PM, PM., CO, O 3.396 Observed pollutant Figure : Testing dataset Scatter plots of observed versus predicted pollutant levels of regression model.

Figure 3: Comparison of observed and predicted pollutants for the training dataset of the MLP model. X. COMPARATIVE ANALYSIS OF THE DEVELOPED MODELS The relative effectiveness of the models are examined in predicting pollutant levels using the testing data set. The performance of the developed models was evaluated using statistical indicators and graphical comparisons. Figure 4: Comparison of observed and predicted pollutants for the testing dataset of the MLP model. Table : Performance statistical indicators for the developed models Indicators 1 3 7 9 11 13 17 18 16 14 1 8 6 4 1 3 7 9 11 13 17 MLP LR Predicted data Observed Predicted Observed Training Testing Training Testing MAE (ppb).3 7.4 1.67 1.6 RMSE(ppb).1.14. 14.3 R.134.11.9.31 D.9.89.74.68 It can be seen that the MLP model clearly gave the better results according to all statistical indicators. In terms of the MAE and the RMSE values, the MLP model performs better than the regression model for both datasets. Figure 4 shows the linear regression model performed significantly less well at predicting high pollutant level concentrations. The reason for the underestimation is that the problem of fitting of regression coefficients is solved using a least-squares criterion. A direct consequence is that the LR model, by nature, does not make any distinction between low and high levels of the values. The regression analysis process aims at modeling the average behavior for the predict and (output) variable, whereas with regards to air quality standards, the prediction of extreme pollutant levels is much more important from the health perspective. Despite the strong nonlinear character of the phenomena, the MLP gives rather good predictions. The data are processed using data mining tool and give results which help the policy maker in taking effective decisions in order to control air pollution created in various parts of Chennai. 6 XI. CONCLUSION Air pollution play hazardous role in the health of the humans and plants. The effects of air pollution on health are very complex as there are many different sources and their individual effects vary from one to the other. The ambient air quality is assessed from various parts of Chennai and industrial area. The online data has been collected from Central Pollution Control Board (CPCB), Tamil Nadu Pollution Control Board(TNPCB) ambient air quality data for the past four years from 1 to. The data are pre processed and can be further processed by data mining tool and proper decision support can be given to the policy makers. The government has since adopted an array of measures to combat this problem. The prediction of Air pollution in urban and industrial area of Chennai using data mining could serve as an important reference for the policy maker in formulating future policies. The NAAQ(National Ambient Air Quality) standards of 9, which superseded the earlier standard has more stringent values. The trend analysis shows that the norms are adhered and maintained so as to meet the new standards. This work paves way for the formation of new standards in the future so as to enhance the sustainable development. In future this research can be extended to predict the air pollution outside of Chennai and in other states. ACKNOWLEDGMENT The authors would like to thank Central Pollution Control Board, Tamil Nadu Pollution Control Board for online. REFERENCES [1] Sarah N. Kohail, Alaa M. El-Halees, Implementation of Mining Techniques for Meteorological Analysis, International Journal of Information and Communication Technology Research, Volume 1 No. 3, July 11. [] Fayyad, U. M., Piatetsky-Shapiro, G., & Smyth P. (1996). The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11), 7 34.

[3] Li S., and Shue L., " mining to aid policy making in air pollution management," Expert Systems with Applications, vol. 7, pp. 331-34, 4. [4] Pyle, D. (1999). preparation for data mining. Los Altos, CA: Morgan Kaufmann. [] Amrender Kumar, Artificial Neural Networks for mining, New Delhi. [6] Fayyad, Usama, Ramakrishna Evolving mining into solutions for Insights, communications of the ACM 4, no. 8 [7] Kavi K. Khedo, Rajiv Perseedoss and Avinash Mungur, A Wireless Sensor Network Air Pollution Monitoring System, International journal of Wireless and mobile network, Vol, issue,. [8] Haykin, S., Neural Networks, Prentice Hall International Inc., 1999 [9] Khajanchi, Amit, Artificial Neural Networks: The next intelligence [] Agrawal, R., Imielinski, T., Swami, A., base Mining: A Performance Perspective, IEEE Transactions on Knowledge and Engineering, pp. 914-9, December 1993 [11] Berry, J. A., Lindoff, G., Mining Techniques, Wiley Computer Publishing, 1997 (ISBN -471-1798-9). [1] Berson, Warehousing, -Mining & OLAP, TMH [13] Bhavani,Thura-is-ingham, -mining Technologies,Techniques tools & Trends, CRC Press [14] http://www.cpcb.gov.in/caaqm/frmcurrent [] http://www.tnpcb.gov.in/ambient_airquality [16] Dr. Yashpal Singh, Alok Singh Chauhan, Neural Networks In Mining, Bundelkhand Institute of Engineering & Technology, Jhansi, Institute of Management, Allahabad, India, 9. 7