Please download to get full document.

View again

of 15

An Optimized In-Network Aggregation Scheme for Data Collection in Periodic Sensor Networks

An Optimized In-Network Aggregation Scheme for Data Collection in Periodic Sensor Networks Jacques Bahi, Abdallah Makhoul, Maguy Medlej To cite this version: Jacques Bahi, Abdallah Makhoul, Maguy Medlej.
0 views15 pages
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Documenttranscript
An Optimized In-Network Aggregation Scheme for Data Collection in Periodic Sensor Networks Jacques Bahi, Abdallah Makhoul, Maguy Medlej To cite this version: Jacques Bahi, Abdallah Makhoul, Maguy Medlej. An Optimized In-Network Aggregation Scheme for Data Collection in Periodic Sensor Networks. ADHOC-NOW 2012, 11-th Int. Conf. on Ad Hoc Networks and Wireless, Jan 2012, Serbia. pp , hal HAL Id: hal Submitted on 31 Jan 2014 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. An Optimized In-Network Aggregation Scheme for Data Collection in Periodic Sensor Networks Jacques M. Bahi, Abdallah Makhoul, and Maguy Medlej FEMTO-ST Laboratory, DISC departement University of Franche-Comté Rue Engel-Gros, Belfort, France Abstract. In-network data aggregation is considered an effective technique for conserving energy communication in wireless sensor networks. It consists in eliminating the inherent redundancy in raw data collected from the sensor nodes. Prior works on data aggregation protocols have focused on the measurement data redundancy. In this paper, our goal in addition of reducing measures redundancy is to identify near duplicate nodes that generate similar data sets. We consider a tree based bi-level periodic data aggregation approach implemented on the source node and on the aggregator levels. We investigate the problem of finding all pairs of nodes generating similar data sets such that similarity between each pair of sets is above a threshold t. We propose a new frequency filtering approach and several optimizations using sets similarity functions to solve this problem. To evaluate the performance of the proposed filtering method, experiments on real sensor data have been conducted. The obtained results show that our approach offers significant data reduction by eliminating in network redundancy and outperforms existing filtering techniques. 1 Introduction Data collection from sensor networks can be made on demand or by data streaming. The first category is done by bi-directional dialogs between the sensor nodes and the base station. A request for data is sent from the end user via the sink to the sensor nodes which, in return, send back the data to the user via multi hop communications. On the other side, in data streaming, data flows primarily from the sensor node to the sink. In this category we distinguish the periodic sampling and the event driven data models. In this paper we are interested in periodic sampling data model in sensor networks, where the acquisition of sensor data from a number of remote sensor nodes are forwarded to the gateway on a periodic basis. This data model is appropriate for applications where certain conditions or processes need to be monitored constantly, such as the temperature in a conditioned space or pressure in a process pipeline. There are couple of important design considerations associated with the periodic sampling data model. The most critical design issue is the phase relation among multiple sensor nodes. If two neighbor nodes operate with identical or similar sampling rates, redundant packets from the two nodes are likely to happen repeatedly. It is essential for sensor networks to be able to detect and clean redundant transfered data from the nodes to the sink. In-network data aggregation has been proven as an effective technique for eliminating redundancy and forwarding only the extracted information from the raw data. Furthermore, by doing so data aggregation can often reduce the communication cost and extend the whole network lifetime. In this paper we present a hierarchical multilevel data aggregation scheme aiming to optimize the volume of data transmitted thus saving energy consumption and reducing bandwidth on the network level. A first level in-sensor process is done by the nodes themselves. Instead of sending each sensor node s raw data to a base station, the data is cleaned periodically by the sensor node itself before sending it to an aggregator node for a second level of aggregation. At this level, we are interested in exploring a new part of the filtering aggregation problem, by focusing on identifying the similarity between data sets generated by neighboring nodes and sent to the same aggregator. Our objective is to identify similarities between near sensor nodes, and integrate their captured data into one record while preserving information integrity. In this paper, we provide a new prefix filtering method to study the sets similarity in sensor networks. We propose frequency filtering optimization techniques, which exploits the ordering of measurements according to their frequencies. A frequency of a measure is defined by the number of occurrences of this measure in the set defined at the first aggregation level. Furthermore, we provide a new optimization method for early termination of sets similarity computing. To evaluate our approach we conducted extensive experimental study using real data measurements. The obtained results compared to the existing algorithms show the effectiveness of our method which significantly reduces the number of duplicate data. The rest of the paper is organized as follows, Section 2 gives an overview on related works reported on data aggregation in sensor networks. Section 3 describes our periodic data aggregation scheme. The local aggregation level is presented in section 4. Review on similarity functions and our proposed frequency filtering techniques are presented in Section 5. Experimental results are given in Section 6. Section 7 concludes the paper with some directions to a future work. 2 Previous Data Aggregation Work Data aggregation in wireless sensor networks has been well studied in recent years [1] [2] [3]. It means computing and transmitting partially aggregated data to the end user rather than transmitting raw data in networks to reduce the energy consumption [4]. There are vast amount of extant works on in-network data aggregation in the literature. Some of the methods reported recently are query based methods [5] [6]. A query is generated at the sink and then broadcasted through the network. Some nodes just process the query, while others propagate it, receive partial results, aggregate results, and send them back to the sink. Various algorithmic techniques have been proposed to allow efficient aggregation without increasing the message size [7]. Some works, such as [8] [9] [10], use the clustering methods for aggregating data packets in each cluster separately. Among these methods, the LEACH protocol [11] [12]. In [9], the authors propose a self-organizing method for aggregating data based on the architecture CODA (Cluster-based self-organizing Data Aggregation), based on the Kohonen Self-Organizing Map to aggregate sensor data in cluster. In a first step before deployment, the nodes are trained to have the ability to classify the sensor data. Thus, it increases the quality of data and reduces data traffic as well as energy-conserving. An adaptive data aggregation (ADA) scheme for clustered sensor networks has been proposed in [10]. In this scheme, a time based as well as spatial aggregation degrees are introduced. They are controlled by the reporting frequency at sensor nodes and by the aggregation ratio at cluster heads (CHs) respectively. The function of the ADA scheme is mainly performed at the sink node, with a little function at CHs and sensor nodes. In a tree based network as our presented work, sensor nodes are organized into a tree where data aggregation is performed at aggregators along the tree to arrive to the sink. Tree based data aggregation approaches are suitable for in-network data aggregation. The authors in [13] [14], have proposed Tree on DAG (ToD) for data aggregation, a semistructured approach that uses Dynamic Forwarding on an implicitly constructed structure composed of multiple shortest path trees to support network scalability. The key principle behind ToD was that adjacent nodes in a graph will have low stretch in one of these trees in ToD, thus resulting in early aggregation of packets. In our previous work [3], we have shown that existing prefix filtering methods are very complex and not suitable for sensor networks and we proposed a heuristic based on the frequency ordering. In this paper, we propose two optimization techniques based on frequency filtering extention which can be integrated with our previous prefix method [3] to find similar data sets efficiently. Furthermore we provide a new and faster technique for sets similarity computation. 3 Periodic Data Aggregation Due to resource restricted sensor nodes, it is important to minimize the amount of data transmission among sensor networks so that the average network lifetime and the overall bandwidth utilization are improved. To reduce the amount of sending data, an aggregation approach can be applied along the path from sensors to the sink. Sensor nodes collect information from the region of interest and send it to aggregators. Each aggregator then condenses the data prior to sending it on. Our data aggregation method works in two phases, the first one at the nodes level, which we call local aggregation and the second at the aggregators level. At each period p each node sends its aggregated data set to its proper aggregator which subsequently aggregates all data sets coming from different sensor nodes and sends them to the sink. 4 Local aggregation In periodic sensor networks, we consider that each sensor node i at each slot s takes a new measurementy is. Then nodeiforms a new set of captured measurementsm i with periodp, and sends it to the aggregator. It is likely that a sensor node takes the same (or very similar) measurements several times especially when s is too short. In this phase of aggregation, we are interested in identifying locally duplicate data measurements in order to reduce the size of the setm i. Therefore, to identify the similarity between two measures, we provide the two following definitions: Definition 1 (link function). We define the link function between two measurements as: { 1 if yis1 y link(y is1,y is2 ) = is2 δ, 0 otherwise. where δ is a threshold determined by the application. Furthermore, two measures are similar if and only if their link function is equal to 1. Definition 2 (Measure s frequency). The frequency of a measurement y is is defined as the number of the subsequent occurrence of the same or similar (according to the link function) measurements in the same set. It is represented by f(y is ). Using the notations defined above the local aggregation algorithm is done as follows [3]. For each new sensed measurement (at each slot), a sensor node i searches for the similar measure already captured. If a similar measurement is found, it deletes the new one while incrementing the corresponding frequency by 1, else it adds the new measure to the set and initialize its frequency to 1. At the end of the periodp, each node i will possess a local aggregated set M i and send it to its aggregator. 5 Duplicate data sets aggregation At this level of aggregation, each aggregator has received k sets of measurements and their frequencies. The idea here is to identify all pairs of sets whose similarities are above a given threshold t. For this reason we use a similarity function which measures the degree of similarity between the two sets and returns a value in [0,1]. A higher similarity value indicates that the sets are more similar. Thus we can treat pairs of sets with high similarity value as duplicates and reduce the size of the final data set that will be sent to the sink. 5.1 Similarity Functions A variety of similarity functions have been used in the literature such as overlap threshold, Jaccard similarity and Cosine similarity [15 17]. We denote M i as the number of elements (measures) in the setm i. The following functions can be used to measure the similarity between two sets of measurements M i and M j : Overlap similarity: O(M i,m j ) = M i M j Jaccard similarity: J(M i,m j ) = Mi Mj M i M j Cosine similarity: C(M i,m j ) = Mi Mj Mi M j Dice similarity: D(M i,m j ) = 2 Mi Mj M i + M j All these functions are commutative and can be transformed to the Overlap similarity easily. For instance, we can present the Jaccard similarity function as follows: J(M i,m j ) = O(M i,m j ) M i + M j O(M i,m j ) In our approach, we will focus on the Jaccard similarity. It is one of the most widely accepted function because it can support many other similarity functions [16]. In our application, two given setsm i and M j are considered similar if and only if: J(M i,m j ) t where t is a threshold given by the application itself. This equation can be transformed as: J(M i,m j ) t O(M i,m j ) α (1) where, α = t.( M i + M j ). In order to study the similarity functions for data aggregation in sensor networks, we define a new function for overlapping s between two sets of measurements as follows: Definition 3 (Overlap function). Consider two sets of measurements M 1 and M 2, then we define: M 1 s M 2 = {(y 1,y 2 ) M1 M2 such that link(y 1,y 2 ) = 1}; ando s (M 1,M 2 ) = M 1 s M 2. To evaluate the similarity between two sets we obtain: J(M i,m j) t M i s M j α = t.( Mi + Mj ) (2) 5.2 Sets similarity computation In this section we provide techniques for computing the similarity between the received sets. A naïve solution to find all similar sets is to enumerate and compare every pair of sets. This method is obviously prohibitively expensive for large data sets (such the case of sensor networks), as total number of comparison iso(n 2 ). To reduce the number of comparisons between sets a prefix filtering method has been proposed. Several approaches for traditional similarity join between sets are based on the prefix filtering principle [15] [17] [3]. This method is based on the intuition that if all sets of measures are sorted by a global ordering, some fragments of them must share several common tokens with each other in order to meet the threshold similarity. An inverted index maps a given measurementmto a list of identifiers of sets that containm i such that link(m i,m) = 1. After inverted indices for all measures in the set are built, we can scan each one, probe the indices using every measure in the setm, and obtain a set of candidates; merging these candidates together gives us their actual overlap with the current set M; final results can be extracted by removing sets whose overlap with M is less than t.( M i + M j ) (Equation 1). This intuition is formalized by the following Lemma inspired from [17]: Lemma 1. Consider two sets of sensor measuresm i andm j, such that their elements are ordered by a global defined ordering. Let thep-prefix be the first p elements ofm i. If M i s M j α, then the( M i α+1)-prefix ofm i and the( M j α+1)-prefix of M j must share at least one element. Proof. Lemma 1 can be proven similarly to the lemma of page 6 in [17]. To ensure the prefix filtering based approach does not miss any similarity set result, as shown in Lemma 1 we need a prefix of length M i t. M i + 1 for every set M i [3]. The algorithm for finding similarity sets based on prefix filtering technique is given in Algorithm 1. It takes as input a collection of datasets coming from different sensor nodes already sorted according to a defined ordering. It scans sequentially each set M i, selects the candidates that intersects with its prefix. Afterwards, M i and all its candidates will be verified against the jaccard similarity threshold to finally return the set of correct similar measurements sets. Algorithm 1 Prefix-filtering based algorithm. Require: Set of measures setsm = {M 1,M 2...M n}, and a threshold t. Ensure: All pairs of sets (M i,m j), such that J(M i,m j) t. 1: S 2: I i (1 i total number of measures) 3: for each setm i M do 4: p M i t M i +1 5: X empty map from set id to int 6: for k 1 topdo 7: w M i[k] 8: if (I ws exists such that link(w,w s) = 1) then 9: for each Measurement (M j[l]),f(m j[l]) I ws do 10: X[M j] X[M j]+1 11: end for 12: I ws I ws {M i} 13: else 14: create I w 15: I w I w {M i} 16: end if 17: end for 18: for each M j such that X[M j] 0 do 19: if O s(m i,m j) α then 20: (S {(M i,m j)}) 21: end if 22: end for 23: end for 24: returns Prefix filtering algorithm helps prune out unfeasible sets of measures, however, in practice the number of non-similar sets surviving after this technique is still quadratic growth [18]. Following the prefix filtering, many optimization methods [18] [19] were proposed to prune out further the unfeasible non-similar sets. A trade-off of these prefix filtering optimizations is that usually require more computational efforts which is unsuitable by heavy resources sensor networks. In our approach, we provide some opti- mizations for prefix filtering techniques based on measures frequency while taking into account this trade-off. 5.3 Frequency filtering approach In this section, we present our frequency filtering method based on prefix extension. We begin by introducing some definitions and notations which will be the basis of what follows. In periodic sensor networks, two data sets are similar if their measurements overlap with each other, and especially the ones having higher frequencies values. Definition 4 (Ordering O). We define an ordering O which arranges the measurements of a given set by the decreasing order of their frequencies. For two similar measures m i and m j such that link(m i,m j ) = 1, we denote f min (m i,m j ) = Min(f(m i ),f(m j )) the minimum value of the frequency of these measures. Definition 5 (f s (M i,m j )). Consider two sets of measures M i and M j, we define f s (M i,m j ) = O s(m i,m j) (f min ((m i,m j ) M i s M j )). In this paper, we consider that all sensor nodes operate with the same sampling rate, and every node captures τ measures with each period p. Thus we can deduce that for every received setm i from node i we have: M i (f(m k M i ) = τ. Using the Jaccard similarity function, two sets M i and M j are similar if and only if: O s (M i,m j ) α where α = t.( M i + M j ) (Equation (2)). Supposing that the sets were sent to the aggregators without applying the first aggregation phase and without computing measures frequencies, thus we can observe that: M i = M j = τ and f s (M i,m j ) = O s (M i,m j ). (3) Hence, from Equation (2) and Equation (3) we can deduce that: M i and M j are similar iff:f s (M i,m j ) 2 t τ. (4) Frequency filter principle Lemma 1 states that the prefixes of two sets of measures must share at least one measure in order to satisfy the prefix filtering condition (PFC). Nevertheless, in sensor networks this condition is easily satisfied. In this section, we will present an extension of the prefix filtering technique making the PFC condition more difficult to be satisfied. Lemma 2. Assume that all the measures in the setsm i andm j are ordered according to the global orderingo. Let thep-prefix be the first p elements ofm i. Iff s (M i,m j ) 2 t τ, then f s (p-m i,p-m j ) p-m i (f(m k p-m i )) 1 t τ. Proof. We denote by p-m i the prefix of the set M i and r-m i the set of reminder measures where M i = {p-m i +r-m i }. We have: f s (M i,m j ) = f s (p-m i,m j )+f s (r-m i,m j ) = f s (p-m i,p-m j )+f s (p-m i,r-m j )+ f s (r-m i,m j ) = f s (p-m i,p-m j )+f s (r-m i,m j ) r-m i f s (p-m i,p-m j )+ (f(m k r-m i )) In the second line we can omit the term f s (p-m i,r-m j ) because we have assumed that it is negligible compared to th
Advertisement
MostRelated
View more
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x