Data Aggregation Based on Fuzzy Logic for VANETs (Machine Learning and Intelligence)

Abstract

Data aggregation is mainly used to combine similar or equal information sent by different nodes of a network before forwarding it with the aim of reducing the number of messages. This is particularly important in Vehicular Ad-hoc NETworks (VANETs) where vehicles broadcast information about the road traffic situation, what can lead to overload the network. In this paper we propose a system for data aggregation based on the primary computational intelligence approach of fuzzy logic. The scheme is automatically launched in case of incident, without any fixed structure on the road, but through the spontaneous formation of aggregation groups. Fuzzy logic is mainly applied in decision-making regarding aggregation and diffusion of aggregated information, while data aggregation is based on digital signature combination. The proposed protocol also uses probabilistic verification of aggregated data in order to reduce computation overhead and delay.

Keywords: Data aggregation; fuzzy logic; vehicular ad-hoc network.

Introduction

Vehicular Ad-hoc Networks (VANETs) are self-organized networks formed by vehicles and used to provide drivers with advance notification of traffic congestion or hazard events using wireless communication. In this type of networks, warning messages affect decisions taken by drivers so that any wrong message could lead to higher transportation times, fuel consumption, environmental contamination and impact of road constructions, and, in the worst-case scenario, more traffic accidents. In the near future, this type of networks will allow the reduction of the number of deaths due to car accidents, and the provision of real-time information on traffic and on roads. For example, drivers will be able to exchange information with their neighbors and with the road so that they can receive warnings about potentially dangerous events such as accidents, obstacles on the road, etc. Other practical applications of VANETs are, for instance, to provide the ability to find free parking spaces or to disseminate traffic information.


VANETs are composed of two different types of nodes: On-Board Units (OBUs), which are wireless devices on vehicles, and Road-Side Units (RSUs), which form the network infrastructure on the road. While in other resource-constrained wireless networks such as sensor networks, data aggregation is used mainly to save energy, in VANETs it may be used both to ensure that the transmitted information is reliable and to reduce the number of repeated warnings.

This paper is organized as follows. Related works are summarized in Section 2. Basic concepts about the fuzzy logic approach that is followed in the scheme are described in Section 3. The probabilistic verification included in the scheme is presented in Section 4. Finally, conclusions are presented in Section 5.

Related Works

In the literature we can find many trust models applied to VANETs domain. Many of them are cited in the recent survey on trust management for VANETs [13]. With respect to practical cryptographic solutions, there are several papers proposing the use of Public-Key Infrastructures (PKI) in VANETs so that thanks to the use of digital signatures, the source and integrity of messages can be verified [6]. However, none of those proposals protect against malicious attacks such as false content generation where an adversary could try to inject false information that does not correspond to what it is really detecting.

Data aggregation for VANETs has been analyzed in several papers. [8] presents a protocol for relaying information in VANETs under the assumption that vehicles form clusters, and details about speed and traffic information are exchanged within nodes in the same cluster. Such a mechanism reduces the amount of data transmitted in a cluster, but the paper does not include any specific scheme to combine aggregated data. VESPA [3] proposes that not only fresh data are considered for warning, but also aggregated data histories are maintained for disseminating knowledge among vehicles. Another proposal can be found in [5], where the aggregation of multiple messages describing the same event and the use of revocation messages allowing vehicles to report false information are proposed. However, such a mechanism has an important weakness because real messages can be revoked through it. In [11] the proposed solution is based on the use of a tamper-proof device and consists in asking an aggregator vehicle about a random aggregated record. The main disadvantage of this method is the dependency on a tamper-proof device since an attacker could easily skip this service in order to compose malicious aggregated data. Finally, [4] proposes another mechanism to provide security through aggregation in a scheme where streets are divided into fixed size segments corresponding to Wi-Fi signal coverage. However, this aggregation criterion uses a fixed segmentation of the road, what has been shown that does not work properly with a high number of vehicles in large areas, like for example in big traffic jams covering kilometers.

In order to overcome all the aforementioned problems, the use of fuzzy logic combined with a probabilistic verification scheme is here proposed for the definition of a data aggregation protocol. The idea of using fuzzy aggregation in trust models has been modeled and proposed by several authors [1] [2]. A close and recent paper [9] proposed an algorithm for hierarchical aggregation in VANETs based on a probabilistic approximation to data. However, in our proposal the probabilistic approach is applied on the verification of fuzzily aggregated data.

Various proposals have been made to promote cooperation and enhance security in VANETs through the use of authentication, key management and pseudonyms [10]. However, to the best of our knowledge no tools that allow ensuring that the generated information is true have been proposed till now. A quite logical initial approach to address this problem is by providing nodes of a mechanism to verify the packet content accuracy. In this paper nodes will be called agents from now on, according to the Artificial Intelligence literature.

When data aggregation is possible because the vehicle has received several packets informing about the same event, various known schemes based on clusters may be used [7] [12]. In those proposals, the so-called cluster-heads are responsible for adding the information submitted by their cluster members, and also for forwarding it. Consequently, those schemes can be seen as proposals based on a hierarchy, which increases the amount of necessary communications for cluster management, especially in highly dynamic networks such as VANETs. In this paper we propose a new scheme that replaces such a static hierarchical point of view by a fuzzy-logic based dynamic approach.

Fuzzy Logic Approach

Aggregation schemes are usually based on three sequential phases. The first phase is decision-making, when the system must allow deciding whether two pieces of information are similar enough to add them or not. The second phase is data aggregation. Once the system has concluded that the received packets must be added, it has to apply an aggregation method for combining them. The third and last step is the aggregated data delivery. After the two previous steps, the system must allow the dissemination of the aggregated data.

In the literature we can find several proposals for the two last stages. However, schemes for making decision are normally based on predefined clusters, trees or fixed structures, which are not suitable for their use in VANETs. Thus, a new aggregation scheme is here proposed where decisions regarding whether two received pieces of information are similar enough to be added are based on the content of such packets. Such decisions take into account possible inaccuracies or approximations regarding space and time data to define an event. Therefore, our proposal for the application of fuzzy logic is focused especially on the description of the decision criteria of the first phase of the aggregation scheme.

In the proposed scheme information is aggregated based mainly on two dimensions: space and time. On the one hand, the approximate location of an event in a road or a map is a fundamental parameter. On the other hand, a time interval in which the persistence of an announced event must be taken into account. When applying a fuzzy approach on such parameters the best decision on aggregation is made because it allows a flexible reasoning that takes into account all possible values in packets announcing the same event. This enables a dynamic approach when considering clusters for aggregation.

Fuzzification Function of SD and TD

Fig. 1. Fuzzification Function of SD and TD

Before the implementation of the system, and depending on the particular application, it is possible to identify other possible parameters apart from spatial and temporal location, with possible influence on the decision of aggregation, such as distance between agents or speeds. The input values corresponding to all influential parameters are real numbers, which the system has to assign to adjectives. The next step is to formulate rules that express the combination of the chosen influential parameters. In particular, according to such rules, the degrees of the influential parameters adjectives are combined using fuzzy Boolean operators such as AND, OR and NOT, defined as the minimum, maximum, and complement, respectively. The output of the application of every rule is defined as two possible values: YES and NO. This fuzzy decision making process allows combining all possible influential parameters in order to conclude whether two received warnings refer to the same event because the complexity of the original real values of such parameters are hidden behind the mapping to adjectives.

As an example of fuzzy decision making based on rules we propose to choose and combine the influential parameters of spatial and temporal location denoted Space-Distance (SD) and Time-Distance (TD). Fig. 1 exemplifies this process for both variables representing the influence of both parameters considering as x-coordinate respectively SD in meters or TD in minutes, and the y-coordinate is the probability corresponding to the adjectives LOW, MEDIUM and HIGH. Each of these adjectives is described by a membership function that maps the real valued input value of the corresponding influence factor to a membership degree corresponding to the adjective described by the function. In the depicted example, according to the fact that a typical error in normal GPS is about 23 meters of ambiguity, an input SD of less than 3 meters is fuzzified as being LOW with a degree of 1. For example, the output of the function for a SD of about 9 meters is classified at the same time as LOW and as MEDIUM with a degree of 0.5. From 27 meters on, SD is considered HIGH with probability 1. The same is considered for TD in minutes.

There can be more than one rule assigning values to the aggregation decision. In this case, the assignments to the aggregation decision are combined by an implicit AND, so that the corresponding probability of the Aggregation-Decision is given by the minimum between both input probabilities for SD and LD. After all rules have been evaluated, the decision will be either YES or NO depending on which of the two has got the higher assigned probability by the rules.

For example, if the SD is 10 m and so it is fuzzified to LOW with a degree of 0.32 and to MEDIUM with a degree of 0.68, and for the same pair of packets the TD is 20 min and so the TD is fuzzified to MEDIUM with a degree of 0.58 and to HIGH with a degree of 0.42, the Aggregation-Decision would be YES with a degree 0.58 and NO with a degree 0.42, so the final decision is YES and we aggregate both packets because we concluded they refer to the same event.

Data received by an agent must be stored in a local database with the objective not only of aggregation but also because the channel could get overload, restricting the amount of data that can be sent, and in this case, the agent is allowed to send only a subset of its database. In order to solve the problem of selection, we might use an approach also based on fuzzy logic to take into account relevant parameters for spreading the more adequate and accurate information that is currently available in the database. Then, each agent should carry out a qualification process of the data in its database in order to conclude which are the most relevant data stored that must be sent according to the restrictions of the channel. Several factors, such as severity or antiquity of an event, can play an important role in this selection. As of the aggregation decision, rating relevance of different parameters of a piece of information often depends on the specific application. Relevance ranking provides an order on the parameters to be considered and on data stored so that according to this order, the agent always knows which packages should be sent without collapsing the channel.

Table 1. Fuzzy Rules

HIGH

NO

NO

NO

MEDIUM

YES

YES

NO

LOW

YES

YES

YES

SD/TD

LOW

MEDIUM

HIGH

Probabilistic Verification

Message verification only applies to vehicles that are unable to check directly the received information. That is to say, when a vehicle receives a warning message about an incident that is not under the coverage of its antenna, and wants to confirm the authenticity of the received message, it has two options depending on its location with respect to the source of the warning. First we consider the case when the receiver cannot confirm the information directly, but it is so close to the supposed event that it has to make a decision quickly because in a short period of time it will be too near to avoid it. In this case, if a vehicle receives an aggregated packet signed by several vehicles, it should use a verification mechanism fast enough to verify the signatures of the packet. As we already mentioned, on the one hand it is inefficient to verify all the signatures contained in a packet, but on the other hand it is necessary to verify the information before accepting it as valid. In order to fix this problem, only a few signatures are proposed to be verified according to a probabilistic scheme. Secondly, we consider the case when agents are far enough from the hazard so they behave according to the store-and-carry paradigm, collecting evidences about the hazard in the form of aggregated packets. In this case, the vehicle has enough time to collect and check several aggregated messages before it has to make a decision. In particular, it has to check some signatures for each received aggregated packet as in the previous case. However, in this case the vehicle may perform more complete verifications that will provide a higher level of certainty.

As aforementioned, probabilistic verification will be only used by vehicles that are unable to verify directly the information that reaches them. The proposed verification algorithm uses threads, which are lightweight processes that allow a concurrent execution for a faster execution of the whole protocol. In the algorithm shown below, H[i] denotes a thread for the variable i that takes an integer value between 1 and n, where n denotes the number of aggregated signatures. When an agent receives a message and decides to verify its signatures, the main process launches as many threads as signatures the message contains. Before the main process launches the threads, it checks whether the message contains enough signatures to determine whether the message has been signed by a significant number of agents. This minimal number of signatures to validate a message shows the degree of closeness between data packets referring to the message, and is usually called intimacy level in trust literature. In our probabilistic verification scheme, such an intimacy level will be given by a combination of two factors: the average number of authenticated vehicles and the space-distance between the locations of the receiving agent and the event announced in the aggregated packet. Thus, in order to calculate the threshold for the minimal number of required signatures, for example the agent could compute the average number of authenticated users per minute in the current session and the SD between both aforementioned locations. Then, the intimacy level might be given by the average of users when the distance is below 30 m. For distances between 30 m and 1000 m the level can decrease linearly with the distance, until reaching the minimal value of 2, which might be the intimacy level for any distance greater than 1000 m.


Each thread H [i] determines whether to verify the signature with a verification probability p. If H [i] defines a verification, and the signature is proved to be valid, H[i] returns a true value informing that it is a valid signature. Otherwise, it returns a false value. The result of all those threads are stored in a structure P. If all fields in the structure P are proved to be valid, it is interpreted as evidence that all the verified signatures are correct so the message is accepted as valid. On the other hand, if P contains some thread results that is invalid, this could be interpreted as false message. If most threads indicate that the message is false, it is taken as invalid message, otherwise it is valid. If there is a tie or a questionable amount of false signatures, the reputation information stored by the receiving agent about the different agents that have signed the message is checked. In this case, only those agents that have a good reputation due to their active and correct participation in the network are trusted and accepted.

Algorithm 2. Probabilistic Verification of Signatures

Algorithm 2. Probabilistic Verification of Signatures

Conclusions

This paper proposes a new scheme for data aggregation with the aim of reducing the number of sent messages in VANETs. Different ideas have been combined in a new data aggregation method so that those agents who agree with the generated information sign the packet. The proposed aggregation system is based on fuzzy logic mainly during the phase of decision-making regarding aggregation. When an aggregated packet reaches a vehicle, this may verify the information by checking the attached signatures. In order to avoid the delay produced by signature verification, a probabilistic scheme is proposed according to which only a number of signatures given by the intimacy level are chosen to be checked. The implementation of the proposal is part of a work in progress.

Next post:

Previous post: