A Secure and Efficient Searching Scheme for Trusted Nodes in a Peer-to-Peer Network (Network Security)

Abstract

The existing search mechanisms for peer-to-peer (P2P) networks have several problems such as fake content distribution, free riding, whitewashing and poor search scalability. Although, researchers have proposed several trust management and semantic community-based mechanisms for combating free riding and distribution of malicious contents, most of these schemes lack scalability due to their high computational, communication and storage overhead. This paper presents a trust management scheme for P2P networks that utilizes topology adaptation by constructing an overlay of trusted peers where the neighbors are selected based on their trust ratings and content similarities. It also increases the search efficiency by taking advantage of the implicit semantic community structures formed as a result of topology adaptation since most of the queries are resolved within the community of trustworthy peers. Simulation results demonstrate that the proposed scheme provides efficient searching to good peers while penalizing the malicious peers by increasing their search times.

Keywords: P2P network, topology adaptation, trust management, semantic community, malicious peer.

Introduction

The term peer-to-peer (P2P) system encompasses a broad set of distributed applications which allow sharing of computer resources by direct exchange between systems. The goal of a P2P system is to aggregate resources available at the edge of Internet and to share it cooperatively among users. Specially, the file sharing P2P systems have become popular as a new paradigm for information exchange among large number of users in Internet. They are more robust, scalable, fault tolerant and offer better availability of resources. Depending on the presence of a central server, P2P systems can be classified as centralized or decentralized [1]. In decentralized architecture, both resource discovery and download are distributed. Decentralized P2P application may be further classified as structured or unstructured network. In structured network, there is a restriction on the placement of content and network topology. In unstructured P2P network, however, placement of content is unrelated to topology. Unstructured P2P networks perform better than their structured counterpart in dynamic environment. However, they need efficient search mechanisms and suffer from fake content distribution, free riding (peers who do not share, but consume resources), whitewashing (peers who leave and rejoin the system in order to avoid penalties) and search scalability problems. Open and anonymous nature of P2P applications lead to complete lack of accountability of the content that a peer puts in the network. The malicious peers often use these networks to do content poisoning and to distribute harmful programs such as Trojan Horses and viruses [2].


To combat inauthentic downloads as well as to improve search scalability, this paper proposes an adaptive trust-aware scalable algorithm. The proposed scheme constructs an overlay of trusted peers where neighbors are selected based on their trust ratings and content similarities. It increases search efficiency by taking advantage of implicit semantic community structures formed as a result of topology adaptation. The novel contribution of the work is that it combines the functionalities of trust management and semantic community formation. While the trust management scheme segregates honest peers from malicious peers, the semantic communities adapt topology to form cluster of peers sharing similar contents.

The rest of the paper is organized as follows. Section 2 discusses some related work. Section 3 presents the proposed algorithm for trust management. Section 4 introduces various metrics to measure performance of the proposed algorithm, and presents the simulation results. Finally, Section 5 concludes the paper while highlighting some future scope of work.

Related Work

In [3], a searching mechanism is proposed that is based on discovery of trust paths among the peers in a peer-to-peer network. A global trust model based on distance-weighted recommendations has been proposed in [4] to quantify and evaluate the peers in a P2P network. In [5], a protocol named adaptive peer-to-peer technologies (APT) for the formation of adaptive topologies has been proposed to reduce spurious file download and free riding in a peer-to-peer network. The scheme follows a defensive strategy for punishment where a peer equally punishes both malicious peers as well as neighbors through which it receives response from malicious peers. This strategy is relaxed in the reciprocal capacity-based adaptive topology protocol (RC-ATP), where a peer connects to others which have higher reciprocal capacity [6]. Reciprocal capacity is defined based on the capacity of providing good files and of recommending source of download of the peers. While RC-ATP provides better network connectivity than APT, it has a large overhead of topology adaptation.

There are some significant differences between the proposed algorithm in this paper and APT and RC-ATP. First, in the proposed scheme, to avoid network partitioning, the links in the original overlays are never deleted. Second, the robustness of the proposed protocol in presence of malicious peers is higher than that of APT and RC-ATP protocols. Third, as APT and RC-ATP both use flooding to locate resources, they have poor search scalability. The proposed scheme, on the other hand, exploits the advantages of the formation of semantic communities to improve the quality of service (QoS) of search. Finally, unlike APT and RC-ATP, the proposed scheme punishes malicious peers by blocking queries initiated from them.

The Proposed Trust-Aware Algorithm

In this section, we describe the factors that are taken into account for designing the proposed scheme, and then present the details of the scheme.

(1) Network topology and load: Using the approach described in [5] and [6], the network has been modeled as a power law graph. In a power law network, degree distribution of nodes follows power law distribution, i.e. fraction of nodes having degree L is L’k where k is a network dependent constant. Prior to each simulation cycle, a fixed fraction of peers chosen randomly is marked as malicious. As the algorithm proceeds, the peers adjust topology locally to connect those peers which have better chance to provide good files and drop malicious peers from their neighborhood. The network links are categorized into two types: connectivity link and community link. The connectivity links are the edges of the original power law network which provide seamless connectivity among the peers. To prevent the network from being fragmented they are never deleted. On the other hand, community links are added probabilistically between the peers who know each other. A community link may be deleted when perceived trustworthiness of a peer falls in the perception of its neighbors. A limit is put on the additional number of edges that a node can acquire to control bandwidth usage and query processing overhead in the network.

(2) Content distribution: The dynamics of a P2P network are highly dependent on the volume and variety of files each peer chooses to share. Hence a model reflecting real-world P2P networks is required. It has been observed that the peers are in general interested in a subset of the content on the P2P network [7]. Also, the peers are often interested only in files from a few content categories. Among these categories, some are more popular than others. It has been shown that Gnutella content distribution follows zipf distribution [8]. Keeping this in mind, both content categories and file popularity within each category is modeled with zipf distribution with a = 0.8.

Content distribution model: The content distribution model in [8] is followed for simulation purpose. In this model, each distinct file fc ,r is abstractly represented by the tuple (c, r), where c represents the content category to which the file belongs, and r represents its popularity rank within a content category c. Let content categories be C = [cj, c2,...,c32}. Each content category is characterized by its popularity rank. For example, if cj = 1, c2 = 2 and c3 = 3, then cj is more popular than c2 and hence it is more replicated than c2 and so on. Also there are more files in category cj than c2.

Table 1. Hypothetical content distribution in peer nodes

Peers

Content categories

Pi

{Ci, C2, C3}

P2

{C2, C4, Cf„ C7}

P3

{C2, C4, C7, C8}

P4

{Ci, C2}

P5

{Ci C5 C6}

Each peer randomly chooses three to six content categories to share files and shares more files in more popular categories. Table 1 shows an illustrative content distribution among five peers. The category cj is more replicated as it is most popular. The Peer 1(Pj) shares files in three categories: cj, c2, c3 where it shares maximum number of files in category cj, followed by category c2 and so on. On the other hand, Peer 3 (P3) shares maximum number of files in category c2 as it's the most popular among the categories chosen by it, followed by c4 and so on.

(3) Query initiation model: The peers usually query for files that exist on the network and are in the content category of their interest [8]. In each cycle of simulation, active peers issue queries. However number of queries a peer issues may vary from peer to peer, modeled by Poisson distribution as follows. If M is the total number of queries to be issued in each cycle of simulation and N is the number of peers present in the network, query ratetmp18-215_thumbis the mean of the Poisson process. The expressiontmp18-216_thumbgives the probability that a peer issues K queries in a cycle. The probability that a peer issues query for the file fc,r depends on the peer’s interest level in category c and rank r of the file within that category.

(4) Trust management engine: The trust management engine helps a peer to compute trust rating of other peer from past transaction history as well as recommendation from its neighbor. It allows a peer to join the network with default trust level and gradually build its reputation by providing good files to other peers. In the proposed scheme, each peer maintains a least recently used (LRU) data structure to keep track of recent transactions with almost 32 peers at a time. Each time peer i downloads a file from peer j, it rates the transaction as positive (trij=1) or negativetmp18-217_thumbi depending on whether downloaded file is authentic or fake.tmp18-218_thumbis the fraction of successful downloads peer i had made from peer j, where TD is the total number of downloads. Peer i considers peer j as trustworthy if Sij > 0.5, and malicious if Sij < 0. If 0 < Sv < 0.5, peer i considers peer j as average trustworthy. Peer i seeks recommendations from other peers about peer j if the information is not locally available.

The Proposed Trust-Aware Algorithm

The proposed scheme uses three steps for its operation: (i) search, (ii) trust checking and (iii) topology adaptation. These steps are discussed in the rest of this section.

Search

The time to live (TTL) bound search is used which evolves along with topology. At each hop, query is forwarded to a fraction of neighbors, the number of neighbors is decided based on the local estimate of network connectivity, known as probability of community formation, Probcom,. For node x, Probcom is defined as in (1):

tmp18-223_thumb

When Probcom for a node is low, the peer has the capacity to accept new community edges. Higher the value of Probcom, lesser the neighbors choose to disseminate queries. As simulation proceeds, connectivity of good nodes increases and reaches a maximum value. So, they focus on directing queries to appropriate community which may host the specific file rather than expanding communities. For example, if peer i can contact at most 10 neighbors and Probcom of j is 0.6, it forwards query to 10 x (1 – 0.6) = 4 neighbors only. The search strategy modifies itself from initial TTL limited BFS to directed DFS with the restructuring of the network. The search is carried out in two steps- query initiation and query forward. These are explained in the following.

Neighbor selection at P for query string (c2, f4). Community edges and connectivity edges are drawn with solid and dotted lines respectively. Nodes that dispatch query are shaded.

Fig. 1. Neighbor selection at P for query string (c2, f4). Community edges and connectivity edges are drawn with solid and dotted lines respectively. Nodes that dispatch query are shaded.

Query initiation: The initiating peer forms a query packet containing the name of the file (c, r) and forwards it to a certain fraction of neighbors along with the Probcom and the TTL values. The query is disseminated as follows. The neighbors are ranked based on both trustworthiness and the similarity of interest. Preference is given to the trusted neighbors who share similar contents. If there are insufficient community links, query is forwarded through connectivity links also. The various cases of neighbor selection rule are illustrated in Fig. 1. It is assumed that in each case only two neighbors are selected. When the query (c2, f4) reaches node P, following cases may occur. In first case, P has adequate community neighbors sharing file in category c2, hence they are chosen. In Case 2, there are insufficient community neighbors sharing file in the requested category, the community neighbors sharing c2 and c6 preferred to the connectivity neighbor c2 to forward query. In Case 3, only one community neighbor (c2) shares the file. Hence it is chosen. From the remaining connectivity neighbors, most trusted c6 is selected. In Case 4, only connectivity neighbor are present. Assuming all of them at the same trust level, the matching neighbor c2 is chosen and from the rest c5 is selected randomly. When a query reaches from peer j to i, the latter performs the query forward action, which is described in the following.

Query forward: (i) Check trust level of peer j: Peer i checks trust rating of peer j through check trust rating algorithm (explained later). Accordingly, the decision regarding further propagation of the query is taken. (ii) Check the availability of file: If the requested file is found, response is sent to peer j. If TTL value has not expired, the following steps are executed. (iii) Calculate the number of messages to be sent: It is calculated based on the value of Probcom.(iv) Choose neighbors: Neighbors are chosen in using neighbor selection rule. The search process is shown in Fig. 2. It is assumed that the query is forwarded at each hop to two neighbors. The matching community links are preferred over connectivity links to dispatch query.

The breadth first search (BFS) tree for the search procedure initiated by peer 1

Fig. 2. The breadth first search (BFS) tree for the search procedure initiated by peer 1

Peer 1 initiates a query and forwards it to two community neighbors 3 and 4. The query reaches peer 8 via peer 4. However, peer 8 knows that peer 4 is malicious from previous transactions. Hence it blocks the query. The query forwarded by peer 5 is also blocked by peer 10 and 11 as both of them know that peer 5 is malicious. The query is matched at four peers: 4, 6, 9 and 13. The search process is shown in Fig. 2.

Topology Adaptation: Responses are sorted by the initiating peer i based on the reputation of resource providers, and the peer having highest reputation is selected as source of download. The requesting peer checks the authenticity of downloaded file. If the file is found to be fake, peer i attempts to download from other sources. It updates the trust rating and possibly adapts topology. The restructuring of network is controlled by a parameter known as degree of rewiring which is the probability with which a link is formed between two peers. Topology adaptation consists of the following operations: (i) link deletion: Peer i deletes the existing community link with peer j if it finds peer j as malicious. (ii) link addition: Peer i probabilistically forms community link with peer j if resource is found to be authentic.

In the example shown in Fig. 3, peer 1 downloads the file from peer 4 and finds that the file is spurious. It reduces the trust score of peer 4 and deletes the community link 1–4. It then downloads the file from peer 6 and gets an authentic file. Peer 1 now sends a request to peer 6, and the latter grants the request and adds the community edge 1–6. The malicious peer 4 loses one community link and peer 6 gains one community edge. However, the network still remains connected by connectivity edges.

Topology adaptation on the search in Figure 2. Malicious nodes are shaded in gray.

Fig. 3. Topology adaptation on the search in Figure 2. Malicious nodes are shaded in gray. 

Performance Evaluation

To analyze the performance of the proposed scheme, three metrics are defined: attempt ratio, closeness centrality, and largest connected component. These metrics are discussed below and they are evaluated by simulation.

(1) Attempt ratio (AR): A peer keeps on downloading files from various sources based on their trust rating till it gets the authentic file. AR is the probability that the authentic file is downloaded in the first attempt. A high value of AR is desirable.

(2) Closeness centrality (CC): Since the topology adaptation brings the good peers closer to each other, the length of the shortest path between a pair of good peers decreases. The peers with higher CC values are topologically better positioned. If Pij is the length of the shortest path between i and j through community edges and if V denotes the set of peers, then CC for peer i is given by:

tmp18-227_thumb

(3) Largest connected component (LCC): The community edges form a trust-aware community overlay with the peers sharing similar contents. However, it will be highly probable that the trust-aware overly graph will be a disconnected graph. LCC is the largest connected component of this disconnected graph. LCC of the network can be taken as a measure of the goodness of the community structure.

A discrete time simulator written in C is used for simulation. In simulation, 6000 peer nodes, 18000 connectivity edges, 32 content categories are chosen. To make their detection difficult, the malicious peers occasionally provide authentic files with a probability known as the degree of deception. The value of this probability is taken as 0.1. The value of degree of rewiring is taken as 0.3. Due to formation of the semantic edges, the degrees of the peers are allowed to increase maximum to the extent of 30%. The TTL values for BFS and DFS are taken as 5s and 10 s respectively. Bara-basi-Alabert generator is used to generate initial power law graphs with 6000 nodes and 18000 edges. The number of searches per generation and the number of generations per cycle are taken as 5000 and 100 respectively.

To check the robustness of the algorithm against attack from malicious peers, the percentage of malicious peers is gradually increased. Fig. 4 illustrates the cost incurred by each type of peers (honest and malicious) to download authentic files. It is evident that with the increase in the percentage of malicious peers, cost for malicious peers to download authentic files decreases. The reverse is the case for honest peers.

AR vs. percentage of malicious nodes. In (a) 10%, in (b) 20% nodes are malicious.

Fig. 4. AR vs. percentage of malicious nodes. In (a) 10%, in (b) 20% nodes are malicious.

Fig. 5 depicts the size of LCC for each of the 32 content categories. It may be observed that the average size of LCC for all content categories remains the same even if the percentage of malicious peers increases. This shows that the community formation among the honest peers is not affected by the presence of malicious nodes.

Largest connected components for peers having different content categories

Fig. 5. Largest connected components for peers having different content categories 

Fig. 6 presents how the closeness centrality (CC) of good and malicious peers varies in the community topology. It may be observed that the steady state value of CC for honest peers is around 0.12, irrespective of the percentage of malicious peers in the network. However, for the malicious peers, the CC value is found to lie between 0.03 to 0.07. This implies that the malicious peers are effectively driven to the fringe of the network while the good peers are rewarded.

Closeness centrality for (a) 20% and (b) 40% Probcom malicious nodes in the network

Fig. 6. Closeness centrality for (a) 20% and (b) 40% Probcom malicious nodes in the network

 Avg. shortest path length vs generations of search at the step of ten for various percentages of malicious peers. In (a) 30% and in (b) 40% nodes are malicious.

Fig. 7. Avg. shortest path length vs generations of search at the step of ten for various percentages of malicious peers. In (a) 30% and in (b) 40% nodes are malicious.

Higher values of CC also indicate that good peers have smaller average shortest path length between them. In the simulation, the diameter of the initial network is taken as 5. At the end of one simulation run, if there is no path between a pair of peers using community edges, then the length of the shortest path between that pair is assumed to be arbitrarily long, say 15 (used in Fig. 7). The average shortest path distance (ASPD) decreases for both honest and malicious nodes. However, the rate and the extent of decrease for honest peers are much higher due to the formation of semantic communities. For malicious peers, after an initial fall, the value of ASPD increases consistently and finally reaches the maximum value of 15. On the other hand, the average value of ASPD for honest peers is observed to be around 6. Since the honest nodes are connected with shorter paths, the query propagations will also faster for these nodes. This justifies the smaller value of TTL used in the simulation.

Conclusion

In this paper, a mechanism is proposed that solves multiple problems in peer-to-peer network e.g., inauthentic download, and poor search scalability and combating free riders. It is shown that by topology adaptation, it is possible to isolate the malicious peers while providing topologically advantageous positions to the good peers so that good peers get faster and authentic responses to their queries. Simulation results have demonstrated that the protocol is robust in presence of a large percentage of malicious peers. Analysis of message overhead of the protocol constitutes a future plan of work.

Next post:

Previous post: