Download Scalable P2P Architectures Outline | CLASSIC 0153I and more Study notes Classical Philology in PDF only on Docsity! Scalable P2P architectures
Oscar Boykin
Electrical Engineering, UCLA
Joint work with:
Jesse Bridgewater, Joseph Kong, Kamen Lozev, Behnam Rezaei, Vwani
Roychowdhury, Nima Sarshar
Outline
* Introduction to P2P models: DHT and
Unstructured Query Systems
* Routing Packets on a “Small World”
* Properties of real P2P systems (e.g. Gnutella).
* A model for Power-law graphs
* Percolating Messages on a Graph
* Design of a new P2P system: Brunet
DHT Systems
° If each node does not have a pointer to every
other node, routing schemes are introduced.
* Each node knows about k other nodes.
* All queries are routed through these k nodes.
* The query should be resolved in the fewest
number of hops.
* Most academic work has focused on DHT
systems.
Hyperspace Routing (Pastry and Tapestry) 010000 110 100 111 011001 101 Examples: Routing 101 starting at 000: 000 > 100 > 101 Routing 101 starting at 010: 110 > 100 > 101 Routing 101 starting at 011: 011 > 111 > 101 ● Messages are routed by matching the prefix of the destination to the current node, and sending to the node which matches the next element. ● Nodes need O( M (log n)/log M ) neighbors for an alphabet of size M, which gives O( log n/ log M) distance. Distance Based Routing ● A distance metric is defined on the key space. ● Nodes are connected to their nearest neighbors in the space and usually to remote nodes. ● Messages are routed to the node which is closest to the destination. ● Examples: System Space Latency Connections CAN M-dimension torus M N ^{1/M}Neighbors: M Chord Ring log N Neighbors exponentially increasing: log N Symphony Ring (log^2 N)/k Neighbors and k remote Viceroy log N stacked rings log N Neighbors A Routable Small World
The red nodes have
connections to distance eee eOeq
L with P(L) ~ 1/L ® 1 © @
How can we
show it is
routable?
Greedy Routing Works ● The probability of connections going a distance d: – P(d)=1/d log N ● What's the probability that a connection takes us to a distance less than d: Source DestinationDistance = d dDistance P =∫ d 1−d 1 x log N dx=−log 1− log N Greedy Routing Works ● How many such connections are needed to get close: ● How many nodes (M) do we need to get lucky L times: Source DestinationDistance = d dDistance L d=log N L= log log N d log M P =L M=L÷−log1− log N M= log N log d−log log N log1− log Since we must be prepared for d = N, then: M = O(log^2 N) Broadcast Query Systems ● In a broadcast query system, each node has some records. To query the network, the node sends a query to ALL neighbors. ● Each query has an identifying number, responses are routed back the way the query came. ● To query the entire system, a query will need to cross all edges (E), thus query cost is O(E) and E > N for all connected networks. How do we make scalable query systems? ● Gnutella is popular protocol for file sharing which uses the unstructured query model. ● To attempt to solve the scalability problems, they introduced “UltraPeers”, which are nodes that keep copies of all the records of their “LeafPeers”. ● Now, each query costs O(U), if U is the number of UltraPeers. But, if U is a constant fraction of N, then query costs are still O(N), only the constant has changed. Can we do better if we take advantage of network structure? Scale Free Networks ● Many large networks with interacting nodes, are what is called “scale free” networks, or powerlaw networks. ● Many mechanisms have been suggested which can account for such degree distributions. ● Powerlaw distributions are called scale free because of the following feature: Pk= k ∝1/k P k= k = / k ∝1/k Preferential Attachment ● A simple model which gives rise to a powerlaw degree distribution was proposed by Barabasi, Albert 1999. ● At each time step, a new node joins and selects a node to connect to. The target node is selected with a probability proportional to its degree. The probability we select a node of degree k: ● Assuming a steady state solution, we want to write a difference equation for the number of nodes with degree k: qk= k nk 2 nk=qk−1−qkk ,1 nk= k−1nk−1−k nk 2 k ,1 k2nk=k−1nk−1k ,1 nk= 4 k k1k2 ∝1/k3 (Bond) Percolation Problem:
° If we have a graph and we delete each edge with
probability (1-p), as a function of p, what is the
size of the largest connected component?
Bond Percolation on Random Graphs (with generating functions) ● Suppose we have a random graph with a constrained degree distribution: p(k). Each node has a degree selected according to this distribution, but its edges are randomly connected. ● We use a generating function to represent this distribution: P x =∑k x k pk ● If the random variable Z is the sum of independent random variables: Z = K_1 + K_2 +... + K_m, then the generating function is the product: Q x =∑z x z pz=∏ P x =[P x ]m We can put this together to compute expected cluster sizes! ● The mean is the first derivative at x=1: P ' x =∑k x k−1 k pk P ' 1=∑k k pk=〈k 〉 Percolation Thresholds for Example Graphs pk= Zeta 3 k3 〈k 〉=Zeta 2/Zeta 3≈1.37 〈k2〉=Zeta 1=∞ qc= 〈k 〉 〈k2〉−〈k 〉 =0 pk= Zeta 4 k 4 〈k 〉=Zeta 3/Zeta 4≈1.11 〈k2〉=Zeta 2= 2 6 qc= 〈k 〉 〈k2〉−〈k 〉 =2.071 pk= Zeta 3.5 k3.5 〈k 〉=Zeta 2.5/Zeta 3.5≈1.19 〈k2〉=Zeta 1.5=2.61 qc= 〈k 〉 〈k2〉−〈k 〉 =0.83 pk=−1 −k 〈k 〉= −1 〈k2〉=1 −12 qc= 〈k 〉 〈k2〉−〈k 〉 = −1 2 What does this mean? We can predict how many edges need to pass a packet to reach a constant fraction of the nodes! Percolation in P2P
(due to Nima Sarshar)
With probability p we send the query to each
neighbor.
Each node that gets the query responds with any
matches, and sends the query to each of his
neighbors with probability p.
How small can p be?
It must be bigger than q_c!
Getting Polylog Scaling in Unstructured Query Systems ● Assume we have a random network of N nodes, and a degree distribution ~ 1/k^2. There is a maximum degree k_max (which is O(N)). ● We can get such a network using the protocol from Sarshar, Roychowdhury (PRE 2004) ● What is the cost of a percolation query at the threshold? C=qc E= qc 〈k 〉N 2 pk=/k 2 〈k 〉= log kmax 〈k2〉= kmax qc= log kmax kmax−log kmax C= log kmax kmax−log kmax N log kmax C= log2 kmax kmax /N−log kmax /N kmax=O N C≈ log2 N Hence we get only O(log^2 N) cost for each query! Simulation Results
Performance of Percolation Search an CRAWLS Network with TTL=15
1 — 1 r T
Hit Rate i
Fraction of Edges Used -------
O9F Fraction of Nodes Used -------- ‘ i 4
O8
O7 -
O68 F
O5 -
O4 -
O38 +
02 be
Ot F
0.15 2 0.25
Percolation Probability
* A percolation search protocol on a Gnutella network of size
39,730. The network structure was obtained by Limewire's
Crawler
Brunet: A Hybrid P2P System ● DHTs cannot resolve general queries. ● Unstructured systems (usually) require large routing tables to return query hits. ● Brunet is a new P2P protocol which combines the advantages of both DHTs and Unstructured Powerlaw networks. ● Brunet offers a general P2P foundation on which a wide variety of protocols and applications can build. Brunet: A Hybrid P2P System
¢ Each node has a 160 bit address which can also be thought of as a 160 bit
positive integer. A distance metric using the integer representation.
¢ Each node is situated on a routable small world ring with “structured”
connections to its neighbors on the ring, and shortcuts to remote locations.
¢ Each node also is on an “unstructured” network and has “unstructured”
oe "eo nodes on that power-law network ow we
Structured Subgraph (small world) Unstructured Subgraph (1/k‘2)
Brunet Implementation ● The first implementation of the Brunet protocol is being completed at UCLA's Complex Networks Group. ● The code is developed using GNU/Linux and the Mono C# development environment. ● In addition to a programming library which implements the Brunet protocol, we have developed other tools: – Netmodeler: a general C++ network modeling package – Brunet Verifier: a protocol debugger for Brunet implementations Open Problems ● Can the DHT or unstructured systems be used to build an improved model of distributed computing (e.g. how can these P2P models help in mapping task graphs onto resources)? ● What common primitives can be implemented using P2P systems? (e.g. what kinds of communications costs are incurred building a P2P Database?) ● What results can be obtained about protocol security? Can bad nodes ruin the network? Summary ● Using models inspired from social contexts (such as small world and powerlaw networks) we see how some computer networking systems and architectures can be improved. ● Statistical Mechanics tools (percolation) allow us to analyze some novel networking conditions. ● By engineering previously ignored structural details of P2P systems, polylog scaling is achieved. ● The Brunet P2P system puts the DHT model together with the percolation search to get state of the art scaling properties.