Рабочие листы
к вашим урокам
Скачать
1 слайд
1
CS 525
Advanced Distributed Systems
Spring 2011
Indranil Gupta (Indy)
Membership Protocols (and Failure Detectors)
March 31, 2011
All Slides © IG
2 слайд
2
Target Settings
Process ‘group’-based systems
Clouds/Datacenters
Replicated servers
Distributed databases
Crash-stop/Fail-stop process failures
3 слайд
3
Group Membership Service
Application Queries
e.g., gossip, overlays, DHT’s, etc.
Membership
Protocol
Group
Membership List
joins, leaves, failures
of members
Unreliable
Communication
Application Process pi
Membership List
4 слайд
4
Two sub-protocols
Dissemination
Failure Detector
Application Process pi
pj
Group
Membership List
Unreliable
Communication
Almost-Complete list (focus of this talk)
Gossip-style, SWIM, Virtual synchrony, …
Or Partial-random list (other papers)
SCAMP, T-MAN, Cyclon,…
5 слайд
5
Large Group: Scalability A Goal
this is us (pi)
Unreliable Communication
Network
1000’s of processes
Process Group
“Members”
6 слайд
6
pj
I pj crashed
Group Membership Protocol
Unreliable Communication
Network
pi
Some process
finds out quickly
Failure Detector
II
Dissemination
III
Crash-stop Failures only
7 слайд
7
I. pj crashes
Nothing we can do about it!
A frequent occurrence
Common case rather than exception
Frequency goes up at least linearly with size of datacenter
8 слайд
8
II. Distributed Failure Detectors: Desirable Properties
Completeness = each failure is detected
Accuracy = there is no mistaken detection
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
9 слайд
9
Distributed Failure Detectors: Properties
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
Impossible together in
lossy networks [Chandra
and Toueg]
If possible, then can
solve consensus!
10 слайд
10
What Real Failure Detectors Prefer
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
Guaranteed
Partial/Probabilistic
guarantee
11 слайд
11
Failure Detector Properties
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
Time until some
process detects the failure
Guaranteed
Partial/Probabilistic
guarantee
12 слайд
12
Failure Detector Properties
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
Time until some
process detects the failure
Guaranteed
Partial/Probabilistic
guarantee
No bottlenecks/single
failure point
13 слайд
13
Failure Detector Properties
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
In spite of
arbitrary simultaneous
process failures
14 слайд
14
Centralized Heartbeating
…
pi, Heartbeat Seq. l++
pi
Hotspot
pj
Heartbeats sent periodically
If heartbeat not received from pi within
timeout, mark pi as failed
15 слайд
15
Ring Heartbeating
pi, Heartbeat Seq. l++
Unpredictable on
simultaneous multiple failures
pi
…
…
pj
16 слайд
16
All-to-All Heartbeating
pi, Heartbeat Seq. l++
…
Equal load per member
pi
pj
17 слайд
17
Gossip-style Heartbeating
Array of
Heartbeat Seq. l
for member subset
Good accuracy properties
pi
18 слайд
18
Gossip-Style Failure Detection
1
2
4
3
Protocol:
Nodes periodically gossip their membership list
On receipt, the local membership list is updated
Current time : 70 at node 2
(asynchronous clocks)
Address
Heartbeat Counter
Time (local)
Fig and animation by: Dongyun Jin and Thuy Ngyuen
19 слайд
19
Gossip-Style Failure Detection
If the heartbeat has not increased for more than Tfail seconds,
the member is considered failed
And after Tcleanup seconds, it will delete the member from the list
Why two different timeouts?
20 слайд
20
Gossip-Style Failure Detection
What if an entry pointing to a failed node is deleted right after Tfail seconds?
Fix: remember for another Tfail
1
2
4
3
Current time : 75 at node 2
21 слайд
21
Multi-level Gossiping
Network topology is hierarchical
Random gossip target selection => core routers face O(N) load (Why?)
Fix: Select gossip target in subnet I, which contains ni nodes, with probability 1/ni
Router load=O(1)
Dissemination time=O(log(N))
Why?
What about latency for multi-level topologies?
[Gupta et al, TPDS 06]
Router
N/2 nodes in a subnet
N/2 nodes in a subnet
22 слайд
22
Analysis/Discussion
What happens if gossip period Tgossip is decreased?
A single heartbeat takes O(log(N)) time to propagate. So: N heartbeats take:
O(log(N)) time to propagate, if bandwidth allowed per node is allowed to be O(N)
O(N.log(N)) time to propagate, if bandwidth allowed per node is only O(1)
What about O(k) bandwidth?
What happens to Pmistake (false positive rate) as Tfail ,Tcleanup is increased?
Tradeoff: False positive rate vs. detection time
23 слайд
23
Simulations
As # members increases, the detection time increases
As requirement is loosened, the detection time decreases
As # failed members increases, the detection time increases significantly
The algorithm is resilient to message loss
24 слайд
24
Failure Detector Properties …
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
25 слайд
25
…Are application-defined Requirements
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
Guarantee always
Probability PM(T)
T time units
26 слайд
26
…Are application-defined Requirements
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
Guarantee always
Probability PM(T)
T time units
N*L: Compare this across protocols
27 слайд
27
All-to-All Heartbeating
pi, Heartbeat Seq. l++
…
pi
Every T units
L=N/T
28 слайд
28
Gossip-style Heartbeating
Array of
Heartbeat Seq. l
for member subset
pi
Every tg units
=gossip period,
send O(N) gossip
message
T=logN * tg
L=N/tg=N*logN/T
29 слайд
29
Worst case load L*
as a function of T, PM(T), N
Independent Message Loss probability pml
(proof in PODC 01 paper)
What’s the Best/Optimal we can do?
30 слайд
30
Heartbeating
Optimal L is independent of N (!)
All-to-all and gossip-based: sub-optimal
L=O(N/T)
try to achieve simultaneous detection at all processes
fail to distinguish Failure Detection and Dissemination components
Key:
Separate the two components
Use a non heartbeat-based Failure Detection Component
31 слайд
31
SWIM Failure Detector Protocol
Protocol period
= T’ time units
X
K random
processes
pi
ping
ack
ping-req
ack
random pj
X
ack
ping
random K
pj
32 слайд
32
SWIM versus Heartbeating
Process Load
First Detection
Time
Constant
Constant
O(N)
O(N)
SWIM
For Fixed :
False Positive Rate
Message Loss Rate
Heartbeating
Heartbeating
33 слайд
33
SWIM Failure Detector
34 слайд
34
Accuracy, Load
PM(T) is exponential in -K. Also depends on pml (and pf )
See paper
for up to 15 % loss rates
35 слайд
35
Prob. of being pinged in T’=
E[T ] =
Completeness: Any alive member detects failure
Eventually
By using a trick: within worst case O(N) protocol periods
Detection Time
36 слайд
36
pj crashed
III. Dissemination
Unreliable Communication
Network
pi
Dissemination
HOW ?
Failure Detector
Some process
finds out quickly
37 слайд
37
Dissemination Options
Multicast (Hardware / IP)
unreliable
multiple simultaneous multicasts
Point-to-point (TCP / UDP)
expensive
Zero extra messages: Piggyback on Failure Detector messages
Infection-style Dissemination
38 слайд
38
Infection-style Dissemination
Protocol period
= T time units
X
pi
ping
ack
ping-req
ack
random pj
X
ack
ping
random K
pj
Piggybacked
membership
information
K random
processes
39 слайд
39
Infection-style Dissemination
Epidemic style dissemination
After protocol periods, processes would not have heard about an update
Maintain a buffer of recently joined/evicted processes
Piggyback from this buffer
Prefer recent updates
Buffer elements are garbage collected after a while
After protocol periods; this defines weak consistency
40 слайд
40
Suspicion Mechanism
False detections, due to
Perturbed processes
Packet losses, e.g., from congestion
Indirect pinging may not solve the problem
e.g., correlated message losses near pinged host
Key: suspect a process before declaring it as failed in the group
41 слайд
41
Suspicion Mechanism
Alive
Suspected
Failed
Dissmn (Suspect pj)
Dissmn (Alive pj)
Dissmn (Failed pj)
pi :: State Machine for pj view element
FD:: pi ping failed
Dissmn::(Suspect pj)
Time out
FD::pi ping success
Dissmn::(Alive pj)
Dissmn
FD
pi
42 слайд
42
Suspicion Mechanism
Distinguish multiple suspicions of a process
Per-process incarnation number
Inc # for pi can be incremented only by pi
e.g., when it receives a (Suspect, pi) message
Somewhat similar to DSDV
Higher inc# notifications over-ride lower inc#’s
Within an inc#: (Suspect inc #) > (Alive, inc #)
Nothing overrides a (Failed, inc #)
See paper
43 слайд
43
Time-bounded Completeness
Key: select each membership element once as a ping target in a traversal
Round-robin pinging
Random permutation of list after each traversal
Each failure is detected in worst case 2N-1 (local) protocol periods
Preserves FD properties
44 слайд
44
Results from an Implementation
Current implementation
Win2K, uses Winsock 2
Uses only UDP messaging
900 semicolons of code (including testing)
Experimental platform
Galaxy cluster: diverse collection of commodity PCs
100 Mbps Ethernet
Default protocol settings
Protocol period=2 s; K=1; G.C. and Suspicion timeouts=3*ceil[log(N+1)]
No partial membership lists observed in experiments
45 слайд
45
Per-process Send and Receive Loads
are independent of group size
46 слайд
46
Time to First Detection of a process failure
T1
T1+T2+T3
47 слайд
47
T1
Time to First Detection of a process failure
apparently uncorrelated to group size
T1+T2+T3
48 слайд
48
Membership Update Dissemination Time
is low at high group sizes
T2
+
T1+T2+T3
49 слайд
49
Excess time taken by
Suspicion Mechanism
T3
+
T1+T2+T3
50 слайд
50
Benefit of Suspicion Mechanism:
Per-process 10% synthetic packet loss
51 слайд
51
More discussion points
It turns out that with a partial membership list that is uniformly random, gossiping retains same properties as with complete membership lists
Why? (Think of the equation)
Partial membership protocols
SCAMP, Cyclon, TMAN, …
Gossip-style failure detection underlies
Astrolabe
Amazon EC2/S3 (rumored!)
SWIM used in
CoralCDN/Oasis anycast service: http://oasis.coralcdn.org
Mike Freedman used suspicion mechanism to blackmark frequently-failing nodes
52 слайд
Reminder – Due this Sunday April 3rd at 11.59 PM
Project Midterm Report due, 11.59 pm [12pt font, single-sided, 8 + 1 page Business Plan max]
Wiki Term Paper - Second Draft Due (Individual)
Reviews – you only have to submit reviews for 15 sessions (any 15 sessions) from 2/10 to 4/28. Keep track of your count! Take a breather!
53 слайд
53
Questions
Рабочие листы
к вашим урокам
Скачать
6 666 268 материалов в базе
Настоящий материал опубликован пользователем Кузнецова Ирина Николаевна. Инфоурок является информационным посредником и предоставляет пользователям возможность размещать на сайте методические материалы. Всю ответственность за опубликованные материалы, содержащиеся в них сведения, а также за соблюдение авторских прав несут пользователи, загрузившие материал на сайт
Если Вы считаете, что материал нарушает авторские права либо по каким-то другим причинам должен быть удален с сайта, Вы можете оставить жалобу на материал.
Удалить материалВаша скидка на курсы
40%Курс профессиональной переподготовки
500/1000 ч.
Курс профессиональной переподготовки
600 ч.
Курс профессиональной переподготовки
300/600 ч.
Курс повышения квалификации
72/180 ч.
Оставьте свой комментарий
Авторизуйтесь, чтобы задавать вопросы.