Advanced Distributed

Скачать материал

Скачать материал "Advanced Distributed"

Описание презентации по отдельным слайдам:

1 слайд

1
CS 525
Advanced Distributed Systems
Spring 2011
Indranil Gupta (Indy)

Membership Protocols (and Failure Detectors)
March 31, 2011
All Slides © IG
2 слайд

2
Target Settings
Process ‘group’-based systems
Clouds/Datacenters
Replicated servers
Distributed databases

Crash-stop/Fail-stop process failures
3 слайд

3
Group Membership Service
Application Queries
e.g., gossip, overlays, DHT’s, etc.
Membership
Protocol
Group
Membership List
joins, leaves, failures
of members
Unreliable
Communication
Application Process pi
Membership List
4 слайд

4
Two sub-protocols
Dissemination
Failure Detector
Application Process pi
pj
Group
Membership List
Unreliable
Communication
Almost-Complete list (focus of this talk)
Gossip-style, SWIM, Virtual synchrony, …
Or Partial-random list (other papers)
SCAMP, T-MAN, Cyclon,…
5 слайд

5
Large Group: Scalability A Goal
this is us (pi)
Unreliable Communication
Network
1000’s of processes
Process Group
“Members”
6 слайд

6
pj
I pj crashed
Group Membership Protocol
Unreliable Communication
Network
pi
Some process
finds out quickly
Failure Detector
II
Dissemination
III
Crash-stop Failures only
7 слайд

7
I. pj crashes
Nothing we can do about it!
A frequent occurrence
Common case rather than exception
Frequency goes up at least linearly with size of datacenter
8 слайд

8
II. Distributed Failure Detectors: Desirable Properties
Completeness = each failure is detected
Accuracy = there is no mistaken detection
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
9 слайд

9
Distributed Failure Detectors: Properties
Completeness
Accuracy

Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load

Impossible together in
lossy networks [Chandra
and Toueg]

If possible, then can
solve consensus!
10 слайд

10
What Real Failure Detectors Prefer
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
Guaranteed
Partial/Probabilistic
guarantee
11 слайд

11
Failure Detector Properties
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load

Time until some
process detects the failure
Guaranteed
Partial/Probabilistic
guarantee
12 слайд

12
Failure Detector Properties
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load

Time until some
process detects the failure
Guaranteed
Partial/Probabilistic
guarantee
No bottlenecks/single
failure point
13 слайд

13
Failure Detector Properties
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load

In spite of
arbitrary simultaneous
process failures
14 слайд

14
Centralized Heartbeating
…
pi, Heartbeat Seq. l++
pi
 Hotspot
pj
Heartbeats sent periodically
If heartbeat not received from pi within
timeout, mark pi as failed
15 слайд

15
Ring Heartbeating
pi, Heartbeat Seq. l++
 Unpredictable on
simultaneous multiple failures
pi
…
…
pj
16 слайд

16
All-to-All Heartbeating
pi, Heartbeat Seq. l++
…
 Equal load per member

pi
pj
17 слайд

17
Gossip-style Heartbeating
Array of
Heartbeat Seq. l
for member subset
 Good accuracy properties
pi
18 слайд

18
Gossip-Style Failure Detection
1
2
4
3
Protocol:
Nodes periodically gossip their membership list
On receipt, the local membership list is updated
Current time : 70 at node 2
(asynchronous clocks)
Address
Heartbeat Counter
Time (local)
Fig and animation by: Dongyun Jin and Thuy Ngyuen
19 слайд

19
Gossip-Style Failure Detection
If the heartbeat has not increased for more than Tfail seconds,
the member is considered failed
And after Tcleanup seconds, it will delete the member from the list
Why two different timeouts?
20 слайд

20
Gossip-Style Failure Detection
What if an entry pointing to a failed node is deleted right after Tfail seconds?

Fix: remember for another Tfail
1
2
4
3
Current time : 75 at node 2
21 слайд

21
Multi-level Gossiping
Network topology is hierarchical
Random gossip target selection => core routers face O(N) load (Why?)

Fix: Select gossip target in subnet I, which contains ni nodes, with probability 1/ni
Router load=O(1)
Dissemination time=O(log(N))
Why?
What about latency for multi-level topologies?
[Gupta et al, TPDS 06]

Router
N/2 nodes in a subnet
N/2 nodes in a subnet
22 слайд

22
Analysis/Discussion
What happens if gossip period Tgossip is decreased?
A single heartbeat takes O(log(N)) time to propagate. So: N heartbeats take:
O(log(N)) time to propagate, if bandwidth allowed per node is allowed to be O(N)
O(N.log(N)) time to propagate, if bandwidth allowed per node is only O(1)
What about O(k) bandwidth?
What happens to Pmistake (false positive rate) as Tfail ,Tcleanup is increased?
Tradeoff: False positive rate vs. detection time
23 слайд

23
Simulations
As # members increases, the detection time increases
As requirement is loosened, the detection time decreases
As # failed members increases, the detection time increases significantly
The algorithm is resilient to message loss
24 слайд

24
Failure Detector Properties …
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load
25 слайд

25
…Are application-defined Requirements
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load

Guarantee always
Probability PM(T)
T time units
26 слайд

26
…Are application-defined Requirements
Completeness
Accuracy
Speed
Time to first detection of a failure
Scale
Equal Load on each member
Network Message Load

Guarantee always
Probability PM(T)
T time units
N*L: Compare this across protocols
27 слайд

27
All-to-All Heartbeating
pi, Heartbeat Seq. l++
…
pi
Every T units
L=N/T
28 слайд

28
Gossip-style Heartbeating
Array of
Heartbeat Seq. l
for member subset
pi
Every tg units
=gossip period,
send O(N) gossip
message
T=logN * tg
L=N/tg=N*logN/T
29 слайд

29

Worst case load L*
as a function of T, PM(T), N
Independent Message Loss probability pml

(proof in PODC 01 paper)

What’s the Best/Optimal we can do?
30 слайд

30
Heartbeating
Optimal L is independent of N (!)
All-to-all and gossip-based: sub-optimal
L=O(N/T)
try to achieve simultaneous detection at all processes
fail to distinguish Failure Detection and Dissemination components

Key:
Separate the two components
Use a non heartbeat-based Failure Detection Component
31 слайд

31
SWIM Failure Detector Protocol
Protocol period
= T’ time units
X
K random
processes
pi
ping
ack
ping-req
ack
random pj
X
ack
ping
random K
pj
32 слайд

32
SWIM versus Heartbeating
Process Load
First Detection
Time
Constant
Constant
O(N)
O(N)
SWIM
For Fixed :
False Positive Rate
Message Loss Rate
Heartbeating
Heartbeating
33 слайд

33
SWIM Failure Detector
34 слайд

34
Accuracy, Load

PM(T) is exponential in -K. Also depends on pml (and pf )
See paper

for up to 15 % loss rates
35 слайд

35
Prob. of being pinged in T’=

E[T ] =

Completeness: Any alive member detects failure
Eventually
By using a trick: within worst case O(N) protocol periods

Detection Time
36 слайд

36
pj crashed
III. Dissemination
Unreliable Communication
Network
pi
Dissemination
HOW ?
Failure Detector
Some process
finds out quickly
37 слайд

37
Dissemination Options
Multicast (Hardware / IP)
unreliable
multiple simultaneous multicasts
Point-to-point (TCP / UDP)
expensive
Zero extra messages: Piggyback on Failure Detector messages
Infection-style Dissemination
38 слайд

38
Infection-style Dissemination
Protocol period
= T time units
X
pi
ping
ack
ping-req
ack
random pj
X
ack
ping
random K
pj
Piggybacked
membership
information
K random
processes
39 слайд

39
Infection-style Dissemination
Epidemic style dissemination
After protocol periods, processes would not have heard about an update
Maintain a buffer of recently joined/evicted processes
Piggyback from this buffer
Prefer recent updates
Buffer elements are garbage collected after a while
After protocol periods; this defines weak consistency
40 слайд

40
Suspicion Mechanism
False detections, due to
Perturbed processes
Packet losses, e.g., from congestion
Indirect pinging may not solve the problem
e.g., correlated message losses near pinged host
Key: suspect a process before declaring it as failed in the group
41 слайд

41
Suspicion Mechanism
Alive
Suspected
Failed
Dissmn (Suspect pj)
Dissmn (Alive pj)
Dissmn (Failed pj)
pi :: State Machine for pj view element
FD:: pi ping failed
Dissmn::(Suspect pj)
Time out
FD::pi ping success
Dissmn::(Alive pj)
Dissmn
FD
pi
42 слайд

42
Suspicion Mechanism
Distinguish multiple suspicions of a process
Per-process incarnation number
Inc # for pi can be incremented only by pi
e.g., when it receives a (Suspect, pi) message
Somewhat similar to DSDV
Higher inc# notifications over-ride lower inc#’s
Within an inc#: (Suspect inc #) > (Alive, inc #)
Nothing overrides a (Failed, inc #)
See paper
43 слайд

43
Time-bounded Completeness
Key: select each membership element once as a ping target in a traversal
Round-robin pinging
Random permutation of list after each traversal
Each failure is detected in worst case 2N-1 (local) protocol periods
Preserves FD properties
44 слайд

44
Results from an Implementation
Current implementation
Win2K, uses Winsock 2
Uses only UDP messaging
900 semicolons of code (including testing)
Experimental platform
Galaxy cluster: diverse collection of commodity PCs
100 Mbps Ethernet
Default protocol settings
Protocol period=2 s; K=1; G.C. and Suspicion timeouts=3*ceil[log(N+1)]
No partial membership lists observed in experiments
45 слайд

45
Per-process Send and Receive Loads
are independent of group size
46 слайд

46
Time to First Detection of a process failure

T1
T1+T2+T3
47 слайд

47
T1
Time to First Detection of a process failure
apparently uncorrelated to group size
T1+T2+T3
48 слайд

48
Membership Update Dissemination Time
is low at high group sizes
T2
+
T1+T2+T3
49 слайд

49
Excess time taken by
Suspicion Mechanism
T3
+
T1+T2+T3
50 слайд

50
Benefit of Suspicion Mechanism:
Per-process 10% synthetic packet loss
51 слайд

51
More discussion points
It turns out that with a partial membership list that is uniformly random, gossiping retains same properties as with complete membership lists
Why? (Think of the equation)
Partial membership protocols
SCAMP, Cyclon, TMAN, …
Gossip-style failure detection underlies
Astrolabe
Amazon EC2/S3 (rumored!)
SWIM used in
CoralCDN/Oasis anycast service: http://oasis.coralcdn.org
Mike Freedman used suspicion mechanism to blackmark frequently-failing nodes
52 слайд

Reminder – Due this Sunday April 3rd at 11.59 PM
Project Midterm Report due, 11.59 pm [12pt font, single-sided, 8 + 1 page Business Plan max]
Wiki Term Paper - Second Draft Due (Individual)

Reviews – you only have to submit reviews for 15 sessions (any 15 sessions) from 2/10 to 4/28. Keep track of your count! Take a breather!
53 слайд

53
Questions

Скачать материал

Найдите материал к любому уроку, указав свой предмет (категорию), класс, учебник и тему:

6 666 268 материалов в базе

Найти материалы

Скачать материал

Другие материалы

pptx

Монополия

17.12.2020
874
0

pptx

УСТОЙЧИВОСТЬ ЛИОФОБНЫХ ДИСПЕРСНЫХ СИСТЕМ

11.12.2020
978
8

pptx

Древневосточная философия

07.12.2020
1090
2

pptx

КИБЕРНЕТИКА И ИНФОРМАТИКА

25.10.2020
500
0

pptx

АДСОРБЦИЯ ИЗ РАСТВОРОВ НА РАЗНЫХ поверхностях фаз

15.10.2020
762
1

pptx

Political Science

16.07.2020
405
5

pptx

ТЕРМОДИНАМИЧЕСКИ УСТОЙЧИВЫЕ (ЛИОФИЛЬНЫЕ) ДИСПЕРСНЫЕ СИСТЕМЫ

05.07.2020
795
2

pptx

МОДЕЛИРОВАНИЕ

25.06.2020
204
2

Вам будут интересны эти курсы:

Оставьте свой комментарий

Авторизуйтесь, чтобы задавать вопросы.

Скачать материал
- 16.11.2020 216
- PPTX 1 мбайт
- Оцените материал:
Настоящий материал опубликован пользователем Кузнецова Ирина Николаевна. Инфоурок является информационным посредником и предоставляет пользователям возможность размещать на сайте методические материалы. Всю ответственность за опубликованные материалы, содержащиеся в них сведения, а также за соблюдение авторских прав несут пользователи, загрузившие материал на сайт

Если Вы считаете, что материал нарушает авторские права либо по каким-то другим причинам должен быть удален с сайта, Вы можете оставить жалобу на материал.
Удалить материал
Автор материала

Кузнецова Ирина Николаевна
- На сайте: 3 года и 4 месяца
- Подписчики: 0
- Всего просмотров: 112981
- Всего материалов: 235