Security and Fault-tolerance in Distributed Systems (2012)

Course at ETH Zurich, Department of Computer Science, Spring Semester 2012

251-0470-00L

Description

This course presents methods for building dependable, secure, and highly available distributed systems. The emphasis is on replication as the means to tolerate faults. Applications to cluster computing and cloud computing services will be presented. The course presents principles and fundamental methods, and shows how they are applied to real-world systems.

Organization

Lecturer. Dr. Christian Cachin, IBM Research - Zurich.
Teaching Assistant. Pavel Raykov, Information Security & Cryptography, ETH Zürich.

Dates.

The lecture of February 24 has to be canceled.

Lecture: Friday, 13:15-15:00, ML F 34, starting 2. March 2012.
Exercise: Friday, 15:15-16:00, ML F 34, starting 2. March 2012.

Web page. http://cachin.com/cc/sft12/,

The course is part of the Information Security Master Track.

Prerequisites. Knowledge in information security and/or network security, cryptology, and distributed systems. In particular, this course uses concepts from public-key cryptology (RSA, Diffie-Hellman) and reliability in asynchronous message-passing systems. Corresponding to ETHZ D-INFK courses "Information Security" and "Verteilte Systeme" ("Distributed Systems") or equivalent.

Topics (tentative)

  1. Dependability
  2. Communication primitives
  3. Reliable broadcast
  4. Distributed storage
  5. Consensus
  6. (intentionally left blank)
  7. Distributed cryptography and proactive recovery
  8. System examples, use in cloud platforms for storage and services

Schedule

Exercises are usually due one week after assignment.

Date Lecture notes Assigned exercise
 
2. March (1) Introduction and (2) Dependability
 
Exercise 1
9. March Primitives (communication, failure detectors, cryptography)
[CGR11] 1; 2.1, 2.2, 2.4.1, 2.4.4, 2.5, 2.6.1, 2.6.2
 
Exercise 2
16. March (3) Reliable broadcast, FIFO broadcast, causal broadcast
[CGR11] 3.1, 3.2, 3.3 (excl. 3.3.2), 3.4 (excl. 3.4.3), 3.9 (3.9.1-3.9.3, Alg. 3.24)
 
Exercise 3
23. March (3ff.) Byzantine broadcasts
[CGR11] 3.10 (excl. 3.10.4), 3.11
 
Exercise 4
30. March (4) Shared memory (safe, regular, atomic registers)
[CGR11] 4.1, 4.2, 4.3.1, 4.3.3
 
Exercise 5
20. April (4ff.) Byzantine shared memory
[CGR11] 4.6, 4.7.1, 4.7.2
 
Exercise 6
27. April (7) Distributed cryptography (slides) and intercloud storage (slides) [BCE+12]
Guest lecture by Alessandro Sorniotti
 
Exercise 7
4. May (7ff.) Distributed cryptography (cont.), proactive security
[Handout] 7.1, 7.2, 7.3, 7.6
 
Exercise 8
11. May (5) Consensus and atomic broadcast
[CGR11] 5.1.1, 5.2.1, 5.2.2, 6.1
 
Exercise 9
18. May (5ff.) Leader election and fail-noisy uniform consensus (Paxos algorithm)
[CGR11] 2.6.4, 2.6.5, 5.3
 
Exercise 10
25. May (5ff.) Randomized and Byzantine consensus
[CGR11] 5.5.1-5.5.3, 5.6.1; [Handout] 7.5
 
Exercise 11
1. June (8) Practical systems: ZooKeeper [HKJR10,JRS11], Windows Azure Storage [CWO11], Sintra/DNS [CP02,CS04]
 
 
 

Literature

Main reference

[CGR11] Christian Cachin, Rachid Guerraoui, and Luís Rodrigues. Introduction to Reliable and Secure Distributed Programming (Second Edition). Springer, 2011.

Online at springerlink.com

Link to Amazon.de

The notes at the end of every chapter provide background literature. Chapter 7 points to related and more advanced literature.

Further references

Assessment

Exercises and Exam. The exercises are an integral part of the course. We encourage you to attend the exercise classes, to participate actively and to return your solutions. The main reference textbook [CGR11] contains also many exercises with solutions.

There will be an oral exam, held during the ETHZ exam session. The exam will cover the material presented in class and also some material presented in the exercises.


Last updated , by
Christian Cachin.