Security and Fault-tolerance in Distributed Systems (2012)

Course at ETH Zurich, Department of Computer Science , Spring Semester 2012

Description

This course presents methods for building dependable, secure, and highly available distributed systems. The emphasis is on replication as the means to tolerate faults. Applications to cluster computing and cloud computing services will be presented. The course presents principles and fundamental methods, and shows how they are applied to real-world systems.

Organization

Lecturer. Dr. Christian Cachin, IBM Research - Zurich.
Teaching Assistant. Pavel Raykov, Information Security & Cryptography, ETH Zürich.

Dates.

The lecture of February 24 has to be canceled.

Lecture: Friday, 13:15-15:00, ML F 34, starting 2. March 2012.

Exercise: Friday, 15:15-16:00, ML F 34, starting 2. March 2012.

Web page. http://cachin.com/cc/sft12/,

The course is part of the Information Security Master Track.

Prerequisites. Knowledge in information security and/or network security, cryptology, and distributed systems. In particular, this course uses concepts from public-key cryptology (RSA, Diffie-Hellman) and reliability in asynchronous message-passing systems. Corresponding to ETHZ D-INFK courses "Information Security" and "Verteilte Systeme" ("Distributed Systems") or equivalent.

Topics (tentative)

Dependability
Communication primitives
Reliable broadcast
Distributed storage
Consensus
(intentionally left blank)
Distributed cryptography and proactive recovery
System examples, use in cloud platforms for storage and services

Schedule

Exercises are usually due one week after assignment.

Date Lecture notes Assigned exercise

2. March (1) Introduction and (2) Dependability
Exercise 1

9. March Primitives (communication, failure detectors, cryptography)
[CGR11] 1; 2.1, 2.2, 2.4.1, 2.4.4, 2.5, 2.6.1, 2.6.2
Exercise 2

16. March (3) Reliable broadcast, FIFO broadcast, causal broadcast
[CGR11] 3.1, 3.2, 3.3 (excl. 3.3.2), 3.4 (excl. 3.4.3), 3.9 (3.9.1-3.9.3, Alg. 3.24)
Exercise 3

23. March (3ff.) Byzantine broadcasts
[CGR11] 3.10 (excl. 3.10.4), 3.11
Exercise 4

30. March (4) Shared memory (safe, regular, atomic registers)
[CGR11] 4.1, 4.2, 4.3.1, 4.3.3
Exercise 5

20. April (4ff.) Byzantine shared memory
[CGR11] 4.6, 4.7.1, 4.7.2
Exercise 6

27. April (7) Distributed cryptography (slides) and intercloud storage (slides) [BCE+12]
Guest lecture by Alessandro Sorniotti
Exercise 7

4. May (7ff.) Distributed cryptography (cont.), proactive security
[Handout] 7.1, 7.2, 7.3, 7.6
Exercise 8

11. May (5) Consensus and atomic broadcast
[CGR11] 5.1.1, 5.2.1, 5.2.2, 6.1
Exercise 9

18. May (5ff.) Leader election and fail-noisy uniform consensus (Paxos algorithm)
[CGR11] 2.6.4, 2.6.5, 5.3
Exercise 10

25. May (5ff.) Randomized and Byzantine consensus
[CGR11] 5.5.1-5.5.3, 5.6.1; [Handout] 7.5
Exercise 11

1. June (8) Practical systems: ZooKeeper [HKJR10,JRS11], Windows Azure Storage [CWO11], Sintra/DNS [CP02,CS04]

Literature

Main reference

[CGR11] Christian Cachin, Rachid Guerraoui, and Luís Rodrigues. Introduction to Reliable and Secure Distributed Programming (Second Edition). Springer, 2011.

Online at springerlink.com

Link to Amazon.de
The notes at the end of every chapter provide background literature. Chapter 7 points to related and more advanced literature.

Further references

Background on distributed systems

[AW04] Hagit Attiya and Jennifer Welch. Distributed Computing: Fundamentals, Simulations and Advanced Topics. Wiley, second edition, 2004.

[TVS07] Andrew Tanenbaum and Maarten Van Steen. Distributed Systems: Principles and Paradigms. Pearson Prentice Hall, 2nd edition, 2007.

[CDK05] George Coulouris, Jean Dollimore, and Tim Kindberg. Distributed Systems: Concepts and Design. Addison-Wesley, 4th edition, 2005.

[CPS10] Bernadette Charron-Bost, Fernando Pedone, and André Schiper, editors. Replication: Theory and Practice, volume 5959 of Lecture Notes in Computer Science. Springer, 2010.
Cryptography references
- [PP10] Christof Paar and Jan Pelzl. Understanding Cryptography: A Textbook for Students and Practitioners. Springer, 2009.
- [S10] Nigel Smart. Cryptography, An Introduction (Third Edition). Available online, 2010.
Systems
- [BCE+12] Cristina Basescu, Christian Cachin, Ittay Eyal, Robert Haas, Alessandro Sorniotti, Marko Vukolic, and Ido Zachevsky. Robust data sharing with key-value stores. In Proc. Intl. Conference on Dependable Systems and Networks (DSN), June 2012.
- [B12] Eric Brewer. CAP twelve years later: How the "rules" have changed. IEEE Computer, pages 23-29, February 2012. (PDF)
- [CWO11] Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, et al. Windows Azure Storage: A highly available cloud storage service with strong consistency. In Proc. 23rd ACM Symposium on Operating Systems Principles (SOSP), 2011.
- [CP02] Christian Cachin and Jonathan A. Poritz. Secure intrusion-tolerant replication on the Internet. In Proc. International Conference on Dependable Systems and Networks (DSN-2002), pages 167-176, June 2002.
- [CS04] Christian Cachin and Asad Samar. Secure distributed DNS. In Proc. Intl. Conference on Dependable Systems and Networks (DSN), pages 423-432, 2004.
- [HKJR10] Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. ZooKeeper: Wait-free coordination for internet-scale systems. In Proc. USENIX Annual Technical Conference, 2010.
- [JRS11] Flavio Junqueira, Benjamin Reed, and Marco Serafini. Zab: High-performance broadcast for primary-backup systems. In Proc. 41st International Conference on Dependable Systems and Networks (DSN), 2011.

Assessment

Exercises and Exam. The exercises are an integral part of the course. We encourage you to attend the exercise classes, to participate actively and to return your solutions. The main reference textbook [CGR11] contains also many exercises with solutions.

There will be an oral exam, held during the ETHZ exam session. The exam will cover the material presented in class and also some material presented in the exercises.

Last updated , by Christian Cachin.

Lecture:	Friday, 13:15-15:00, ML F 34, starting 2. March 2012.
Exercise:	Friday, 15:15-16:00, ML F 34, starting 2. March 2012.

Date	Lecture notes	Assigned exercise

2. March	(1) Introduction and (2) Dependability	Exercise 1
9. March	Primitives (communication, failure detectors, cryptography) [CGR11] 1; 2.1, 2.2, 2.4.1, 2.4.4, 2.5, 2.6.1, 2.6.2	Exercise 2
16. March	(3) Reliable broadcast, FIFO broadcast, causal broadcast [CGR11] 3.1, 3.2, 3.3 (excl. 3.3.2), 3.4 (excl. 3.4.3), 3.9 (3.9.1-3.9.3, Alg. 3.24)	Exercise 3
23. March	(3ff.) Byzantine broadcasts [CGR11] 3.10 (excl. 3.10.4), 3.11	Exercise 4
30. March	(4) Shared memory (safe, regular, atomic registers) [CGR11] 4.1, 4.2, 4.3.1, 4.3.3	Exercise 5
20. April	(4ff.) Byzantine shared memory [CGR11] 4.6, 4.7.1, 4.7.2	Exercise 6
27. April	(7) Distributed cryptography (slides) and intercloud storage (slides) [BCE+12] Guest lecture by Alessandro Sorniotti	Exercise 7
4. May	(7ff.) Distributed cryptography (cont.), proactive security [Handout] 7.1, 7.2, 7.3, 7.6	Exercise 8
11. May	(5) Consensus and atomic broadcast [CGR11] 5.1.1, 5.2.1, 5.2.2, 6.1	Exercise 9
18. May	(5ff.) Leader election and fail-noisy uniform consensus (Paxos algorithm) [CGR11] 2.6.4, 2.6.5, 5.3	Exercise 10
25. May	(5ff.) Randomized and Byzantine consensus [CGR11] 5.5.1-5.5.3, 5.6.1; [Handout] 7.5	Exercise 11
1. June	(8) Practical systems: ZooKeeper [HKJR10,JRS11], Windows Azure Storage [CWO11], Sintra/DNS [CP02,CS04]