Saturday, January 24, 2009

Living safely [Part 2]

Starting from where left off last time, this blog entry tries to understand liveness and safety in computer science and open room for discussion. As I mentioned in the previous blog, every (I emphasize the word "every") problem specification is an intersection of liveness and safety properties.

What exactly is a liveness property or a safety property? Lamport introduced the general notions in his "Proving the Correctness of Multiprocess Programs" (1977) paper. We can model the solution for a problem specification as a sequence of atomic actions. We use the term "execution" to refer to a sequence of actions. An execution could be a set of transactions executed on a database system, or a program you have written using a language of your choice, or a set of file operations (read, write, delete, open, close, etc.) performed on objects (files, folders) in a file system, etc. A safety property stipulates that some bad thing does not happen during execution. A liveness property stipulates that a good thing happens (eventually) during execution. [85, Alpern and Schneider]

Safety - in general, what ought not happen
Liveness - in general, what out to happen

If an execution is not a safety property, at some point its execution some bad thing must happen and that bad thing must be irremediable as a safety property never give way for a bad thing to happen. Also note that it does not require that something happen sometime and hence any execution is safe so long as we don't execute that execution. Therefore, we need something else to complement safety property. That something else is liveness property. (more on it later)

Some classical examples:
1. mutual exclusion (bad thing - two processes executing in a critical section at the same time)
2. deadlock freedom (bad thing - deadlock)
3. E1 not happening before E2 where E1 and E2 are two events (bad thing - E1 happening before E2)

Most of the time, when QAing what we test for is safety bugs. The unit test cases we have written assert something is not false. (side question: What about liveness bugs?) In other words, a safety property for a sequential program stipulates that it will never produce a wrong result. We some time call this "partial correctness". Why? As mentioned above, safety property alone is not sufficient to validate a program. (Safety properties do not stipulate that programs are loop free and terminate)

A key point about safety properties is that the violation of a safety property has a finite witness. (In terms of execution sequences, there is a prefix of the execution where we can spot the safety violation irrespective of whether the sequence is finite or infinite)

Now you should be convinced that safety property alone is not sufficient. That's where liveness properties come to our rescue.

As the definition above, liveness properties specify desirable system bahviours which must be satisfied eventually, but not always satisfied. Notice the terms "eventually" and "not always". We may not be able to observe the liveness property with a partial execution, but it remains possible for the required good thing to occur in the future. Theoritically speaking we need to monitor an infinite execution to tell if an execution violates a liveness property. It is a hard thing to do. The fundamental issue is that "eventually" does not specify any time limit. Therefore liveness is always an approximation to the property that we would like to see in the system in question.

Some classical examples:
1. Starvation freedom (good thing - making progress - each process gets to enter the critical section eventually)
2. Reliable unicast (good thing - guaranteed service, every message sent is eventually received; this actually has safety properties associated with it as well)

In practice, we need approaches where we detect liveness violations in finite executions. Using hueristics is one possibility; we may not be able to detect all the violations, but if we can make sure there no false positives at least it could be an agreeable solution. I am sure there should be some research work in this area, but I haven't looked at the research literature yet.

In addition to the questions put forward in my previous note, there are many other interesting questions as well. If something is not a liveness property is it always a safety property (or vise versa)? What is the connection between temporal logic and liveness/safety? Safety and liveness under benign failures?..Feel free to comment on the subject.

1 comment:

Blogger said...

Did you know that you can generate dollars by locking selected sections of your blog or site?
All you need to do is join AdscendMedia and implement their Content Locking plugin.