The SETI@Home Problem
by David MolnarIntroduction
In 1999, a group of astrophysicists at Berkeley discovered first-hand the need for security in distributed computations. As part of the massive Search For Extraterrestrial Intelligence (SETI), the researchers had collected gigabytes of raw data from radio telescopes around the world. Unable to process this data themselves, they came up with the idea of distributing the computation. A machine at Berkeley acts as a server and sends data to client computers, which check the data for signals and then send it back, in a project called SETI@Home [13]. As the researchers soon discovered, the SETI@Home project brings with it the SETI@Home Problem: how do you ensure that the client machines are doing the "right thing?"
The SETI@Home problem is an interesting example of the distributed computing problems lurking just beyond the horizon. Today, astrophysics researchers and cryptographic crackers distribute their computations. Tomorrow, "selling spare cycles" may be a source of income for home computer users and a vital resource for companies interested in large-scale computation. Before this revolution in computing resources can occur, several problems (foremost the SETI@Home problem) must be solved. It is also important to create mechanisms to pay clients and also to preserve data secrecy and integrity.
The SETI@Home Problem
The SETI@Home project succeeded beyond the wildest dreams of its creators. At present, over one million machines run the SETI@Home client software [10]. Each client machine downloads a work unit (WU) consisting of raw radio telescope data from Berkeley headquarters, runs signal processing algorithms on the work unit, then reports the results. With SETI@Home, not only can the researchers process new data as fast as it comes in, but there is not enough data to feed all the machines! With this success came fame; with fame came "olli."
Enter Olli
"Olli" is the pseudonym of a poster to the sci.astro.seti newsgroup who believed the SETI@Home client software too slow and in need of improvement. Olli released a patch to the original Berkeley client implementing these "improvements." Whereas Olli claims that the patch makes the client faster, SETI@Home researchers claim that the patch causes the client to return answers which are different in unpredictable ways from the original client, thus ruining the results of the experiment. Compounding the problem, patched clients and unpatched clients report themselves in the same way to the SETI@Home server. There is no obvious way to tell one from the other.
SETI@Home Meets Microsoft
Olli was not the only one who tried to "improve" the SETI@Home client. The sci.astro.seti FAQ [6] tells the story of another illicit patch. Microsoft wrote their own version of SETI, highly optimized for certain Windows hardware. They wanted to turn in the fastest WU times, to prove how fast Windows is. The SETI people discovered MS's cheating, and told them they must run the original SETI software, and threatened to dissolve the MS team, and said they would refuse results from any WU processed on a non-official SETI client. SETI had obvious concerns, that their algorithms might be programmed incorrectly.
The incidents involving Olli and Microsoft created the SETI@Home Problem :
- How can the researchers tell the difference between a work unit processed by one of their clients, and one processed by a patched client?
- How does the server know that the clients are playing by the rules?"
While there are efforts to make it socially unacceptable to run a patched client, such as the "Patch-Free Processing" web page [9], it is uncertain how well these efforts work. It would be preferable to develop a way to exclude patched clients and their tainted work units from the SETI@Home computation.
Verifying Distributed Computation
The SETI@Home problem can be thought of as a special case of the distributed computation verification problem: "given a large amount of computation divided among many computers, how can malicious participating computers be prevented from doing damage?" This is not a new problem. Distributed computation is a venerable research topic, and the idea of "selling spare CPU cycles" has been a science fiction fixture for years.
In real life, distributed computation has been used since at least the late 1980's to create "farms" of machines for rendering 3-D images. Farms allow graphic artists to create large images without needing to buy a supercomputer. More recently, the needs of scientific computation have led to the creation of frameworks such as Parallel Virtual Machine (PVM) [8] and Beowulf [2], which make it easier to distribute computations across many machines. The machines involved are usually owned by the same entity and a machine is either "good" or "bad" if it is operating or malfunctioning. There are no blatantly malicious machines.
The Internet makes it possible for computation to be distributed to many more machines. However, distributing computing around the internet requires developers to consider the possibility of malicious clients. The first large-scale distributed computation to face this problem was directed at brute-force cracking of cryptographic algorithms. In 1995, the Cypherpunks list organized several cracking "challenges" aimed at brute-force search of the RC-4 cipher with a key of 40 bits (then the only exportable key length under U.S. law). During the challenges, people began to raise questions such as "what if a machine says it will check a particular key, but doesn't?" and "what happens if the machine dividing up the work is temporarily disabled?" Today, the cracking effort 'distributed.net' faces the same problems.
More recently, the startups entropia.com and centrata.com aim to sell spare cycles. Details are currently lacking on Centrata's website, but the company recently placed second in the MIT 50K Entrepreneurship Competition with a plan to harness the spare storage and disk space of large numbers of computers over the Internet [3].
Methods for Protecting SETI@Home
The Current Strategy
Currently, the SETI@Home server looks for "suspicious" responses to work units and then re-runs those work units on trusted SETI@Home machines [6]. This approach works, but it depends heavily on recognizing "suspicious" responses. In addition, if many blocks suddenly look suspicious, this can cause a bottleneck with the trusted SETI@Home machines. It is worth exploring other strategies.
Remove Incentives to Cheat
The SETI@Home project compiles and publishes computation "rankings," showing which machines have done the most work. The computation competition is aimed at encouraging clients to improve their rank by devoting more machines to the problem. Unfortunately, users have instead resorted to patched clients, like Olli's, to increase their standing.
In some sense, the SETI@Home project is "lucky," because people who run patched clients for this reason (for instance, Microsoft), are not actually trying to disrupt the SETI@Home computation. Instead, a faster, patched client is a way to rise in the rankings. Remove the rankings, and the incentive to cheat goes away.
This approach has the advantage of being simple, easy to understand, and easy to implement. It has at least three drawbacks.
- It doesn't detect or stop clients which have already installed a patch.
- If a client has other motives for running a patch, this approach will not work.
- Removing the rankings also removes the incentive that some clients have to participate in the SETI@Home project.
A variant of this approach is to retain rankings, but to penalize patched clients by immediately removing them from the rankings. This solution does not directly address clients who do not care about rankings. However, it offers a way to penalize malicious clients who do care about rankings while retaining the friendly competition involved with SETI@Home. Unfortunately, patched clients must first be identified. The methods below describe ways to detect and deal with patched clients.
Using Clients to Check Other Clients
The SETI@Home project is lucky in another way. The project has many more clients than it has work units. With over one million clients participating in the project, there are enough clients to check each work unit several times. In fact, any "suspicious" looking work units are already double-checked at the SETI@Home server [6].
The large number of clients means that the same work unit can be assigned to many clients to form a voting bloc. A majority vote would be taken at the end of computation by checking the clients' answers against each other. Clients in the minority of a bloc are patched, since the results of the majority vote will correctly tell what the result of computing on the work unit should be.
Or will it? In order for this to be true, the following has to hold.
- Patched clients must be in the minority of each bloc. This depends on how voting blocs are constructed. The most straightforward way is to randomly assign clients to voting blocs; then the success depends on how many of the clients picked by this random assignment are patched. Another method makes use of persistent client identities; if the server can identify clients which it knows are running the real client, it can place them in a voting bloc with unknown or untrusted servers as a means to check the results of those servers. These can be clients which are under the server's direct control, or clients which have consistently reported good results that have cross-checked with machines under the server's direct control.
- The responses from patched clients and the responses from the real clients must be different when both are run on the same data. For instance, if the only response sent back by a SETI@Home client were the message "Yes, aliens found!" or "Nope, no aliens here!" then the responses of the two types of clients will be indistinguishable almost all the time. On the other hand, if the response consists of a complete transcript of the entire computation performed by the client, down to the instructions executed and the contents of registers at each step, then any disagreement can be quickly noticed just by comparing transcripts line by line. For a sufficiently detailed transcript, a complete agreement in transcripts means that both clients executed the same algorithm.
This approach works well for SETI@Home for at least two reasons. First, it requires lots of extra clients, which are present in SETI@Home. Second, patched clients in SETI@Home are not known to coordinate with each other, and there is no evil mastermind choosing which clients are patched. If there were an adversary capable of watching the server and then afterwards corrupting the clients the server placed into the same voting bloc, then this approach would fail miserably.
Setting Traps and Testing Clients
Suppose that the precise differences between a patched client and a real client on a particular work unit can be identified and pinned down. In the case of Olli's patch, the source code for the patch is not available, but patched clients are; it is possible to step through the two clients side by side and see where they first differ. For instance, maybe an intermediate step T of a calculation in the real client assigns a variable "x" the value "4.243", but the patched client assigns "x" the value "4.200."
Now there is a method for determining whether a client is patched or not. Lay a trap! Simply give it the test work unit and ask it to report the value of "x" at step T. If the client reports "4.200", then it's running a patched client.
The upside to this approach is that when it works, it gives proof that a client has been patched. The downside is that if a patched client suspects it is being tested, it can run the real client, figure out what "x" should be, and then report "4.243" while running the patched client on all other instances. A more serious drawback is that this approach requires the SETI@Home server to know what kind of patch the client might run, or have a transcript that establishes positively that the correct algorithm is being used.
Committed Transcripts and Random Sampling
In a sci.crypt thread in October 1999, David Wagner, Paul Crowley, and others suggested an approach which avoids some of the problems with clients which know they are being tested [11]. Instead of picking clients first, and then sending test data, ask all clients to send a collision-resistant hashed transcript of computation with every response.
The phrase "collision-resistant hashed transcript of computation" is something of a mouthful, but it becomes straightforward when examined piece by piece. A transcript of computation is something already referred to in the approach of using clients to test other clients; it is an output of such things as intermediate steps, internal state, register contents, and so on, sufficient to establish that the client was running the correct algorithm and not a patch.
Because such transcripts are likely to be quite long, clients send only a collision-resistant hash function of the transcript. A hash function is a function which takes an arbitrary length string and returns a string of some fixed, usually small length such as 128 or 160 bits. A collision-resistant hash function has the extra property that it's hard to find two strings which hash to the same output. This property is necessary because otherwise a client might be able to find two transcripts which had the same result after being run through the hash function; the idea is that the hash of a transcript is a succint replacement for the transcript itself.
Now the server picks a random set of hashed transcripts to verify after the clients have computed the transcripts and sent in the hashes. That is, it picks a work unit, computes what the work unit response "should" look like by itself, and then challenges the client to come up with a transcript which matches the hash it submitted in the first place. Because the server picks which transcripts to audit after the computation is done, the client has no way of knowing whether it is going to be tested or not!
Now, in order to avoid detection in the long run, a patched client must generate a "good" transcript for each and every response and submit the hash of this good transcript with its response. If generating this transcript takes at least as much time as running the real SETI@Home client, then a patched client has no efficiency advantage over a real client.
Quickly Checkable Proofs of Correctness
The randomized nature of the previous approach leaves some room for error; even though this error can be precisely quantified, there may be situations in which it is not desired. In addition, the approaches described to this point do not do much to address an adversary which is simply trying to disrupt the SETI@Home computation and doesn't care about rankings. Another approach might be to have clients submit proofs of correctness with each response. Then the server can act as a proof verifier and check each and every single proof before deciding whether to keep a response or not.
The transcripts of computation used in the previous approach could be considered proofs of correctness for a client's computation. This leads to the question "are there ways to build transcripts which are short and quickly checkable?" After all, if it takes as long to check the computation as it did to perform the computation, then this approach gains nothing.
While no such quickly-checkable proofs are known for the specific case of the SETI@Home client, their study is closely related to areas of research called program checking, self-testing and self-correcting. These areas study programs which are "buggy"; they fail on some fraction of inputs (the same way a patched client returns "failed" results for the "program" consisting of all SETI@Home clients). The idea is to exploit structure present in the problems to allow a buggy program to notice when it has made a mistake, or even correct its own mistakes. The field was initiated by Manuel Blum [3], and an overview can be found at Hal Wasserman's web page [12].
Also related to these notions is the notion of a probabilistically checkable proof, which is a proof in which a verifier need only look at a very few, randomly chosen locations in the proof to tell if the proof is correct or not [1]. Probabilistically checkable proofs are encouraging in that they can be constructed for every problem which might ever want to be distributed [1], but they are discouraging in that they are usually impractical to implement. For one thing, the length of such proofs is enormous, even though a verifier need only look at a few bits. Silvio Micali introduced yet another notion, that of a computationally sound or CS-Proof, which is easier to verify given some assumptions on the construction of the proof [7]. Research in these areas is ongoing. An introduction to the field of secure multiparty computation can be found in Canetti's thesis [5].
What Strategies Would Work For SETI@Home?
The "clients checking clients" approach can be used in conjunction with these trusted machines in order to build "levels of trust" in various clients, which can then be used to affect which clients are assigned to voting blocs. For instance, a client can be chosen and its output compared to a completely trusted SETI@Home machine. If it passes often, then it is probably running the real version of the client. Then it can be included in a voting bloc with a mix of other partially trusted clients and unknown clients. This all happens without needing to modify any client software and in effect stretches the verification resources of the completely trusted SETI@Home machines.
Random sampling of hashed transcripts and laying traps would certainly work for the SETI@Home project. The problem is that a transcript format has not yet been set. In addition, both client and server software would have to be rewritten to use this approach, which means that client software must be redistributed. While the SETI@Home project does have a procedure for phasing out old clients, this is a slow process in order to allow clients time to upgrade to the newest versions.
The quickly-checkable proof approach does not seem to be currently workable for SETI@Home because there is no obvious way to construct a quickly-checkable proof of correctness, and no one has yet looked for un-obvious ways of doing so. This could merit further work, or the other approaches might be enough. The crucial point is discovering what the malicious clients want and how many of them exist. In SETI@Home, it seems likely that there are few such clients and most of them want better rankings. Other distributed computations may not be so lucky.
Conclusion
The general study of secure multiparty computation has produced much interesting work over the last two decades [5]. Less well studied, unfortunately, are the tools and techniques required to move the theoretical results to the real world. The old dream of massively distributed computations is finally coming true, and yet our tools for building and analysing real systems still seem primitive. The challenge of the next few years will be to bridge this gap.
Acknowledgements
Guy Macon raised the topic in a sci.crypt thread. David Wagner gave permission to cover the commited transcript method. This article began as a discussion at the MIT Applied Security Reading Group; the author is grateful to Kevin Fu for the opportunity to speak, and to the attendees, including David Andersen, Trevor Schroeder, and William Ricker for their comments. David Alpert, Joev Dubach, Andy Oram, and William Josephson provided comments and moral support. Finally, the author is indebted to the anonymous reviewers and the Crossroads editors who made the column possible.
References
- 1
- Arora, S. and Safra, M. "Probabilistic Checking of Proofs: A New Characterization of NP" in Journal of the ACM , 45(1):70--122, 1998 online at ftp://ftp.cs.princeton.edu/pub/people/arora/as.ps
- 2
- The Beowulf Project. http://cesdis.gsfc.nasa.gov/linux-web/beowulf.html
- 3
- Blum, Manuel and Wasserman, Hal Software Reliability via Run-Time Result-Checking Journal of the ACM vol. 44, no. 6 pp.826-849 November 1997 http://http.cs.berkeley.edu/~halw/fourier.ps
- 4
- Centrata, Inc. http://www.centrata.com/
- 5
- Canetti, Ran Studies in Secure Multiparty Computation and Applications PhD Thesis, MIT ftp://theory.lcs.mit.edu/pub/people/canetti/thesis.ps.Z
- 6
- Knutar, Jan The SETI@Home FAQ. http://home.hccnet.nl/a.alfred/FAQ.html
- 7
- Micali, Silvio CS-proofs (extended abstract) in 35th Symposium on Foundations of Computer Science, pp. 436-453, Santa Fe, New Mexico, 20-22 November 1994. IEEE.
- 8
- Parallel Virtual Machine Project. http://www.epm.ornl.gov/pvm/pvm_home.html
- 9
- The Patch-Free Processing Page. http://home.hccnet.nl/a.alfred/p-free-p1pfp.html
- 10
- SETI@Home Statistics. http://setiathome.ssl.berkeley.edu/totals.html
- 11
- Wagner, D. "Re: Can the SETI@Home Client be Protected" 11/09/1999 post to sci.crypt
- 12
- Wasserman, Hal "Result-Checking" Web Page. http://http.cs.berkeley.edu/~halw/
- 13
- W.T. Sullivan, III , D. Werthimer, S. Bowyer, J. Cobb, D. Gedye, and D. Anderson in "Astronomical and Biochemical Origins and the Search for Life in the Universe, Proc. of the Fifth Intl. Conf. on Bioastronomy IAU Colloq. No. 161 eds. C.B. Cosmovici, S. Bowyer, and D. Werthemer. Editrice Compositori, Bologna, Italy http://setiathome.ssl.berkeley.edu woody_paper.html
Biography
David Molnar (dmolnar@hcs.harvard.edu) began using PGP in 1993. He became interested (obsessed?) with figuring out "why it worked" and has been studying cryptography ever since. Now an undergraduate at Harvard University, he keeps up with security issues by attending courses, reading newsgroups, mailing lists, and conference papers, and attending DEF CON in his home city of Las Vegas. David is an ACM Student Member and a member of the International Association for Cryptologic Research.