ACMCrossroads / Xrds12-1 / Using Software Watermarking to Discourage Piracy

Article Glyph

Using Software Watermarking to Discourage Piracy

by Ginger Myles

Introduction

Software piracy and copyright infringement are rapidly growing.Historically, the spread of pirated software required the transfer of aphysical copy (i.e. a disk), limiting the rate of illegal softwaredistribution. However, recent increases in network transfer rates andease of access have eliminated the need for physical media based piracy.To compound the problem, software is being legally distributed inplatform independent formats, such as Java bytecode and Microsoft'sIntermediate Language (MSIL). These formats closely resemble sourcecode, which can easily be reverse engineered and manipulated. Thus it ismuch easier for software pirates to bypass license checks. In addition,unscrupulous programmers can steal algorithmic secrets, which decreasestheir own production time and allows them to gain an edge on thecompetition.

There are legal ramifications associated with software piracy, suchas statutory damages of up to $150,000 for each program copied [1 ]. However, these fines are often targeted at anunsuspecting end user and not at the person responsible for the piracy.When a person unknowingly purchases and uses an illegal piece ofsoftware it is often difficult to trace this software back to the guiltyparty. In addition, it is also hard to detect and prove that a dishonestprogrammer has taken advantage of a trade secret. The focus of thisarticle is on software watermarking, a software-based techniquedeveloped to aid in piracy prevention and identification of the guiltyparty.

Piracy Prevention Techniques

Organizations such as the Business Software Alliance (BSA) [1] perform audits to verify that corporations are not using illegalsoftware. Unfortunately, auditing does not identify an unknown softwarepirate or unethical programmer. In an attempt to curb software piracy, avariety of hardware and software techniques have been proposed. Thehardware based approaches typically provide a higher level ofprotection; however, they are more cumbersome for the user and moreexpensive for the software vendor. Two such examples are tamper-proofhardware and dongles. Software-based solutions, such as codeobfuscation, software tamper-proofing, and software watermarking,are cheaper but provide a lower level of protection.

Tamper-proof hardware aids in piracy prevention by providing a securecontext and/or secure data storage. By executing the software in asecure environment the pirate is unable to gain access to the software.This technique prevents the attacker from observing the behavior of thesoftware. The obvious drawback to this technique is the additional costof requiring all users to have tamper-proof hardware. The secondhardware based technique is a dongle which is a device distributed withthe software. Possession of the the device proves ownership of thesoftware. A dongle typically connects to an I/O port and computes theoutput of a secret function. Periodically the software queries thedongle. If the result of the query is the wrong output, the softwarereacts appropriately. There are two drawbacks to the use of dongles: (1)cost (a single dongle can cost at least $10) and (2) distribution of adongle with software over the Internet is impractical.

Code obfuscation, a software-based solution, is a technique whichaids in the prevention of reverse engineering through transformationsthat make the application more difficult to understand while preservingthe original functionality. The idea is to obscure the readability andunderstandability of the program to such a degree that it is more costlyfor the attacker to reverse engineer the program than to simply recreateit. This particular technique was the focus of the Crossroads article"Protecting Java Code via Code Obfuscation" by Douglas Low [6]. The second software-based technique is softwaretamper-proofing. In this technique methods are employed to prevent thealteration of the program. For example, many programs contain licensechecks that prevent the user from using the software after a specificdate. To prevent an attacker from removing the license check,tamper-proofing techniques are used that prevent the alteration. If anattacker does remove the license check then the tamper-proofingtechnique causes the software to fail. A third software-based techniqueis software watermarking, which we will discuss in detail.

Software Watermarking

Software watermarking is a technique used to protect software frompiracy. Unfortunately, watermarking alone does not prevent piracy.Instead it is used to discourage a user from illegally redistributingcopies of the software. The general idea of software watermarking isvery similar to media watermarking in which a unique identifier isembedded in images, audio, or videos through the introduction of errorswhich are undetectable by the human auditory system. Because thefunctionality of software is crucial to its success, the watermark mustbe embedded through techniques other than the introduction of errors.

Software watermarking embeds a unique identifierw (the "watermark") into a program P. Ifw uniquely establishes the author of P thenw is considered a copyright notice. On the other hand, ifw uniquely identifies the legal purchaser of Pthen w is a fingerprint. One of the most important aspectsof a watermarking system is the use of a secret key. Through the use ofthe key the watermark is incorporated into the program producing a newprogram.

Currently, watermarking algorithms are described as either beingstatic or dynamic. Static watermarking algorithms only make useof the features of an application that are available at compile-time,such as the instruction sequence or the constant pool table in a Javaapplication, to embed a watermark. On the other hand, a dynamicwatermarking algorithm relies on information gathered during theexecution of the application to both embed and recognize the watermark.There are three different dynamic techniques: easter egg watermarks,data structure watermarks, and execution trace watermarks [3]. Some of the current research is focused on thestudy of static versus dynamic algorithms in order to determine if oneis better than the other. As of now all that can be said is that thealgorithms make use of different information to embed the watermark.

An easter egg watermark is a piece of code that can only be executedby a highly unusual input sequence. For example, in Adobe Acrobat Reader4.0 select Help -> About Plug-ins -> Acrobat Forms and then holdControl+Alt+Shift while clicking on the credits button. This specialinput sequence will reveal the easter egg in Figure1 that consists of three features: a dog bark, a credit button face thatchanges to say "woof", and an Adobe emblem that becomes a dog paw[12]. The drawback to using easter eggs to embed awatermark is that once they have been discovered, simple debuggingtechniques make it easy to locate and remove the watermark.

Figure 1: Easter Egg Watermark found in Adobe Acrobat Reader 4.0Figure 1: Easter Egg Watermark found in Adobe Acrobat Reader 4.0

Figure 1: Easter Egg Watermark found in Adobe Acrobat Reader 4.0

A data structure watermark embeds the watermark in the state of aprogram (such as the global, heap, and stack data) as the program isexecuted with a particular input. This technique is far more stealthythan an easter egg watermark since no output is produced. The thirdtechnique, execution trace watermarking, embeds the watermark within thetrace of the application as it is executed with a particular input. Thistechnique differs from the data structure watermark in that thewatermark is embedded in the application's instructions or addressesinstead of the application's state.

Examples

Embedding of a copyright notice or a fingerprint serve differentpurposes. The original creator of the software uses a copyright noticeto prove ownership. This unique identifier is extremely helpful inproving that a program or a piece of a program is stolen. For example,suppose a programmer from Company B steals a secret module from CompanyA to decrease his own production time. If Company A can demonstrate thatCompany B's software contains their watermark then company A can provethat Company B is illegally profiting.Unfortunately, this type of watermark only proves that Company B wasusing the secret and not that they were the ones that stole it.

Tracking the source of illegal distribution requires a fingerprint,rather than a copyright notice, to link the particular copy to theoriginal purchaser. To illustrate, suppose Alice sells Bob a copy of hersoftware. Before she gives Bob the copy she embeds his credit cardnumber. When Alice obtains a copy of her software, which she believes ispirated, she uses the recognize function with her secret key to extractthe watermark. Since the watermark is Bob's credit card number, whichuniquely identifies him, she can prove that Bob is the guilty pirate. Figure 2 illustrates this scenario.

Figure 2: Alice embeds Bob's credit card number in orderto identify Bob as the guilty pirate.
Figure 2: Alice embeds Bob's credit card number in order to identify Bobas the guilty pirate.

Companies create and distribute beta versions of their software. Thedistribution is targeted at a select handful of users and designed tosolicit feedback. Generally, the release of the beta version to a useris contingent on the software remaining confidential. Although thisagreement is made, a leak frequently occurs. The embedding of a uniquefingerprint tied to the user permits the company to take legal actionagainst the identified party.

Creators embed unique identifiers to assert ownership and/or tracethe pirate after the fact. Software watermarking does not preventpiracy, but discourages the user from illegally copying the software byincreasing the possibility that the pirate will get caught. One simplewatermarking technique that illustrates the general idea is to embed thewatermark as a field in the application. Suppose the string "wildcat" isembedded in class C below using this technique.

Before:
class C{
int a = 1;
void m1(){...}
void m2(){...}
}

After:
class C{
int a = 1;
int wildcat$ = 5;
void m1(){...}
void m2(){...}
}

In this particular example a special character, '$', was also addedto aid in the recognition of the watermark. While this technique issimple to implement it is not effective in preventing piracy. Since manyapplications are distributed in formats that are easy to reverseengineer, an attacker could simply examine the source code, identifythis as the watermark, and remove it. A second attack on thiswatermarking technique would be to use a simple code obfuscation thatrenames all of the identifiers in the application.

A variety of different software watermarking algorithms has beenproposed in the past. One of the simplest techniques was proposed byMonden et al. [7]. This particular techniqueembeds the watermark in a dummy method that is added to the application.The embedding is accomplished through a specially constructed sequenceof instructions. Because the inserted method is never executed there isflexibility in how the instructions are constructed which permits theembedding of any watermark. Stern et al. [11]also consider instruction sequences for embedding the watermark. Theirtechnique modifies the frequency of the instructions through out theapplication to represent the watermark.

Instead of embedding the watermark at the instruction level, awatermark can be embedded by manipulating the control flow graph (CFG)of a method. Davidson and Myhrvold [4] proposed atechnique which does just that. By rearranging the basic blocks of theCFG and then redirecting the control flow to maintain the correctfunctionality the watermark is embedded. Venkatesan et al. [13] also use a CFG to embed a watermark. In thisalgorithm a subgraph is inserted which represents the watermark.

Another aspect of a method which conveys static information is theinterference graph. An interference graph represents which variables ina method are live simultaneously. If two variables are live at the sametime then an edge is drawn between them. This graph is used tofacilitate register allocation for the method because two variables thatare live simultaneously should not be assigned to the same register. Quand Potkonjak [9] make use of the interference graphand the graph coloring problem to embed a watermark in the registerallocation of an application. Graph coloring is an NP-complete problemfor which many heuristic algorithms have been developed that work wellfor register allocation. The problem is defined as follows: Given agraph G(V,E) color the graph with as few colors as possible such that notwo adjacent vertices are colored with the same color. To embed awatermark using the QP algorithm edges are added between chosen verticesin the graph based on the value of the message. The vertices are nowconnected and when the graph is colored the vertices will be highlightedwith different colors yielding a new register allocation. So thealgorithm embeds the watermark by adding fake interferences betweenvariables forcing them to be assigned to different registers.

All of the previously described watermarking techniques areclassified as static watermarking algorithms. The first dynamicwatermark algorithm, CT, was proposed by Collberg et al. [3]. This technique builds a graph structure at runtimewhich is used to embed the watermark.

Of these proposed algorithms very little has been published on theirimplementation and evaluation. There are a few existing implementationsof the CT algorithm, such as the one within the SandMark framework andthat by Palsberg et al. [8]. A recentdissertation by Hachez on software protection tools [6] provides an implementation of the Stern algorithm.Other than these publications little work has been done in this area.

SandMark

SandMark [2, 10] is a research tool currently under developmentat the University of Arizona. The tool is designed for the study ofsoftware protection techniques such as code obfuscation, softwarewatermarking, and tamper-proofing of Java bytecode. One of the goals ofthe project is to implement and evaluate all known software watermarkingalgorithms.

To aid in the evaluation of the watermarking algorithms a variety oftools have been incorporated into the system:

  • An obfuscation tool which permits the study of algorithm resiliencyby applying semantics preserving code transformations.
  • A watermarking tool to study additive attacks in which an adversaryadds a new watermark in an attempt to cast doubt on the original.
  • A bytecode diff tool used to launch a collusive attack in which theadversary compares two differently watermarked programs.
  • Astatic statistics tool to examine how well the watermark blends in withthe code around it.

Through the use of the SandMark tool users are able to develop andtest new watermarking algorithms. In addition, the tool is available forusers to watermark any Java application which they have created and wishto distribute.

Conclusion

Platform independence and the use of the Internet are both valuableaspects of the computer industry. Unfortunately, they have made iteasier for software pirates to illegally distribute software and forunscrupulous programmers to steal algorithmic secrets. Both of theseissues are of ethical concern and have been the focus or recentresearch. Through this research a variety of prevention techniques havebeen developed using both hardware and software. Unfortunately, nosingle solution is currently strong enough to prevent piracy. However,through a combination of techniques, such as software watermarking,application developers can better protect their products.

References

1
Business Software Alliance. http://www.bsa.org/.
2
Collberg, C., Myles, G., andHuntwork, A. Sandmark -- A Tool for Software Protection Research.IEEE Security and Privacy, 1, 4 (July/August 2003), pp. 40-49.
3
Collberg C., and Thomborson,C. Software Watermarking: Models and Dynamic Embeddings.ConferenceRecord of POPL '99: The 26th ACM SIGPLAN-SIGACT Symposium on Priciplesof Programming Languages, 1999.
4
Davidson, R.L., and Myhrvold,N. Method and System for Generating and Auditing a Signature for aComputer Program. US Patent 5,559,884, Assignee: Microsoft Corporation,1996.
5
Hachez, G. A ComparativeStudy of Software Protection Tools Suited for E-Commerce withContributions to Software Watermarking and Smart Cards. PhD thesis,Universite Catholique de Louvain, 2003.
6
Low, D. Protecting Java CodeVia Code Obfuscation. ACM Crossroads, 4, 3 (Spring 1998).
7
Monden, A., Iida, H.,Matsumoto, K., Inoue, K., and Torii, K. A Practical method forWatermarking Java Programs. Compsac 2000, 24th Computer Software andApplications Conference, 2000.
8
Palsberg, J.,Krishnaswamy, S., Kwon, M., Ma, D., Shao, Q., and Zhang, Y. Experienceswith Software Watermarking. Proceedings of ACSAC'00, 16th AnnualComputer Security Applications Conference, 2000.
9
Qu, G., and Potkonjak, M.Hiding Signatures in Graph Coloring Solutions. InformationHiding, 1999, pp. 308-316.
10
Sandmark. http://www.cs.arizona.edu/sandmark.
11
Stern, J.P., Hachez, G.,Koeune, F., and Quisquater, J. Robust Object Watermarking: Applicationsto Code. Information Hiding, 1999, pp. 368-378.
12
The Easter Egg Archive. http://www.eeggs.com/.
13
Venkatesan, R., Vazirani,V., and Sinha, S. A Graph Theoretic Approach to Software Watermarking.4th International Information Hiding Workshop, 2001.

Biography

Ginger Myles (mylesg@cs.arizona.edu) is a PhDstudent in the Computer Science Department at the University of Arizonaunder Christian Collberg. Her research interests include softwareprotection, in particular software watermarking, and issues concerningprivacy and security in ubiquitous computing environments. Currently,she is working on the SandMark project.

Copyright 2004, The Association for Computing Machinery, Inc.