Digital Rights Protection Management of Web Portals Content

INTRODUCTION

Without doubt one of the most important factors that contributed to the wide acceptance and popularity of Web portals is the potential for users to access a broad spectrum of information from a single access point, the Web portal itself. A Web portal, in such a way, aggregates information from multiple sources and makes that information available to various users. Regardless of whether the offered assets are hosted within the Web portal or whether the latter serves as a gateway to information services and resources located on the rest of the Internet, a Web portal is simultaneously an all-in-one Web site and a browsing guide to all available Internet information worldwide. Even though there is no definite taxonomy of portals, relevant labels such as government, community, enterprise, general and others are offered aiming at defining the Web portal with respect to its content and its target group. Summarizing, it could be assumed that a Web portal offers centralized access to all relevant content and applications (Tatnall, 2005).

On the other hand, the ability to create, host and distribute digital material, one of the key features of digital technology that a Web portal utilizes and derives its huge success from, proved to be a double edged sword since it allowed zero cost reproduction of the digital material for purposes of piracy. Piracy of relevant digital material offered by a Web portal is common today and it has posed significant problems in terms of financial losses to the owners of digital content that is offered through it. This explains why the owners of digital content hesitate to place their work on a Web portal where they may be illegally copied and distributed.

Nonetheless, the advantages of adopting this idea and applying it securely in practice are considerable from both customer and owner perspectives. That is why effective copyright protection techniques must be employed in order to convince the owners of digital material to allow their assets to be hosted on the Internet and especially on Web portals. The latter are highly attractive to the new world and are considered to be the meeting point for all technologically oriented people with a mind to purchase something.

This is where digital rights management (DRM) comes into play, employing a set of technical means to control illegal distribution of the aforementioned material and to protect the intellectual property of the original owners (Guenette, Gussin, & Trippe, 2001). Furthermore DRM aims to protect the rights of the users who legitimately purchased the digital material from the original owners.

This article surveys the most effective watermarking techniques available for every multimedia and database entry that requires copyright protection within a Web portal. Subsequently, the most commonly encountered code obfus-cation methods for software objects will be discussed. The conclusion will present views for the future of DRM in the territory of Web portal applications.

BACKGROUND

While copyright infringement existed in the predigital era, the digital age may have increased the ease and scope with which copyright material can be copied.

DRM is often confused with the term access control. The difference lies into the domain where the content is being protected. Access control techniques apply when the content resides in the copyright owner’s space and DRM techniques apply when the content is located in the customer’s space where it can be freely accessed and examined extensively. This is why copyright protection through DRM is considered much more complicated and hard to achieve.

Apple Computers on April 28, 2003 introduced the iTunes Music Store, an online music service that, by January 2006, has sold over 850 million songs worldwide, which accounts for over 80% of digital music sales (Drmwatch.com, 2005). The service has attracted the interest of many companies with respective Web portals which either included it as part of their array of services such as America Online, or which designed a service of their own, such as Microsoft’s MSN music service (Music.msn.com, 2006) to counter the former. America Online’s music service through the iTunes music store utilizes a technique called FairPlay (Music.aol.com,

2006). FairPlay will allow a protected track to be copied exclusively onto Apple Computer’s iPod, portable music players. In addition, the protected track may be played on up to five authorized computers simultaneously and may be copied onto a standard CD audio track any number of times.

This raises another question related to the DRM issue concerning the boundaries between the content owner rights and customer rights, when they trample on each other. This is due to the fact that the respective parties interpret the term “rights” as conditional. As a result, some DRM techniques employed by content owners, such as limiting the number of times a sound track can be duplicated, even for backup purposes or to restrain the portable multimedia players on which the content may be played, have caused serious protests on the part of customers, with the latter arguing that these few technical measures seriously threaten end user rights and stifle productivity and innovation. These open disagreements were taken under consideration and, as a result, some of the respective DRM techniques were recently (Drmwatch.com, 2005) declared illegal in France whereas the European Community is expected to rule a ban on these methods.

DIGITAL RIGHTS PROTECTION

From an evaluation of the resulting situation, it is certain that the intellectual property of content owners who deposit their work in a Web portal must not be left unprotected and the end user rights should be simultaneously preserved. A correct approach to fulfilling this end implies designing effective DRM techniques and examining any immediately following possible conflicts with the customer. Consequently, methods that simultaneously offer copyright protection and do not apply usability restrictions on the objects they aim to protect should be adopted.

Taking the first step into this end, it is observed that the digital content provided by Web portals can be divided into two broad categories: data and software (Atallah, Prabhakar, Frikken, & Sion, 2004). The first one consists of all possible forms of multimedia, including images, videos and digital sounds as well as digital documents, e-books and text structured information. Moreover this specific category includes data hosted in a database that are either queried out or exported in large chunks. The second category features software only, as indicated by the division of categories above.

Due to the fact that the above classification is decided according to the digital object’s data properties, DRM techniques follow the same path and implement different techniques when trying to protect the specific objects. Two approaches are offered by DRM in order to protect the digital material hosted in Web portals: digital watermarking and code obfuscation. The first one aims to protect objects provided by Web portals that fall into the first category, whereas the second one involves exclusively software objects.

Digital Watermarking

Digital watermarking (“watermarking” for short) as well as its relevant information hiding techniques are not products that characterize our age but, on the contrary, are inherently related to the habits and tendencies of every time period (Rosenblatt, Trippe, & Mooney, 2001). However, the hazards mentioned above in combination with the adoption of digital technology from the modern world at the individual and social levels were considered a significant factor that hastened the transport of watermarking techniques in the digital world (Cox, Miller, & Bloom, 2002). This leads to the term digital watermarking which, in the present context, will be considered equivalent to the term watermarking. The latter is employed in order to protect the digital rights of various contents by enabling provable ownership over it. This is accomplished by performing relatively minor modifications on the object which designate the identification information (Watermarkingworld.org, 2005). The embedding procedure regarding these modifications, which is called marks, is determined by a publicly known algorithm and a secret key. This combination defines deterministically the segments of the object that will be altered as well as the alteration itself. The watermarking procedure is considered symmetric considering that the detection—verification process uses the same, most of the time, combination of algorithm and key to locate the alterations that were applied during the embedding process.

With respect to the perceptibility (visual or audible) of the watermark, there are two categories: visible and invisible. Even though in most of the cases a visually undetectable watermarked is preferred, there are some cases where a detectable watermarking is used. For instance in a situation where the content owner desires an ownership mark that is visually apparent but does not prevent the object from being used for some purposes such as scholarly research, a visible watermarking scheme could be employed. An example of a watermarked image with a visible watermark is depicted in Figure 1.

A crucial feature of any watermarking procedure is that it modifies the object it aims to protect. Taking this for granted, it is of utmost importance that the modifications enforced by the watermark not only comply with the initial requirement of being detectable but also have a marginal impact on the object, with respect to its usage. A member of the information hiding family and a really close relative of digital watermarking is digital fingerprinting (Li, Swarup, & Jajodia, 2005; Petitcolas, Anderson, & Kuhn, 1998). Many domain experts claim that these two share many features while others claim that they have complementary roles. To be more specific, digital fingerprinting could be conceived as a unique watermark insertion for each copy of the same purchasable digital object. This time each legitimate customer receives its copy of the object that carries out a different watermark. These watermarks serve a dual purpose. They pin point the digital object’s owner and the customer who bought the latter. With this architecture being operational, when a copy of the protected object is found in the hands of an unauthorized user, both the owner and the user who distributed illegally the object, can be identified. Nonetheless a fingerprint, apart from the last-mentioned feature, can be in all aspects considered as a watermark.

Figure 1. (a) An original image, (b) an image consisting of the copyright information, that is, the watermark, and (c) a watermarked version of image (a)

Due to the fact that a most essential factor that needs to be considered prior to watermark insertion is the data properties of the object to be protected, different techniques are applied to objects of different type.

Text and Documents

It is required that text structured information that is hosted within a Web portal must be protected from illegal copying and distribution. Watermarking in this specific field takes advantage of the fundamental attributes of the text-based information such as words, sentences and paragraphs. It is understood that no alteration can be performed on the character level because any such operation would produce unpredictable results most likely decreasing dramatically the original value of the document. As such, the most common watermarking methods for document copyright protection utilize the above three document components and are summarized below (Atallah, Raskin, Hempelmann, Karahan, Sion, Triezenberg, & Topkara, 2002; Low, Maxemchuk, & Lapone, 1998).

A group of techniques performs the watermark insertion by executing invisible for the human sensor system modifications of the document’s components. A plain example of this method is the vertical shift of a specific word by one pixel in the y axis or the increment of the distance, that is, a shift in the x axis, between two consecutive words.

A different approach imposes the insertion of new sentences within the document in arbitrarily chosen sectors of it. The sentences inserted in this manner are considered valid but, as expected, totally irrelevant with their surroundings, the document’s context. A human being that can read and comprehend the document can easily detect this sort of alteration but on the other hand it is not traceable by computer software or another automated procedure.

Images

In the field of digital pictures, a broad array of watermarking methods is offered to solve the problem of visually unde-tectable copyright protection (Chandramouli, Kharrazi, & Memon, 2003; Langelaar, Setyawan, & Lagendijk, 2000). This abundance is justified initially, as a result of the fact that images were the primary field of watermarking applications and, as such, a lot of knowledge and experience surrounds the sector. A special feature of this watermarking field is the high degree of redundant information. For instance there are many pixels identical in all their properties which can be altered appropriately without any noticeable effects. This image environment, as opposed to text and documents, suffers from the lack of semantics and other high level structures between the adjacent components, that is, the image pixels. The consequence of this lack enables the creation of a large amount of watermarking methods.

A group of methods embeds the watermark in the original domain of the picture, the so-called spatial domain. These methods modify to a slight degree some of the core property values like the luminance or the color of the pixels that constitute the image. To do so, every pixel has a value in that core property that can be represented by a binary sequence and one or a few of the sequence’s least significant bits are changed leading to a watermark insertion.

A different and, regarded by many as a more sophisticated group of methods, operates on the so called frequency domain. To be able to work on that domain, the image is perceived as a two dimensional signal which is converted to the frequency domain through a transformation such as the Discrete Cosine or the Fast Fourier Transformation. Once this occurs, the watermarking algorithm and the key select some coefficients and modifies slightly their values and when all that is done, the image is converted back to the original domain, back to its original form using the appropriate inverse transformation.

Digital Sound

The digitalization of sound goods is a process particularly widespread the last years. It is significant to report that beyond the fact that new musical work is encountered in an overwhelming majority in digital form, older musical work is being constantly converted to digital form. Nowadays drastic changes have been observed with respect to digital music distribution and more than a few portals offer the ability to their subscribers to download legitimately their favorite music tracks. However, once downloaded, a music track could easily be distributed and shared among other users. This is why effective watermarking techniques must be utilized to protect the copyright owners. Among these, Kirovski and Malvar (2001) and Cretu and Fouad (2003) are considered the most distinguished.

The baseline group of methods performs modifications on the least significant bits of the binary sequences that correspond to samples taken from the original sound track.

Taking a step further into the sound domain, a set of techniques can be found that rely on the imperfections of the human sensor system to detect small level alterations. One of them determines the insertion of additional sounds of same frequency but lower volume and even though they are played back concurrently, only the higher volume sound is perceived. Another subcategory inserts sounds of the same volume but different frequency, with the same effect. By this method, the watermark is a signal that is being transmitted at the same time as the signal of the sound but, as a result of the reasons explained above, it escapes perception.

A sum of refined techniques under the echo-hiding label employ, as above, the inability of the human sensor system to perceive low-amplitude echoes. Using these methods, the watermarking algorithm inserts two different types of echoes into the audio tracks in order to codify the ones and zeros, both of which are of minimal duration such as a few milliseconds. As with the other methods, the embedding algorithm defines the portions of the audio track where the echoes are inserted.

Last but not least, the most promising, and thought by many as the most robust set of watermarking techniques in the sound domain must be mentioned, the so called spread spectrum techniques (Chandramouli et al., 2003). These ones rely on hiding a low-amplitude spread spectrum sequence throughout the sound track. Usually the watermark is inserted in the high-amplitude portions of the sound and detection can be achieved using appropriate correlation techniques.

Video

It is also a reality that the multimedia explosion we witness nowadays has not omitted to focus, among other fields, on that of digital video. A fair number of portals have begun to sell music videos, TV series episodes and other digital video content though the Internet. Inevitably this sector requires copyright protection attention and a few methods have responded to the challenge. One could claim that each video is comprised of a sequence of consecutive images and, with no further modifications, a generic image watermarking algorithm could be applied and then proceed to insert the watermark into every one of them. This is theoretically correct but fails to apply in practice. The reason lies in the total overhead that will be added to the video file, considering the fact that each second of video is usually comprised of 20 or 25 pictures and the watermark information could reach even a kilobyte. As a result, even for a five-minute music video, an overhead of five to six megabytes should be added to the existing video file and that is a total increment of approximately 10-15% of the original size. This is surely a non-negligible amount. On the other hand, another view may well put forward the claim that the watermark could be inserted on the initial or the first pictures of the video, using the image techniques discussed above. This approach also fails and this happens because an adversary could cut off the first seconds of the video, removing the copyright protection and reducing its value marginally. As a result, the watermark should be spread appropriately and this is a task carried out by field specific methods (Doerr & Dugelay, 2003).

One group of methods operates exclusively on uncompressed videos and it considers the digital video a sequence of consecutive images, that is, the video’s frames. These images are grouped and, at constant intervals, such as intervals of 20, the watermark insertion takes place spreading among them.

Another set of techniques focuses on digital video that has been compressed using the MPEG-2 format and its variations. Even though of the MPEG-2 is one of the most widely recognized video formats, it is also the format that is used in the digital versatile discs, the specific format is the choice for the watermarking methods in the given field due to the fact that the compression format eases significantly the embedding process. This is justified by the fact that the I-frames of the MPEG video frame sequence are in fact digital images compressed using the JPEG compression standard. Thus, embedding is done at the frequency domain of the I-frames modifying their DCT coefficient values as instructed by the insertion algorithm and the secret key.

Relational Data

While watermarking relational data is considered to be interesting and challenging, it has received relatively less attention than other types of data. As a consequence of their reduced popularity, with respect to multimedia files among common users, relational data are not encountered in peer-to-peer or ftp communications, even tough they form an exclusive feature of the Web portal environment. However, a unique feature of this domain is the inability of current watermarking schemes for multimedia to provide copyright protection for all types of structured data, making it possible for data of numeric nature only to be protected. All available watermarking schemes perform minor alterations, in the numeric values of the data that require copyright protection, for instance, in a commercial Web portal that provides car specifications. The watermarking algorithm could modify the top speed of some of the vehicles by a minor percentage, for example, of the order of 1%, in order to embed the watermark.

Applying such modifications is a procedure of critical importance, since a compromise must take place between the robustness of the copyright scheme, the alteration of many values to a significant degree, and data usability. This field also displays the characteristic that the issue of acceptable change is not as reliable as in other domains. For instance, a minor change in the value of a person’s age, from 17 to 18, can lead to a classification change with undesirable results when an application generates reports on people who have the right to vote. This is the type of application which brings into play the compromise between robustness of the copyright scheme, alteration of values and data usability.

A group of methods (Agrawal & Kiernan, 2002; Li et al., 2005) converts the values of some of the data to their binary representation and modifies the values of one of their least significant bits. The algorithm, as guided by the secret key, makes each time a decision whether a database row should be watermarked. After that, it selects a field of data and the bit position that the mark will be inserted. These marks are spread throughout the database and represent the watermark. The algorithm then converts the modified binary values back to their decimal representation and updates the database.

Another method (Sion, Atallah, & Prabhakar, 2003) relies solely on the statistical properties of data to embed the watermark than on the data itself. This method is usually applied on data sets that follow the normal distribution. In a first instance, the method creates subsets of the original data and examines the distribution of data in each subset. Second, it examines some statistical properties of the data and, in accordance with a set of parameters including a secret key, performs specific modifications to the data values in order to shift appropriately the data distribution as required by the watermarking algorithm.

Software: Code Obfuscation

In order to be able to secure the copyright of any software product that is being offered from within a Web portal, specific copyright protection mechanisms must be utilized. In contrast to traditional watermarking approaches, the field of computer software utilizes the so-called code obfuscation philosophy. According to it, modifications which are in fact totally transparent in terms of their outcome are added to the source code of the program, rendering more difficult to read and, as a result, to understand clearly its full functionality (Atallah et al., 2004). This is the last bastion of the software’s copyright integrity since an attacker, by using a decompiler or a specialized software tool, can produce a crystal clear version of the source code of the program. The attacker would consequently be able to read and understand it and therefore alter some parts of it in order to produce “its” version of the code that would be different enough from the original to allow “it” afterwards to claim ownership over the original software.

Code obfuscation encompasses a set of techniques that hopes/intends to accomplish this sort of modifications to the source code (Atallah et al., 2004). One method reorganizes the data structures used in the program and utilizes them discretely to hide their role in the program (Colberg & Thmborson, 2005). An example of a technique that falls into this category consists in the merging of unrelated data into a single data structure and/or in combining multiple data structures that are used to store similar data. Another technique that belongs to this specific area aims at shuffling the control flow of the program, forcing it to perform routines that make no real-term contribution to the outcome of the final program, for instance, by using a method that creates a file in the local file system that is not required by the program. A third technique, which is based on the principle of facilitating the reading of the source code, includes the rearranging of code, comment removal and alteration of the formatting of the statements used in the source code. A combination of the second and the third techniques is depicted in the example of Figure 3, which is an obfuscated version of the simple Hello World program of Figure 2.

The source code depicted in Figure 3 utilizes some code-obfuscated techniques such as generic variable names, useless loops and conditionals that offer a snapshot to this approach.

Figure 2. A simple Hello World Java program

Figure 3. An obfuscated version of the Hello World program

These sorts of alterations after compilation become innately merged with the program. The intention of these additions/alterations is to confuse potential adversaries who may attempt to reverse-engineer the executable program and produce an accurate version of the initial source code.

On the other hand, it must be noted that before any ob-fuscation technique is to be adopted, it must meet certain criteria. First and foremost, the modifications applied by it must not alter the initial functionality of the program. Additionally these modifications should not be discovered by specialized tools available or by code optimizers during compilation. Finally, resulting from the fact that additional tasks require additional time and effort to compute, degradation in the program’s performance is unavoidable. However this degradation must be kept within acceptable levels. In addition, it must be mentioned that code obfuscation requires programmers to modify source code by hand.

Finally keep in mind that code obfuscation cannot by any means guarantee that an adversary will not be able to successfully attack and somehow retrieve the original source code. However this protective barrier will transform the otherwise trivial task to a challenging and laborious procedure, which will require from the attacker significant knowledge expertise and time to bring it to a successful end.

FUTURE TRENDS

Over the past years, numerous systems, standards and technologies have been proposed to prevent the illegal copying of digital material. Even though some of them proved to be highly effective, the content providers have not taken a serious account of the side effects of these copyright protection schemes causing considerable problems in terms of product usability to the lawful customers. On the other hand successfully applying effective DRM techniques in Web portal offered digital content is a bet everyone expects to win. However, DRM success will depend mostly on successfully establishing a balance between protection of the interest of rights holders and those of users and consumers who wish to use and access the digital materials (Drmwatch. com, 2005).

CONCLUSION

Web portals are out there and their popularity is increasing by the day as more and more people visit and benefit from their wide variety of products and services. Among them are online product acquisition services that grant access to a set of purchasable digital products. This feature has redefined the term trade, unfolding simultaneously new financial and social horizons with a potential that was undreamed of a few years before. However the properties of these digital objects that constitute the cornerstone for this new commercial era can easily be exploited to serve malicious and felonious ends. Digital rights protection management mechanisms must be employed to enable content owners to prove ownership of their digital objects and to restrain their illegal distribution and abuse. Currently the DRM arsenal offers a set of methods every one of which is specialized in a specific type of content to respond efficiently to this challenging task. From those methods only the ones that keep the end user’s rights intact and do not reduce the objects’ usage potential should be considered and utilized.

The final word on this survey of DRM methods on various types of valuable digital objects is that, even though a few worthy and effective methods exist in some domains, a number of methods encounter substantial obstacles and the attention of researchers and scientists around the world is required in order to find an integral solution to the copyright protection problem. In all circumstances and taking under consideration the fact that adversaries always enjoy the privilege of taking the first step, the DRM community must stay alert to fulfill its mission.

KEY TERMS

Code Obfuscation: A set of transformations on a program, that preserves the same black box specification while making the internals difficult to reverse-engineer.

Database: An organized collection of data (records) that are stored in a computer in a systematic way, so that a computer program can consult it to answer questions.

Digital Rights Management (DRM): A systematic approach to copyright protection for digital media. DRM’s purpose is to prevent illegal distribution of paid content over the Internet.

FairPlay: FairPlay is Apple Computer’s trademark for its digital rights management technology which is built into the QuickTime multimedia technology. FairPlay ‘s-protected files are regular MP4 container files with an encrypted AAC audio stream. The audio stream is encrypted using the Rijndael algorithm in combination with MD5 hashes.

Moving Picture Experts Group (MPEG): A working group of ISO. The term also refers to the family of digital video compression standards and file formats developed by the group.

Relational Database: The database model in most commonly in use today is the relational model, which represents all information hosted in the database in the form of multiple related tables, each one of them consisting of rows and columns.

Watermark: A pattern of bits inserted into a digital object that identifies the file’s copyright information (author, rights, etc.).

Watermarking: The procedure of embedding a watermark into a digital object for copyright protection purposes.