The pen Is Mighty: Wield It Wisely (Digital Library)

Collecting information and making it widely available to others has far-ranging social implications, and those who build digital libraries must act responsibly by making themselves aware of the legal and ethical issues that surround their particular application. Copyright is the place to begin.

Digital libraries are far more accessible than physical ones. And this brings its own problems: access to the information in digital libraries is generally less controlled than it is in physical collections. Putting information into a digital library has the potential to make it immediately available to a nearly unlimited audience.

This is great news. For the user, information becomes available wherever you are, worldwide. For the author, the potential audience is greater than ever before.On the other hand, authors worry that fewer copies of a work will be sold if networked digital libraries enable worldwide access to an electronic copy of it. Their nightmare is that the number of copies sold could be as few as one. How many books will be published online if the entire market can be extinguished by the sale of one electronic copy to a public library?

Copyright law

In copyright law, possession of a copy of a document certainly does not constitute ownership. Although there may be many copies, each document has only one copyright owner. This applies not just to physical copies of books, but to computer files, too, whether they have been digitized from a physical work or created electronically in the first place—"born digital." When you buy a copy of a document, you can resell it, but you certainly do not buy the right to make further copies and redistribute them. That right rests with the copyright owner.


Copyright subsists in the work, rather than in any particular embodiment of it. A work is an intangible intellectual object, of which a document is a physical manifestation. Lawyers use the word subsists, which in English means to remain or to continue in existence, because copyright has no separate existence without the work. Copyright protects the way ideas are expressed, not the ideas themselves. Two works that express the same idea in different ways are independent in copyright law.

Who owns a particular work? The creator is the initial copyright owner, unless the work is made for hire. If the work is created by an employee within the scope of her employment, or under a specific contract that explicitly designates it as being made for hire, the employer or contracting organization owns the copyright. Any copyright owner can transfer, or "assign," copyright to another party through a written contract.

The copyright owner has the exclusive right to do certain things with the work: thus copyright is sometimes referred to as a "bundle" of rights. In general, there are four rights, although details vary from country to country. The reproduction right allows the owner to reproduce the work freely. The distribution right allows the owner to distribute it, but this is a one-time right: once a copy has been distributed, the copyright owner has no right to control its subsequent distribution. For example, if you buy a book, you can do whatever you want with your copy, including reselling it. The public lending right compensates authors for public lending of their work—although an exception is granted for not-for-profit and educational use, which do not require the copyright holder’s consent. The remaining rights, called other rights, include permitting or refusing public performance of the work, and making derivative works like plays or movies.

Copyright law is complex, arcane, and varies from one country to another. The British Parliament adopted the first copyright act in 1710; the U.S. Congress followed suit in 1790. Although copyright is national law, most countries today have signed the Berne Convention of 1886, which lays down a basic international framework. According to the Convention, copyright subsists without formality, which means that (unlike patents) it’s not dependent on registering a work with the authorities or depositing a copy in a national library. It applies regardless of whether the document bears the international copyright symbol ©. You automatically have copyright over works you create (unless they are made for hire). Some countries—including the United States—maintain a copyright registry even though they have signed the Berne Convention, which makes it easier for a copyright holder to take legal action against infringers. Nevertheless, copyright still subsists, even if the work has not been registered.

The Berne Convention decrees that it is always acceptable to make limited quotations from protected works, with acknowledgment and provided it is done fairly. The United States has a copyright principle called "fair use" that allows material to be copied by individuals for research purposes. The U.K. equivalent, which has been adopted by many countries whose laws were inherited from Britain in colonial times, is called "fair dealing" and is slightly more restrictive than fair use.

Making copies of copyrighted works for distribution or resale is prohibited. That is the main economic point of the copyright system. The Berne Convention also recognizes certain moral rights. Unlike economic rights, these cannot be assigned or transferred to others; they remain with the author forever. They give authors the right to the acknowledgment of their authorship, and to the integrity of their work—which means that they can object to a derogatory treatment of it.

The public domain

Works not subject to copyright are said to be in the "public domain," which comprises the cultural and intellectual heritage of humanity that anyone may use or exploit. Often, works produced by the government are automatically placed in the public domain, or else the government sets out generous rules for their use by not-for-profit organizations. This applies only in the country of origin: works produced by the U.S. government are in the public domain in the United States and its territories, but are subject to U.S. copyright in other countries.

Copyright does not last forever; when it expires, the work passes into the public domain, free of all copyright restrictions. No permission is needed to use it in any way, incorporate any of its material into other works, or make any derivative works. You can copy it, sell it, excerpt it—or digitize it and put it on the Web. The author’s moral rights still hold, however, so you must provide due attribution.

As internationally agreed in the Berne Convention, the minimum copyright duration is life plus 50 years, that is, until 50 years after the author dies. Of course, it is often difficult to find out when the author died. One way is to consult the authors’ association in the appropriate country, which maintains links to databases maintained by authors’ associations around the world.

The duration of copyright has an interesting and controversial history. Many countries specify a longer term than the minimum, and this changes over the years. The original British 1710 act provided a term of 14 years, renewable once if the author was alive; it also decreed that all works already published by 1710 would get a single term of 21 further years. The 1790 U.S. law followed suit, with a 14-year once-renewable term. Again, if an author did not renew copyright, the work automatically passed into the public domain. In 1831, Congress extended the initial period of copyright from 14 to 28 years, and in 1909 it extended the renewal term from 14 to 28 years, giving copyright a maximum duration of 56 years.

From 1962 onward, Congress enacted a series of copyright extensions, some of one or two years, others of 19 or 20 years. In 1998, it passed the Sonny Bono Copyright Term Extension Act, which extended the term of existing and future copyrights by 20 years. (The act is named in memory of former musician Sonny Bono, who, according to his widow, believed that "copyrights should be forever.") The impetus behind the changes was the desire of large, powerful corporations to protect a minuscule number of cultural icons; opponents call them the "Mickey Mouse" copyright extensions. Many parts of the world (notably the United Kingdom) have followed suit by extending their copyright term to life plus 70 years—and in some countries (e.g., Italy) the extension was retroactive, so that works already in the public domain were suddenly removed from it.

The upshot is that copyright protection ends at different times depending on when the work was created. It also begins at different times. In the United States, older works are protected for 95 years from the date of first publication. Through the 1998 Copyright Extension Act, newer ones are protected from the "moment of their fixation in a tangible medium of expression" until 70 years after the author’s death. Works made for hire—often ones belonging to corporations—are protected for 95 years after publication or 120 years after creation, whichever comes first.

The original copyright term was one-time renewable, but few copyright holders renewed. Focusing again on the United States, in 1973 more than 85 percent of copyright holders failed to renew their copyright, which meant that, at the time, the average term of copyright was just 32 years. Today there is no renewal requirement for works created before 1978: copyright is automatically given for a period of 95 years—tripling the average duration of protection.

No copyrights will expire in the 20-year period from 1998 to 2018. To put this into perspective, 1 million patents will pass into the public domain during the same period. The effect of this extension is dramatic. Of the 10,000 books published in 1930, only a handful (less than 2 percent) are still in print. If the recent copyright extensions had not occurred, all 10,000 of these books would by now be in the public domain, their copyright having expired in 1958 (after 28 years) if it was not renewed, or 1986 (after a further 28 years) if it was renewed. Unfortunately, that is not the case.

If you want to digitize one of these 1930 books and make it available on the Internet, you must first determine its copyright status. If copyright expired in 1958 without renewal, it is already in the public domain. The official registry of copyright renewal is not available online, but new initiatives like the Copyright Renewal Database at Stanford University and the Registry of Copyright Evidence by the Online Computer Library Center (OCLC) are trying to help streamline the task. With or without the help of these newer initiatives, for works still under copyright, you will have to contact the copyright holders. If the book is out of print (98 percent of cases), they would likely be perfectly happy to give you permission. But you’d have to track these people down.

As Lawrence Lessig wrote in his book Free Culture (from which much of the above information was taken), Now that technology enables us to rebuild the library of Alexandria, the law gets in the way. And it doesn’t get in the way for any useful copyright purpose, for the purpose of copyright is to enable the commercial market that spreads culture. No, we are talking about culture after it has lived its commercial life. In this context, copyright is serving no purpose at all related to the spread of knowledge. In this context, copyright is not an engine of free expression. Copyright is a brake.

Relinquishing copyright

As we have explained, the Berne Convention grants copyright without formality, without registration. Anything you write, every creative act that is "fixated in a tangible means of expression"—be it a book, an e-mail, or a grocery list—is automatically protected by copyright until 50 years after you die (according to the Berne Convention’s minimum restrictions) or, today in the United States, 70 years after you die (assuming you did not write your grocery list as a work made for hire). People can quote from it under the principle of fair use, but they cannot otherwise use your work until such time as copyright expires, unless you reassign the copyright.

Authors who wish to relinquish copyright must take active steps to do so. In fact, it’s quite difficult. To facilitate it, a nonprofit organization called the Creative Commons has developed licenses that people can attach to the content they create. Each license is expressed in three ways: a legal version, a human-readable everyday-language description, and a machine-readable tag. Content is marked with the CC mark, which does not mean that copyright is waived but that freedoms are given to others to use the material in ways that would not normally be permissible by default under copyright.

The freedoms allowed by licensing all go beyond traditional fair use, but their precise nature depends on the choice of license. One license permits any use so long as attribution is given. Another permits only noncommercial use. A third permits any use within developing nations. Or any educational use. Or any use except for the creation of derivative works. Or any use so long as the same freedom is given to other users. Most important, according to the Creative Commons, is that these licenses express what people can do with your work in a way they can understand and rely upon without having to hire a lawyer. The idea is to help reconstruct a viable public domain.

The term copyleft is sometimes used to describe a license that imposes the same terms on any derivative work: typically the rights to use, modify, and redistribute. The intention of copyleft is to use the facilities of copyright legislation to preserve these freedoms. So, if you take a copyleft work, make changes and distribute a modified version, then other people automatically acquire the same rights (to change and distribute) that you had received.

A prominent figure that helped establish this form of licensing is Richard Stallman. In 1984 he left MIT and founded the GNU Project and set about developing—and persuading over time others to help develop with him—a comprehensive suite of general purpose software. (The acronym, GNU, is itself a self-referential joke, standing for GNU’s Not Unix.) The produced software was made available under the terms of the GNU General Public License (GPL), encapsulating the notion of copyleft, and at the time a radical departure from how mainstream software was distributed. Subsequently the project extended the idea to other forms of work. For instance, the GNU Free Documentation License is the counterpart to GPL, originally designed for the manuals and documentation that accompany GNU software. Others are free to make use of the GNU licenses for their own work, and it is a popular choice for many open source projects.

Digital rights management

A treaty adopted by the World Intellectual Property Organization (WIPO) in 1996 addresses some of the copyright issues raised by digital technology and networks in the modern information era. It decrees that computer programs should be protected as literary works and that the arrangement and selection of material in databases is protected. It provides authors of works with more control over their rental and distribution rights than the Berne Convention does.

In order to understand the motivation behind this treaty, and its implications, reflect on the enormous repercussions of current innovations in electronic publishing.They continue to rely on conventional retailers for the sale of paper copies, but they are experimenting with combining e-Books with pre- and post-release paper copies. E-Book technology is being forced to standardize.

New sales models are beginning to emerge. Highly effective search engine advertising is leveling the market, providing more opportunities for small  self-publishing. Physical bookstores are finding themselves bypassed. E-Books also provide preview options (flip through the pages at no cost), and can be rented on a time-metered or absolute duration basis (like video rentals), sometimes with an option to purchase.

Content owners are adopting technical means to implement policies governing access to the information they sell. The term "digital rights management" (DRM) refers to the control and protection of digital content, including text documents, images, video, and audio. DRM technology limits what users can do with content—even when they possess it. Access can be restricted to a particular computer: this means no lending to friends, no sharing between your home computer and your office, and no backing up on another machine. It also precludes resale, eliminating the secondhand market. Furthermore, expiry dates can be imposed, precluding permanent collections and archives. These measures go far beyond the traditional legal bounds of copyright.

In the entertainment industry, DRM schemes are used to counter perceived threats of piracy. Some complain that these schemes are concerned solely with content owners’ rights, not with users’ rights. DRM does not grant "permissions," which is what copyright authorizes, but instead enforces absolute, mechanical "controls." For example, the motion picture industry establishes control by compelling manufacturers to incorporate encryption into their products because it holds key patents on DVD players.

The WIPO Treaty introduces an important but controversial requirement that countries must provide effective legal measures against the circumvention of DRM schemes. One of the first national laws implementing the WIPO Treaty was the U.S. Digital Millennium Copyright Act (DMCA), which, among many other things, makes it unlawful to publish information that exposes the weaknesses of technical protection measures. In Europe, the European Council has approved the treaty on behalf of the European Community and has made directives that largely cover its subject matter.

Such legislation jeopardizes basic rights legally enshrined in the concept of copyright. DRM allows reading rights to be controlled, monitored, and withdrawn, and DMCA legislation makes it illegal for users to seek redress by taking matters into their own hands. In scholarly publishing, DRM is already well advanced. Academic libraries license access to content in electronic form, often in tandem with purchase of print versions. Because they form the entire market. However, libraries have far less power in the consumer book market.

Copyright and digitization

Many digital library projects involve digitizing documents. If the work to be digitized is in the public domain, or it attempts to faithfully reproduce a work in the public domain, you may digitize the work without securing anyone’s permission. Of course, the result of your digitizing efforts is not protected by copyright either, unless you produce something more than a faithful reproduction of the original.

If material has been donated to your institution for digitizing, and the donor is the copyright owner, you can certainly go ahead, provided the donor gave your institution the right to digitize—perhaps in a written form, such as "the right to use the work for any institutional purpose, in any medium." Even without a written agreement, it may be reasonably assumed that the donor implicitly granted the right to take advantage of new media, provided the work continues to be used for the purpose for which it was donated. You do need to ensure, of course, that the donor is the original copyright owner and has not transferred copyright. For example, you cannot assume permission to digitize letters sent to the donor but written by others.

If you want to digitize documents and the above considerations do not apply, you should consider whether you can go ahead under the concept of fair use. This is a difficult judgment to make. You need to reflect on the copyright owner’s concerns and address them. Institutional policies about who can access the material, backed up by practices that restrict access appropriately, can help. Finally, if you conclude that fair use does not apply, then you must obtain permission to digitize the work or acquire access to it by licensing it.

Thus, building a digital library requires serious attention to copyright. Digital library projects must be undertaken with a full understanding of ownership rights and with full recognition that permission to convert materials that are not in the public domain is essential. Because of the potential for legal liability, any prudent library builder should consider seeking professional advice. A full account of the legal situation is far beyond the scope of this topic, but the "Notes and sources" section at the end of the topic (Section 1.8) contains sources for practical information about copyright. The sources include information on how fair use can be interpreted and on the issues involved in negotiating copyright permission or licensing.

Looking at the situation from an ethical, rather than legal, perspective sheds light on the fundamental issues. It is unethical to steal: deriving profit by distributing a book for which someone else has rightful claim to copyright is wrong. It is unethical to deprive someone of the fruit of their labor: giving away electronic copies of a book for which someone else has rightful claim to copyright is wrong. It is unethical to pass someone else’s work off as your own: making a digital library collection without proper acknowledgment is wrong. It is unethical to willfully misrepresent someone else’s point of view: modifying documents before including them in the collection is wrong, even if the original authorship is acknowledged.

Collecting from the Web

The legal status of documents published on the World Wide Web is murky. Because activities like Web searching and archiving are in a state of rapid transition, it is impractical, and also inappropriate, for legal regulation to try to keep up with the change. If any legislation is needed, it should be designed to minimize harm to interests affected by technological change while at the same time enabling and encouraging effective lines of development. Legislators are adopting a "wait and see" policy, while leading innovators strive to ensure that what they do is reasonable and accords with the spirit—if not necessarily the letter—of copyright law.

Issues abound. Some lawyers have questioned whether it is legal even to view a document on the Web, since one’s browser inevitably makes a local copy without explicit authorization. Of course, it is widely accepted that you can view Web documents—after all, that’s what they’re there for. Next comes the question whether you can save Web documents for personal use. Or link to them. Or distribute them to others. Note that, behind the scenes, Web documents are routinely copied and saved. For example, to economize on network traffic and to accelerate delivery, Web cache mechanisms save copies of documents locally and deliver them to other users.

The way that computers in general, and the Web in particular, operate raises the question whether the notion of a "copy" is perhaps no longer the appropriate foundation for copyright law in the digital age. Legitimate copies of digital information are made so routinely that restrictions on the act of copying no longer serve to regulate and control use on behalf of copyright owners. Because computers make many internal copies when they are used to access information, the fact that a copy has been made says little about the legitimacy of the behavior. In the digital world, copying is so bound up with the way computers work that controlling it provides unexpectedly broad powers, far beyond those intended by copyright law.

All these points have an immediate and practical impact on digital libraries. Digital libraries are organized collections of information. The Web is full of unorganized information. Downloading parts of Web content in order to organize information into focused collections and to make the material more useful to others is a prime application area for digital libraries.

Search engines, one of the most widely used services on the Internet, are a good example. They use software "robots" to continually download huge portions of the Web and create indexes to the content. Although a service provider may retain documents on their own computers, searchers are presented with a summary and are directed to the original source documents rather than to local copies. Search engines are commercial operations, but their services are not sold directly to users; instead, their revenue is derived from advertising—in effect, a tax on the user’s attention. Although search engines are widely accepted and used, their legal status is unclear.

Web sites can safeguard against indiscriminate downloading. A generally accepted robot exclusion protocol allows individual Web sites to prevent their content from being downloaded and indexed. Although this protocol is entirely voluntary, widely used search engines certainly comply with it. But the onus of responsibility has been shifted. Previously, to use someone else’s information legitimately, one had to request explicit permission from the information provider. Now, search engines automatically assume permission unless the provider has set up an exclusion mechanism. This is a key development with wide ramifications. And some Web sites threaten dire consequences for computers that violate the robot exclusion protocol—for example, denial-of-service attacks on the violating computer. This is law enforcement on the wild Web frontier.

Different copyright issues are raised by projects that are archiving the entire World Wide Web. The rationale for creating such an archive is to offer services like supplying documents that are no longer available or providing a "copy of record" for publicly available documents—in effect, supplying the raw material for historical studies. However, creating an archive raises many interesting issues involving privacy and copyright, issues that are not easily resolved.

What if a college student created a Web page that had pictures of her then-current boyfriend? What if she later wanted to "tear them up," so to speak, yet they lived on in the archive? Should she have the right to remove them? In contrast, should a public figure—a U.S. senator, for instance—be able to erase data posted from his or her college years? Does collecting information made available to the public violate the "fair use" provisions of copyright law?

Most digital libraries aim to provide more comprehensive searching and browsing services than search engines do. Like archives, they probably want to store documents locally, to ensure their continued availability. Documents are more likely to be seen as part of the library, rather than as products of their original Web site. Digital libraries are more likely to modify documents as an aid to the user, least invasively by highlighting search terms or adding metadata, more invasively by re presenting them in a standard format, most invasively by producing computer-generated summaries or extracting keywords and key phrases automatically.

Those responsible for such libraries need to consider carefully the ethical and legal issues involved. It is important to respect robot exclusion protocols. It is important to provide mechanisms whereby authors can withdraw their works from the library. It is helpful if explicit permission can be sought to include material. If information is automatically derived from, or added to, the source documents, it is necessary to be sensitive to possible misrepresentation.

The world is changing. Digital libraries are pushing the boundaries of what is possible by organizing anthologies of material. And they are pushing the boundaries of society’s norms for distribution of intellectual property. Those who run large-scale Internet information services tell interesting "war stories" of people’s differing expectations of what it is reasonable for their services to do. For example, search engine operators frequently receive calls from computer users who have noticed that some of their documents are indexed when they think they shouldn’t be. Sometimes users feel their documents couldn’t possibly have been captured legitimately because there are no links to them. Most search engines have a facility for locating any documents that link to a specified one and can easily locate the omitted link. On other occasions, people have put confidential documents into a directory that is open to the Web, perhaps just momentarily while they change the directory permissions, only to have them grabbed by a search engine and made available for all the world to see.

Search technology makes information readily available that may previously have been public in principle, but impossible to find in practice. When a major search engine took over the archives of a corpus of Internet discussion groups on a wide range of topics, it received many pleas from contributors to retract indiscreet postings because, now that postings were easily available for anyone to find, they were causing their authors embarrassment.

Illegal and harmful material

Some material is illegal and harmful and clearly inappropriate for public presentation (because such material is distasteful, we will not give examples). For example, a 1999 UNESCO Expert Meeting on Paedophilia on the Internet noted, Violence and pornography have invaded the Internet. Photos and videos of children and young teenagers engaged in sexual acts and various forms of paedophilia are readily available. Reports of children being kidnapped, beaten, raped and murdered abound…. The Internet has in many cases replaced the media of paedophiliac magazines, films and videos. It is a practical, cheap, convenient and untraceable means for conducting business as well as for trafficking in paedophilia and child pornography. The Internet has also become the principal medium for dialogue about paedophilia and its perpetuation.

UNESCO has taken the lead on breaking the silence on this topic and is engaged in a number of initiatives to provide safety nets for children online.

Whether information is considered harmful often depends on the cultural, religious, and social context in which it is circulated. Standards vary both within and among nations. However, the international nature of the Internet means that it is no longer possible to police the transfer of information, and sustaining local legal and cultural standards is a huge challenge facing society today. The challenge includes the dilemma of balancing freedom of expression against citizens’ rights to be protected from illegal or harmful material.

In 2000, a well-publicized example of different views on Internet access concerned online sales of Nazi memorabilia on U.S. Web sites accessed using the Yahoo Internet portal. A judge in Paris ruled that the sites are barred under French law and ordered them to be blocked. However, U.S. Web sites are governed by U.S. laws, and an American judge ruled that the First Amendment protects content generated in the United States by American companies from being regulated by authorities in countries that have laws more restrictive of freedom of expression. Suit and countersuit followed, and the matter was not settled for six years, when a U.S. court decided that Yahoo was liable for a fine levied in France.

Another challenging example is online gambling. The relevant laws are restrictive (or at best muddy) in countries like the United States, China, and Italy. Some international gambling sites claim to comply with local laws by checking the geographical origin of the user (a difficult and unreliable procedure that is easily circumvented) and by refusing to offer their services in countries where gambling is illegal.

Cultural sensitivity

Most digital libraries are produced by people from Western backgrounds, yet the majority of the world’s population live in countries that have very different cultures. Some digital libraries are specifically aimed at people in different parts of the world: collections for developing countries, for example, or collections aimed at preserving and promoting indigenous cultures. It is clearly necessary for digital library developers to consider how their creations will affect other people.

First, as we already mentioned, language is the vehicle of thought, communication, and cultural identity, and so digital library users should be able to work in whatever language suits them. But the need for cultural sensitivity goes even deeper. Particular labels can have strong, unexpected connotations: certain models of cars have failed to sell in countries where the model’s name had a serious negative association. Icons also have cultural implications: dogs, for example, are offensive in Arabic cultures and will have negative effects if they are offered as user interface icons. Furthermore, different cultures have different color preferences, and particular colors have different associations.

In Polynesian cultures the concept of tapu, usually translated as "sacred," has rich and complex connotations that are difficult for Westerners to appreciate. Many objects have different degrees of tapu, and Polynesians find it rude and offensive if these objects are used inappropriately, in the same way that many Westerners find blasphemy rude and offensive. An example that can affect digital library design is that representations of people—including pictures—are tapu, and in Polynesia it is generally inappropriate for them to be on public display.

Next post:

Previous post: