Member
|
About the Internet Archive
|
The Internet Archive is a 501(c)(3) public nonprofit
that was founded to build an ‘Internet library,’
with the purpose of offering permanent access
for researchers, historians, and scholars to historical
collections that exist in digital format. Founded
in 1996 and located in the
Presidio of San Francisco, the Archive has
been receiving data donations from Alexa
Internet and others. In late 1999, the organization
started to grow to include more well-rounded collections. Now the Internet Archive includes texts, audio, moving images, and software as well as archived web pages in our collections.
|
|
Why
the Archive is Building an 'Internet Library' |
Libraries
exist to preserve society’s cultural artifacts
and to provide access to them. If libraries are to
continue to foster education and scholarship in this
era of digital technology, it’s essential for
them to extend those functions into the digital world.
Many
early movies were recycled to recover the silver in
the film. The Library
of Alexandria — an ancient center of learning
containing a copy of every book in the world —
was eventually burned to the ground. Even now, at
the turn of the 21st century, no comprehensive archives
of television or radio programs exist.
But
without cultural artifacts, civilization has no memory
and no mechanism to learn from its successes and failures.
And paradoxically, with the explosion of the Internet,
we live in what Danny Hillis has referred to as our
"digital
dark age."
The
Internet Archive is working to prevent the Internet
— a new medium with major historical significance
— and other "born-digital" materials
from disappearing into the past. Collaborating with
institutions including the Library
of Congress and the Smithsonian,
we are working to preserve a record for generations
to come.
Open
and free access to literature and other writings has
long been considered essential to education and to
the maintenance of an open society. Public and philanthropic
enterprises have supported it through the ages.
The
Internet Archive is opening its collections to researchers,
historians, and scholars. The Archive has no vested
interest in the discoveries of the users of its collections,
nor is it a grant-making organization.
At
present, the size of our Web collection is such that
using it requires programming
skills. However, we are hopeful about the development
of tools and methods that will give the general public
easy and meaningful access to our collective history.
In addition to developing our own collections, we
are working to promote the formation of other Internet
libraries in the United States and elsewhere.
Find
out
How to help
fund the Archive
Make a Monetary
Donation to the Archvive
How to donate
a digital collection to the Internet Archive
About our announcement
and discussion lists on Internet libraries and
movie archives as well as our user forums
|
Future
Libraries - How People Envision Using Internet Libraries |
From
ephemera to artifact: Internet libraries can change
the content of the Internet from ephemera to enduring
artifacts of our political and cultural lives.
"I
believe historians need every possible piece of
paper and archived byte of digital data they can
muster. The Smithsonian Institution sees the value,
and has affiliated with the Archive to preserve
the 1996
campaign Web sites, official and unofficial."
—Dan
Gillmor, computing editor, San Jose Mercury News,
1 September 1996
Protecting
our right to know: Most states have pre-Internet
sunshine laws that require public access to government
documents. Yet while the Internet has generally increased
public access to information, states have just begun
to amend those laws to reflect today’s Internet
environment. According to Bill Chamberlin, director
of the Marion
Brechner Citizen Access Project at the University
of Florida's College of Journalism and Communications,
such laws are being enacted "piecemeal, one state
at a time," and cover information that varies
widely in nature — everything from "all
public records" to specialized information such
as education reports and the licensing status of medical
practitioners. In the meantime, while public officials
are posting more information on the Internet than
their state legislatures require, there's little regulatory
control over exactly what is posted, when it's taken
off, or how often it's updated. This leaves a gap
that online libraries can help to fill.
Exercising
our "right to remember": Without paper
libraries, it would be hard to exercise our "right
to remember" our political history or hold government
accountable. With much of the public’s business
now moving from paper to digital media, Internet libraries
are certain to become essential in maintaining that
right. Imagine, for instance, how news coverage of
an election campaign might suffer if journalists had
only limited access to previous statements that candidates
had made in the media.
"The
Internet Archive is a service so essential that
its founding is bound to be looked back on with
the fondness and respect that people now have for
the public libraries seeded by Andrew Carnegie a
century ago.... Digitized information, especially
on the Internet, has such rapid turnover these days
that total loss is the norm. Civilization is developing
severe amnesia as a result; indeed it may have become
too amnesiac already to notice the problem properly.
The Internet Archive is the beginning of a cure
— the beginning of complete, detailed, accessible,
searchable memory for society, and not just scholars
this time, but everyone."
Establishing
Internet centers internationally: What is a country
without a memory of its cultural heritage? Internet
libraries are the place to preserve the aspect of
a country’s heritage that exists on the Internet.
Tracing
the way our language changes: During the late
19th century, James Murray, a professor at Oxford
University, built the first edition of the Oxford
English Dictionary by sending copies of selected
books to "men of letters" who volunteered
to search them for the first occurrences of words
and to trace the migration of their various meanings.
Internet libraries could allow linguists to automate
much of this extremely labor-intensive process.
Tracking
the Web’s evolution: Historians, sociologists,
and journalists could use Internet libraries to hold
up a mirror to society. For example, they might ask
when different ethnic groups or special interests
or certain businesses became a presence on the Internet.
"We
don't know where this Internet is going, and once
we get there it will be very instructive to look
back."
Reviving
dead links: A few services — such as UC Berkeley’s
Digital
Library Project, the Online
Computer Library Center, and Alexa
Internet — are starting to offer access to
archived versions of Web pages when those pages have
been removed from the Web. This means that if you
get a "404 — Page Not Found" error,
you’ll still be able to find a version of the
page.
Understanding
the economy: Economists could use Archive data
such as link structures — what and how many links
a site contains — to investigate how the Web
affects commerce.
Finding
out what the Web tells us about ourselves: Researchers
could use data on links and traffic to better understand
human behavior and communication.
"Researchers
could use the Archive’s Web snapshots in combination
with usage statistics to compare how people in different
countries use the Web over long periods of time....
Political scientists and sociologists could use
the data to study how public opinion gets formed.
For example, suppose a device for increasing privacy
became available: Would it change usage patterns?"
"The
Internet Archive has created a kind of test tube
that allows a broad range of researchers to analyze
the Web in ways that have never been possible before.
What makes this type of research unique is that
it often requires the fusion of traditional tools
and techniques with new methods, and it results
in the development of new theories, techniques,
and metrics."
Looking
back: With a "way-back machine" —
a device that displayed the Web as it looked on a
given date — historians and others would literally
have a window on the past.
How
would you use an Internet library?
|
Related
Projects and Research |
Internet
libraries raise many issues in a range of areas, including
archiving technology, copyright, privacy and free speech,
trademark, trade secrets, import/export issues, stolen
property, pornography, the question of who will have
access to the libraries, and more. Below
are links to projects, resources, and institutions
related to Internet libraries.
Internet
Libraries and Librarianship
Archiving Technology
Internet Mapping
Internet Statistics
Copyright
Privacy and Free Speech
Internet
Libraries and Librarianship
Alexa
Internet has catalogued Web sites and provides
this information in a free service.
www.alexa.com
The
American Library Association is a major trade
association of American libraries.
www.ala.org
The
Australian National Library collects material
including organizational Web sites.
pandora.nla.gov.au/documents.html
The
Council on Library and Information Resources
works to ensure the well-being of the scholarly
communication system.
www.clir.org
See its publication Why Digitize? at
www.clir.org/pubs/reports/pub80-smith/pub80.html
The
Digital Library Forum (D-Lib) publishes an online
magazine and other resources for building digital
libraries.
www.dlib.org
Attorney
I. Trotter Hardy explains copyright law and
examines its implications for digital materials
in his paper Internet Archives and Copyright.
copyright_TH.php
The
Internet Public Library site has many links
to online resources for the general public.
www.ipl.org
Brewster
Kahle is a founder of WAIS Inc. and Alexa Internet
and chairman of the board of the Internet Archive.
See his paper The Ethics of Digital Librarianship
at
ethics_BK.php
Michael
Lesk of the National Science Foundation has
written extensively on digital archiving and digital
libraries.
www.purl.net/NET/lesk
The
Library of Congress is the national library
of the United States.
www.loc.gov
The
Museum Digital Library plans to help digitize
collections and provide access to them.
www.digitalmuseums.org
The
National Archives and Records Administration
oversees the management of all US federal records.
It also archives federal Web sites including the
Clinton White
House site.
www.nara.gov
The
National Science Foundation Digital Library Program
has funded academic research on digital libraries.
www.nsf.gov/home/crssprgm/dli/start.htm
National Technical Information Service (NTIS), U.S. Department of Commerce, Technology Administration.
NTIS is an archive and distributor of scientific, technical, engineering and business related information developed by and for the federal government.
www.ntis.gov
Network
Wizards has been tracking Internet growth for
many years.
www.nw.com
Project
Gutenberg is making ASCII versions of classic
literature openly available. www.gutenberg.org
The
Radio and Television Archive has many links
to related resources.
www.rtvf.unt.edu/links/histsites.htm
Revival
of the Library of Alexandria is a project to
revive the ancient library in Egypt.
www.bibalex.org
The
Society of American Archivists is a professional
association focused on ensuring the identification,
preservation, and use of records of historical value.
www.archivists.org
The
Royal Institute of Technology Library in Sweden
is creating a system of quality-assessed information
resources on the Internet for academic use.
www.lib.kth.se/kthbeng/kthb.html
The
United States Government Printing Office produces
and distributes information published by the US
government.
www.access.gpo.gov
The
University of Virginia is building a catalog
of digital library activities.
http://www.lib.virginia.edu/digital/
Archiving
Technology
The
Association for Computing Machinery (ACM) computing
and public policy page includes papers and news
on pending legislation on issues including universal
access, copyright and intellectual property, free
speech and the Internet, and privacy.
www.acm.org/serving
The
Carnegie Mellon University Informedia Digital Video
Library Project is studying how multimedia digital
libraries can be established and used.
www.informedia.cs.cmu.edu
The
Intermemory Project aims to develop highly survivable
and available storage systems.
www.intermemory.org
The
National Film Preservation Board, established
by the National Film Preservation Act of 1988, works
with the Library of Congress to study and implement
plans for film and television preservation. The
site's research page includes links to the board's
1993
film preservation study, a 1994
film preservation plan, and a 1997
television and video study. All the documents
warn of the dire state of film and television preservation
in the United States.
lcweb.loc.gov/film/filmpres.html
The
National Institute of Standards and Technology (NIST)
posts IEC International Standard names and symbols
for prefixes for binary multiples for use in data
processing and data transmission.
www.physics.nist.gov/cuu/Units/binary.html
The
Text Retrieval Conference (TREC) encourages
research in information retrieval from large text
collections.
trec.nist.gov
Internet
Mapping
An
Atlas of Cyberspaces has maps and dynamic tools
for visualizing Web browsing.
www.cybergeography.com/atlas/surf.html
The
Internet Mapping Project is a long-term project
by a scientist at Bell Labs to collect routing data
on the Internet.
www.cs.bell-labs.com/who/ches/map
The
Matrix Information Directory Service has good
maps and visualizations of the networked world.
www.mids.org
Peacock
Maps has maps of Internet connectivity.
www.peacockmaps.com
Internet
Statistics
WebReference
has an Internet statistics page (publisher: Internet.com).
webreference.com/internet/statistics.html
Copyright
The
Association for Computing Machinery (ACM) copyright
information page includes text of pertinent
laws and pending legislation.
www.acm.org/usacm/copyright
Tom
W. Bell teaches intellectual property and Internet
law at Chapman University School of Law.
www.tomwbell.com
His site includes a graph showing the trend
of the maximum US copyright term at www.tomwbell.com/writings/(C)_Term.html
Cornell
University posts the text of copyright law
at
www4.law.cornell.edu/uscode/unframed/17/107.html
www4.law.cornell.edu/uscode/unframed/17/108.html
The
Digital Future Coalition is a nonprofit working
on the issues of copyright in the digital age.
www.dfc.org
The
National Academy Press is the publishing arm
of the national
academies.
"The Digital Dilemma: Intellectual Property
in the Information Age"
http://www.nap.edu/html/digital_dilemma/
"LC21: A Digital Strategy for the Library of
Congress"
www.nap.edu/books/0309071445/html
Pamela
Samuelson is a professor in the School of Information
Management and Systems at UC Berkeley.
info.berkeley.edu/~pam
Title
17 of US copyright code
www.loc.gov/copyright/title17/
US
Government Copyright Office
www.loc.gov/copyright
Privacy
and Free Speech
The
Association for Computing Machinery (ACM) free-speech
information page includes the text of pertinent
laws and pending legislation.
www.acm.org/usacm/speech
The
Association for Computing Machinery (ACM) privacy
information page includes the text of congressional
testimony and links to other resources.
www.acm.org/usacm/privacy
The
Benton Foundation Communications Policy and Practice
Program has the goal of infusing the emerging
communications environment with public-interest
values.
www.benton.org/cpphome.html
The
Center for Democracy and Technology works to
promote democratic values and constitutional liberties
in the digital age.
www.cdt.org
The
Computers Freedom and Privacy Conference has
a site containing information on each annual conference
held since 1991.
www.cfp.org
The
Electronic Frontier Foundation works to protect
fundamental civil liberties, including privacy and
freedom of expression in the arena of computers
and the Internet.
www.eff.org
The
Electronic Privacy Information Center, a project
of the Fund for
Constitutional Government, is a public-interest
research center whose goal is to focus public attention
on emerging civil liberties issues and to protect
privacy, the First Amendment, and constitutional
values.
www.epic.org
The Free Expression Policy Project
is a think tank on artistic and
intellectual freedom at NYU's Brennan Center for Justice. Through
policy research and advocacy, they explore freedom of expression issues
including censorship, copyright law, media localism, and corporate
media reform.
www.fepproject.org
The
Internet Free Expression Alliance is an information
and advocacy organization focused on free speech
as it relates to the Internet.
www.ifea.net
The
Internet Privacy Coalition aims to protect privacy
on the Internet by promoting the widespread availability
of strong encryption and the relaxation of export
controls on cryptography.
www.privacy.org/ipc
The
Privacy Page includes news, alerts, and links
to privacy-related resources. Related organizations
include the Electronic
Privacy Information Center, the Internet
Privacy Coalition, and Privacy
International.
www.privacy.org
Privacy
International is a London-based human rights
group formed as a watchdog on surveillance by governments
and corporations.
www.privacy.org/pi
Please
suggest
other pages that may be appropriate here. |
The
Archive has two practical considerations in dealing
with digital collections:
How
to store massive amounts of data
How to preserve the data
for posterity
Storage
Storing
the Archive’s collections involves parsing,
indexing, and physically encoding the data. With the
Internet collections growing at exponential rates,
this task poses an ongoing challenge.
Our
hardware consists of PCs with clusters of IDE hard
drives. Data is stored on DLT
tape and hard drives in various appropriate formats,
depending on the collection. Web data is received
and stored in archive format — 100-megabyte
ARC files made up of many individual files. Alexa
Internet (currently the source of all crawls in
our collections) is proposing ARC as a standard for
archiving Internet objects. See Alexa for the format
specification.
Preservation
Preservation
is the ongoing task of permanently protecting stored
resources from damage or destruction. The main issues
are guarding against the consequences of accidents
and data degradation and maintaining the accessibility
of data as formats become obsolete.
Accidents:
Any medium or site used to store data is potentially
vulnerable to accidents and natural disasters. Maintaining
copies of the Archive’s collections at multiple
sites can help alleviate this risk. Part of the
collection is already handled this way, and we are
proceeding as quickly as possible to do the same
with the rest.
Migration:
Over time, storage media can degrade to a point
where the data becomes permanently irretrievable.
Although DLT tape
is rated to last 30 years, the industry rule of
thumb is to migrate data every 10 years. Given developments
in computer hardware, we will likely migrate more
often than that.
Data
formats: As advances are made in software
applications, many data formats become obsolete.
We will be collecting software and emulators that
will aid future researchers, historians, and scholars
in their research.
Find
out
How to get free
access to the Archive’s Internet collections
About our announcement
and discussion lists on Internet libraries and
movie archives
|
|
|
|