Lost in Cyberspace | Media | Chicago Reader

News & Politics » Media

Lost in Cyberspace

The peculiar challenges of archiving newspapers in the Information Age

by

comment

When the Rocky Mountain News was declared dead on Friday, February 27, after 150 years of publication, the remains included a Web site. Abandoned by the parent company, E.W. Scripps, rockymountainnews.com sits there just as the paper left it four months ago, a death mask of the Rocky.

Newspaper people have always been of two minds about the permanence of what they do. On the one hand, today's news lines tomorrow's bird cage; on the other, when newspapers rolled out Web sites a decade or so ago and they looked for a time like a source of serious revenue, writers bristled at the idea their stories would live forever online and they wouldn't see an extra penny. In 2001 the Supreme Court gave them a nod, ruling in New York Times Co. v. Tasini that freelancers who sold print rights to articles that the periodicals then stashed in digital databases retained copyright privileges and deserved compensation.

Today the defunct Rocky Mountain News is both a memory and a portent, and the talk among parrots is over what else they can find to poop on. The idea of a digital afterlife feels less like exploitation than consolation, and the effect of Tasini is simply to add copyright issues to the reasons why librarians and archivists are reluctant to take custody of digital files and preserve them.

Scripps is putting rockymountainnews.com up for sale, and when someone buys the URL, if not before, it will go blank. The digital archives of old print stories will disappear, and so will stories, blogs, and public comment that never existed in any form but digital. "Shit, do you think I should go through there and print out all my blog posts?" wondered John Temple, the Rocky's last editor, last week. Until we talked, he'd supposed his blog, and all the other blogs housed on the Rocky's Web site, would live forever.

That's because on June 8 Scripps made the jubilant announcement that it was finalizing an agreement with the Denver Public Library "to ensure responsible stewardship of the storied newspaper's archives and artifacts." The library "would assume ownership of the Rocky's voluminous archives, including all digital and paper newspaper clipping files, " while the Colorado Historical Society would receive "such other artifacts as signs, photographs, special editions, artwork and other information that documents the history of the Rocky."

Temple assumed that "archives" and "digital files" meant that the entire contents of the Rocky's site would be preserved by the library. But they won't. Jim Kroll, who as head of the library's department of western history and genealogy is receiving the Scripps bequest, tells me the library's going to get "photos that appeared in the paper, photos that are outtakes, PDFs of the newspaper for the past four years, streaming video, some other things I'm not quite sure of yet."

But what Kroll calls the "Web page," well, "that's not there." Even without the Web site, said Kroll, "it's a massive amount of material. It's not gigabytes. It's terabytes." A terabyte is a thousand times the size of a gigabyte—a thousand billion bytes.

Apart from the quotidian news it contains, the Rocky's doomed site tells an important story about how the paper tried to save itself. In the history of journalism this is a poignant point in time: it finds newspapers desperately responding to the Internet peril. The blogs they add and ballyhoo are one response. (The supreme example is the Tribune's new Chicago Now smorgasbord of independent bloggers.) Another is the volley of public comment the papers solicit on nearly every story.

"The nature of reader discourse would be an interesting thing to look back at, as we look at the first few years of the Internet and the idea of seeking these discussions with the community," reflects Tina Griego, a former Rocky political columnist (and briefly a blogger) who's now at the Denver Post. "It's pretty raw right now. What we allow online we would never allow in print. It's a very skewed picture of humanity."

Griego expects this picture to change, and so do I. Newspapers that don't police their sites will begin to—out of embarrassment, or because laws will be written to make them, or because some new form of newspaper will hold its readers to old-fashioned standards. And as for the bloggers—they may be a tide that's already crested. In a June 7 story, "Blogs Falling in an Empty Forest," the New York Times reported that about 95 percent of blogs are "essentially abandoned, left to lie fallow on the Web" just like the Rocky's Web site. These are private blogs, of course, but one reason they're being forsaken must concern newspapers—they make hardly anybody any money. (The Tribune offers Chicago Now bloggers $5 for every 10,000 local hits—the cyberspace equivalent of carfare.)

In extremis, American newspapers are practicing—or should I say making up on the fly?—a journalism that will probably turn out to be as different from tomorrow's as it is from yesterday's. Transitional periods are fascinating as they happen and damned hard later to reconstruct. How complete will the record be of this one?

The Society of American Archivists and the Council of State Archivists are meeting together in Austin, Texas, in mid-August, and they intend to pose that question. One of the panels is titled "'All the News That's Fit to Keep': The Challenges of News Preservation in the Digital Age," and the program's description of the session wonders: "Can born-digital news be saved? What is the scope of the preservation challenge?"

It's enormous, says the archivist chairing the panel.

"This has been an obsession of mine for about ten years," says Victoria McCargar of Mount Saint Mary's College in Los Angeles. I tell her about the responsibility the Denver public library isn't shouldering, and she applauds. "In my humble opinion, they'd be crazy to do that," says McCargar. "No library smaller than the Library of Congress can take something like that on." And, she adds, although the Library of Congress has a program to digitize old newspapers—which for copyright reasons stops at 1920—the preservation of modern digital journalism is a problem it has just begun to consider.

McCargar has a rule of thumb that Groucho Marx would admire. A paper like the Rocky would be nuts to turn its data over to any library that offers to take it because the offer proves the library doesn't know what it's doing.

"There's a long history of public libraries and state and local historical societies taking newspaper files," she says. Microfilm, microfiche, even envelopes stuffed with yellow, crumbling clippings—they're all duck soup for a competent archivist. However, "since the middle of the 90s any medium-to-large-size newspaper like the Rocky has a database in which they keep articles and photos and page PDFs"—a database that probably makes its IT people tear their hair out. "These are one-off systems that kind of grow up locally within the paper, and at a certain point not even the vendors recognize it," says McCargar.

McCargar gives workshops in which she warns libraries tempted to acquire these archives that they'd be accepting a "hugely expensive and untenable burden." Digitized data is high maintenance—left alone it quickly becomes unusable. "The analogy frequently used is to a patient on life support," says McCargar. Unless the data is constantly being reformatted to accommodate the latest hardware and software—tried to see what's on your old floppy discs lately?—it'll be lost. "It takes active intervention forever," says McCargar. "It's very expensive and it's not particularly green. It's a big experiment. These are sobering issues for much bigger libraries than the Denver Public."

What about a library that's 4.5 petabytes big? (A petabyte is a thousand terabytes.) That's the size of the cluster of computers that the Internet Archive in San Francisco fancifully calls its Wayback Machine (after Mr. Peabody's time-travel contraption on Rocky and Bullwinkle). Funded by donors and foundations, the Internet Archive says it's "working to prevent the Internet—a new medium with major historical significance —and other 'born digital' materials from disappearing into the past."

Every six to eight weeks, says Robert Miller, the archive's director of books, the Wayback Machine "takes a snapshot of the Web." In theory, that's every single accessible page of every single URL, all of it set before the public at archive.org.

The snapshots are a little hit-and-miss, with plenty of broken links and material that doesn't show up because it was hidden behind firewalls. Even so, the Wayback Machine, "such as it is, is the only effective archive of the Internet," says McCargar.

In some respects LexisNexis is a "de facto newspaper archives," she says, but "they don't see themselves as being in the preservation business." Besides, "they could go out of business tomorrow, and then what happens?"

She goes on, "I think Google likes to say they're in the business of preserving, but there's an upper limit to that—it depends on how much money they're going to be making online."

The point is that real archiving's not a business—it's a public service. The digital newspapers of the early 21st century will be unknown in the 22nd unless they're aggressively safeguarded. They won't sit around in boxes until they're shredded or burned. Simple neglect will destroy them.   v

Care to comment? Find this column at chicagoreader.com. And for more on the media, see Michael Miner's blog, News Bites.

Add a comment