Size Matters: Knowledge, Storage, and the History of Compression

Hansun Hsiung recently finished his PhD about textbooks and the globalization of scientific knowledge in the nineteenth century at Harvard. He is currently working on a project about compression at the Max-Planck-Institut für Wissenschaftsgeschichte in Berlin. An interview with Hansun Hsiung on the History of Compression

Interview by Dennis Schep

What is the history of compression?

In the past decade or so, several historians, most notably Ann Blair, have attempted to historicize our contemporary “information overload” by excavating precursors from the early modern era. These studies have shown that since the rise of commercial print at latest, we’ve been creatively struggling with the problem of “too much to know.” My project takes this line of inquiry, and refocuses it on problems of material infrastructure. Knowledge is embedded in objects, and we build infrastructures such as libraries and archives to house these objects. But objects take up space. Thus, if information overload does have a long history, then this history is not only a problem of “too much to know.” It must also be a problem of “too much to store.”

So, how do we start examining this “too much to store”? Well, on the one hand, epistemology and space have long been related. Look at memory palaces, a technique that’s been around since antiquity. You localize and store things you want to remember by building some kind of sequenced architectural plan in your mind. But this latter point is an essential difference. The memory palace is not real – it’s a fictional space of the imagination, a mental representation. And because of this, it’s potentially infinite. You could store more and more knowledge in your mind for later retrieval through adding and adding more fictional spaces.

The prehistory of “too much to store” is really a change from this potential infinity into a necessary finitude of storage space. At the latest, you see this by the 17th century. The mind had always been supplemented by physical memory aids. But it’s around this time that the physical aids start to be a primary concern, rather than merely supplementary, because the mind starts to be seen as inadequate. For instance, in The Encouragement of Learning, Francis Bacon prefers the use of the commonplace book above memory arts; in comparison to well-organized inscriptions on paper, memory can’t be trusted. The commonplace book is an obvious example. But people also start constructing devices like knowledge chests (arca studiorum), basically cabinets with hooks for organizing little cards on which excerpts have been copied – a kind of predecessor of the filing cabinet. So the human mind gets externalized into space, with the result that spatial finitude emerges as an important epistemological dilemma. This is really the starting point for a history of compression.

Where to next? You mentioned an interest in material infrastructures.

Yes. The more original part of my project concerns the changing relations of storage space to the ideal of the universal library. The term “universal library” was often used for encyclopedic projects – especially encyclopedic catalogs – but the construction of physical spaces that could serve as a new Library of Alexandria is really an Enlightenment project tied to national libraries. Already, Gabriel Naudé, in his mid-17th century manual on how to create a library, criticized private libraries that looked like Wunderkammer, too intent on expressing personal taste and showing off the idiosyncratic curation of knowledge. Instead of this private selectivity, Naudé stressed goals of comprehensiveness or inclusivity, of covering all forms of knowledge. By the 18th century, this comprehensiveness and inclusivity gets tied to a notion of public knowledge, public reason. This is the essence of the Enlightenment universal library.

As a result, libraries are collecting more and more, leading to a problem of storage. This gets exacerbated in the 19th century, when the ideal of universality gets mixed with questions of centralization and national prestige. The Bibliothèque nationale is a good example of a library that explodes through strong centralization. Meanwhile, the stabilization of copyright laws means that deposit libraries like the British Library, or British Museum really, were receiving more and more books. My preliminary sense though, is that one of the biggest problems of the 19th century was actually the rise of periodical or serial print. Newspapers and magazines accelerate the rate of expansion for collections. Space is now a problem for the possibility of universality, and it’s in response to this that compression technologies really start to develop in their modern form.

How does this 19th century disruption of the universal library fantasy manifest itself?

Here is where I think it becomes increasingly important to pay attention to specific technologies of compression. Consider the case of early forms of high-density shelving circa the 1890s. Previously, shelves were part of the library’s essential architecture, typically lining the interior walls; readers browsed and walked by them, sat beside them. In contrast, the development of adjustable shelves for cramming more materials into storage tends to create a separation between the space of “reading” and the space of “storage.” It’s the invention of the “stacks,” as we now call them. And pretty soon, by the 1900s, it’s the invention of “off-site storage.”

What this means, I think, is that libraries start thinking of themselves less as a universal storehouse of knowledge, and more as sites customized to user needs and preferences. Because users are separated from the stacks and need to request through the intermediary of a librarian and catalog, and because many books themselves are stored off site, calibrating collections and accessibility according to expected patron requests becomes essential. Universality has to compromise with democracy, in a way; libraries must increasingly accumulate and analyze data on what users want, because they can’t provide everything.

This doesn’t mean that the universal library dies – especially not at the level of national libraries, which still try for comprehensiveness. But it does get refigured. There’s a really interesting case in early 20th century Germany that shows how patron-generated data can be used to reconstruct a kind of virtual universal library under conditions of decentralization. In 1898, the Prussian state library tried to spearhead the initiative to create a catalog of all book holdings across all libraries in unified Germany. The notion was that no single library could be universal, but that many libraries together could be. But the project encountered a lot of resistance from local libraries who resented the centralization measure as a potential drain on their already limited resources, and feared that larger state or university libraries might snatch up their most precious possessions. In response, the Prussian state library began to create its catalog indirectly through what looked like an unofficial search service. This was the Auskunftsbüro, which launched in 1905. It was an information center where you could send in a form requesting a book or journal article that you were looking for, and they would get people to search across all of Germany for it, and tell you where it was if they found it. Instead of a top-down centralization method, they used the results of these user requests to catalog library holdings across the country, piecemeal.

In a way, this resembles the way the universal library fantasy functions today. I visited the Internet Archive headquarters in San Francisco and had the chance to talk to its founder, Brewster Kahle. He kept saying they were recreating the Library of Alexandria. Their digitization policy and digital architecture, however, is really more like the German case above. There’s no guiding understanding of the universal; they just digitize whatever people give them, as the Auskunftsbüro catalogued based on whatever requests came their way. Similarly, the digital architecture of the Internet Archive is distributed among multiple different server sites, with one common access point, just as the universal library was dispersed, but accessed virtually through the common Gesamtkatalog.

But doesn’t everything change with digitization?

Sure, a lot of things change once we’re dealing with digital objects, but maybe less at the level of infrastructure, and more at the level of ontology. As a result, the last part of my project wants to engage more closely with the ontology of the “document.” Basically, the practices and technologies employed by libraries to minimize space occupied by “documents” also fundamentally modify those documents. It’s a question of “lossy” versus “lossless” compression, and particularly the ways these terms presume boundaries of what is sensible or insensible in a document – what, really, constitutes the important trace in the document that must remain intact.

Microfilm seems to me the first truly distinct rupture in this history, which starts as early as the 1870s. Yes, you can excerpt with index cards and commonplaces. But microfilm is intermedial compression. You’re changing manual inscription on paper into the inscription of light onto celluloid. This ontological difference affects how we encounter, manipulate, and interpret documents. Early microfilm was projected on a wall or screen. As dedicated viewing machines developed, you learned to crank a wheel, which is closer to how one reads a scroll, rather than a codex book. You also were given the ability to change light settings, contrast settings, zoom in and out. Photography claimed to have a direct indexical relationship to reality, at the same time that it altered our sense of how reality was to be perceived. Microfilm did the same thing to sources; it claimed to reproduce them, but altered the way we read and the things we sensed.

One of these alterations was the dematerialization of the book. Microfilm turns the physical object of the book into a projection. We no longer feel the actual page. For a lot of us, it may seem like this doesn’t matter so much, but there’s a lot of research that shows that people even into the mid-Victorian era still cared a lot about the physicality of the book, and that this influenced the ways they read. They sensed the quality of the paper, the smell of the binding. Microfilm divorced “text” from all these other material features of the book.

Is this potential separation not part of the very notion of text itself? The point of having 26 characters in an alphabet is that you can reproduce them in different forms and layouts and on a different piece of paper, but they will still be the same character.

One could say that this potential is already in text itself. But, historically, people did care a lot about the physical look of words on the page, and seemed to care a lot less in the twentieth century. Recent scholarship in book history has been more interested in these elements, which may in part be a revenge against the dematerialization of text that we’ve experienced for such a long time.

But now that you’ve pressed me on the issue, I’d actually concede that “dematerialization” is a misleading way to phrase things. Microfilm didn’t really dematerialize text. What it did was introduce new materialities that shifted our attention onto text at the expense of other things. We perceive different details depending on the way in which microfilm is shot, the type of film used, the type of viewing machine. Now, because of the way microfilm is most commonly shot, and the typical light settings on a viewing machine, text stands out as the most legible element. The more we grow accustomed to reading microfilm, the more we learn how to ignore other blotches, smudges, shadowy photos, blurry details on the edges of pages. And we transfer these habits over to physical books and documents. So it’s not really dematerialization in the end, but a refocusing of material attention or sensibility.

In the same way, digitization isn’t really dematerializing, so much as allowing us to perceive different material features of the object, often at a level of detail and resolutions previously impossible. But here I’m probably stepping on thin ice. I was trained to do 18th-19th century history, not contemporary media studies, so I’m still developing my understanding of digitization. On this count, I aim to benefit a great deal from a modest conference on compression that I’ve organized for this May back at Harvard. The group is quite diverse, with classicists, sinologists, historians, ethnomusicologists, and media studies scholars, and I expect I’ve much to learn from them.