„Trotz der verschiedensten Entwicklungen des 20. Jahrhunderts basieren die Methoden der ‚Analogue Humanities’ immer noch auf Praktiken, die über 200 Jahre alt sind. Im digitalen Zeitalter sind sie nicht mehr zeitgemäß.“

Lev Manovich:

Digital Humanities scholars use computers to analyze mostly historical artefacts created by professionals. The examples are novels written by professional writers in the 19th century. Time wise, they stop at the historical boundaries defined by copyright laws in their countries. For example, according to the U.S. copyright law, the works published in the last 95 years are automatically copyrighted. (So, for example, as of 2015, everything created after 1920 is copyrighted, unless it is recent digital content that uses Creative Commons licenses.) I understand the respect for the copyright laws – but it also means that digital humanists shut themselves out from studying the present.

We believe that the web and social networks content and user activities give us the unprecedented opportunity to describe, model, and simulate global cultural universe while questioning and rethinking basic concepts and tools of humanities that were developed to analyze “small cultural data” (i.e., highly selective and non-representative cultural samples). In the very influential definition by British cultural critic Matthew Arnold (1869), culture is “the best that has been thought and said in the world.” Academic humanities have largely followed this definition. And when they started to revolt against their canons and to include the works of previously excluded people (women, non-whites, non-Western authors, queer, etc.), they often included only “the best” created by those who were previously excluded.

Cultural Analytics is interested in everything created by everybody. In this, we are approaching culture the way linguists study languages or biologists who study the life on earth. Ideally, we want to look at every cultural manifestation, rather than selective samples. (This more systematic perspective is not dissimilar to that of cultural anthropology.) The larger inclusive scope combining professional and vernacular, historical and contemporary is exemplified by the range of projects we worked in our lab on since 2008. We have analyzed historical, professionally created cultural content in all Time magazine covers (1923-2009); paintings by Vincent van Gogh, Piet Mondrian, and Mark Rothko; 20,000 photographs from the collection of Museum of Modern Art in New York (MoMA); one million manga pages from 883 manga series published in the last 30 years. Our analysis of contemporary vernacular content includes Phototrails (the comparison of visual signatures of 13 global cities using 2.3 million Instagram photos), The Exceptional and the Everyday: 144 Hours in Kyiv (the analysis of Instagram images shared in Kyiv during the 2014 Ukrainian Revolution) and On Broadway (the interactive installation exploring the Broadway in NYC using 40 million user-generated images and data points). We also have looked at contemporary amateur or semi-professional content (one million artworks shared by 30,000 semi-professional artists on www.deviantart.com.) Currently we are exploring a dataset of 265 million images tweeted worldwide during 2011-2014. In summary, in our work we don’t draw a boundary between (smaller) historical professional artifacts and (bigger) online digital content created by non-professionals. Instead, we freely take from both.

When humanities were concerned with “small data” (content created by single authors or small groups), the sociological perspective was only one of many options for interpretation – unless you were a Marxist. But once we start studying online content and activities of millions of people, this perspective becomes almost inevitable. In the case of “big cultural data,” the cultural and the social closely overlap. Large groups of people from different countries and socio-economic backgrounds (sociological perspective) share images, video, texts, and make particular aesthetic choices in doing this (humanities perspective). Because of this overlap, the kinds of questions investigated in sociology of culture of the 20th century (exemplified by its most influential researcher Pierre Bourdieu) are directly relevant for Cultural Analytics.

Christine Ivanovic:

Was bedeutet geisteswissenschaftliches Arbeiten im digitalen Zeitalter ?

Die Möglichkeit digitaler Verarbeitung von Informationen zu Daten verändert die Wahrnehmung wie die Produktion von Inhalten, die durch semiotisch aufgeladene Zeichen repräsentiert und über mediale Träger kommuniziert werden.

Informationen sind nun ortsunabhängig über das Internet in digitaler Form abrufbar und können (das Funktionieren der Technik vorausgesetzt) jederzeit digital weiterverarbeitet werden. Das Internet bildet eine den „imagined communities“ von B. Anderson vergleichbare Informationssphäre, die scheinbar unbegrenzte Verknüpfungsmöglichkeiten wahrzunehmen und zu generieren erlaubt. Die prinzipielle Verfügbarkeit und Reproduzierbarkeit der abgefragten Daten verändern die Einstellung des Subjekts zu den ‘Dingen’, die sich in dieser Sphäre repräsentiert finden, wie zu deren medialer Repräsentation selbst.

Die Bedeutung des einzelnen materiellen Datenträgers ist dabei minimiert worden; der Computer dient als universale Empfangs- und Transformationsmaschine, die Größe, Qualität, Inhalt und Status der verarbeiteten Information kaum wahrnehmbar werden lässt.

Die Kontextualisierung der Information wird tendenziell destabilisiert: Informationen werden aus ihnen zugehörigen Kontexten gelöst (z.B. ein Zitat aus dem Zusammenhang des Satzes, des Absatzes, des Textes, des Buches, dem historisch-kulturellen Kontext, in dem es entstand), und sie erscheinen in kontextuellen Verknüpfungen, die von sekundären Parametern abhängen und die sich mit jedem Aufrufen verändern können (Kombinationen aus Suchmaschinen-Funden; user-orientierte Zusatzinformationen, die durch entsprechende Programme generiert wurden etc.).

Der selbstständige mechanische Aneignungsprozess beispielsweise über das Abschreiben (Reproduzieren auf einem anderen Datenträger) ist ebenfalls minimiert worden; damit verändert sich auch die Dauer der individuellen Auseinandersetzung mit der Information. Statt dessen dominieren Aktionen wie Suchen und Finden, Extrahieren und Kombinieren oder Rekonfigurieren bzw. Sammeln und Speichern, wobei die vielfältigen Dimensionen des durchsuchten Raums oder der gespeicherten Daten vom Individuum kaum wahrgenommen werden können.

All diese Veränderungen betreffen die Wahrnehmungen des Individuums und seine Fähigkeit neue Daten zu produzieren. Sie betreffen das Grundprinzip geisteswissenschaftlichen Arbeitens nicht.

Die primäre Praxis der Geisteswissenschaften, sofern sich diese als Wissenschaften definieren, sind das Erfassen und Klassifizieren, Beschreiben und Analysieren kultureller Artefakte oder kultureller Prozesse.

Digitale Technologien stellen die Mittel bereit, um Informationen in weit größerem Umfang als bisher aufzunehmen, klassifikatorisch zu bearbeiten und der Analyse zugänglich zu machen. Ihre Parameter werden von den Wissenschaftlern festgelegt; sie bedürfen einer Reflexion und Kritik, die mit digitalen Methoden allein nicht geleistet werden kann. Im Rahmen der Digital Humanities erfolgt eine Transposition der bisherigen Ansätze in ein anderes Format; dies bedeutet nicht ohne weiteres eine grundsätzlich andere wissenschaftliche Vorgehensweise. Einschlägige Datenbanken und Analyseinstrumente in der Archäologie oder den Kunstwissenschaften sind bereits im Aufbau und Analyseeinsatz. Die Digitalisierung ganzer Bibliotheken wird weltweit in großem Umfang betrieben, Methoden digital gestützter Textanalysen sind in der Entwicklung. Dennoch werden Digital Humanities im Prinzip nicht anders arbeiten als Geisteswissenschaften vor 200 Jahren. Aber es wird möglich sein, mehr Daten systematischer und überprüfbarer zu erfassen und zu untersuchen. Nichtwissenschaftliches (spekulatives) Arbeiten wird auch in den analog arbeitenden Geisteswissenschaften mit Kritik bedacht.

Ich sehe es als unsere Aufgabe an, informationstechnologisch geschulte Geisteswissenschafter auszubilden, um die schon bestehenden Technologien nutzen und um sie für unsere Fragestellungen gezielt weiter entwickeln zu können. Der Einsatz von Informationstechnologien begründet lediglich Digital Humanities; er verändert weder den Gegenstand noch das Aufgabenfeld der Geisteswissenschaften.

Martín Azar:

Humanities with Goggles

The field of Digital Humanities, as its name indicates, is composed by the conjunction of two realms: digital tools and humanities research. The relationship between the two can be accounted for by a couple of key concepts: data and information. Data refers to unorganized collections of elements, raw (but measurable) material. For analyzing this data, computational tools have been developed, which can parse it with a thoroughness that would be impossible to achieve by natural human observation. We’re talking about things like identifying the frequency of certain words or structures in large corpora of thousands or millions of texts. This capacity is more relevant today than ever: in the last 5 years we’ve stored more data than in the previous 5,000 –80% of which consists of natural language. This is the basis of the digital level. However, as we all know, digital tools function exclusively by computing algorithms, and computation is ontologically observer-relative. That is, it has no inherent semantics, it means nothing in itself. Then, so as to make it meaningful, valuable, we have to develop an interpretation of it, and that’s how we transform it into information. Hence, the humanities level.

Let us comment a simple example. A study published this year by Archer & Jockers analyzed 5000 American novels, and showed –among many other things– that the best-selling ones register in average a notably higher rate of contractions (“’ve,” “’d,” “n’t,” etc.) than the non-best-selling ones. Properly understood, this rate of contractions is in itself merely meaningless data; but we can easily integrate it into a theoretical framework so as to make it informative, that is, meaningful. We can postulate, for instance, that contractions indicate informal literary style and that the number of sales indicates taste of people. Thereafter, the correlation between contractions and sales rates could be interpreted as a piece of evidence in favor of a hypothesis such as: “This society prefers an informal literary style,” statement which can, in its turn, be further interpreted within the frame of other hypothesis, such as: “This society prefers a realist literature, close to its everyday concerns and discursive habits,” and so on. In short, the new patterns revealed by digital tools can be productively leveraged to inform our interpretations, by allowing us to contrast our hypothesis with actually representative data – stage often avoided in literary studies.

One of the most usual critiques against the use of this kind of techniques argues that digital tools reduce human interpretation to computation and pose thus a threat to humanities. This argument involves a misunderstanding of both hermeneutics and computation. As it should be already clear from the example, digital tools don’t pose a threat to humanities any more than calculators posed a threat to mathematicians. And this argument can be further illustrated by referring to the abovementioned concepts. The difference between data and information, between the identification of raw elements and the attribution of meaning, is actually inherent to every human hermeneutic act. Each time we make a literary analysis (or any interpretation whatsoever) our cognition performs a cognitive categorization: firstly, we identify certain elements through perception, and secondly, we place these elements into mental categories so that they acquire particular meanings. That is, as part of our normal behavior, we are constantly converting data into information, transforming the perception of certain lights and shadows into the notions of certain objects, familiar faces, or meaningful words. But then, for making interpretations, we need to have what to interpret: we need data to create information. Digital tools only facilitate our access to data. The interpretive stage –the proper realm of humanities– is thereby not reduced, but on the contrary enormously enriched: thanks to digital tools, there’s now much more data to interpret, which was impossible to retrieve until recent years. In this sense, by widening and magnifying our sight, digital tools have the potential to produce in the humanities an analogous revolution to the one produced in the natural sciences by technical inventions such as the telescope and the microscope.

Lev Manovich is a leading media theorist, Professor of Computer Science at CUNY, and founder and director of the Cultural Analytics Lab, which specializes in the use of data science to analyze patterns in big cultural datasets. The above statement is based on his essay ‘The Science of Culture? Social Computing, Digital Humanities and Cultural Analytics.’

Christine Ivanovic is a Professor at the Institute of German Studies of the University of Vienna.

Martín Azar is a doctoral student at the Freie Universität Berlin.