[This is a lightly edited version of a keynote lecture I presented at the DHd 2020 conference in Paderborn, Germany, in March 2020. The conference theme was “Digital Humanities between Modeling and Interpretation.” A video of the presentation and the accompanying discussion is available here.]
Thank you so much for this kind welcome, and thank you to the conference organizers for inviting me to speak with you this evening. It is a rare honor and a privilege to be among so many people whose work I have admired and benefited from.
I want to start with a couple of acknowledgements: first, this work draws very heavily on my collaboration with Fotis Jannidis on our workshop and recent volume on data modeling in digital humanities (and indeed on the work of all of the contributors to that volume). I also want to express my gratitude to my student and colleague William Reed Quinn for making me aware of the importance of weak theory—these days, I feel as if everything I learn comes from my students. And finally I want to acknowledge a strong debt to my colleague Amanda Rust, who has been leading the Digital Scholarship Group’s efforts in thinking about how modeling and interpretation can be redirected towards goals of social justice.
Introduction
The American poet Robert Frost is widely (and variously) quoted as having likened free verse to playing tennis with the net down. As an undergraduate student writing a senior thesis on Robert Frost, I thought this was a wonderful metaphor. Looking it up, I found a rebuke from Frost’s contemporary, Carl Sandberg:
The poet without imagination or folly enough to play tennis by serving and returning the ball over an invisible net may see himself as highly disciplined. There have been poets who could and did play more than one game of tennis with unseen rackets, volleying airy and fantastic balls over an insubstantial net, on a frail moonlit fabric of a court.
For me, these days, everything is a metaphor for data modeling, and this one seems particularly apt given the theme for this conference. I am sure over the next few days we will hear much subtler explorations but let me start with an obvious one. Data modeling is partly about creating a space of action: perhaps a spielraum, a space of play; perhaps an arbeitsraum: a workshop or studio? In other words, a space of constraint in which the constraint (real or imagined) frames the meaningful challenges and outcomes.
Frank Fischer asked me this morning to say what I thought the position of the TEI was, within the landscape of data modeling. I found myself giving an impassioned spiel about the history of the TEI: its initial vision of an interchange format, its internal debates about strictness and conformity, and what I take to be its greatest achievement, namely the creation of a customization and localization mechanism that affords its users the ability to create strong or weak local constraints—their own spielraum, if you like—while remaining in meaningful conversation and collaboration with the rest of the TEI world.
What I would like to talk about this evening is the ways in which this kind of spielraum is not just about us and our own game of tennis: to pursue the metaphor past endurance, how do we avoid building a country club?
The foundational work of information modeling is curatorial. That is, it is substantially directed towards goals of creating information resources that can be used as reference or foundation for research and interpretive work.
An example: critical editing intensifies the formal conventions of typography to model a multidimensional universe of texts, including their differences and the commentary upon them: from the variorum editions of the 17th century to the digital editions and archives of our own time.
A second example: metadata and information organization about collections of documents, including systems like the random-access card catalogue (attributed to Baron Gottfried van Swieten at the Austrian National Library), with its categorization of books through author, title, date, subject, and then its further formalization of additional information, leading ultimately to the MARC standard and to the metadata information models prevalent in digital humanities.
Modeling in this curatorial sense is strongly future-directed and also strongly rule-oriented: it envisions and anticipates forms of order and action towards which it strives. And it can thus be understood as a tool of power, a world-making tool: it establishes a formal framework within which each individual semantic unit has its meaning, and it typically results not only in a schema for that framework, but also prescriptive documentation. If we recall Christof Schöch’s characterization of two “core types” of humanities data—big data and smart data—then modeling is the hallmark of “smart data”, data that is explicitly structured (shaped by a formal model) and that is also “clean”: not only in the sense that noise and erratic information that is at odds with that model has been eliminated, but also in that we possess a consistently executed theory of the data: the data and its constraint systems form an internally consistent statement about the world.
Smart data is data that is structured or semi-structured; it is explicit and enriched, because in addition to the raw data, it contains markup, annotations and metadata. And smart data is “clean”, in the sense that imperfections of the process of capture or creation have been reduced as much as possible, within the limits of the specific aspect of the original object being represented. …The process of modeling the data is essential to small/smart data; its abstract structure can be defined with elaborate schemas or as predefined database structures. (Schöch 2013, emphasis mine)
Christof’s analysis counters “smart data” with “big data”, which lacks such modeling (although it does express a model in another way—a topic for an entirely different paper). But both big data and smart data seek to offer insight into what Wai Chee Dimock calls “strong” theoretical approaches, high-level ways of accounting for literary phenomena—periodization, genre, geography, influence, historical contextualization—that exercise a totalizing explanatory force. Dimock’s article explores the ways in which “weak theory” might bring new insight into a field dominated by “strong” theoretical approaches, and situates this dynamic within the history of science—in particular, the debate between Robert Boyle and Thomas Hobbes over the status of experimental science, described in fascinating detail in Shapin and Schaffer’s 1985 book “Leviathan and Air Pump”. In strong theory, as Dimock puts it, quoting Pierre Bourdieu,
Every individual instance is “a manifestation of the field as a whole, in which all the powers of the field, and all the determinisms inherent in its structure and functioning, are concentrated” [Bourdieu, The Field of Cultural Production, ed. Randal Johnson, 1993, p. 37]…Large-scale arguments about political and cultural institutions can be invoked as homologies for the small-scale structures of words. (Dimock, 734)
This is a good description of the relationship between the modeled instance, the encoded document, and the schema that regulates it, and the application of models of this sort requires a clear boundary between what is valid and what is not, in the way that the data schema defines a clear interior and exterior and exercises a kind of definitional infallibility in assigning instances to one or the other space. Models of this kind struggle to account for or accommodate exceptions, when used in their most characteristic role as arbiters of consistency in support of large-scale data regulation.
Probing the entailments of such approaches, and invoking Bruno Latour for his attention instead to a more centrifugal, networked geometry of agency and theory, Dimock proposes weak theory as a way of dispersing and localizing our explanatory work, getting away from “sovereign axioms” and focusing instead on associative connections, local mediation, and site-specificity. As she puts it,
These dispersed, episodic webs of association, not supervised and not formalizable, make it an open question what is primary, what is determinative, what counts as the center and what counts as the margins. A paradigm such as this is fundamentally different from the causally singular, closed-circuit totality of Hobbes. (Dimock, Weak Theory, 737)
Interpretation
Let us now turn for a moment to interpretation. I am fascinated by an early symptomatic moment, from some of the earliest discussions that accompanied the development of the TEI Guidelines, in which the relation between modeling and interpretation emerges as a high-stakes design issue. The first version of the Guidelines was released in November 1990 and a few months later (in February 1991) the Literature Working Group circulated a report titled “Critique of the TEI Guidelines.” I want to draw attention to two key points made by the Critique’s authors. First, they assert that the shareable information the TEI seeks to model (for the purposes of exchanging digital data sources) does not and should not involve interpretive information:
Literature scholars are not interested in, in fact many object vehemently to, the perspective of obtaining texts which already contain – explicitly or implicitly – literary interpretations. (“Critique”)
And second, they argue that there is an important separation of concerns based on professional identity, in which interpretation is identified with the work of the “scholar” rather than the “coder”:
It is recognized that all coding can be seen as a kind of interpretation but a fundamental distinction must be made here. A certain character is or is not in italic; once the way of representing italic has been decided, a simple either-or decision carrying very little intellectual content will resolve the matter. Why a word is italicised is open to a number of interpretations; scholars legitimately may not agree on which one or ones are valid. This is interpretation in the usual sense, and is the domain of the scholar working on the completed text, not that of the coder inputting or converting the text. Recommendations overlooking this distinction will alienate the vast majority of literature people working with computer. (“Critique”)
This critique clearly positions interpretation as following from—an intellectual sequel to—the foundational work of modeling, but also offers a fairly weak view of that foundation. Modeling establishes the text, serves as a form of “lower criticism”, and is fundamentally convergent on a kind of consensus (achievable because it is interpretively insignificant). Interpretation “in the usual sense” by contrast is naturally divergent: scholars don’t want other people’s interpretations, they want their own and in particular they want the freedom to establish their own.
The Critique was an early sign of what proved a long-standing irritant, an issue that demanded explicit mention in discussions of what encoding and transcription really meant for scholars. Repeatedly, early markup commentators argue passionately that even the smallest and apparently most trivial acts of transcription are deeply and consequentially interpretive (contra the assertion in the Critique that there are some modeling decisions that are “cut and dried”, suitable for the “coder”):
… ‘literature scholars’, whatever their interests, have never had any text to work with which has not already undergone a process of transmission and which does not therefore already contain, implicitly or explicitly, interpretations of some kind. (Walsh, 1993)
Interpretation is fundamental to transcription. It cannot be eliminated, and must be accommodated. (Robinson, 1994)
We must recognize that transcription work involves a range of different interpretational activities…Transcription is not copying, but…rather selecting and interpreting. …Any edition of Wittgenstein is in a strong sense a result of interpretation. Our only option is to formalize interpretation, and make it explicit. (Pichler, 1995)
And in these discussions we can see a shifting geometry of the relation between modeling and interpretation. First, as we see in the Critique, there is a geometry of subordination, in which the domains of modeling and interpretation are carefully bounded and aligned with professional hierarchies. Modeling here is a diminished and subservient activity that does not foreclose any interpretive options and does not constitute a competition with scholarly interpretive power. But it nonetheless has the potential to erupt from its proper positioning if it is accorded the power to take interpretative license. Closely neighboring to this relation—perhaps isomorphic with it—is a relation of power, in which modeling exercises a coercive hegemony, a consensus that is not fully inclusive and by its nature never can be. Interpretation establishes and operates within a space outside of this constraint. This relation is also acknowledged in the Critique: the authors point out the future likelihood that the TEI Guidelines will be a standard, not simply a pathway for exchange, and that as such it will accrue the power to compel conformity. Indeed, they note a natural tendency for standards to emerge in this way:
The Work Group understands that the TEI is proposing a coding system for interchange, not for entry of texts. We realize also that many things are suggested as options, not as requirements. It must however also be recognized that simple considerations of efficiency — it is practical to have a locally standard code as close as possible to the interchange code — will tend to foster the use of TEI codes at the local level; ASCII was originally proposed as an interchange code; it is now a standard for alphanumeric representation.
The very polished and comprehensive nature of the present Guidelines, also, means that there will be a tendency for them to become standards, both for interchange and local processing, and even data entry; this possibility must be faced and taken into account as they are drafted. By a similar process optional codes, in the absence of clear distinction between the optional and the required, will tend to be considered as recommended or required, in spite of occasional or implicit indications to the contrary. (“Critique”)
Because of the way it relocates power into the modeling activity, this relation also brings into visibility the ways in which modeling influences interpretation by constituting the informational field it has to work with. But its coerciveness operates at a practical rather than theoretical level: the model here is not yet being accorded the status of ideology. More recent debates over the role of theory and cultural criticism in digital humanities have elicited statements ranging from Susan Smulyan’s blunt “The database is the theory!” to much longer explorations of the political inflections that may inhere in deep technical systems, from scholars like Tara McPherson and Fiona Barnett. In their position, modeling is not only accorded the status of strong theory but is also aligned with external, ideological forms of hegemony that go beyond disciplinary consensus. Interpretation, in this view, struggles with the model not only in specific research contexts but also almost universally: our interpretive options are taken to be severely circumscribed by discourse and subject positioning as well as by the modeling of our information resources. A less stark vision, informed by current digital humanities working practice, would construe the relation between modeling and interpretation as one of reciprocity and co-construction: in which modeling and interpretation are engaged in a fundamentally dialogic and iterative process of knowledge creation. This vision also aligns with the ways in which digital humanities has adopted increasingly hybrid and collaborative professional roles, tending to blur their clear mapping of responsibility for modeling and interpretive work.
The historical tendency, then, in digital humanities, has been to accumulate power in our models, to gain the situational advantages of strong modeling: the value of consistency and conformance to shared standards in supporting data interchange, and the value of a model as an expression of theoretical positioning—important in establishing this branch of digital humanities as genuine research. As Michael Sperberg-McQueen has argued in his contribution to The Shape of Data, a model of this kind provides a basis for formal debate and analysis; the stronger the model and the better it regulates its field of information, the more substantively that debate can take place:
For digital humanities, formally expressed models offer an unparalleled opportunity to make our understandings explicit, to observe and test the consequences of our ideas, and to clarify the nature of our agreements and disagreements with each other. (Sperberg-McQueen, 2018)
But with those advantages fully acknowledged, researchers in digital humanities have also sought ways to mitigate the brittleness and coerciveness of strong models. David Birnbaum, quite a long time ago, gave a talk at the 1997 ACH/ALLC conference “In Defense of Invalid SGML” in which he noted the tension between the original goals of SGML (to regulate the creation of new data) and its use in a humanities context (to structure the representation of pre-existing documents). In this latter case, he notes, the source document may contain authentic violations of the logical structure the schema seeks to enforce, which constitute important archaeographic information in their own right. The central drama of his paper revolves around the conflict between the schema as a legitimate model of the document’s own true and intended structure, and the document’s own violations of that structure. These violations cannot be simply corrected, lest we lose valuable information, but neither can they be allowed to loosen the overall schema constraints, since that would constitute a loss of information about the document’s true structural intentions. Birnbaum’s solution, as suggested in his title, is that we permit invalid SGML markup to circulate as meaningful data, with the resulting error messages serving as a formal acknowledgement of the friction between model and instance.
Birnbaum’s paper highlights the difference between descriptive and normative approaches to modeling humanities data, and this distinction has also played an important role in defining the contexts where the strongest forms of modeling are appropriate. First, there are cases where the goals of constraint are so paramount that we are content to sacrifice variation, for instance where data must operate within a functional framework that cannot be modified, such as a library catalogue or a large public information framework like Europeana or the DPLA. And conversely, there are also contexts where a “weaker” model better suits our purposes—as Dimock puts it, where our knowledge-making is “not supervised and not formalizable.” This might be in cases where the material being modeled resists alignment across instances (for instance, genre disobedience as Birnbaum describes it), but it might also be in cases where the work of “supervising and formalizing” in itself is socially damaging, through the imposition of knowledge paradigms on participants who (unlike the authors of the Critique) are not in a position to resist, or whose trust and collaboration is more important to us than perfectly formalized data.
Community archiving, interpretation, reparative reading
I want to pause here and point out that Interpretation in these discussions—dispersive, localized, exercised by individuals rather than by organizations—could well play the role of the “weak” theory in relation to “strong” modeling. As is becoming clear, one of the problems with modeling as strong theory is that it constitutes an imposition of will on populations and potential partners with whom we ought rather to have a bilateral relationship—rather than insisting that they play the excellent game of tennis at our club. If so, then we could imagine the work of interpretation and its improvisational logic as restoring the balance. But I would instead like to argue that the entire signifying circuit here, as currently constituted in digital humanities, is “strong.” Interpretation represents the value, the payout, the return on investment that authorizes modeling in the first place. And furthermore, interpretation as a funded professional activity (the “research” portion of our annual merit reporting) is structurally cohesive and institutionally foundational, even though individual interpretative work may go in widely divergent critical directions; in an important sense, the medium is the message here. Insofar as academic data modeling seeks to anticipate and underpin the interpretive operations of academic scholarship, it ensures that those operations are working from inputs that already carry a strongly authorized theoretical stamp. And insofar as scholarly interpretive operations—however their individual narratives may vary or offer diegetic critiques—seek a framework of intelligibility and value within the academy, they likewise ensure that the academy will continue to invest in modeling frameworks that can continue to reproduce and underpin that interpretive work.
It’s not, in other words, that modeling is strong and interpretation is weak, but rather that as practiced in the digital academy they constitute together a form of “strong” theory about the academy itself and the forms of knowledge it can produce. The interpretive elements of transcription and encoding (e.g. in Pichler’s and Robinson’s accounts) are precisely those that contribute to the establishment of a text that can be read and interpreted through established academic protocols. The forms of agency that were opened up early on (for instance, in the Canterbury Tales Project) for readers to choose their own editorial options and draw their own interpretive conclusions are still strongly bounded by the terms and premises of the model: that the reader’s free will is directed towards comparing witnesses and readings to establish a textual history of a major cultural work. Crowd-based approaches (which became prevalent in the first decade of the 21st century, with projects like GalaxyZoo [2007], and Transcribe Bentham [2010]) invite volunteers to identify with the research-oriented (albeit public-spirited) goals of these projects: to create accurate data within the constraints provided by the contributory platform, so as to support further research.
As I observed earlier, strong modeling and interpretation have an important role to play in relation to the interests of the digital humanities. However, the academy—and by extension our field—are now being asked, more forcefully than ever before in our lifetimes, whether and how they can really serve a purpose outside of their own bounds. Moreover, the recuperative motivations that early on prompted the creation of archives of women’s writing and African-American dramatists, and the reassemblage of dispersed medieval manuscripts, are now being turned to archives of even more strongly marginalized communities, and increased research attention is being paid to the histories of those communities. At the same time, the power dynamics of that research attention are rightly coming under scrutiny. So as we turn our attention to public audiences, and to partnerships with communities outside the academy, we need to shift tactics and reimagine what both modeling and interpretation are for, to disrupt their closed circuit of meaning.
In particular, to the extent that interpretation can be a space of dispersive, open-ended agency, we need to be prepared to open that space up to interpretative work that may be deeply critical, even iconoclastic: we must genuinely allow it its own autonomous spielraum. And to the extent that modeling approaches thus far have operated through standards that enforce consensus, we need to be prepared to open up the process by which that consensus is achieved.
Eve Sedgwick, in a 1993 essay titled “Paranoid Reading and Reparative Reading,” identifies weak theory with “reparative” modes of reading: reading that is aimed at constructing provisional, contingent narratives that seek wholeness rather than analytical accounts of power. And if, prompted by this idea, we look to domains—like community-based archiving—that are practicing and theorizing a reparative agenda, we can find a set of principles that can help guide our own work in that direction:
…independent grassroots efforts emerging from within communities to collect, preserve, and make accessible records documenting their own histories outside of mainstream archival institutions. These community-based archives serve as an alternative venue for communities to make collective decisions about what is of enduring value to them, to shape collective memory of their own pasts, and to control the means through which stories about their past are constructed. (Caswell 2012; emphasis mine)
There is a tremendous amount we can learn from community-based archiving that applies well beyond that domain—not only about modeling and interpretive practices, but also about work flows, effective design processes, and sustainability—but I want to reference a handful of key points that seem to me particularly relevant here. First, as this quotation from Michelle Caswell makes clear, agency is a crucial issue. In my discussion earlier of the contested agencies of modeler and interpreter in text encoding, the range of eligible agents is comparatively narrow: the modeler, coder, and interpreter all come from the academic or para-academic spaces: scholars, students, librarians, information technologists. Scholars of community archiving emphasize the importance of not simply extending agency to community participants or sharing agency with them, but ceding certain forms of agency to them altogether: the right to control access to culturally sensitive materials, the right to control the ways those materials are described, modeled, and interpreted. Going beyond this: scholars such as Michelle Caswell and Jessica Tai propose that we consider archival records themselves as agents, and also as conferring agency upon their users and readers (Caswell 2012 & 2014, Tai 2019). Other scholars including Stacy Wood, Marika Cifor, Ricardo Punzalan (2014) have shown the importance of also considering those represented in the records as agents. This has the effect of establishing a right to be considered as a stakeholder that can be understood as inhering in the fact of being represented in cultural materials. Thus in the context of modeling and interpretation, in seeking a term to put into dialogue with “strong,” rather than “weak” we might consider “empowering” or “decentering” or “multilaterializing.”
Second, and closely related to the issue of agency, is a shift in the role and nature of expertise, as Ricardo Punzalan and Michelle Caswell argue:
Community archival discourses have expanded the notion of who has the power to process and control archival records. To encourage larger societal participation in archival endeavors, archivists are called to relinquish their role as authoritative professionals in order to assume a more facilitative role. (Punzalan and Caswell 2016, 30)
Stacy Wood and her coauthors describe how the Plateau Peoples Portal places institutional catalog records and descriptions next to tribal records and descriptions and community-contributed tags and comments (Wood et al. 411), thereby establishing a parallel space of authority. Similarly, tools like the Mukurtu content management system, developed by Kim Christen, provide explicit systems for this kind of community-driven modeling, description, and interpretation. This diversification of authority may be harder to imagine and also needs to be understood within multiple vectors of power: archivists and digital humanities practitioners are in some contexts encouraged to insist on their distinctive expertise, as a way of strengthening their position vis-à-vis faculty colleagues; while at the same time, they are being asked to diminish that authority vis-à-vis “citizen archivists.”
Third, specific techniques have been identified that can inflect our modeling practices to balance “conceptual economy and elegance” (as Sedgwick characterizes strong theory) with goals of empowerment and diversification of authority. As much as we may feel strong affinity for the principles of “clean smart data” articulated by Christof Schöch, we may also take insight from Trevor Muñoz and Katie Rawson, in “Against Cleaning,” where they recommend ways to preserve the original modeling that may come to us as “noise” but may also carry important traces of the values and descriptive agendas of the original data creators. Rather than focusing our modeling efforts on establishing normative information structures and standards, projects like Documenting the Now and scholars like Lauren Klein suggest that we might focus on representing entities: in particular, on representing missing, invisible, and silenced voices. And to the extent that we still do need such large-scale structures and standards, Stacy Wood and coauthors emphasize the importance of contingency and uncertainty in descriptive standards (Wood et al. 2014). They note that standards like CIDOC support the pluralization of entities (e.g. in the context of provenance and other forms of agency) but not yet “contingency and uncertainty in description” (414); “Descriptive standards must be better able to express relationships where contingency and uncertainty are defining factors” (415). Finally, numerous scholars (for instance, in the discussions at Northeastern University’s Design for Diversity forums) have observed the importance of adapting our modeling processes so that they are more porous and welcoming to participants who have not traditionally been included in our working groups, task forces, and design meetings. Such adaptations might include the same shifts in tone and pace that are now identified as good practice for accommodating colleagues in workplaces that are more diverse in gender, race, and neurodiversity.
There is a lot to think about here! I come to this conference as something of an outsider who nonetheless has observed and admired—and to a limited extent even participated in—the work being done by this community, and I look forward very much to thinking with you further about how we might adapt, broaden, and reshape our modeling and interpretive work so that it can respond to the changing demands on the digital humanities and indeed the academy as a whole. I welcome your questions and ideas. Thank you so much.
Works Cited
Caswell, Michelle. “SAADA and the Community-Based Archives Model: What Is a Community-Based Archives Anyway?” 2012. https://www.saada.org/tides/article/20120418-704.
Caswell, Michelle, and Ricardo Punzalan. “Critical Directions for Archival Approaches to Social Justice.” Library Quarterly: Information, Community, Policy 86.1 (2016), 25-42.
Dimock, Wai Chee. “Weak Theory: Henry James, Colm Tóibín, and W. B. Yeats.” Critical Inquiry , Vol. 39, No. 4 (Summer 2013), pp. 732-753.
Lavagnino, John. “Comments on Critique by Literature Working Group.” TEI-L listserv, 4 March 1991. https://listserv.brown.edu/cgi-bin/wa?A2=ind9103&L=TEI-L&P=60.
Literature Working Group. “Final Critique.” TEI-L listserv, 11 February 1991. https://listserv.brown.edu/cgi-bin/wa?A2=ind9102&L=TEI-L&P=11624.
Pichler, Alois. “Transcriptions, Texts and Interpretation.” 1995, Wittgenstein Society. http://bora.uib.no/bitstream/handle/1956/1874/apichler-kirchb95a.pdf.
Robinson, Peter. The Transcription of Primary Textual Sources Using SGML. Office for Hymanities Communication Publications 6, Oxford University Computing Services, 1994.
Sedgwick, Eve Kosofsky. “Paranoid Reading and Reparative Reading” in Novel Gazing….
Saint-Amour, Paul K. “Weak Theory, Weak Modernism,” Modernism/Modernity 25.3 (September 2018), 437-459. DOI: 10.1353/mod.2018.0035.
Tai, Jessica, et al. [Zavala, Gabiola, Brilmyer, Caswell]. “Summoning the Ghosts: Records as Agents in Community Archives,” Journal of Contemporary Archival Studies Vol. 6, Article 18 (2019), https://elischolar.library.yale.edu/jcas/vol6/iss1/18.
Walsh, Marcus, “The Fluid Text and the Orientations of Editing,” in The Politics of the Electronic Text, Office for Humanities Communication Publications 3, ed. Warren Chernaik, Caroline Davis, and Marilyn Deegan. Oxford University Computing Services, 1993.
Wood, Stacy, et al. [Carbone, Cifor, Gilliland, Punzalan]. “Mobilizing records: re-framing archival description to support human rights.” Archival Science (2014) 14:397–419. DOI: 10.1007/s10502-014-9233-1.