On the Method of the “STS of Documents” in Digital Humanities
Documents have been increasingly recognised as important objects of investigation in science and technology studies (STS), as Kalpana Shankar et al. states: ‘Documents are often, maybe always, at the center of efforts to achieve coordination and control, so greater attention to the complexity of their role should result in better accounts of how this is accomplished’ (2017: 70). So far, much less attention has been given to the study of documents in Digital Humanities (DH); however, given a diverse and large number of materials produced in DH work (e.g., technical documentation, kits, protocols, white papers), it has become timely and significant to approach them to investigate the socio-technical practices of digital research production.
Over the last nine months, I have been studying documents, particularly ‘Feasibility documents’, produced by King’s Digital Lab (KDL) as part of my laboratory ethnographic study. Based on the analysis of 40 Feasibility documents, I first sought to understand how they inform the lab management work and how they contribute to structuring the digital research process. To this end, I apply the method of the ‘STS of documents’ (Shankar et al., 2017) and analyse Feasibility documents in a manner similar to STS-based studies of scientific labs’ protocols (Crabu, 2014) and kits (Neresini and Viteritti, 2014). The outcome of my analysis will be published in Convergence special issue ‘Critical Technical Practice(s) in Digital Research‘.
In the forthcoming article, I intend to show that documents can be studied as ethnographic objects that can help to reveal critical and socio-technical practices entangled with operational methods and local requirements. Drawing on Agre’s critical technical practice and Digital Humanities’ theories of critical production (Berry and Fagerjord, 2017; Smithies, 2017), I seek to shift attention from end-product digital artefacts towards the complex process of their creation, which can unpack a range of social, technological, and management issues. In doing so I also aim to provide a methodological framework for the analysis of documents in Digital Humanities that have the potential to unearth new questions about the socio-technical nature of digital production.
In my analysis I argue that Feasibility documents (1) inform ethnographic work about lab workflow and management and in doing so, are able to capture the interconnectedness of work layers and practices; (2) enable an empirical analysis of digital research projects and the process of translation from research questions, to methods, to technical solutions; (3) are critical structuring objects that structure the research process and relationships between involved actors and are structured by local institutional strategies and decisions. In this blog post, I want to briefly discuss the first goal focused on lab management and workflow by referring to a research project conducted in collaboration with KDL.
Humanists are not accustomed to seeing scholarly research projects in terms of multi-layered project management and risk assessment. However, digital projects involving many partners and pieces of products require a new way of approaching them to ensure their successful delivery and sustainability. The sustainability issue is often associated with maintaining infrastructure and systems and is rarely connected with research management and workflow. An explicitly articulated workflow is, however, an assurance of the practical implementation of digital research projects. As Bergel et al. (2020: 17) argue, ‘The sustainability of DH is directly linked to the impact of the activities within which digital resources and methods are involved. Within those activities, DH is variably recognised.’ Therefore, there is a need for more awareness about the importance of project management practices that along with feasibility and risk assessment can ensure the delivery of critically robust and sustainable artefacts.
Feasibility documents
Feasibility documents are produced by KDL as part of the Agile-based Dynamic Systems Development Method (DSDM) framework and Software Development Lifecycle (SDLC) model, and they aim to inform the lab’s decision to progress with a project proposal (KDL’s Feasibility guidance and project templates are openly available at Github [KDL, 2020]). After the initial contact with partners and several meetings, the lab team prepares the Feasibility document to assess the project for the lab and plan high level requirements, which prioritization is agreed between KDL and partners rather than solely dictated by KDL. This process involves activities such as the elicitation of requirements, reviewing partners’ data samples and data sets, sketching technical solutions, and checking whether the budget is sufficient for such work. The team aims to assess technical and design requirements based on research questions and to organise a project workflow. From the ethnographic perspective, the documents provide an interesting insight into the components, layers, and management of digital products. Feasibility documents represent the accumulation of expectations that are structured into the logical process of the translation of inquiries into material feasible items. They are important artefacts that aim to be both fixed and flexible at the same time. I argue that they constitute a rich resource for the ethnographer who can follow which parts are under control and which are open to negotiation.
The Feasibility document consists of the following sections:
1) Research case (providing an outline of the research goal and objectives, the contribution of KDL as well as any information about data sets, including their formats, quality, the degree of digitisation required, copyright, and licensing);
2) Requirements (listing the methodological and technical project-specific requirements in the form of big chunks of work clustered together based on the MoSCoW technique and high-level functionalities; for example, user testing or data storage solution; the division of responsibility between the lab and partners, and assessing the risks);
3) Solution architecture (technical design framework for the solution, including the description and justification of the selected software, tools, and standards, as well as outlining alternative solutions);
4) Development approach (providing an outline of the SDLC framework underpinning the project development, the strategy for testing parts of solution and making the source code available in an open-source repository on GitHub);
5) Delivery plan (giving brief information about an incremental delivery approach);
6) Management approach (outlining the Agile-based project management framework);
7) Forward planning (explaining options for the long-term sustainability of the project);
8) Costings;
9) Feasibility assessment (the final justification for go or no-go of the project by summarising the possible risks and impact of the project in relation to the lab’s experience and strategy as well as the priorities of the university).
The goal of using a Feasibility document is to plan the work process by breaking down large segments of activities into items of high-level requirements following the MoSCoW prioritisation technique to specify ‘must’, ‘should’, ‘could’, and ‘won’t have this time’ requirements – the method used in the software industry’s approach of DSDM. The document is also intended to assess risk and impacts and by doing so, it seeks to provide justification to the university for taking on the project. The prioritisation approach is significant for delivering high-quality artefacts within budget and time constraints. It aims to ensure the implementation of the main philosophy of the Agile approach that is ‘never compromising on quality’ (Agile Business Consortium, 2014).
The SDLC can help to keep a project on track when unpredictable human aspects disturb the work process. This business model is just a framework within which the lab seeks to tame the complexity of digital production. The flexibility of the Agile method is, therefore, useful to find the balance between expectations and requirements, innovation and sustainability, and experimentation and functionality.
The interconnectedness of work layers and practices
To show the multi-layered structure of digital research production where all elements are interconnected, I produced diagrams visualising the feasibility model and the lab workflow. Figure 1 can be read from top to bottom to see the following processes and layers.
First is the ‘Analytical process’ which is the intellectual inquiry, meaning the research ideas submitted by partners (e.g., researchers in the humanities, social scientists, archivists) to KDL and a methodological layer that aims to assess which digital methods and software would be suitable for addressing the research questions. This step is led by a research software analyst in close contact with partners and the rest of the lab team (developers and designers) who can help to ensure that the technical and methodological solutions are feasible and functional.
Next is the ‘Production process’ which entails the technical development and delivering increments to partners. This layer corresponds to the so-called ‘black box’ of production, meaning that partners receive the finished pieces of work without observing how the production was carried out. Having said that, the Feasibility documents provide transparent information about the development process; therefore, partners can gain an understanding of how the work will be conducted. The ‘degree’ of black box access depends, however, on partners’ technical knowledge and engagement. For instance, the requirement: ‘Minimal specification VM for Django set-up for data entry’ might be understood by some partners and unclear for others.
The ‘Maintaining process’ takes us to an infrastructure layer coordinated by the software system managers. This substrate of digital production reminds us how digital work is very much material – storage, backup solutions, and random-access memory – and how humanists put less emphasis on the underlying processes of their work which is crucial for its accessibility, safety, and sustainability.
The last, ‘Monitoring process’ constitutes a backbone management layer of the lab run by the project and lab managers. This is a cornerstone of the lab work responsible for planning, executing, and controlling projects The monitoring practices are fixed to ensure solid management and successful delivery of the products at the right time and within budget. They are performed by the use of various digital platforms and techniques, e.g., Microsoft SharePoint for knowledge base and documentation, Slack platform for communication, ActiveCollab software for collaborative management, and Timeboxing method for planning and reviewing project progress. All these practices ensure continuous and transparent communication between the lab and partners and Faculty.
The diagram of the feasibility model consists of the main part called the ‘Research case’ that is variable due to projects. Therefore, while the projects are treated individually, the lab organisation and Agile-based workflow around the ‘Research case’ stays relatively fixed and stable. This confirms how the DSDM-based SDLC approach aims to provide a strong foundation for the entire process.
The research cases, with their methodological, technical, and infrastructural requirements, constitute solid empirical materials that enable ethnographers to follow the process of translation, from research objectives to methods, to technical solutions.
In Figure 1 I present an example of a planned KDL digital project ‘Alice Thornton Digital Edition’ to be conducted with a team based in the Department of History at the University of Edinburgh. The project will produce a digital edition of four autobiographical manuscripts by a seventeenth-century woman writer, Alice Thornton in semi-diplomatic as well as modernised versions. The project requirements were broken down into 23 high-level requirements, using the MoSCoW prioritisation technique, devoted to separate but interconnected objectives, methods, and technical solutions. Figure 1 can be read from top to bottom to see how the methods and technical development are tailored to each objective.
Based on the Feasibility document of this project, I identified three main research questions constituting the Intellectual inquiry layer. The first major objective is to build a digital edition of Alice Thornton’s autobiographical manuscripts. To do so, the team has selected the methodology and technical solutions that involve data annotation and encoding machine-readable texts. This task is specified as a second ‘must’ requirement (called M2), which means that this concrete task guarantees the delivery of the project. Another task (called M3: a ‘must’ requirement) related to this objective is to set up the Content Management System for the research team to enable them to control website copy and documentation materials and edit project contextual information and metadata. Technical development for this objective includes tools and solutions widely adopted by the KDL team, such as Django (an open-source web publishing framework), PostgreSQL (open-source database), and Wagtail (Django-based open-source content management system).
The second research objective posed by the Principal Investigator (PI) of the project is to analyse in what ways the manuscripts are revised over time. In other words, the design and functionality of the digital edition aim to facilitate addressing the scholarly question. The goal of the lab team is therefore to conceptualise methodological and technical solutions that would enable historians to compare manuscripts in a critical and comprehensive way. To this end, the team has defined a ‘must’ requirement (called M7) that involves building a browsing functionality for the digital edition with indexes by people, places, events, references to scriptures, and dates. This way, as one comment in the Feasibility document reads, ‘Search results should show text snippet side by side across manuscripts when the same entity occurs in multiple manuscripts’. The search functionality has therefore been proposed as a viable technical solution that can ensure the exploration of research inquiries.
Other tasks related to this objective are ‘should’ and ‘could’ requirements, which means that they are important and desirable respectively but not vital for the project. The ‘should’ requirement includes the integration of places mentioned in the digital edition with Wikimedia Commons when relevant to display image and the ‘could’ task proposes to plot places mentioned in the digital edition on an ordnance survey map. These are certainly interesting tasks desirable by the PI, but from the team perspective, they are not critical for the successful delivery of the project. For building the search functionality, the team proposes the use of the open-source search engines Apache SOLR or ElasticSearch.
The third objective of the ‘Alice Thornton Digital Edition’ project is to examine how Thornton conceived of the relationship of the manuscripts to each other. The digital edition must therefore facilitate scholars in answering research questions concerning when and why Thornton wrote each of these texts and their relationship considering that different manuscripts refer to the same event in very distinct ways. The design of the digital edition becomes vital for its critical reading and analysis. The lab team has specified a ‘must’ requirement that includes building a ‘responsive public site frontend to the digital reading edition (semi-diplomatic and modernised) of Alice Thornton 4 manuscripts in which each text should be read continuously’. The team has also suggested a ‘could’ requirement that involves a special design of the digital reading edition where each text from the four manuscripts could be read side-by-side. This solution can contribute to facilitating critical analysis, but it can be implemented only if there is enough time and budget.
For this part of the project that requires the conversion of TEI-XML files into publishing format of the edition and associated browsing functionalities, the lab team has proposed to use the Kiln solution, an open-source multi-platform framework developed and maintained by KDL for building and deploying complex websites whose source content is primarily in XML.
The infrastructure layer of the digital edition project includes the following requirements: 10 x GB server space, 1 x Domain name, 4 x GB RAM, and 10 x GB backup. These components must be taken into consideration while planning and budgeting the project as it provides solid fundaments for its safe and sustainable delivery and preservation.
Conclusion
Going through the Feasibility documents, I have gained a deep insight into the lab organisation, workflow processes, and technical solutions. Taking into consideration time, budget, risk, and responsibility, the lab has standardised a core technology stack and most projects rely on open-source frameworks and tools, such as mentioned above Python, Django, Apache SOLR, ElasticSearch, and LeafletJS. Tested software and tools are more reliable than experimental, unexplored platforms that need to be carefully evaluated in terms of technological and security issues. As the team argues, ‘We are open to use other technologies as needed to fulfil project requirements; however, if we need to deviate from the core technical stack …, implications for sustainability should be assessed accordingly’ (KDL, 2021). The key issue is therefore to keep a balance between innovation and sustainability.
What is interesting in the analysis of the documents is to understand not only what the documents record but also why the processes were planned to be carried out that way and noting that they might have diverged since then as well as how the lab decisions were made. These questions constitute the main part of my ‘feasibility analysis’ presented in the forthcoming article.
I believe any document is an artefact that ‘includes substantial references to the social processes through which it was produced and reproduced’ (Shankar et al., 2017: 59); therefore, it can enable a critical empirical analysis of the production of DH outputs, including the investigation of social relationships, task divisions, labour issues, the workplace culture, technical practices, and infrastructural values. The DH can therefore benefit greatly from the use of the STS-based method of reflective discourse on documents.
References:
Agile Business Consortium (2014) The DSDM Agile Project Framework Handbook. Available at: https://www.agilebusiness.org/page/TheDSDMAgileProjectFramework
Agre PE (1997) Toward a critical technical practice: Lessons learned in trying to reform AI. In: Bowker GC, Star SL, Turner W, and Gasser L (eds) Social Science, Technical Systems and Cooperative Work: Beyond the Great Divide. Hillsdale, NJ: Erlbaum, pp. 131-157.
Bergel G, Willcox P and Armstrong G et al. (2020) Sustaining Digital Humanities: Important developments in the UK landscape. Report, Software Sustainability Institute, UK.
Berry DM and Fagerjord A (2017) Digital Humanities: Knowledge and Critique in a Digital Age. Cambridge and Malden: Polity Press.
Crabu S (2014) Give Us a Protocol and We Will Rise a Lab: The Shaping of Infra-Structuring Objects. In: Mongili A and Pellegrino G (eds) Information Infrastructure(s): Boundaries, Ecologies, Multiplicity. Newcastle upon Tyne: Cambridge Scholars Publishing, pp. 121-143.
KDL (2020) F1: Feasibility. In: King’s Digital Lab GitHub repository. Available at: https://github.com/kingsdigitallab/sdlc-for-rse/wiki/F1:-Feasibility
KDL (2021) Frequently Asked Questions. What project partners might want to know about KDL. King’s Digital Lab. Available at: https://kdl.kcl.ac.uk/how-we-work/faq-partners/
Neresini F and Viteritti A (2014) The Laboratory Kit between Infrastructure and Boundary Object. In: Mongili A and Pellegrino G (eds) Information Infrastructure(s): Boundaries, Ecologies, Multiplicity. Newcastle upon Tyne: Cambridge Scholars Publishing, pp. 99-120.
Shankar K, Hakken D and Østerlund C (2017) Rethinking Documents. In: Felt U, Fouché R, Miller CA et al. (eds) The Handbook of Science and Technology Studies. Fourth Edition. Cambridge and London: The MIT Press, pp. 59-85.
Smithies J (2017) The Digital Humanities and the Digital Modern. Basingstoke: Palgrave Macmillan.