ARH Writing

Master of Rare Books and Digital Humanities

Université Bourgogne-Franche-Comté

Rare Book and Digital Humanities

M2 (2022–2024)

18 October 2023

What is the Text Encoding Initiative? With Lou Burnard

Amanda Hemmons

Professor Burnard noticed, among other people, in the 80s that computers and computer books were all different, with different formats. This led to a desire for some kind of standardized language and ultimately the beginning of TEI.

Now, what is the TEI? It’s a combination of the following things: the specific institution of the Text Encoding Initiative, a shorthand to reference a social construct, a technical framework, a way of identifying ways of thinking about text, as well as a group of people with a shared understanding.

TEI began in 1987. At this time, serious computing was done on large machines called “mainframes.” Text processing was already a recognized field. There was no world wide web, but academic computers were linked on a limited internet. Formatting would become outdated quickly and files deteriorating were a problem that made it difficult to share results of academic work. Technical challenges already evident: data preservation and data compatibility. Burnard notes that this is an ongoing challenge today; for example, ebook files not being compatible across devices.

In the spring of 1987, there were European workshops on standardization of historical data, featuring major figures in the field, J.P. Genet and M. Thaller. This led to an autumn workshop later that year on an exploratory international workshop on the feasibility of defining “text encoding guidelines,” funded by the NEH. The workshop brought together a lot of people who had nothing in common except that they all wanted to use computer resources. The participants were generally academic outliers, as computer usage was not really mainstream at this point. They also managed to pull together the Poughkeepsie Principles, which are basically guidelines for the guidelines and still relevant today.

The TEI mission, at its inception, was to facilitate the creation, exchange, and integration of textual data in digital form. The scope is broad but that is still the case today. Its recommendations are intended both for beginners, seeking well-established solutions to well-understood problems, and experts, seeking to create new solutions. Its original design goal was to provide recommendations derived from the existing consensus, where this could be determined. It prefers general solutions to discipline-specific ones, but supports both specialization and extension. Definitely not designed to provide a complete answer out of the box.

After the conferences began the real development of TEI. 1988 to 1992 was a period of transition during which dialogue between computer scientists and humanities was just beginning. TEI P1 was developed during this time and work on a second iteration, P2 , was in process. Then from 1993-1994 was the integration and completion of P2 as TEI P3. After 1995, TEI continued to spread and be promoted by word of mouth; there was no funding, yet usership grew.

It became clear that it would be important to have some kind of governing body to actually take responsibility for TEI, leading to the establishment of the TEI Consortium, incorporated 30 Dec in 2000. From 2001-2003, TEI P3 was converted to XML and became P4, then the Council of the TEI Consortium oversaw production of a complete revision as TEI P5. To this day, the version of TEI being used is still P5, but features regular corrections and modifications. Anyone can be a member of the consortium and they vote for the council and executive board; the council then provides those twice yearly updates (based on proposals from special interest groups).

Burnard went on to discuss the major committees involved in creating the components of TEI, and mentioned Antonio Zampoli and Don Walker as being important figures, now deceased. The first committee, dedicated to the TEI header, determined the primary source of information for an entry. The second, Metadata, was originally designed using SGML because that was the only syntax available at the time. Even then they knew they wanted to change it if something better came along, which is why TEI’s formal structure is independent of its syntax. There was another committee dedicated to text representation, who ultimately realized it was impossible for there to be one single, simple answer. The analysis and interpretation committee hoped to identify and represent every kind of linguistic and literary analysis. However, this is an insanely huge scope, an impossible task.

Where did the money for all these committees come from? There was a lot of money in Europe at the time dedicated to the concept of language engineering. “Language engineering” means to treat language as an engineering problem, i.e. the process of defining dictionaries and language understanding systems, grammars, etc, so they can be used in a computer environment.

So why is TEI still here? Burnard reports that his answer is borrowed from researcher Henry Thompson. There are two major reasons why text standards fail: that they are based on an immature theory, or the user community is fragmented or diverse. In contrast, TEI doesn’t fail. It has a lot of features and being XML schema, it’s hospitable to other namespaces, can interoperate with other standards (SVG, MathML, MEI, etc) and you have the option to define an element with any kind of specific equivalency. In short, TEI has the flexibility needed to serve all sorts of goals. That’s why it still exists after 36 years, when other languages died out after a handful.

Source

L. Burnard. [DELab UW]. (2015, May 21). HC: What is the Text Encoding Initiative? [Video]. YouTube. https://youtu.be/Xu6Z1SoEZcc