New finds in old letters

They were discovered by accident in London archives:thousands of letters written in the 17th and 18th centuries between sailors and those who stayed at home. After a linguistic pre-processing, we can learn a lot from this about the history of our language.

Ijck wyns my dear man so bright good night as his stars are in the sky (I wish my dear husband as many good nights as there are stars in the sky)

This is a line from a letter to a sailor, sent long ago by his beloved home. The letter is part of a collection of 38,000 personal and business letters and documents from Dutch people at sea or in faraway places on their home front, and vice versa. The letters were captured during privateering in the 17th and 18th centuries and have been stored in English archives for centuries. They were rediscovered in 1980 and some of them have become available in the Netherlands for research by historians and linguists.

Language of ordinary people

Most of the historical texts we know come from literate people who deliberately wrote according to the written standard. That is not the case with these letters. The language of these letter writers shows that many of them were not very used to writing. It is true that there are often conventional formulas at the beginning and end, such as:I hope you are doing well too. If not, I would be deeply saddened, but there are also countless spellings, words and sentence constructions that are closely related to the oral language of the time. And herein lies precisely the value of the letters:what we find here is the unpolished language of ordinary people. Until now, this was not available in such large quantities.

Language variation research

In Leiden, in the Letters as booty project, a subset (a corpus ) of about 1000 letters examined. The Leiden linguists mainly focus on language differences related to social classes. They also look at language change between the 17th-century and the 18th-century parts of the corpus. The researchers also want to make the corpus available for further research; after all, there are still all kinds of questions about spelling, dialect forms and sentence structure, for example. Computers play an important role in this. How do we make such a diverse text suitable for linguistic research? The first step is to accurately type (transcribe) the handwritten text. But that alone is not enough. Additional processing is required to quickly and completely retrieve all kinds of words and sentence patterns. The corpus is linguistically enriched at the Institute for Dutch Lexicology (INL) in Leiden. This means that all kinds of linguistic information is added to the transcribed words, such as about spelling and part of speech.

Over 100 spellings

Because there were no official spelling agreements at the time of the letters (as now in the Green Book), the same word could be written in all kinds of ways. The word captain for example, occurs in more than a hundred spellings, including as captijen, caps and katyn. Another example is bootsman, that also as bosman and angry man was spelled, and is thus difficult for us to recognize. In the linguistic editing, the modern spellings captain and bootsman associated with these old spellings. That modern standard form is called lemma, and the process is lemmatizing. This way we can collect all old spelling variants in one go.

In the letters we also find loosely written word parts, in writing for example, instead of caption. When lemmatizing these two separate words together, the lemma opschrift attached. This is how it differs from the same words in put in writing, in which each word has its own lemma. Adding such a modern lemma also makes the text more comprehensible.

Pete uncle or uncle Piet

In addition to a modern lemma, a word type designation is also added to the old words. It is therefore noted whether the word is a noun, adverb, verb, etcetera. Names are given a special code. This allows us to investigate developments in the function of words, and to check developments in word orders. An example:many family relationships are mentioned in the letter texts. Many letter writers say hello to (or from) their siblings, cousins, aunts and uncles. The first names often come before the relation designation:Piet uncle, John cousin or Maaike moei (trouble =aunt). But the reverse also occurs:nicht Geesje or Uncle Dirk. By searching for uncle/cousin/tired preceded by or followed by a proper name, we can quickly see whether one order occurs more often than the other. We can then also determine whether this is tied to time, place or social class.

Extension of vocabulary

In fact, paper dictionaries such as the WNT should be supplemented with these missing words. But that is far too expensive. Adding new words to a digital database is a much better (and cheaper) solution. Missing words like bolkop and grumpy are therefore included in the central vocabulary of the INL. Every Dutch word, from very old (6th century) to recent, is stored in that database. As a label, they get the modern form, spelled according to the official rules. This way everyone has access to this historical vocabulary, even if you have no prior knowledge of historical word forms. In the long run it is also possible to connect each word with other occurrences of it (e.g. mother and nut ), and with information in dictionaries and encyclopedias. In this way the letter writers of that time contribute centuries later to the description of the vocabulary of Dutch.