History, a documentation case study

This case study presents a best practices example for organising and describing data in the discipline of Finnish history. The example is published with the kind permission of the creator of the material and documentation, PhD Miia Kuha (ORCID 0000-0002-1838-8272). The example texts are marked in the form “<–>”.

The documentation consists of three main elements, a directory tree in a file directory, a content description and explanation of data units (here incorporated in the 'Documentation' file), and a description, called 'Documentation', of the context, availability, and stucture of the data and the data directory. 'Documentation' file is updated regularly during data analysis to ensure its integrity comprehensiveness. The directory tree shows the 'Documentation file' placed in the directory. Separate concise README file in Finnish and in English are moreover added in the directory root to contain a short table of contents of the overall data directory. 

1. Directory tree (in Finnish, an English version under update (9/2024):

Kaaviokuva tutkimusaineistotiedostojen organisoimisesta kansiorakenteeseen

2. Content description of data units

See the desctiptions and explanations included in the 'Documentation' file below: 
 

3. Documentation

THIS DOCUMENTATION.dox FILE WAS CREATED ON XXXX-XX-XX, CREATED BY: [Name]
<Example texts in brackets>

AVAILABILITY INFORMATION

1. Usage license and terms, possible usage restrictions:
2. Links to publications referring to the data:
3. If part of the data has been published open access online, links to the data:
4. How to cite the data:

DATA DESCRIPTION

5. GENERAL DESCRIPTION

<EXAMPLE: This dataset contains information about clergymen’s wives and widows in the diocese of Vyborg between 1650 and 1710. The dataset consists of a prosopographical database “Clergymen’s Wives 1650–1710”. The database consists of .xlxs spreadsheets containing information about the wives of all pastors and chaplains within the Vyborg diocese during the period. Biographical information about each studied person, i.e., clergyman’s wife, has been collected in individual separate folders, if source material exists of the person. These folders further contain:
- a text file containing biographical information about the person (in .docx format) and
- transcriptions of original sources, mainly court records and funeral sermons (in .docx format).>

6. DATA DIRECTORY DESCRIPTION

<EXAMPLE: Data is located in a folder titled XX in the personal Nextcoud directory of [data author]. Address: xxxx.
In the main directory, there are individual master folders for the data, administrational documents, and publications issued in the project. This DOCUMENTATION file is stored in the root of the main directory. The structure is as follows:
• Data
• Administration
• Publications
• DOCUMENTATION.

The Data folder contains the following subfolders:
• Tables_ClergymensWives
• Database_ClergymensWives

7. DESCRIPTION OF SUBFOLDERS AND FILES

SUBFOLDERS AND FILES: <A list of all the folders OR, if the directory is small, of files + a short description of each, EXAMPLE:

Tables_ClergymensWives folder:

7.1 Base data:

ClergymensWives_all.xlsx

CONTENTS: The table contains the following information about 363 individuals: name, name and status of husband, information about becoming a widow, places of living, lifespan or estimate, “Other information”.

METHOD DESCRIPTION: Criteria for selection in table are a certain knowledge of at least first name and patronym or surname, based on previous research, and 2) a husband serving as a pastor or chaplain in Vyborg diocese between 1650 and 1710, and married at least part of the time within this period. In my earlier project I have compiled a database:

ʲٴǰԳղǰپdz1650–1710.

which serves as a basis for creating this database.

Pastors were picked from the database in alphabetical order, and their spouses were added in this table in alphabetical order. If no information about the spouse was available, I excluded the person from the table. I also excluded the spouses married to the pastors only after 1710.>

7.2 Filtered data tables:

ClergymensWives_concise.xlsx

CONTENTS: There are information about 171 individuals in the file.
METHOD DESCRIPTION: A criterion for filtering was whether original sources were available about the individual.

ClergymensWives_concise_PastorsWives.xlsx

CONTENTS: This table contains information about 128 individuals who have been married to a pastor at least one time. Wives of chaplains have been excluded. >

SOURCES: <Add source listing>

8. Database_ClergymensWives folder

CONTENTS: In this folder, a separate subfolder has been created for each clergyman’s wife about whom biographical information is available. These individuals are listed in the ClergymensWives_concise.xlsx file.

NAMING CONVENTION: The folders are named according to the person whose information they contain. The folder of each clergyman’s wife contains a file called
Name_biography.docx.

VARIABLES: Variables are organized in the following matrix:
name,
spouse,
career of spouse,
places of living,
parents,
children,
other relatives,
phase of investigation in this research,
sources.

The file contains all surviving biographical information about the individual clergyman’s wife. In addition, the file contains in separate files a transcription of each remaining source and associated notes. Each document has thus its own separate file.

METHOD DESCRIPTION: Hand-written documents have been transcribed verbatim maintaining the original writing form. Printed funeral sermons have been transcribed whenever necessary. The main section in the sermons to be selected for transcription is the biography of the deceased person at the end of the sermon. To interpret the text, Svenska Akademiens Ordbok (SAOB) database was utilized. Transcription files contain notes about the content and transcription whenever they have been needed to ensure the understandability of the data.

SOURCES: <See the Sources section of each subfolder.>