The Grammar of the Archaeological Record
2. The Manual

2. Data Input Overview

Giorgio Buccellati – June 2010

Back to top: 2. Data Input Overview

2.1: The Mozan LAN system

LAN stands for Local Area Network. It is the system that links all computers. Two central servers, called Hubble and Socrates, are accessible to the staff.

Hubble is open to all staff members, and serves as a “dirty” repository for all data being produced.

Socrates is a “clean” repository for data in their final shape (though uploading should happen regularly during the season and not only at the end). Access to it is on a restricted basis, typically reserved to stratigraphic or typological unit directors.

Each unit (stratigraphic or typological) has its own computer station. Backups must be made regularly to Hubble and Socrates. The directory structure is identical on unit stations and on the two central servers.

It is imperative that all relevant material be stored on Socrates in its clean form, and this is a prime responsibility of the unit directors. No backup of unit computers will be made, and so the only data that will serve as a key reference are those on Socrates.

Copies of both Socrates and Hubble will be made at the end of the season and will be brought back to the US and Europe, but only Socrates will be deemed to be the proper and final version of all the data processed during the season.

Back to top: 2. Data Input Overview

2.2: Folders and Files

The organization and the format of folders and, within them, of files is strictly regulated, and the rules are spelled out in chapter 3 (“Folders and Files”).

Some files are required upstream of the programs. They are called “definition files.” Some have to be entered manually, and others are program generated. They are also treated in chapter 3 (“Folders and Files”).

Back to top: 2. Data Input Overview

2.3: The three types of input

Input into the digital archive occurs in one of three ways:

  1. Data may be entered manually (primary input), in purely ASCII format.
  2. Data may be generated by programs (secondary input). This results in ASCII files that are identical in format to those entered manually.
  3. Graphic files are produced manually, or else by cameras and scanners, and are stored in one of the following formats: JPEG, TIFF or WMF. Additionally, graphic files may also be generated by AutoCAD: the output (in DWG format) must also be converted to JPEG, TIFF or WMF format.

Back to top: 2. Data Input Overview

2.4: Protocols and procedures

A number of protocols and procedures specify the details of the operations to be followed outside of the data entry routines (see e.g. chapters 7 and 14).

Back to top: 2. Data Input Overview

2.5: “Dirty tree” (Hubble) and “clean tree” (Socrates)

It is absolutely important to develop a clear sense of the difference between the local drives, Hubble as the dirty tree, and Socrates as the clean tree where everything final must go. By the end of the season, Socrates must also include temporary files that have not been processed, but which are indispensable.

To this end, we need to maintain a clear organization of the folders in each book, so as to have a clearer idea of what the various files are. This is the primary responsibility of the “Number Ones.”

  1. First, there are the canonical folders which are, in alphabetical order, D, G, I, IMAGES, O, SUMMARY, TEXTS, UGR. As you know, D, O and SUMMARY are generated automatically by the programs (as well as E within I).
  2. Second, a single folder, labeled non-canonical, includes folders and files that are important but do not belong with the canonical subfolders (e.g., sketches, scans of manual input sheets that have not been entered, etc.)
  3. Third, a single folder, labeled extra, includes folders and files that are kept or possible reference. Three possible subfolders include:
    1. TEMP – temporary canonical files and folders on which one is currently working; these must be deleted by the end of the season.
    2. BALLAST – files on which one is not working, but that might be useful and one prefers not to delete.
    3. INACTIVE – files on which one is not working, and that one assumes have lost their current utility; in effect, this is like a RECYCLE BIN, and should be deleted by the end of the season.
    4. ACCRETIONS – for older books, this contains material stored in different venues, which must be sifted through before inclusion or discard.)

Hubble will contain everything. Socrates will only contain the canonical and the non-canonical folders.

Back to top: 2. Data Input Overview