Mozan Sitewide (Version 2, Beta release)

THE SYSTEM “CYBERNETICA MESOPOTAMICA”

Giorgio Buccellati – January 2024

1. Presuppositions
2. Background
3. Beginnings
4. Cybernetica Mesopotamica: the first phase
5. The archaeological component
6. The Ebla corpus
7. Cybernetica Mesopotamica: the current phase
8. Scope
9. Integration of textual and artifactual data
10. Categorization as primary structural analysis of the data
11. Distributional analysis
12. The concept of electronic publishing
13. System programs and commercial programs
14. Extensive and intensive dissemination of data and programs
15. Portability of computer systems
16- Current objectives – data
17. Graphemic categorization
18. Morphemic categorization

NOTE: cf. also the dedicated website Cybernetica Mesopotamica.

1. Presuppositions

Ancient Mesopotamia is a “dead” civilization in the sense that there are no living carriers of its traditions. It is, however, alive to the extent that we can gain an insight into those inner interrelationships which held it together in the past and (conceptually) still hold it so in the present, as a systemic whole. The major difficulties we have in trying to apprehend such interrelationships are two. On the one hand, we do lack the total universe of manifestations to which the civilization gave rise. On the other hand, we tend, methodologically, to focus more on specific phenomena (however numerous) than on the structure that held, and holds, them together. To meet the latter difficulty, we need a type of distributional analysis that brings out patterns of regularities to which one may attribute meaning. To meet the former, we need diversified access to as comprehensive a body of data as possible.

In both cases, electronic data processing has begun to provide radically new possibilities. The body of data can grow almost indefinitely, without the concomitant problems which would result when using traditional means of study: for instance, the growth in size remains under control no matter how great the quantity of the data, because the concept of data structure is an indispensable dimension of any electronic data base; utilization of the data by other scholars is incomparably more dynamic because analytical programs can be brought to bear directly on the data in electronic form; or again, dissemination of the data (“publication” in the sense of making public) is no longer affected by economic considerations, because the cost of distributing electronic media is only nominal.

It is for reasons such as these that electronic data processing is to be considered, rather than as a simple technique, as a real conceptual revolution which affects, deeply, the very basis of our methods. In other words, the computer is not just a tool which makes traditional scholarship easier ö though that it is for certain, particularly in its word processing applications. The computer is also and especially a powerful inducement to change the very categorization with which we conceptualize our perception of the data. In this respect, it may be compared to the introduction of writing or to the introduction of the printing press. With all three phenomena (writing, printing, computers) we witness the development of a dual trend. One, which we may call centrifugal, expands in a capillary fashion our intellectual control over the outer limits of specialized knowledge. The other trend, which we may call centripetal, seeks the common nodes for the different branches of knowledge and underscores their fundamental unity. (Along these lines one may note how one of the significant conceptual transformations made possible, and induced, by the introduction of the printing press was precisely dual in nature in much the same way as was just mentioned. For we notice at that point in time the introduction of both the scholarly journals on the one hand, as vehicles for the dissemination of specialized knowledge, and on the other hand of the Encyclopaedia, as a vehicle for the summation of the same knowledge viewed in its unity.) Electronic data processing meets these two recurrent concerns in ways which vastly overtake earlier conceptualizations (such as those made possible by writing or printing), at the very moment that it builds on them. The retrieval of the most analytical particle of information is possible along search paths which are defined by the most synthetic of structures. But what is most remarkable about such process of retrieval are two interrelated aspects of data processing. First, such retrieval is practically instantaneous; second, correlations among elements are built into the very process of retrieval. Thus the posing of certain hypotheses as pre-set correlations among categories, and their testing against actual data (almost regardless of the size of the corpus), can now be assumed as a precondition of scholarly discourse, rather than being its initial goal. Wholly new research strategies must emerge.

Such profound innovation does not entail, however, a total solution of continuity with the past. What I have just mentioned about the introduction of writing and of printing as intellectual antecedents of the introduction of electronic data processing seems significant in this respect. It is not only that we share with remote intellectual ancestors the basic interests in certain cultural dimensions of the past; we also share with them some fundamental scholarly presuppositions which condition the very nature of scholarly method. Categorization of the data, their retrieval along defined search paths, their correlation in view of discovering recurrence and uniqueness ö these are the common concerns scholars have shared across millennia in their effort to attribute meaning to data. Different information techniques (from writing on clay to computers) have been eagerly pressed into service to this end ö and, to the extent that they were revolutionary as techniques, they altered the implementation of the research process itself. The specific types of alteration vary with each technique; but the order of magnitude of the alteration as such remains fundamentally the same ö whether we compare literate to pre-literate, or electronic to pre-electronic, society. To explain this in simple terms one may use an analogy, and think of the difference between using a microscope to look closely at a specimen and using instead dozens of human viewers. By simply cumulating the visual powers of different individuals one cannot obtain the kind of resolution that a microscope can provide. In other words, the microscope does not perform better or more easily tasks that could otherwise be done just as well by human agents; rather, it does something altogether different. Thus it is with the computer, through which we are able, to pursue our metaphor, to put “time under a microscope.” The resolution with which we can categorize, retrieve and correlate data is on of an altogether different order of magnitude, even though the tasks themselves (of categorization, retrieval and correlation) are the same that were performed by scholars before the advent of computers.

These initial and general remarks have as a goal to indicate that with the introduction of the system Cybernetica Mesopotamica I wish not only to provide useful tools of research along well established lines, but also to address broader issues of method. To that extent the system as a whole is to be considered an experiment in method ö not an experiment in data processing, but an experiment in Assyriology. While the data and programs as individual components follow common standards of electronic data processing, some aspects of the conceptualization, the details of the categorization, and especially the overall design of the system conceived as a structural whole are experimental in nature ö in line with the presuppositions I have just briefly stated. Since this is the first volume to appear in the new publication phase of the system (see further below under 1.2-7), it seems like a good opportunity to present here a description of the system as a whole. The specific experimental aspects of the system will be explained below in sections 3 to 5, which deal with the scope and the goals of the system Cybernetica Mesopotamica. Before that, it may be useful to review some of the stages through which the project had progressed over the years.

2. Background

The system Cybernetica Mesopotamica is the result of a long standing research project which has gone through many diverse transformations over the years. Even though the growth of this project may well be of limited interest except for the parties which came to be involved with it over the years, I would like to chronicle here its development for two reasons. (I presented various overviews of the project at different scholarly meetings, beginning with the Rencontre in Liege of… . For a brief recent summary see also Buccellati forthc.87a.) The first is to remember the individuals who have contributed in so many different ways to the development of the research; rather a mere alphabetical list, a reasoned chronological description of the project will more properly highlight their contributions as they affected the nature of the research and made it possible in the first place. A second reason is to provide a chronological framework for the project which might help explain some particular dimensions of its current scope and goals; fortunately, in fact, the long developmental period was not, for all its frustrations, a sterile exercise, but rather a long gestation whose fruits are beginning to appear now. I will divide the history of the project into seven phases.

3. Beginnings

I started applying data processing techniques to Mesopotamian materials back in 1968, at a time when electronic applications in the humanities were still extremely few, the availability of machines quite limited, and actual access to both hard- and soft-ware very cumbersome. I began to work on the treatment of selected Old Babylonian texts, especially from a graphemic and a morphological point of view. I was assisted for the philological part by John L. Hayes (then a student working under my supervision) and for the electronic part by Arthur Sorkin (then a student in Computer Sciences at UCLA). These early efforts were entirely supported by the Research Committee of the Academic Senate at UCLA.

1.2. Linguistic analysis of Old Babylonian

By 1971 we had progressed to a point where we felt ready for a major grant application to the National Endowment for the Humanities, a grant which we were in fact awarded. Under the terms of the grant, which lasted from 1972 to 1977, we were to produce a computerized data base of Old Babylonian letters and a grammar of Old Babylonian built on those data. It was named Old Babylonian Linguistic Analysis Project, and was known under the acronym OBLAP. (See the early description of this project given in Buccellati 1977.) I. J. Gelb served as a very active consultant on this project, and helped immeasurably with matters of philological and linguistic interpretation. Hayes remained my major assistant for the philological side, together with a number of other students, of whom I will remember especially Douglas Cargille, Michael Desrochers, Stanley Edwards, Paul Gaebelein, Matthew L. Jaffe, Yoshitaka Kobayashi, Richard D. Patterson, William R. Shelby. The following UCLA dissertations (For full bibliographical references to these and other UCLA dissertations cited in this section see Chapter 5 below.) emerged eventually from work on the project: Desrochers on the Old Babylonian texts from Dilbat (19…), Gaebelein on graphemic analysis of Mari letters (1977), Jaffe on the Old Babylonian letter form (19…), Kobayashi on graphemic analysis of Old Babylonian letters from the South (1975), Patterson on the syntax of the Old Babylonian letters (19…). The electronic component of the project was under the extremely competent and devoted supervision of John L. Settles, who was working then as a professional programmer and began at the same time graduate work in Akkadian at UCLA. Our data base was completed as planned during the five years of tenure of the grant. It consisted of a full graphemic rendering, and an extensive morphological coding, for all the Old Babylonian letters published until that time. The graphemic encoding manual discussed below, as well as the morphological code currently used by our project for onomastic analysis, derive directly from the coding system established during that phase of our work.

A problem which emerged at the end of the project was that of the distribution of the materials elaborated with regard to the first goal of the grant, i. e. the data base itself. We had made substantial efforts in preparing for typeset quality printing, including a complete set of cuneiform signs which was produced by Yoshitaka Kobayashi with a technology made available by Sal Fallone. (FN1A description of the system will be found in Buccellati 1977, p. …) But the sheer size of the results was forbidding. Printouts of the basic sign concordance ran upwards of 10,000 pages; morphological concordances which we had completed for sections of the data base were even larger: the sheer physical volume was such that any type of standard publication was out of the question. Thus I kept making available tapes and paper printouts to interested colleagues, and portions of the printouts upon request, and gave much thought to ways of compressing the data through editorial compacting of one type or another. This resulted eventually in the first version of Cybernetica Mesopotamica, which began in 198… (about which more will be found below).

The second goal of the grant, i. e. the publication of a grammar of Old Babylonian, was delayed for a number of personal reasons. While a complete early version of the grammar, which resulted from work on the grant, was distributed freely to several colleagues, and was used regularly over the years in my classes at UCLA, the final version has been completed only recently and will go to press in the immediate future.

1.2. Expansion to other philological areas

While considering alternative ways of distributing our materials, we were expanding on both the buildup of the textual data base and the types of application. Major additional bodies of texts came to be added, both by our Los Angeles based staff and by colleagues elsewhere. The texts elaborated in Los Angeles include the El-Amarna corpus (by Hayes) and the Akkadian corpus of Ugarit (by Thomas Finley); these were supplemented later by the texts from Alalakh, edited by Dr. Guy Bunnens of Bruxelles, who came as a Visiting Scholar to Los Angeles. Both Hayes and Finley eventually completed their dissertations on linguistic aspects of their respective two corpora (in 19… and 19… respectively). At the same time, we began a fruitful collaboration with two colleagues, Claudio Saporetti in Rome and Olivier Rouault in Paris. Saporetti began to work (as far back as 1975) on the Assyrian corpus (of which four volumes and one disk came to be published eventually within Cybernetica Mesopotamica (GC 1; DSC 1; DSC 2; DSC 3.). Rouault began to work on a group of texts from Mari, in addition to the letters which had already been elaborated by Gaebelein at UCLA. Since the scope of the project had expanded beyond the limits of the Old Babylonian period, its name was changed to Computer Aided Analysis of Cuneiform Texts, known under the acronym CACTUS.
While adding new corpora to our data base, we were also branching out into other directions within the philological domain. As an aid to the study of the texts, we prepared an electronic version of portions of the Akkadian dictionary; primarily under the supervision of Robert Keller, with the assistance of Michael Roquemore, we entered the full lemmata and the text references (i.e. the bibliographical references only, without the actual text passages) from the first seven volumes of the CAD.
Under the supervision of Patricia Oliansky, we began to work on a historical analysis of the texts. Conceived as a sort of computerized topical index, the Historical Categorization of Cuneiform Texts (HC) was meant to supplement the (Akkadian) word indices by referring to concepts and facts which are given to in the texts not so much lexically as contextually. Whether present as explicit categories or subsumed implicitly by the situation described in the texts, the concepts included in HC are understood as operative categories of the social milieu from which the texts stem, and are articulated accordingly in hierarchical fashion, following our structural understanding of the society which gave rise to the texts in the first place. The first corpus to be analyzed was that (relatively short in size) of the letters emanating from the royal chancery of Babylon during the first dynasty; having gone through a long gestation period, this project will come to fruition shortly with the publication of the first disk and its accompanying User’s Manual in the near future.

4. Cybernetica Mesopotamica: the first phase

At the same time that we were proceding with our philological work on the texts, from the different angles just described, I was still trying to cope with the problem of the distribution of the data ö other, that is, than through tapes and printouts made available on ad hoc basis to interested colleagues. Two external factors contributed to shape this phase of the project. The first was the introduction of minicomputers, which we began to use next to the main frames. In 1979 (??) we began a close collaboration with David W. Packard, who was involved in designing a highly innovative combination of hard- and software products specifically geared to the humanities (this was the forerunner of what eventually became the Ibycus microcomputer system). My involvement was partly through Undena Publications, a scholarly publishing house which I had started in 1976, and which had as one of its goals the specific issue of the distribution of the large data bases resulting from our computer work. For Undena, Ibycus began to provide very sophisticated (if, at the time, still partly experimental) typesetting services. Typesetting functions were an ideal component of any attempt to deal with the distribution problems we were facing, particularly with a computer environment such as Ibycus was offering, whereby we could transfer data from the main frame, use optical scanners (in their early versions), and produce highly professional typeset results.

A second, concomitant factor was the need to achieve better technical coordination for the wide variety of projects which had come to be developed on the main-frame computers (at UCLA and Pisa), both among themselves and with the minicomputer at Ibycus. This task was very ably discharged by David A. Holzgang, who joined the project at that time in place of Settles. On the one hand he provided the necessary coordination among the various hardware systems being used, and on the other he streamlined our internal operations, particularly with regard to file handling and maintenance. He was assisted by Jaffe and Judith R. Paul.

It was under such premises that I began at that time to plan within Undena what I called a “Collection” of different book series, to which I gave the name Cybernetica Mesopotamica. The prospectus of this “Collection” was given with the first volume published (which appeared in 19..), and is reproduced here as Fig. 1. While it may have seemed as over-ambitious, especially since only four volumes have in fact been published, it was by no means an exercise in wishful thinking. For each of the series listed there, we had already at the time substantial working copies of preliminary manuscripts which served well our in-house purposes. Also important, in my view, was the attempt to make explicit the overall scope of the project, in its conceptual ramifications. There was still no notion at the time that we might be able to use magnetic media for anything more than making available specific tapes on ad hoc basis.

5. The archaeological component

While in the beginning our efforts in terms of electronic data processing were limited to philological data, I began to plan for an archaeological component ever since the strat, in 1976, of the Joint Expedition to Terqa, of which I served as Director with Marilyn Kelly-Buccellati as Co-Director. From the beginning, two very distinct avenues of research were pursued ö one dealing with stratigraphy, the other with typology.

For stratigraphic analysis, it was felt indispensable to be able to implement electronic data processing directly in the field. Accordingly, in the early years of excavation, when there was no thought yet of microcomputers at the excavation site in Syria, we prepared conceptually for a computer oriented coding system of the stratigraphic data, though we explicitly indicated that this was as yet a “non-digital” implementation. (Buccellati and Kelly-Buccellati 1978.) In 1981 we were the first to bring a microcomputer for archaeological field work in Syria: even though it was a very bulky CP/M type computer, with a minimum of memory and disk storage, it demonstrated the feasibility of our approach and it provided a strong incentive for continuing in spite of the immense logistic difficulties. The conceptual concerns for an integrated categorization of the entire stratigraphic process were translated into an all-comprehensive “grammar of space,” and into a “global record” as the correlative data base. A preliminary article on a graphic component of the system waspublished in 1983, (Buccellati and Rouault 1983) and in the same year I gave a first public demonstration of the system in Terqa on the occasion of the International Conference on Der ez-Zor, held in that city by the Directorate General of Antiquities and Museums of Syria. We continued with the application of our methods both in Terqa and at the other sites (Mozan, Qraya, Ziyada) being excavated under the sponsorship of IIMAS ö The International Institute for Mesopotamian Area Studies. The first disk publication of the global record from Mozan is currently in preparation, as well as a User’s Manual which will describe in detail this component of the system.
As for the typological record, M. Kelly-Buccellati started working at the same time on a categorization of the cylinder seals of the Old Babylonian period, for which she has prepared a comprehensive formal and iconographic grammar. The conceptual organization of the material goes back to the years when the Terqa excavations were getting under way, and was described in a number of publications (Kelly-Buccellati 1977; 1979. The substance of the latter article was first presented in a public lecture given on the occasion of the International Symposium on Ugarit, held by the Directorate general of Antiquities and Museums of Syria in Lattakia in …) and various public lectures. This project will come to fruition shortly with the publication of a disk edition of the data, accompanied by a correlative User’s Manual.

Several UCLA students have been working on various aspects of archaeological data processing for their gradudate research. Loyola Seymour completed her Master thesis on a typological categorization of figurines (19…); Daniela Buia is completing her dissertation on a categorization of Terqa pottery, which builds on the earlier system established for the same corpus by M. Kelly-Buccellati; Stephen Reimer is working on the global record of the excavations at Qraya; and Stephen Hughey is working on practical and theoretical aspects of spatial controls in stratigraphic analysis. In addition, Mark Chavalas, who completed his dissertation on domestic architecture at Terqa, is working on the publication of the global record of the house of Puzurum at Terqa.

6. The Ebla corpus

Following the establishment (in 1977) of the International Committee for the Publication of the Texts of Ebla, to which I was appointed a member, I assumed the task of preparing an electronic edition of the texts of the royal archives, and to work on an onomastic repertory of the texts. The graphemic complexities of the Ebla writing system required special adaptations. Also, the rapid pace at which the texts were being published and the growing understanding of the graphemic system necessitated that special attention be paid to harmonizations among different readings as published in the various volumes. This occupied us from the beginning, and it is discussed at length in the chapters below. Hayes continued to work closely on this apsect of the project, even after he moved to a teaching post at the University of California, Berkeley. James H. Platt took on a major role in coordinating the activities and individuals working on the project, with the close collaboration of Joseph M. Pagan and the assistance of Mark A. Arrington. They have also begun to work on their dissertations, which concentrate on various aspects of the project ö Platt dealing with Ebla graphemics, Pagan with linguistic aspects of Ebla onomastics, and Arrington with semiotic aspects of onomastics (including name-giving) in the third millennium.

The work on the Ebla corpus provided special impetus for the development of the system as a whole, because of the special substantive interest of the corpus and the unique set of graphemic problems which it presented. Also, the range of outside collaborators broadened considerably, and this contributed not only to widen the substantive coverage of the data, but also to fine tune our conceptual approach to categorization. Alfonso Archi, as the chief epigraphist of the Ebla Expedition, strengthened immeasurably the philological merit of our electornic edition by providing a number of new readings derived both from a fresh understanding of the published texts and from systematic collations on the original tablets. Lucio Milano came to Los Angeles for an extended research period and then to teach regular classes at UCLA on the texts of Ebla: he participated actively in the formalization of both our graphemic conventions and our overall understanding of Ebla philology; he also ensured a smooth coordination of our electronic hardware systems between Los Angeles and Rome. Finally, we were fortunate to enlist the assistance of Pelio Fronzaroli for the analysis of the onomastic data.

7. Cybernetica Mesopotamica: the current phase

With the advent and the widespread use of microcomputers, the overall scope of the system changed drastically. We could now plan on a systematic distribution of the data on magnetic media, which would not only be more cost-effective, but would also allow for a true interactive utilization of the data. Accordingly, I revised the very concept of the system, while retaining the name Cybernetica Mesopotamica as originally proposed in 19…

The main difference between the original and the revised conceptions may best be gauged by comparing the 19.. prospectus (reproduced here as Fig. 1), and the current prospectus, which is found below in Section 1.6. The 19.. prospectus envisaged various series of paper editions which offered primarily the printed outputs of given computer operations; such outputs were, by necessity, both limited in scope and frozen in their paper embodiment, since the data from a given corpus could not be interfiled or cross-indexed with those of other corpora without a wholly new paper edition. The current prospectus, on the other hand, envisages primarily a collection of disks which offer full data bases in their catgorized version, and attendant programs which allow fresh, interactive utilization of any combination of corpora as desired; it also provides for various other types of supporting material.

The 19.. “collection” of book series was subdivided into three subheadings, which are carried over conceptually, though not mechanically, in the current `collection” of disk series. (1) The 19.. subheading Mechanisms referred to the description of codes and systems, method and theory, which are now given in the series of Manuals (printed volumes, accompanied by a disk version in the Electronic series). (2) The 19.. subheading Data is carried over in the series of Texts and Artifacts, but with one major difference: whereas the 19.. prospectus envisaged publication of the data as such in traditional format, and only in cases where the texts where not otherwise available in a modern edition, the current prospectus foresees a categorized disk edition of every body of data included in the system, under the major headings of Texts and Artifacts; printed Manuals will accompany only selected disks, generally those which inaugurate a series (as is the case with the current Manual for the graphemic version); traditional text editions will still be published in book format (under the subheadings DSC and DSA) for data which have not otherwise been published in that format. (3) The outputs which in the 19.. prospectus were envisaged under the heading Results (The further distinction between “Categorization” and “Analysis” was the same which is explained below under 1.3.2 and 1.3.3: categorization was to provide a primary structural interpretation of the data, and analysis instead a highly differentiated breakdown according to multiple sets of variables. are now available in a format which is incomparably more extensive and more flexible, namely as outputs obtained interactively through the use of Programs. (4) Additional series which will be available on disk, and were not envisaged in the earlier prospectus, deal with subsidiary materials which are not central to the system as such, but can be useful either for the study of Mesopotamian civilization (Bibliography; Reference) or for the implementation of data processing (Utilities). The comparison between the two prospectus’es is tabulated in Fig. 3.

Of the titles originally envisaged within the 19.. prospectus, three have been published within the series DSC, and one volume within the series GC. The volume GC 1, which had been reserved for the publication of the Old Babylonian texts from Terqa, appeared instead in disk form in 1987 under the label CMT1a as the first of the disks to be published within the system, (Buccellati et al. 1987) together with the electronic version of the Middle Assyrian Laws, which appeared as CMT2a. (Saporetti 1987.)

The marked discrepancy between the number of publications anticipated in the prospectus and those which actually appeared was primarily the result, as it should be apparent by now, of the shift in orientation caused by the introduction of microcomputers and the consequent possibility of distributing data and programs on disks. Such a shift has caused some frustrations and disappointments, because data bases which were essentially ready for publication had to be held back. This affected especially several of the corpora which Claudio Saporetti had been preparing for publication, besides the Old Babylonian corpus on which we had been working in Los Angeles since the early stages of the project. It also affected the public perception of the project, as evidenced, for example, in the review which Edzard wrote of DSC 1 (Edzard 19..). The criticisms raised have merit when viewed within the narrow limits of the single volume to which they apply. It is indeed true that the volume in and of itself presents both too much and too little detail ö too much in that the graphemic word index for the Middle Assyrian Laws represents a sort of overkill for such a limited body of data, and too little because a full assessment of the system could only be made if both more indices were available for this limited corpus and more corpora were available for the same type of indexing and correlative comparisons. Edzard’s criticisms, would have been met if more volumes of the GC series had appeared as planned: given the necessity, under which we were operating at the time, to publish these indices in paper format, such individual volumes (published in an inexpensive format) would have provided sets of interrelated indices which were the best possible under the circumstances. The radical alteration of the circumstances, which has made electronic distribution of data and program a new reality of the scholarly endeavor, should show even more effectively how the original conceptual scheme of Cybernetica Mesopotamica, if not its early published embodiment, had merit and value.

8. Scope

The current scope of the system emerges already from the brief history of its development as I have just outlined it. Since it is clearly more than a simple electronic equivalent of paper editions, the system must be viewed and used specifically in an electronic fashion for its advantages to become apparent. Yet the system does not, by any means, aim to be at the cutting edge of programming or data processing as such; the emphasis is rather on providing the minimum common denominator that may apply to the widest number and categories of scholarly users. Thus the innovative aspects of the system may be found more properly in its overall scope and configuration than in any particular data processing technique. (A similarity may be noted with commercial products such as games (which typically run on the broadest possible range of hardware, and provide graduated instructions catering to both the novice and the most advanced of players) or telecommunication systems, such as electronic bank accounts or information retrieval services (which emphasize breadth and range of access rather than complexity of data manipulation).) In what follows I would like to identify what are, in my perception, such more salient aspects of the system Cybernetica Mesopotamica.

9. Integration of textual and artifactual data

In an effort to view Mesopotamian civilization as an organic whole, and to bridge the gap between philology and archaeology, the system stresses on an equal footing the analysis of both epigraphic and artifactual materials. The nature of the integration between the two types of analysis is not such that direct correlations are established between the two in terms of data processing; in other words, the data bases are not construed as some sort of all-inclusive hypercard system, whereby one may pass at will from linguistic expression to material culture. Rather, integration is to be understood simply in the sense that both types of data are subjected to the same kind of in-depth categorization, and thus are susceptible to the same kind of structured retrieval and correlation approaches. Conceptually similar to such systems as the [cultural management system, see note from Mike Fuller; also compare iconographic archives], Cybernetica Mesopotamica distinguishes itself in that the type of categorization embedded in the data is much more differentiated and specialized, and also in that many of the data included are previously unpublished. On the other hand, the total amount of data included is for now relatively limited (which is why the whole system, as currently implemented, remains rather experimental in nature), and there are certain important dimensions missing which should otherwise be considered for a comprehensive study of Mesopotamian civilization, such as ethnography or geography.

10. Categorization as primary structural analysis of the data

The fundamental hallmark of the data distributed as part of our system is that they are highly structured. For each type of data there is in effect an attendant “grammar” which attempts to define holistically and hierarchically the pertinent universe: the “grammatical” definition is holistic because it builds on a structural method, which presupposes, and seeks to identify, a structural whole from which the data emerge, rather than following on an ad hoc basis the individual manifestations in their disjointed appearance; and it is hierarchic because it is conceived in the form of a tree structure which descends through a variety of nodes in capillary fashion to account for every last detail of the universe of data. (Such statements may appear at first reading too abstract and inconsequential for practical purposes, but in fact the concrete and practical benefits of such structural coherence are borne out daily in the practice of either linguistic analysis or archaeological field work. While this is not the place for a detailed justification of this point of view, I hope that the graphemic categorization presented here, and its attendant use with the Ebla corpus, may begin to indicate the merits of such “structural” approach. For a related example drawn from archaeology, about which I will write more at length in a future volume of this series of Manuals, I may point out how the work on stratigraphic categorization has resulted in a very complex and comprehensive system of coding and analysis which changes in essential ways the mechanism for arriving at strategy decisions during the excavations, since it provides much greater capillary control on stratigraphic details; it also adds a whole new dimension of objectivity to the record, since it makes it possible to publish, if one so chooses, the full range of observations made in the field (the “global record”), without the selectivity that has otherwise been necessary in archaeological publications.) These new “grammars” are applicable to the data in such a way that the massive data storage accumulated in the process does not become something like the inarticulate archaeological mounds we excavated in the first place. For it is ironically quite possible to re-bury the data inside the computer if the retrieval channels are superficial and few. Instead, our effort has been to give a highly structured configuration to the input, so that all the various components may hold together more effectively and the resulting yield may be all the more powerful.

11. Distributional analysis

If categorization is the basis of the system, then distributional analysis is its ultimate goal. The categorization process produces, as we have just seen, a primary structural analysis of the data, in that the data are given, yes, as data, but filtered through a given “grammar” which represents a specific understanding of structural relationships. While the data retain their full documentary value, they are not presented in a raw state, and as a result their utilization is that much more powerful. To put it simply, we might say that categorization establishes only minimal and simple correlations among the data, whereas with distributional analysis correlations can grow to very high degrees of complexity. In other words, categorization builds on relatively simple distributional arrays (“paradigms,” as they are known in traditional grammatical terminology), while analysis proper establishes very complex arrays, derived from multivariate and highly differentiated correlations.

The system programs, which are made available together with the data, provide a first step in this direction. Other programs, to be made avaliable in the future, and commercial programs which can be applied to the system data, will extend even further the possibilities which are intrinsic in the data so categorized. The distributional patterns which emerge from the application of programs will thus provide answers to specific conceptual questions posed by the scholar. And these patterns will be based, as with no other type of documentation, on two interrelated, fundamental conditions. On the one hand, the emerging distributional patterns will rest on a mass of documentary data so large as to make statistical inferences that much more meaningful. On the other hand, the absence of given phenomena will add considerable weight to the notion of “meaningful zero,” which is so critical for any kind of structural analysis. It is in this respect that the application of electronic data processing can come as close as possible to the ideal situation where living informants are available, thereby lending new life, as it were, to the “dead” civilization of ancient Mesopotamia.

12. The concept of electronic publishing

Data and programs are distributed on disks which are conceived as standard vehicles for the dissemination of scholarly information, much as any other conventional type of publication. This is already exemplified by the disks currently available, and it means the following with regard to both data and programs.

As for the data, they are not simply an electronic copy of materials otherwise already avalaible in printed form; rather, the disk edition represents an independent elaboration according to specific criteria (e.g. graphemic or morphemic, stratigraphic or typological). These criteria are spelled out in detail in accompanying files which contain all the specifics of pertinent codes and data structure. A variety of additional introductory files (which are explained more in detail below) also contribute to make of the distribution disks an autonomous and self-contained channel of communication. Printed Manuals such as this one will be published as supporting tools only occasionally, in order to describe in this more traditional format the special characteristics of each different type of structural categorization; this will obtain especially whenever a given type appears for the first time, as is the case in this volume for graphemic analysis.

Similarly, distribution disks containing programs should be viewed as regular items of scholarly publication. Programming is a discipline all by itself, which obviously does not belong as such to the field of Mesopotamian studies; and in this respect, programs qua programs would be either too ordinary to be of any interest for a computer scientist, or too complex to be understandable by a normal Assyriologist or archaeologist. What there is of scholarly interest in a program, an interest which ought to be shared by Assyriologists and archaeologists, is not the programming component as such (e.g. language or algorythms used), but rather the system design which defines the problems to be asked and the ways in which the program is to deal with them. It is such a problem orientation of data processing which has its place as a scholarly publication on disk.

13. System programs and commercial programs

The clear definition of codes and data structure will allow scholars with some expertise of data processing to make use of the data by applying either commercial programs or programs of their own. But it is also an integral component of our approach to provide specific programming support, and this in two primary ways. On the one hand, there are system programs, i. e. programs written specifically for the system Cybernetica Mesopotamica, which operate exclusively on the data as structured and which provide answers to a previously defined set of questions. (Such are, for instance, the indexing programs for texts in graphemic format which are described below in this volume.) On the other hand, there are programs which prepare the data for “importation” into some of the major or more suitable commercial programs available (such as DBase IV, Reflex, or WordCruncher); instructions will also be provided on how to best make use of these programs. Since the data distributed as part of our system are all in basic ASCII characters, importation into most word processors will be automatic.

14. Extensive and intensive dissemination of data and programs

Dissemination of the system materials on disk will have an altogether different impact from the conventional types of publication on paper. On the one hand, electronic dissemination will be much more extensive, in the sense that the range of access to the materials will be much wider. Since disks may be made available at practically no cost, there is no limit on how widely they may be distributed. It is for this reason, too, that private distribution is easy and common: an inherent difficulty, however, is that materials so distributed may often be inadequately documented or, in fact, not even sufficiently self-standing to be used without personal, ad hoc instructions. In addition, personally distributed materials are not “published” in the specific sense that they are not available for inspection at will. This situation, which is typical of the in-house breed of products, is not compatible with the requirements for normal scholarly publishing. The distribution approach which we propose within Cybernetica Mesopotamica, on the other hand, addresses precisely these requirements: on the one hand, disks are available through commercial channels (though at a nominal cost), they are prepared in a final and self-standing form, they are carefully documented, and they are submitted to the same peer review that obtains for paper publications.

Electronic dissemination is also more intensive in that there is no real limit to the amount of detail which can be included in the medium ö at least, there is no technical or economic limit. The only true limit to be observed is conceptual in nature, i. e. it derives from the inner logic of the data and from the degree to which proper categorization can be provided, with the attendant documentation. It is important, in other words, that data “published” on disk not be a mere dump of personal notes, but rather be structured in a clear and well documented manner. In some areas, in particular, one will feel very significantly the lifting of traditional restrictions on the size of the corpus to be published, as in the following two examples. In the field of philology, the publication of data in diverse formats (e.g. graphemic, morphemic, or topical) will not appear extravagant as it would should one produce such alternative formats on paper. In the field of archaeology, it will be possible for the first time ever to provide the entire documentation of the excavation (what I call the “global record”), thereby adding a measure of documentary integrity which the field as a whole has so far been unable to attain.

15. Portability of computer systems

As already mentioned, the system aims for a minimum common denominator through which it may reach effectively a wider range of scholars, including those who may not be very familiar with computer operations, much less with programming as such. At the same time, however, the nature of the data is such that very complex data processing techniques may be brought to bear on them and yield optimal results. This has colored some of the operational decisions which have moulded the system, in particular among which are in particular the following.

The data are all in low range ASCII characters (i. e., roughly, the same characters which are found on the keyboard). This means, for example, that we have used combination of such characters to express certain symbols (e. g. s^ for å): while such a solution is awkward in appearance and cumbersome for some operations (e.g., for sorts), it allows data to be imported by any type of computer and any type of operating system, without having to implement special files for the screen or the keyboard. We will, on the other hand, provide separate programs (like those described below in Chapter 3), which will rewrite the data in a variety of intermediate alternative formats, suitable for specialized screen display or printer output.

The programs provided with the system will follow rather simple routines, with universally acceptable display formats. The operating system presupposed is generally MS-DOS 3.00 or higher, which is the most widely currently in use and which lends itself more easily than most to conversion programs if needed.

Any type of hardware configuration or operating system may use directly, or easily import, the data; depending on demand, we may also choose to provide in the future the data in alternative formats, such as MacIntosh. It is already anticipated that the data will be made available on compact disk for use with the Ibycus system. On the other hand, only IBM or IBM-compatible computers will be suitable for running our system programs. However, no special cards will be required, so that even the simplest IBM-type configuration will be suitable.

16- Current objectives – data

In this and the following section I will outline the specific segments of the system on which work is currently being pursued. Many of these title are in an advanced state of preparation, and should be published within the next one to two years: at that point, the precise nature of the system will hopefully become self-explanatory to any one who will use it ö as should already be the case, to some extent, with this Manual and the disks to which it refers. The following description of the projected titles will help for now to envisage the direction in which we are going. A comprehensive prospectus of titles, including both data and programs, is given as Fig. 3.

The prospectus of publications best reflects the scope of the project, in that it covers the broad spectrum of Mesopotamian civilization. To stress further a point that has already been made above, the project and the publications which emanate from it remain experimental in method and partial in coverage. The prospectus will give an indication of the scope intended. What is experimental is not so much the application of a given program or the utilization of a given corpus, but rather the concern for a coherent structural understanding of Mesopotamian civilization. Viewed as a systemic whole, this civilization is conceived as susceptible of an integrated type of categorization, and of the attendant analysis that can be performed on it. The substantive results to be expected will be the more valid the broader the coverage and the more far-reaching the programs. But it is important to understand my purpose in taking the initial steps and to appreciate the fundamental presuppositions which underly it. As stated already, my assumptions are that a thorough and rigorous categorization is possible in the first place, that the canons for the categorization of different cultural emodiments are fundamentally similar, and that the distributional analysis which can be built on it is the most powerful and objective tool for our research. In other terms, the various grammars proposed should add up eventually to a unified grammar of Mesopotamian civilization.
In what follows, I will add a few brief details about the projects which have been outlined in the prospectus, where more specific references to authors and titles are also provided. These titles should provide within a span of a few years a somewhat more substantial basis for an assessment of the system as a whole.

17. Graphemic categorization

The first two disks published in the system belong to this sub-series ö the Old Babylonian Texts from Terqa and the Middle Assyrian Laws.
On or about the same time as this Manual we will also publish the electronic edition of the first four volumes of ARET (Archivi Reali di Ebla ö Testi), as well as the electronic edition of ARET 9. The latter will mark the first occasion that an electronic edition appears simultanesouly with the corresponding paper edition. Work on the remaining volumes, through ARET 10, is already in an advanced state of preparation.

Work is actively underway for the completion of both the first and the second corpora with which the project began ö the Old Babylonian Letters and the Amarna Texts.

For the future, we plan to continue publishing the electronic edition of the volumes of ARET as they appear. In collaboration with C. Saporetti, we also plan to make avaiable in electronic format the graphemic version of the Assyrian texts which he is publishing in his series .... Also anticipated is an electornic edition of ritual texts which J. Paul is currently preparing.

18. Morphemic categorization

We expect that the first publication of material analyzed in morphemic format will be the onomastic corpus of Ebla. This is conceived as a major project for which we have under consideration a major research proposal. If implemented at its fullest, the project will involve the participation of several colleagues who will serve as consultants on a variety of different aspects of the research.