Data, Systems, Management, and StandardisationBy Chris Hurley This article falls into two parts, the common thread of which justifies its presentation as a single piece. In the first part, I update my 1990 views on standardisation in Australia. The second part provides an account of work I have since undertaken on behalf of the Australian Society of Archivists (ASA) to respond to a draft international standard being developed by the International Council on Archives (ICA). It will be necessary, at the outset, to define two terms. By archival data I mean information about records and their context of which an archives is (in part) the source (traditionally in the form of archival guides and finding aids) and, in the environment created for us by electronic records, possibly the manger. By electronic records I mean so much of the data on a system (being a system satisfying functional requirements for record-keeping) which is used for record-keeping purposes. In the second part of this article, I will describe the very high level of support given by the ASA and a select number of archives institutions to -
There appears, nevertheless, to be a view that standardisation is somehow removed from the immediate, daily, "practical" concerns of archival life - a worthy, "theoretical" goal to be pursued when we get past the more urgent demands of getting on with the job - that it is something which involves "extra" (optional) work which can't always be justified in the press of daily chores. Such a view mistakes the urgent for the important. The heart and soul of any organisation (not just an archives, but any organisation) is its data system. Standardisation is not just (!) about improving our data systems. Still less is it about modelling some desirable, but impractical, unattainable, or optional enhancements. Electronic records and standardisation are widely perceived to be the outstanding theoretical problems of our profession. In the case of the former, the "practicality" of trying to find a solution is well understood. Perhaps the latter does not receive the same attention because it is felt that we have descriptive solutions already, that the only question is whether we need to make them a bit better, and that it can wait upon more "urgent" priorities. It is already clear that, whatever successful strategy is developed for dealing with electronic records, possession of an adequate documentation strategy is a key to that solution. The two great problems (electronic records and standardisation) are not, in fact, two problems at all - they are twin aspects of the same problem. Electronic records cannot be dealt with effectively apart from an adequate documentation strategy. The current pre-occupation of the profession here with electronic records needs to be extended to standards for archival data. If the pundits are correct, preservation of (and access to) electronic records will not (indeed, cannot) be achieved custodially. Access to records via networking makes it unlikely that an archivist will be needed to access or interpret electronic records. Networks provide their own navigation systems and the interpretation of data is a function of the end user's system rather than the formats and protocols devised by the generator of the record. On this view, the archivist of the future will be neither a custodian, a navigator, nor a gate-keeper. In the custodial model, if our archival data was poor, we could (however inefficiently) overcome this in the search room by providing expert knowledge of holdings or remedial reference services. The opportunity to make good inadequacies in documentation came with the necessity for users to present themselves if they wanted to use the records. Archives are segregated for preservation and, consequently, used in ways dictated by the need to preserve their qualities as records. Users must accommodate themselves to access protocols peculiar to archives because that is a function of the need to preserve. With electronic records, this safety net will be removed in two ways -
In neither case, will there be any opportunity to make good deficiencies by using well-honed search room skills. What then, that archivists now do, will still need to be done? This question is at the heart of the electronic records issue. A part of the answer must surely lie, as Bearman and Lytle have argued, in the need to document the context of electronic records (as we do now for all records) as a path to their interpretation and use and as an important part of managing current information in large organisations -
Users accessing electronic records via the network will not need archivists to hold, locate, or interpret the data. We will be needed, if at all, to help construct systems in which archival data (knowledge of context and record-keeping) is available to users when needed. Archivists are not alone, of course, in undertaking organisational analysis and documentation. What we bring to the task is a unique experience in representing and preserving information ("archival data") about changes in systems and to the contextual framework. It follows that at least some part of the answer to the question, "What must archivists do with electronic records?", involves discovering how archival data can best be used in the generation, management, and use of electronic record-keeping systems. That, ultimately, is what standardisation is all about. Standardisation 1987, Revisited In the May 1990 number of this journal2, in an article outlining arguments in favour of standardisation which drew upon a Report prepared in 1987 for the Australian Council of Archives (ACA), I took no account whatsoever of such arguments. I gave several meanings (at p. 64) for standardisation. They were -
I also gave four reasons (at p. 65) why I thought standardisation was important. I shall try the patience of the Editor by repeating these also -
I did then make one suggestion that, with hindsight, seems more than ever relevant. As part of any national endeavour to standardise archival data, I proposed that attention should be given to establishing a common contextual framework for the documentation of the country's archives (both public and private) which was "above" the level of documentation normally undertaken by any one programme and which no one programme could provide for itself. The conception behind this proposal was that all archival data needed to be linked conceptually to a universal statement of context which would ultimately be needed to interpret and understand the archival data generated by each programme. Nothing which has occurred or been said since has changed my mind on this point and, if speculation about the way we need to go in dealing with electronic records is correct, it seems to me more relevant than ever. Standards of archival description by and large appear as rule-books or manuals itemising the kind of data used when describing records and their context and, by extension, specifying what is being described and how the data should be organised. David Bearman has identified four types of standard4 and warned against confusing them -
The assumption of some archivists has been that, if we can define data contents and adopt an agreed structure, improve control over vocabulary, and refine our ideas about access points, then standardisation of archival data will have been achieved because it will be formatted to look the same (at least when it appears as output) and be more easily retrievable. Archival data which conforms to the standard could then be effectively exchanged, merged, and used. It is assumed that any system would recognise and be able to handle archival data from another system because it conforms to standards for structure and contents (at least to the extent that output from the system conforms to those standards 5). The purpose of standardisation, based on these assumptions, would be the production of archival data from a variety of sources which could be accessed from a single point - in other words, a data exchange format. The Impact of Networking Any future consideration of standards for the express purpose of data exchange or merging (whether at the national or the international level) must, however, take account of current trends in the information environment. There are two critical issues -
One clear implication of the networking model is that archival data will most likely be made widely available from local systems which are connected on the network and not merged in a central data repository. Data will be distributed across the network and accessed using client-server protocols which allow data generated in one application to be interpreted on the local system. The logical consequence appears to be that, in a networking environment, the existence of a standard of the kind foreshadowed in my 1990 article (designed to ensure uniformity in the kind of descriptive data employed, in the way it is used, and in its formulation into a standardised representation) has a lower priority (at least, for data exchange purposes). The really important questions will be how distributed data is accessed through the network and how it must be formatted so that it can be "interpreted" by the user's system. This will involve conformance with protocols which have very little to do with what data is made available and how it is formulated into descriptive entities, but rather with the transmission and interpretation of data of any kind (i.e. metadata). The implications appear to be that we, as archivists, must -
The politics of standardisation (to which I alluded in my 1990 piece) are thus changed. It is no longer just a debate amongst ourselves. In the larger world of networking, archivists will be minor players. On this model, for example, the concept of a National Register of Archives (which in 1987 and 1990 I thought central to the debate) - being a Register which takes the form of a consolidation of data held elsewhere or made available through a dedicated network - becomes almost irrelevant. The only certain purpose of a National Register would be to capture data not available on the network from distributed sources. A secondary purpose might be to provide a focus (to act as a directory service) for networked data but even this need may disappear if (?when) adequate means are developed for "navigating" the network. It must not be supposed, of course, that all we have to do now is load up our data onto existing or soon-to-be available networks. Our data has some way to go before it is ready and the networking future to which I refer6 is not yet here. Already, however, users of the Internet and similar facilities are becoming accustomed to "tapping in" to data with which they are immediately unfamiliar. The networking solution is not to make data from different sources the same, but to ensure that it can be transmitted and interpreted by a user at the other end of the network in whatever format the end-user's system prefers. The second of the four reasons I gave in 1990 (user familiarity with the way that data is presented to them) is unlikely to be compelling. The benefits of networking - the amount of data which can be accessed and the capacity to cross boundaries between disciplines - will sweep aside any twinge of regret for familiarity with output formats. The likelihood is that user interface systems will make data accessed via the network more meaningful at the user's end (for the user) than anything we could have devised for general utility. Of the four reasons I gave in favour of standardisation in 1990, therefore, I think only the first (it is a platform for improved professional practice in arrangement and description) and the third (it assists in the transition to and continued development of automated systems) survive as being persuasive now. I will now expand upon the role I think standardisation might have in relation to system development in a networked environment. It is possible that this may prove more important - not because it is necessarily more significant in itself, but because it appeals to the most basic and insular view of the matter imaginable and may, in consequence, be more persuasive amongst those who have thus far taken little interest. Standards and Archival Systems in a Networked Environment Hitherto, I have said that standardisation is desirable because it gives us a better, more practical, more useful result. It has been implicit that a good, practical, useful result could nevertheless be achieved without it. I now think there is reason to doubt that. I would now add, therefore, another reason to my catalogue -
Every automated system lives or dies according to its ability to maintain -
A failure to maintain any one of the three compromises the integrity of the system in ways which can not be remedied by success in meeting the other two. The standardisation debate (at least as I developed it in my 1990 article) has been largely about the first and the third of these requirements. I now believe that the second is significant also. It was my explicit conclusion in the 1990 piece that standardisation of system applications would not (and need not) be tight and that
Implicit in this view was the assumption that no single software package would be used (or needed to be used) and that no commercial package would be developed and applied universally. I felt that the large and medium archives (the State archives and one or two others) would develop their own in-house systems as (at that time) the Australian Archives and the Public Record Office of Victoria had already done. This prediction has turned out to be partly true, though as I suspected (I would now say feared) there has been "greater diversity in computer applications than I anticipate[ed]" 7. While there is good evidence that computerisation has led, as predicted, to greater systemisation by archives and extensive borrowing of ideas and approaches from each other, most are developing in-house software (usually, a domestic application of a proprietary package). Each archives, therefore, is becoming wholly responsible for its own system design and development, system management, and for the quality and management of its own data. System management and data quality control are an unavoidable part of computerisation in archives. The extent to which an archives undertakes system design and development, however, depends upon the availability of and the extent to which it utilises software packages designed and developed by someone else. The point is simply made : designing and developing your own system is like writing your own word-processing package instead of buying a proprietary package. No-one does it with WP because it is much more cost-effective to buy one off the shelf and spend time on other things. Archives system needs are not catered for in this way - partly because, in the pre-computer era, we allowed our manual systems to develop in an unstandardised way and partly because the market we provide is much too small to attract serious commercial interest. Even now, one could not confidently articulate the design specifications for an archives system which would be likely to enjoy widespread support even within the tiny market we make up collectively. It may be questioned whether standardisation on common software is, in any case, desirable - especially in view of the implications of networking to which I have already alluded. I can attest to the exhilaration which comes from designing your own in-house system, implementing it, and then developing it further. In the early, heady days it goes hand in hand with the "systematisation" of old, manual procedures and is a useful platform for staff training and development. There are fresh insights into one's data, one sees ways of improving it and new ways of dealing with it and presenting it. Once the system comes on line, the priorities should move to data management and system management. It is a characteristic of the systems environment, however, that systems design does not stop (it cannot stop, technology sees to that). Post-implementation systems design (systems development) proceeds at a pace not much slower than before. Systems design and development requires technical, non-archival skills. They can be developed in-house or purchased by using a consultancy. The cliche that archivists are really systems people has a germ of truth in it, but any archivist who acts upon it will be taking a short-cut to disaster. Archives systems are sufficiently complex and different to require design skills of a high order which few archives (except possibly the very largest) will be able to sustain in-house. Whether systems skills are in-house or external, they represent an investment and resources are always in short supply. The temptation (for all practical purposes, the unavoidable necessity) for small organisations developing complex systems in-house is to trade off system documentation for development. Lack of system documentation is the most common fatal flaw of in-house systems (everywhere, not just in archives). Under great pressure first to design and then to continue to develop a system, it seems, at first, the lesser of two evils to "postpone" documentation until there is a breathing space in which to do it. There never is such a breathing space. The consequences are not immediately serious. The organisation is small, everyone involved knows the system, at this stage they do not need documentation to refer to. It may be possible even to pass on to the first or second upgrade in this state, but sooner or later the lack of full documentation creates enormous problems for any system which proceeds in this way. Consider the consequences -
One solution lies in finding common software applications for archives based on agreed Information System Standards and Data Structure Standards. In one sense, this is directly contrary to my 1990 conclusion when I was somewhat unkind about those "who may still foresee the eventual adoption of a common system as the vehicle for standardisation" 8; but, of course, my present argument is that standardisation could be the vehicle for a common system - not the other way round. Another possible solution is that generation of archival data will no longer occur on separate archives systems at all and will be integrated with data management procedures in a variety of system environments. Either way archives are relieved of the need to design and develop systems and freed to concentrate on system management and data quality. While networking lessens the need for standards in order to achieve data exchange amongst archives, it also means that (for reasons outlined above) their adoption no longer impedes users of the network from accessing archival data freely. The emphasis of the argument in favour of such standards has shifted then from their value to users of the system to the advantages for providers of archival data. As we have seen, the archivist's primary role in a networked environment is likely to be as a provider of archival data rather than a custodian of electronic records. In this role, our value to users (arguably, our survival as a profession) will depend upon satisfactorily determining what data we should offer to the network and its quality - hardly at all on how we deliver it. That is to say, the survival skills of the archivist will be entirely bound up with system management and data quality, not with system design and development. It follows that any move towards common systems (to relieve us of the non-essential part of the task) should be welcomed. This being so, the leisure which in 1990 I believed we had to gradually grow together no longer exists for us. Early agreement on standards is necessary as the basis for developing common systems and as the vehicle for re-defining and improving what kind of archival data we will offer. If, as I suppose, our most important data products will be based on contextual data rather than on the contents and whereabouts of "holdings", it is clear that there is a considerable divide which still has to bridged between existing practice and the desired standards and that this task will have to be substantially completed before we can progress to common system design. A network is like a pipe which carries information between two points. A variety of things (fresh water, sewage, industrial waste, storm water) can be put into one end of the pipe and can be used in a variety of ways at the other end (to drink, to sprinkle on the lawn, to pump into a sewage farm and produce fertiliser, to nourish the ocean off swimming beaches). The limitations of the pipe impose some restrictions on what is carried and how it is used, but by and large producers and end-users need not be concerned about its design and engineering - they can take the pipe more or less for granted. Their task (the archivist's task) is bound up with the design and use of appropriate product. It will be a grave mistake just to load up old product for delivery and use in new ways - to focus on ways and means and not upon product. The secret of the network for archivists lies in developing appropriate new product for documenting context and for documenting record-keeping as an integral part of processes for managing and preserving records in a networked environment. This is the link between the evolving roles of the archivist and records manager - both of whom are thinking their way (or ought to be) into the new environment. For both (if indeed they remain separate disciplines), the possession and use of archival data will be central to the management of electronic records (as distinct from other kinds of electronic data). Archival data generated to provide researchers with information about provenance and holdings (and for no other purpose) will fail to meet this need. The real question posed by these developments is whether or not archival work will in future be done by archivists. If not, some of our successors (whether or not they continue to call themselves archivists) will be mere custodians and purveyors of information while the others (whether or not they recognise the evolutionary link) will undertake truly archival work : viz. the generation, management, and use of archival data. MAD, RAD, and Dangerous to Know In November 1990, the ASA received for comment (along with professional associations around the world) a draft copy of a document embodying the work of an Ad Hoc Commission on Descriptive Standards set up by ICA - Statement of Principles Regarding Archival Description. In 1992, a revised Statement ... was distributed along with a second document - General International Standard Archival Description : ISAD(G). A revised ISAD(G) is being published, and a third document - International Standard Archival Description for Authority Records : ISAD(AR) is currently being drafted by the Commission. The ASA, and the Australian archival community at large, has provided such vigorous input into this process that we were invited to join the Commission for its 1993 meeting in Stockholm and I have been a member of it since then. At each stage, in responding to the ICA Commission, the ASA has sought comment from its own Branches, Special Interest Groups and from archives institutions. The flavour of the Australian response to ISAD can be gleaned from the following extracts from our comments on draft ISAD(G) - It is our view that the draft Principles confuse the theoretical basis for description with a statement of a particular application of those principles which results in a theoretical statement which is not flexible enough to admit alternative (equally legitimate) applications of those same principles in a variety of ways which :-
At its Stockholm meeting, the ICA Commission agreed to revisions of ISAD which go some way to accommodating these three points. The ASA has not sought to impose the Australian "series system" on the rest of the world; it has sought alterations to the proposed international standard to accommodate the series approach as a valid alternative within international precept and practice. It is intended by the ICA Commission that ISAD operate as an international standard for data exchange. The Commission envisaged that national documentation standards (not inconsistent with ISAD) should be developed. Other English-language "standards" - RAD, MAD, and APPM 10 - did not, it was felt by the ASA Council, adequately serve Australian needs. In 1992, a Questionnaire was circulated to archival institutions seeking to obtain a picture of data usage. The results were then circulated to respondent institutions in 1993 and they were asked to participate in a further project to gather and systematise information on the use of descriptive data. This Project (ACPM) was initiated by the ASA Council in March 1993 in the following terms -
Ten institutions agreed and their descriptive practice is currently being analysed and correlated in a work which we have titled - Australian Common Practice Manual : ACPM. ACPM identifies four kinds of descriptive entity and is divided into four corresponding parts -
Within each Part (each Part representing a different kind of descriptive entity), data is divided into three categories : Identity, Description, Relationships. Each category of data comprises a number of data types. The result is an analytical matrix within which all descriptive data is tabulated and correlated -
The type of data found in the Identity category is the same at every level : reference number or code, title or name, dates, and "control data" - because the task of identifying a descriptive entity is essentially the same at all levels. In the Description category, however, the types of data differ at every level : e.g. quantity, access, and location for records and history, function, address for provenance - because the description of different kinds of entities involves identifying attributes which are peculiar to each. Relationships data shows connections between -
An examination is being made of sample documentation submitted by each participating archives. This is being supplemented by at least one visit. Where they exist, in-house manuals and procedures are being summarised. What results is a statement of descriptive practice which is particular to each archives and which also (because it is given within the conceptual framework of ACPM) correlates the use of data by one archives with the practice of other participants. This points up similarities and differences. Each in-house rule is allocated to -
The common practice rule is thus derived from an examination of the descriptive practice of the participating archives, but it is fitted into a conceptual framework which is developed independently of them. Each ACPM "rule" which is attributed to an archives should make sense in terms of the in-house practice of that archives, but its meaning is expounded to others in terms of the common framework of understanding provided by the structure of ACPM itself. A code is assigned to each participating archives (AAA for Australian Archives, ANL for the National Library of Australia, and so on). Where necessary, any variations from the common practice rule are shown. The whole matrix is set out in Figure One. Examples are given of documentation from participating archives, indicating which ACPM rules apply - see Figure Two.
Figure Two ACPM is not itself a standard. In 1990, I urged that steps be taken -
The Manual is, therefore, a stepping stone towards standardisation. As it develops, it will provide information on data usage and (to a lesser extent) on structure and systems, knowledge of which will be essential when the real work of standardisation (developing rules and systems for common use) is undertaken. Moreover, the focus of the Manual is on the area appropriate to development of Data Contents Standards with some application to Data Structure which Bearman has argued is the wrong place to start -
Since ACPM is, in part, a response to ISAD (which is itself essentially a Data Contents Standard) this was unavoidable. Indeed, both ACPM and ISAD ostensibly adopt a neutral stance on Information System and Data Structure - at least to the extent that both seek to describe what elements of information will be used as part of archival description in all types (rather than a given type) of archival description. This stance allows both to masquerade as being neutral on questions about what system will be used. It is not possible, though, to be neutral. However much they may be disguised by such an approach, assumptions must be made about the underlying Information System and this has been at the core of our difficulties with ISAD. In the case of ACPM, this question arises most acutely when comparing data from archives using the "series system" with data from those which do not. Although ACPM is descriptive, not prescriptive, in its approach, it is by no means neutral in its conceptual framework which, by using separate descriptive entities for context and record-keeping data, is firmly based on the "series system" technique. It would, of course, be confusing to simply pretend that differences in information system do not make corresponding differences to the structure and content of data from incompatible systems. In ACPM, this problem exists primarily at the intersection of data about record-keeping (records and contents) and context (ambience and provenance) -
The problem then is how to represent the connection made in different systems between context and record-keeping data. The solution is to recognise that, while the data itself is similar, it must be treated differently so long as the System Standard is different. This is done in ACPM by differentiating between data which is connected using a cataloguing-based approach ("associated data" within a single descriptive entity) and similar data which is connected using a series-based approach ("related data" within two or more descriptive entities). The methodology can be illustrated quite simply by applying it to the chapter in Keeping Archives (2nd edition) on "Arrangement and Description" (Ch. 8) -
The hall-mark of the "series system" being the separation of data on context from data on record-keeping, it follows that unless "agency descriptions [are] also completed" it is a cataloguing approach which is being used. If agency descriptions are not also completed, data on provenance is associated as part and parcel of the description of records. If agency descriptions are completed, a relationship must be shown. An association is how ACPM represents data which would be a relationship if it were bound into a separate descriptive entity. As we express it in the latest edition of ACPM for Records, an "association would be a relationship if it could, but it can't, so it isn't" 15. It is on this analysis that we can describe the Fonds (even though it contains contextual data) as a records entity. It will be seen then that, although the distinctive differences of alternative approaches are respected, readers of the Manual are invited (indeed, compelled) to view the data from a "series system" point of view. This has proved to be much easier than might have been supposed because of the relative lack of sophistication in the way archivists use data once they get beyond the context/record-keeping intersection. Ideas about high level context (ambience) and contents (information handling within series) turn out on close examination to be fairly crude or, in many cases when dealing with ambience, non-existent. This is (temporarily) an advantage because it means we can develop ACPM, away from the context/record-keeping intersection, on what is practically virgin territory. The co-operative endeavour undertaken to develop a standard in the areas of ambience and (to a lesser extent) contents can lead and guide practice rather than merely describe it. That assumes, of course, that archivists will perceive the need to extend and improve their documentation activity in those areas - especially the former. A conviction that, to survive, they must and that, with encouragement, they will prompts me to write this. Conclusions Developments in networking suggest that standardisation of information exchange protocols common to many other areas will be of more significance for accessing archival data than standardisation of the way archivists arrange and present it. This makes our participation in the politics of emerging information networks (in which we are necessarily minor players) of paramount importance and requires that we move rapidly to a familiarity with the technology involved. This participation will take place in a post-custodial environment where archivists can no longer expect to operate primarily as custodians, navigators, or gate-keepers in relation to those who make, manage, seek, and access electronic records available on the networks. They may have a role in purveying and deploying their skills and knowledge in the management and use of archival data - viz. knowledge of record-keeping, context and changing relationships through time. Although the pressure seems to be "off" so far as standardising for purposes of exchanging or merging data about records "holdings", we need to make sure that archival data is of a high quality so that it has continuing value in this new environment. Systems must be designed and developed to deal with high quality archival data. Archival skills are needed particularly to maintain data quality and to manage these systems. In-house system design and development is not a necessary part of the process and involves unacceptable risks for small programmes which could compromise data quality. The archival community needs to support the development of software applications so that archivists can concentrate on the essential tasks of system management and quality control. No progress can be made until archivists articulate their system specifications. So long as each archives pursues its own path to system design and development, we risk consigning valuable data to unsustainable systems and distracting ourselves from the primary task. The Australian Common Practice Manual (ACPM) represents a stepping-stone towards agreement on system specification for common application as well as an opportunity to debate the kind of high quality archival data which we should be developing. End-notes 1. David A. Bearman and Richard H. Lytle,
"The power of the principle of provenance" Archivaria 21 (Winter
1985-86), p. 14.
About Research Publications Consulting Links Sitemap Authorised by Head, School
of Information Management and Systems. Caution.
|