Conferences have not been the same this year: I’ve particularly missed the opportunity to catch up with friends and colleagues, and the random conversations and encounters in queues that compensate for the quality of the coffee one is waiting for. We have been left with the formal proceedings, and I wanted to say something about the papers presented at the (comparatively) recent Collections Trust conference, held online over two half-days on 1 and 2 October 2020.
The theme – building upon the emphasis on ‘dynamic collections’ (particularly disposals) in the recent Mendoza Review of museums – was ‘Dynamic data for dynamic collections’. If collections are to be truly dynamic, then the people who manage them require comprehensive and reliable data to inform their decisions. Papers by Sarah Briggs and Chris Vastenhoud both dealt directly with this theme, whilst Errol Francis emphasised the importance of information for the ongoing project of decolonising museums – and, crucially, thinking very carefully about who had recorded that information, how they had recorded it, and what it describes. Equally current was Sophie Walker’s presentation of a set of procedures for managing ‘data as objects’, that is, born-digital collections.
But the conference also raised three other questions that, for me, are fundamental to the future of the work that we do as documentalists and managers of collection information:
- Fitness for purpose
In the rest of this post, I’ll look at each of these in turn.
Fitness for purpose
We all use digital collections management systems to help us manage our collections, and to store information abut the objects and their contexts. These systems – certainly the longer-established ones – originated as catalogue databases, and were then gradually extended to store the information relating to how the individual objects are managed, as mandated by standards such as Spectrum. As it has become possible to do more and more digitally, so additional modules have been added to systems to manage, for example, images and other digital assets, conservation treatments, and so on.
But can the systems that we use today really manage all this? Adrian Hine, discussing the Science Museum’s massive project to decant its stores from Blythe House in west London to Wroughton in Wiltshire, suggested that they cannot, for three reasons:
First, workflow management. With the honourable exception of the workflow module in System Simulation Ltd’s Museum Index+,1 it is very difficult to configure collections management systems to actually enforce a workflow – that is, to present individual units of information to be filled in by predetermined people in a predetermined sequence, to make the options available at any particular point in the process depend upon what has already been entered, and to prevent further progress until all required work has been completed and signed off by authorised people.
Related to this is the ability to automate workflows – for example, to take a folder full of images and link them directly to the records for the objects they reproduce, based on data (e.g. a barcode or a written object number) within the image.
And third is bulk data entry. Most systems have mechanisms for uploading new records in bulk, but updating existing data usually means logging on to a system that requires at least a tablet-sized screen, and often a current connection to the database (whether via ODBC or HTTP); and then editing fields on many different database screens. They are not easily optimised so that all the necessary fields (and only those fields) are available in one place; they usually find offline data-entry and the subsequent reconciliation with the back-end database difficult; and they are prone to human error in data entry.
The Science Museum’s solution has been to bolt together a system which combines the collections management system’s database with commercial asset tracking software, barcode recognition tools, and so on. They have made extensive use of barcodes to encode all kinds of information – not just reference numbers – as a way to avoid errors in data-entry; and they have made sure that the systems can run on any device – not just the laptop that we usually use to access the collections database in the stores – so that it can take advantage of the integrated cameras and other tools that can be found in tablets and mobile phones.
I found much to agree with here: Hine’s paper chimed with my own scepticism about the current model adopted by collections management system vendors, where ‘one system can do it all’ – manage all the different procedures, provide rich contextual information in authority files, ideally catalogue (and why not manage?) a museum’s library and archive (with their own particular standards and conventions), manage its digital assets, etc., etc. I know why this model has been adopted – it’s easier to deal with a single supplier and to maintain a single system; the vast bulk of museums don’t have the resources to acquire multiple systems for particular purposes; and (as I mentioned above) these systems have mostly evolved gradually, as individual clients have asked for additional functionality. But the end result has been a series of large and expensive systems that do well only about 85% of what one wants, and crucial data held in different systems that are tenuously linked to the core database – if they are linked at all.
To my mind, we need to work towards a new paradigm: smaller systems to handle individual procedures or types of data, linked to each other using robust and well-defined APIs which share just enough core data for the individual modules to do their jobs. I’m not alone in this – Richard Light, who has been working in the field far longer, and is much more competent technically, than I, made very similar points in his paper at the CIDOC conference in December.2 But moving from monolithic systems to a set of modules provided by different suppliers for different functions is a big ask, of both suppliers and users; and when I asked him about it, Hines was quite certain that we’re not yet in a position to make that request.
At the National Gallery, we’re already part of they way along this route: we currently use middleware (Knowledge Integration’s CIIM) to aggregate the contents of multiple systems – our collections, library, archive and digital asset management systems; a wiki-based authoring system; a database of image metadata; image files in folders; and a room-booking system – into a single source of data for consuming systems.
But it’s because, at the moment, different collections management systems don’t play nicely with each other that I’ve been working with the CIDOC Documentation Standards Working Group on a project called EODEM (Exhibition Object Data Exchange Mechanism), which aims to make it easier for different systems to share data, in the absence of modular components and APIs. The project is establishing a framework for sharing the minimum data necessary to administer an object’s loan – that is, basic information to identify the object, and its requirements (environmental, security, insurance, etc.) whilst on loan. I presented the project’s aims and work to date at the conference; following further work at the CIDOC conference before Christmas, we’re now moving on to the development stage, in collaboration with several collections management system vendors.
As Kevin Gosling’s concluding presentation showed, for decades, interoperability – the ability to share data between systems, and therefore institutions, with minimum effort – has been the holy grail of museum information managers. And as Gosling said, we are now at the point where it is a practical possibility.
The last ten years or so have seen a series of standards developed, stabilised and – increasingly – being used, so that we are building up a critical mass of tools and expertise which will make their future exploitation much easier. These include the query protocols OAI-PMH and SPARQL (and other linked data tools); and the consolidation of multiple data standards into the CIDOC-CRM, and the more lightweight subset of CRM elements embodied in LIDO.
We are, for example, using a specific profile of LIDO as the format in which to encode data for EODEM – but in order to reduce technical problems as much as possible, we intend EODEM data to be shared simply as XML files emailed between lenders and borrowers, rather than ask vendors to enable direct system-to-system communication via APIs.
And interoperability enables aggregation. Aggregated museum datasets are nothing new – Europeana, for example, has been around for years, and many of us will remember the People’s Network Discovery Service and CultureGrid – but in many cases, aggregated datasets are static: they can’t easily be updated when catalogue information changes. And they often rely on a limited subset of metadata fields, which means they lack the granularity needed to drill down into large collections of objects to find particular items.
A key aggregated dataset is Art UK, which presents easel paintings and sculpture held in publicly-accessible collections in the UK in a single resource. Up until now, updates have had to be made by hand using either a form on Art UK’s website, or spreadsheets sent to Art UK for upload into their database. Last year, however, I was pleased to participate in a pilot project which investigated the practicality of harvesting data automatically from individual museums’ systems, aligning it with data already held by Art UK, and importing it into their systems. Andrew Ellis and Adrian Cooper presented the results of the Art UK data harvesting pilot project, showing that it was technically possible, and that its costs would be small enough to be fundable. And Kevin Gosling spoke about the practicality of establishing a national aggregator for museum data – not just paintings and sculpture – on similar lines to the Art UK pilot.
Gosling also made some very useful points about the barriers that such an aggregator would face. The current generation of museum leaders will also remember those previous projects I mentioned above, and not with any affection: the barriers to entry were high, they required a great deal of effort to prepare data for submission – and the finished resources proved not to be sustainable (or rather, there proved to be a lack of appetite in government to fund their continued development and provision).
But the Art UK pilot has been crucial as a proof of concept, and in establishing some core principles that would apply to any aggregator: notably, that it needs to be as easy as possible for museums to contribute their data. In other words, data needs to be tidied by the recipient, after submission, not by the museum before it is sent out. An aggregator should therefore be able to accept data that is arranged according to multiple metadata standards, in multiple file formats, via multiple protocols.
If the barriers to sharing data can be removed from museums and passed to the aggregator, then all those benefits of aggregated data – for example, making it much easier to manage disposals by finding out which collections specialise in similar material – can be exploited by museums with minimal effort on their part.
And so we’ve reached the point where we can ask what aggregators can do for contributing museums. Gosling’s answer, trailed by by Ellis and Cooper, was that aggregators can add value by applying tools to data at scale, improving it and, crucially, joining it up by exploiting tools and techniques that would be beyond most individual museums’ resources – for example:
- Providing interoperable data using APIs and linked data techniques
- Aligning individual museums’ authority lists with established resources such as the Getty vocabularies or Wikidata, making it easier to identify similar terms and, therefore, related objects
- Automated tagging of image content
- Mining descriptive texts to improve keywording and connections to structured terminologies
Several papers at the conference showed what might be possible in more detail. Diana Maynard and Jonathan Whitson Cloud spoke about the use of the GATE natural language processing tools to extract keywords and link together descriptive texts and terminologies – or even to develop new terminologies from community descriptions of objects; and Giles Bergel introduced the University of Oxford Visual Geometry Group‘s tools for image recognition.
But this where I have to ask: is our heart really in it?
Nearly fifteen years ago, I worked for a year for the Visual Arts section of the Arts and Humanities Data Service (AHDS). One of my tasks – something that was shared round the office – was marking the ‘technical appendix’ of applications to the Arts and Humanities Research Council (AHRC) for funding for research projects that had a digital component. One element that had to be assessed was the sustainability of any digital resources created by a project; and a promise to deposit with the AHDS was taken as pretty much bombproof. After all, it had been set up specifically as a long-term repository for digital research outputs in the arts and humanities.
A couple of months after I left the AHDS, the AHRC decided to stop funding it. Similarly, funding for the previous UK museum data aggregator, CultureGrid, was stopped in 2015.
We seem to be congenitally incapable, in the UK, of committing to long-term funding for digital resources, even though they need to be able to evolve as standards and technical paradigms evolve; software needs to be patched and updated; and the contributing community needs to be supported. Any UK museum aggregator will need some kind of guaranteed long-term funding if it is not to be another flash in the pan. You’ll forgive me if I suggest that we shouldn’t rely on any of the existing government or arm’s-length funding bodies to provide this. Any aggregator will have to have some kind of endowment if its persistence is to be guaranteed.
And then there’s the question of Towards a National Collection (TaNC), a £17 million, five-year funding stream provided by UK Research and Innovation (UKRI) via the AHRC to universities and those few large museums that are accredited as Independent Research Organisations (IROs), to focus on research questions related to an aggregated ‘national collection’. Again, it’s hard not to be sceptical here: the UK museum sector as a whole has been working on these kinds of question for decades, and has collaborated in the development of existing standard and systems that are already fit for purpose: we’ve done the hard work, and learnt the lessons. Suddenly, bodies that have shown no interest in sustaining previous initiatives in this area have realised the value of the data that we’ve been assembling – with limited help – for years. And given that the systems and standards for building an aggregator already exist, it’s hard to see precisely what research questions remain to be addressed. As described in the discussion following Gosling’s concluding paper, TaNC looks very like ‘the wrong kind of money’.
But others are less cynical than I. Gosling proposed that we should look instead at the money likely to follow UKRI’s recent review of research infrastructure as a potential source for funding the building – dare I hope, endowment? – of a national aggregator. If this can be done quickly enough then, by the time the major TaNC-funded projects are up and running, TaNC money can be used to develop the enhancement tools that would add value to the aggregated data – which could in turn be fed back to contributing museums, helping them to improve and enrich their data in a virtuous circle.
Which also brings me round in a circle to the beginning of the Collections Trust conference, when Jeni Tennison’s keynote lecture introduced the Open Data Institute, and its interest in data ‘at scale’ and how it can be used for common benefit. Gosling concluded by announcing that Collections Trust and the ODI will be collaborating over the next year to make the case for aggregated collections, in the form of a national museum data service which would collect, enhance, and share data from all UK museums. This won’t be the project that builds the aggregator – but it will develop the strategy and arguments to persuade the decision-makers and managers of the value of collection information.