Panlibus Blog

It’s all about links. The Future of Bibliographic Control

Stables

So, the Library of Congress Working Group on The Future of Bibliographic Control released their draft for public comment last week.

The draft contains 5 high level recommendations broken down into many smaller, detailed recommendations. The amount of detail in the 41 page report is impressive with some very focussed thoughts and very clear statements about what to do next, both for the LC and the wider community.

As well as reading the draft report you can watch the working group present their findings to LC, recorded a few weeks ago.

1. INCREASE THE EFFICIENCY OF BIBLIOGRAPHIC PRODUCTION

The first recommendation is about making the production, gathering, editing and flow of data as efficient as possible. The report makes recommendations ranging from mandating that publishers supply metadata as part of the CIP process (which LC already do) to ensuring that purchased datasets are not embargoed against sharing (hard to do given the business models of commercial data suppliers).

OCLC gets a special mention, saying that “OCLC’s business model has a real impact on the distributed system of bibliographic data exchange.” In our opinion that’s somewhat understating the case, certainly for those who aren’t members of the OCLC club. I’ve met many of the OCLC folks and they’re doing some great things, but they have a business model established before the net and having to protect that is damaging their members ability to participate in this networked world.

The thrust of the efficiency recommendations are about using data that’s already available. This comes in three flavours; contractual, ensuring that data is provided by suppliers and that data that has been purchased can be freely shared; technical, ensuring that crosswalks, converters etc are available to get this data into catalogues; and social, relaxing standards to accept existing data without effort over perfect data created from scratch.

There’s also an interesting piece on LC’s costs:

According to current congressional regulations, LC is permitted to recover only direct costs for services provided to others. As a result, the fees that the Library charges do not cover the most expensive aspect of cataloging: namely, the cost of the intellectual work. . The economics of creating LC’s products have changed dramatically since the time when the Library was producing cards for library catalogs. It is now time to reevaluate the pricing of LC’s product line in order to develop a business model that allows LC to more substantially recoup its actual costs.

Finding a business model that both allows distributed responsibility for the bibliographic data and allows LC to bring in more money is a big ask. Commercial organisations are struggling with exactly that right now; OCLC being the biggest amongst them. Given that LC data is not covered by any intellectual property rights (the work of federal employees does not qualify for Copyright protection and the US has no Database Right) I don’t see any practical way for LC to achieve both objectives.

If the management of the data can be successfully distributed then much of the cost would also be distributed. In this case, with the community as a whole producing much of the data, it becomes even more important to clarify the rights people have over the data. We’ve been working on this problem for a little while, releasing an initial draft license for this purpose (licensed community generated data) more than a year ago and having recently released further drafts for comment. We’ve funded legal work and expect to have some more news on progress with our data license here very soon.

Also included in this first recommendation is some discussion of internationalising and expanding the Authorities data. I’m not sure I see this as an efficiency gain, but bringing together authorities from national libraries in many languages and reconciling them has the potential to revolutionise global bibliographic search. This work is already underway and, as long as the data is free for all to use, will be a great endeavour. This, in my mind, is the most compelling reason for LC to be asking for more funding and clearer national mandate.

2. ENHANCE ACCESS TO RARE AND UNIQUE MATERIALS

This second section focusses on one aspect of libraries that has competitive advantage over anything else – obscure stuff. The obvious example here in the UK is St John Rylands with their manuscripts, but every library has its own unique and interesting pieces. The same arguments around data as in section come up again – with a recommendation to focus on at least some access to all resources rather than having some perfectly catalogued and others not at all.

There are also calls to digitise these assets where possible and make them available online, partnering to do this where necessary.

I’ll jump on now, as section 3 and 4 float my boat a bit more.

3. POSITION OUR TECHNOLOGY FOR THE FUTURE

We have become slaves to MARC, so too have our systems vendors

That has to be the headline of this section; it’s a quote from Brian E. C. Schottlaender from the working group presentation of the draft to LC.

In my mind, MARC stands in that sentence as a placeholder for all of the library-centric, complex and web-hostile standards that we currently rely on. You could easily add Z39.50, NCIP and a host of others to the list.

The point of all of the recommendations in this section is to be able to play nicely with the web, but even in the draft report there are still signs of the record-centric thinking that MARC forces us into.

Library bibliographic data will move from the closed database model to the open Web-based model wherein records are addressable by programs and are in formats that can be easily integrated into Web services and computer applications. This will enable libraries to make better use of networked data resources and to take advantage of the relationships that exist (or could be made to exist) among various data sources on the Web.

Overall, though, the recommendations are a very good start. The data needs to be made web-accessible, the vocabularies need to be published, freely, in machine-readable forms and software everywhere must be allowed to link to elements of the data. The description is very much inline with the work that the W3C and others are doing around publishing data on the Semantic Web – it would be nice to have the report come down explicitly in support of this to save everyone two years arguing about how to publish data on the web. Currently the semantic web is only mentioned in passing in 3.2.1.2.

There are developers at the LC who are interested and active in the Semantic Web space already. LC should invite them to show the rest of LC what they’ve been playing with and what the impact of it could be. The recommendations mention SKOS, a way of representing subject headings. I’ve seen work to represent LCSH in SKOS and it looks great – LC should be opening up and promoting this kind of work.This is specifically covered in 4.

Section 3 also contains probably the biggest shocker; “Suspend further new work on RDA”. The reason is to spend time getting FRBR tested and straightened out. I hope it’s just a question of naming as many of the names involved in the RDA stuff are now pushing actively on getting FRBR sorted. Judging by the traffic on mailing lists like NGC4LIB the recommendation at least got everyone to sit up and listen.

The working group see the results of work done on FRBR as a tantalising sign of what could be done to really change the search experience for library users. Having done work on record clustering myself I have to agree. FRBR is only one step along that road, though. With the right model many relationships can mined in the data, making it explorable in a way that catalogues just aren’t today.

4. POSITION OUR COMMUNITY FOR THE FUTURE

The changing demands of library users, the increasing diversity of uses both of library resources and the metadata about them and the recurring references to making the data machine-usable will all be very familiar to those following the biblioblogosphere. That’s a very good thing, there are a lot of smart people on the working group and many more smart people not on the working group.

I was expecting more in this section around people, they make up the community after all. Instead the report focusses on some more technical efforts to link library data with other resources, integrating user-contributed data, make FRBR happen and open LCSH up for re-use wherever desired. It seems to me that these recommendations could easily have been viewed as technology recommendations. It seems almost as though they sit as surrogates or euphemisms for what is said more openly on the mailing lists – that people will have to change. Of course many people are changing already and starting to do much of what the report discusses, others are still reticent, unsure of the need for change and fearful of losing technologies they believe have served libraries well for forty years or more.

The recommendations in section 4, if viewed as surrogates for the real point, are in effect saying “Change is coming, these are the first and you need to get on-board with these”. If that’s what the working group intended to say, I’d like to see it said explicitly.

If we really want an active, world-wide bibliographic community – which would be necessary for many of the recommendations in previous sections – it would be great to see some discussion of how that might coalesce. Distributed management of bibliographic data is a fine start, but Talis, OCLC and other already do that. What factors are necessary to really form a community, as Flickr, Facebook, MySpace or Second Life have?

5. STRENGTHEN THE LIBRARY AND INFORMATION SCIENCE PROFESSION

Step one on getting better at this is apparently to build an evidence base of the costs and benefits of various initiatives – that is, get professional about knowing what is worth the time, effort and money. This seems like a sensible thing to do, and slightly odd that it’s not already being done. As a majority employee-owned software company we all keep an eye on what we’re doing to make sure we’re doing the most important and valuable things.

The next step is to support ongoing research – that is, if you have people in your teams interested in where this should go let them play! Let them try stuff out, give them some time to try mining the marc data or designing an ontology, like Martha Yee did.

Then, to ensure the future of these efforts, change LIS education, convince educators to do more on bibliographic control fundamentals and convince them to share that material widely. A nice set of efforts, but judging by the way computer science education affects the wider industry, a ten-year strategy at a minimum.

6. JFDI

During the presentation of this report to LC, the working group were challenged that LC are already doing much of this, and others are doing much of it too. That’s a good thing, it means the recommendations and our actions are in agreement.

What’s notable by its absence in the report is input and commitment from ILS vendors. As ILSs are so intimately tied with MARC and the language of MARC very little can change without vendors involvement; even with the profusion of free and open-source tools we hope to see appear.

But, doing much of it and having done much of it are a world apart. I think there’s still time to change and to be a part of the web. The semantic web offers real opportunity for libraries to have the best of both worlds; and then some. To close the stable door in time, however, will require faster change than we’ve ever seen before.

Let me know what you think of the report by commenting below.

One Response

  1. Robert Engels Says:

    Hei Rob,

    Thanks for your great summary of the report, finally acknowledgement I would say.
    Here in Norway there is much work done on these issues. For example the Archive Department of the Norwegian Broadcasting Cooperation (NRK), who actually realised the problems of recommendation 1, sees the business model in recommendation nr 2 and works actively with opening up for Semantic Web scenario’s as in recommendation nr 3 (see also W3C Use Case on NRK ). Currently we try to position for recommendation nr 4.

    With current progress, I am looking forward to see where we are in Q2-2008!

Leave a Reply