Panlibus Blog

When is Data not Data?

Father to his son – “When is a door not a door“.  Son – “I don’t know, when is a door not a door?” Father [with triumphant tone in his voice] – “When it’s ajar!“.  Son – “How can a door be a jar?“. Father – “No, not ‘a’ jar, ‘ajar’  – it means slightly open – err it’s a joke“. Son wanders off muttering something about parents being weird.  –  Isn’t communication wonderful – when it works!

So what has that scenario got to to do with data then? –  Well for some inexplicable reason the following popped in to my head yesterday, whilst discussing why libraries are so protective about their Marc records.

When is data not data? – When it is metadata!

It came to mind again today when I read the post on Open Libraries – Open Data: What Would Kilgour Think?.

The New York Public Library has reached a settlement with iBiblio, the public’s library and digital archive at the University of Chapel Hill, North Carolina, for harvesting records from its Research Libraries catalog, which it claims is copyrighted.

Heike Kordish, director of the NYPL Humanities Library, said a cease and desist letter was sent because a 1980s incident by an Australian harvesting effort which turned around and resold the NYPL records.

Simon Spero, iBiblio employee and technical assistant to the assistant vice chancellor at UNC-Chapel Hill, said NYPL requested that its library records be destroyed, and the claim was settled with no admission of wrongdoing. “I would characterize the New York Public Library as being neither public nor a library,” Spero said.

It is a curious development that while the NYPL is making arrangements under private agreements to allow Google to scan its book collection into full-text that it feels free to threaten other research libraries over MARC records.

It is interesting that Jay Datema chose to contrast the apparent contradiction of NYPL allowing Google to scan its books whilst going all legal to prevent distribution of its catalog records. 

Well they are different things – the books are the data [albeit stored in paper form before being transformed in to a digital form by Google] that they are custodians of; whereas the catalog records are the metadata about what they hold.

If anything this makes the contrast even more perverse. NYPL are passing on copyrightable information [the text of the books] whilst aggressively protecting the facts about those books.

I’m going to dip my toe in legal waters here and I’m no expert on intellectual property laws, so correct me if I’m wrong.  It is my understanding that you cannot copyright a fact; a fact such as J. K. Rowling wrote Harry Potter and the Philosopher’s Stone; or the fact that it was published in 1997; or the fact that it has an ISBN of 0747532745; or the fact that NYPL has a copy in its Reference only section.   In fact [excuse pun] I fail to see anything in a library catalog record that is not a fact, and therefore come to the conclusion that catalog records are not copyrightable.

So why the fuss? What is so special about the NYPL records that caused them to ask iBiblio to cease and desist in harvesting those facts? How do they differ from the catalog records from all the other libraries?  As far as I can tell the only difference might be the fact that NYPL holds one or more copies – and why would a ‘public’ [or any library for that matter] want that kept secret?

As I said in my post yesterday:

It is about time that the Libraries of the world moved on from jealously guarding the metadata about the knowledge that they hold, and let their librarians get back to guiding people towards and helping them interpret and interrelate the knowledge itself.

Libraries are custodians of the wealth of human knowledge, a wealth that almost certainly can be multiplied if it is known where parts of it are located and how they relate with each other.

Jay closes with:

While the purpose of releasing library data has not yet reached consensus about what will be built as a result, it can be compared to Netscape open-sourcing the Mozilla code in 2000, which eventually brought Firefox and other open source projects to light. It also shows that the financial motivations of library organization by necessity dictate the legal mechanisms of protection.

It might not be clear at the outset, what good would flow from opening up the world’s library metadata, but it is a fairly safe bet that good would come out of such a move.

To paraphrase my colleague Rob Styles from his lightning talk at the code4lib Conference, we need to stop behaving like four year olds and realize the benefits of openly sharing. 

NYPL quote a previous experience when an organization harvested and then sold their records.  The fear that it might happen again is obviously driving their recent actions.  Whether reselling harvested records is right, wrong, or illegal is a question in itself, but if those records were openly and freely available to all to use, who would be able to build a business on selling what you can get free?

Openly sharing catalog records is good for libraries, good for researchers, good for library users, good for us all.  If you have trouble sleeping at night because you are worrying about people making money out of catalog records, be assured openly and widely sharing metadata is not good for them.  

If in the end you still have residual concerns about what people might do with metadata that they source from you, there are always open licenses such as those provided under Creative Commons, or our own contribution to the licensing debate the Talis Community License, which can protect ownership and/or the ability to charge without restricting open sharing.

A win, win, win, win situation then – so why don’t we do it?


(Photo taken by Daquella manera displayed in Flickr)


Technorati Tags: , , , , , ,

4 Responses

  1. Sarah Shreeves Says:

    I think that the one piece of a catalog record that is not a fact is the subject headings and analysis. This represents intellectual effort on the part of the creator of the catalog record. Of course, if the subject headings were originally assigned by the Library of Congress that should be open. In my experience working as an aggregator of metadata via the OAI Protocol for Metadata Harvesting, I’ve found that some institutions (usually NOT libraries actually) are reluctant to expose the subject analysis or interpretative data about objects in their collections. Also, this is a problem with A&I services who typically do some analysis of articles, and libraries can’t typically re-use that even though we could do some really cool things like in the Univ. of Wisconsin’s BibApp work.

  2. Kevin S. Clarke Says:

    I’m also not a lawyer but I think that it is more than just the subject headings. There is intellectual work in pulling together all the information in the same way that a bibliography which just consists of facts is something on its own and can be copyrighted. Now, maybe there is some difference since catalogs are databases rather than printed works but like I said… not a lawyer.

  3. Chris Rusbridge Says:

    Of course IPR rules differ from country to country, mostly now within the dead confines of WIPO treaties. Here in the UK we have a database right (a sui generis right, I believe!) as well as copyright. The GRADE project has recently done an interesting report (by Charlotte Waelde of the Edinburgh University Law School) on IPR in relation to geospatial databases, in which she comes to some interesting conclusions. See

    There have also been famous cases in the US relating to WestLaw, I think, which has rights to republish court case details (facts), but has claimed a copyright due to the intellectual effort of their arrangement. So the bar may not be very high!

    In passing, I’ll note that there used to be some difficulties in the UK about proprietary claims to MARC records “owned” by a company which I believe was a predecessor to talis!

    I’ve just had a thought. It might be qite hard to prove which bits of a catalogue record are your intellectual effort, and which parts are simply “facts” that have been copied in. Does that mean that catalogue records become orphan works (;-)?

  4. Rob Styles Says:

    Indeed BLCMP (Birmingham Libraries Co-operative Mechanisation Project, the organisation from which Talis evolved) used to assert ownership of records, I even have some reciprocal agreements with some of the other union catalogues, if I can find them 😉

    This has, in the past, also been confused with the fact that we provide access to many commercial datasets through Talis Base, such as NBD, BDS, BL and so on. We have a contractual and moral obligation to make users aware of the restrictions on these commercial supplies.

    In the EU we do have Database Right, legal protection for databases based purely on the effort they took to create rather than any test of originality. It is this that forms the basis for the Talis Community License; the license we use for contributions to The Talis Platform.

    The case you refer to with WestLaw in the US is interesting as the case against LexisNexis was based on an infringement of the copyright intrinsic in WestLaw’s representation of that facts. This can be understood by analogy to musical copyright – “Old Macdonald” is not under copyright, but if I record myself singing it my recording is under copyright.

    On the subject of Geo-spatial data, The Guardian’s Free Our Data Campaign is well worth following.

    This is such an interesting area of discussion (I’m not a lawyer, but I am a pedant) and is going to have such a huge impact on libraries’ ability to really stay a part of the information community.

Leave a Reply