Father to his son – “When is a door not a door“. Son – “I don’t know, when is a door not a door?” Father [with triumphant tone in his voice] – “When it’s ajar!“. Son – “How can a door be a jar?“. Father – “No, not ‘a’ jar, ’ajar’ – it means slightly open – err it’s a joke“. Son wanders off muttering something about parents being weird. – Isn’t communication wonderful – when it works!
So what has that scenario got to to do with data then? - Well for some inexplicable reason the following popped in to my head yesterday, whilst discussing why libraries are so protective about their Marc records.
“When is data not data? – When it is metadata!“
It came to mind again today when I read the post on Open Libraries - Open Data: What Would Kilgour Think?.
The New York Public Library has reached a settlement with iBiblio, the public’s library and digital archive at the University of Chapel Hill, North Carolina, for harvesting records from its Research Libraries catalog, which it claims is copyrighted.
Heike Kordish, director of the NYPL Humanities Library, said a cease and desist letter was sent because a 1980s incident by an Australian harvesting effort which turned around and resold the NYPL records.
Simon Spero, iBiblio employee and technical assistant to the assistant vice chancellor at UNC-Chapel Hill, said NYPL requested that its library records be destroyed, and the claim was settled with no admission of wrongdoing. “I would characterize the New York Public Library as being neither public nor a library,” Spero said.
It is a curious development that while the NYPL is making arrangements under private agreements to allow Google to scan its book collection into full-text that it feels free to threaten other research libraries over MARC records.
It is interesting that Jay Datema chose to contrast the apparent contradiction of NYPL allowing Google to scan its books whilst going all legal to prevent distribution of its catalog records.
Well they are different things – the books are the data [albeit stored in paper form before being transformed in to a digital form by Google] that they are custodians of; whereas the catalog records are the metadata about what they hold.
If anything this makes the contrast even more perverse. NYPL are passing on copyrightable information [the text of the books] whilst aggressively protecting the facts about those books.
I’m going to dip my toe in legal waters here and I’m no expert on intellectual property laws, so correct me if I’m wrong. It is my understanding that you cannot copyright a fact; a fact such as J. K. Rowling wrote Harry Potter and the Philosopher’s Stone; or the fact that it was published in 1997; or the fact that it has an ISBN of 0747532745; or the fact that NYPL has a copy in its Reference only section. In fact [excuse pun] I fail to see anything in a library catalog record that is not a fact, and therefore come to the conclusion that catalog records are not copyrightable.
So why the fuss? What is so special about the NYPL records that caused them to ask iBiblio to cease and desist in harvesting those facts? How do they differ from the catalog records from all the other libraries? As far as I can tell the only difference might be the fact that NYPL holds one or more copies – and why would a ‘public’ [or any library for that matter] want that kept secret?
As I said in my post yesterday:
It is about time that the Libraries of the world moved on from jealously guarding the metadata about the knowledge that they hold, and let their librarians get back to guiding people towards and helping them interpret and interrelate the knowledge itself.
Libraries are custodians of the wealth of human knowledge, a wealth that almost certainly can be multiplied if it is known where parts of it are located and how they relate with each other.
Jay closes with:
While the purpose of releasing library data has not yet reached consensus about what will be built as a result, it can be compared to Netscape open-sourcing the Mozilla code in 2000, which eventually brought Firefox and other open source projects to light. It also shows that the financial motivations of library organization by necessity dictate the legal mechanisms of protection.
It might not be clear at the outset, what good would flow from opening up the world’s library metadata, but it is a fairly safe bet that good would come out of such a move.
To paraphrase my colleague Rob Styles from his lightning talk at the code4lib Conference, we need to stop behaving like four year olds and realize the benefits of openly sharing.
NYPL quote a previous experience when an organization harvested and then sold their records. The fear that it might happen again is obviously driving their recent actions. Whether reselling harvested records is right, wrong, or illegal is a question in itself, but if those records were openly and freely available to all to use, who would be able to build a business on selling what you can get free?
Openly sharing catalog records is good for libraries, good for researchers, good for library users, good for us all. If you have trouble sleeping at night because you are worrying about people making money out of catalog records, be assured openly and widely sharing metadata is not good for them.
If in the end you still have residual concerns about what people might do with metadata that they source from you, there are always open licenses such as those provided under Creative Commons, or our own contribution to the licensing debate the Talis Community License, which can protect ownership and/or the ability to charge without restricting open sharing.
A win, win, win, win situation then – so why don’t we do it?
(Photo taken by Daquella manera displayed in Flickr)
Technorati Tags: Talis, Library, Open Data, Catalog, Copyright, code4lib, Creative Commons