Panlibus Blog

Archive for the 'RDF' Category

What interests 250+ librarians at 8:30 on a Sunday morning

IMG_0165 Linked Data, that’s what! 

I must admit I was a little skeptical of the timing when I accepted the invitation to provide the keynote for a Linked Data session – on the last day of IFLA 2010 – at 8:30 in the morning – in August – on a Sunday.  Who was going to want to get up at that time, on the day they were probably going to leave beautiful Gothenburg, to hear me witter on about the Semantic Web and the obvious benefits of Linked Data for libraries? A few minutes before the start, I was beginning to think my skepticism was well founded, viewing the acres of empty seats laid out in their menacing ranks in front of me. But then almost as if from nowhere, the room rapidly filled and by the time I took the stage we had something approaching a full house.  As you can see from my iPhone snap below, we ended up with a significant group (I lost count at about 250) of interested librarians.

250+ Librarians in Gothenburg

So was it worth them turning up at such an unsociable time?  I obviously can’t speak for my session, but I believe it was well worth turning up.  We had a series talks which varied from the in-depth technical/ontological spectrum to the rousing plea to open up your data now – and don’t hamper it with too much licensing.

First on after my session was Gordon Dunsire from the University of Strathclyde who gave us some in depth reasoning as to why we needed complex detailed ontologies based upon standards like RDA, FRBR, FRAD, and RDA to describe library resources in RDF for the Semantic Web.   To represent the full detail that catalogers have, and want to, provide for resource description I agree with him.  I also believe that we need to temper that detailed view by including more generic ontologies in addition. People from outside of the library world, dipping into library data [with more ways to describe a title than there are flavors of ice cream], will back off and not link to it unless the can find a nice friendly dc:title or foaf:name that they understand.

Some of the other speakers that I caught included Patrick Danowski’s entertaining presentation entitled “Step 1: Blow up the silo!. He took us through the possible licenses to use for sharing data, only to conclude that the best approach was totally open public domain.  He then went on to recommend CC0 and/or PDDL as the best way to indicate that your data is open for anyone to do anything with.

Jan Hanneman from the German National Library delivered an interesting description [pdf]of the way they have been publishing their authority data as Linked Data, and the challenges they met on the way.  These included legal and licensing issues, around what and under what terms they could publish.  Scalability of their service, being another key issue once they move beyond authority data.

All in all it was an excellent Sunday morning in Gothenburg.  I presume the organizers of IFLA 2011 will take note of the interest and build a larger, more convenient, slot in the programme for Linked Data.

Note: My presentation slides can be viewed on Slideshare and downloaded in pdf form

Linked Data and Libraries – videos published

The Linked Data and Libraries event held at the British Library last month was a very successful event attended by many interested in the impact and possibilities of these new techniques and technologies for libraries.

Many travelled from the far corners of the UK and Europe, but from the several emails I received it was clear that many others could not make it.  To that end we took along the technology to capture as much of the event as possible.

The videos have now been edited and published on our sister blog, Nodalities, where you will also find links to the associated presentation slides.  I can highly recommend these as an introduction to the topic and an overview of the thinking and activities in this area from such as the British Library and Europeana.

Linking the data dots …

Something-else I joined the world of libraries some twenty years ago and was immediately impressed with the implicit mission to share, the knowledge and information that libraries look after, with all for the benefit of mankind.  Being part of an organisation that was built out of a desire for libraries to share the data that backs up that knowledge, I was surprised how insular and protectionist with that data this world can be.  Some of the reasons behind this were technological.  Anyone who played with Z39.50 and other similar, so called, standards before the century turned will relate you war stories about the joys of sharing data.  Then along came the web [I know it started in the mid 1990s, but it really came along post the .com boom] and things started to get much easier, from a technology point of view anyway.

So why do libraries make it so difficult to share the data that they hold.  This isn’t the stuff that they hold – that is a whole other can of worms.  This is the data about the stuff they hold. The data that helps mankind find what mankind can benefit from.  I know we have a lot of history to get over, but the commercial world recognised ages ago that if you share information about what you have and what you do, people find what they are looking for and you get more business.

However, negatively ranting on about the insular outlook of some in library land, was not my purpose in writing this post.  You may know that I, and Talis, are involved in the emerging world of Linked Data.  Over recent months I have found myself immersed in other parallel universes such as National and Local Government, newspapers,  broadcast media, and finance systems.  It therefore was a great pleasure, to find my self organising a Linked Data and Libraries event at the British Library last week.  Sir Tim Berners-Lee’s vision of a Web of Data, complementing the current web of documents, utilising a collection of standards and techniques known as Linked Data, is all about sharing and linking data across organisations and sectors for the benefit of mankind – sound familiar?.

It was very refreshing to see the amount of interest this day attracted.  As you will see from the presentations from the day, made available via our sister Nodalities blog, there are many libraries and library organisations actively engaged with this.   Several, such as The German National Library, have released traditional (sourced from Marc records) bibliographic data in a Linked Data form using RDF.  Others, such as VIAF hosted by OCLC and the Library of Congress Authorities are providing RDF as one of the formats openly available from their service.  The Bibliothèque nationale de France is in the process of inviting tenders for an entire new system to open up their data and holdings as Linked Data.

It is fair to say that most of these initiatives are coming from National, International,large and cooperative libraries, but the interest is already trickling down to smaller libraries especially in the academic sector.  It is also fair to say that most who are engaged in thinking about Linked Data and libraries, take on board Sir Tim’s point about many of the benefits coming from the linking of data between sectors – libraries and science and government and the media and commerce and education and leisure and…

So despite my frustrations about the library world, still very evident in some circles, I am becoming more positive about libraries being able to fulfil their mankind benefiting mission as the web of data emerges.  The changing of influential attitudes, and the move to different underlying data formats, may help us leave behind some emotional and legal baggage.

Anyone who has followed us for a while will know that we have been banging on about, and implementing, semantic web and linked data techniques and technologies for many years.  It is great when others start to ‘get it’ and you stop having to be one of a few voices in the wilderness.  These changes will not happen to all libraries over night, but it is nice to swap frustration at the lack of vision and ambition for frustration at a lack of progress – something I think I was born with.

Library of Congress launch Linked Data Subject Headings

Back in December I was very critical of the Library of Congress for forcing the take down of the Linked Data service at lcsh.info.  LoC employee, and Talking with Talis Interviewee, Ed Summers had created a powerful and useful demonstration of how applying Linked Data principles to a LoC dataset  such as the Library of Congress Subject Headings could deliver an open asset to add value to other systems.  Very rapidly after it’s initial release another Talking with Talis interviewee Martin Malmsten, from the Royal Library of Sweden, almost immediately made use of the links to the LCSH data.   Ed was asked to take the service down, ahead of the LoC releasing their own equivalent in the future.

I still wonder at the LoC approach to this, but that is all water under the bridge now, as they have now launched their service, under the snappy title of “Authorities & Vocabularies” at http://id.loc.gov/authorities/.

The Library of Congress Authorities and Vocabularies service enables both humans and machines to programmatically access authority data at the Library of Congress via URIs.

The first release under this banner is the aforementioned Library of Congress Subject Headings.

As well as delivering access to the information via a Linked Data service, they also provide a search interface, and a ‘visualization’ via which you can see the relationship between terms, both broader and narrower, that are held in the data.

To quote Jonathan Rochkind “id.loc.gov is AWESOME”:

Not only is it the first (so far as I know) online free search and browse of LCSH (with in fact a BETTER interace than the proprietary for-pay online alternative I’m aware of).

But it also gives you access to the data itself via BOTH a bulk download AND some limited machine-readable APIs. (RSS feeds for a simple keyword query; easy lookup of metadata about a known-item LCSH term, when you know the authority number; I don’t think there’s a SPARQL endpoint? Yet?).

On the surface, to those not yet bought in to the potential of Linked Data, and especially Linked Open Data, this may seem like an interesting but not necessarily massive leap forward.   I believe that what underpins the fairly simple functional user interface they provide will gradually become core to bibliographic data becoming a first-class citizen in the web of data.

Overnight this uri ‘http://id.loc.gov/authorities/sh85042531’ has now become the globally available, machine and human readable, reliable source for the description for the subject heading of ‘Elephants’ containing links to its related terms (in a way that both machines and humans can navigate).  This means that system developers and integrators can rely upon that link to represent a concept, not necessarily the way they want to [locally] describe it.  This should facilitate the ability for disparate systems and services to simply share concepts and therefore understanding – one of the basic principles behind the Semantic Web.

This move by the LoC has two aspects to it that should make it a success.  The first one is technical.  Adopting the approach, standards, and conventions promoted by the Linked Data community ensures a ready made developer community to use and spread the word about it.  The second, one is openness.  Anyone and everyone will not have to think ”is it OK to use this stuff” before taking advantage of this valuable asset.  Many in the bibliographic community, who seem to spend far too much time on licensing and logins, should watch and learn from this.

A bit of a bumpy ride to get here but nevertheless a great initiative from the LoC that should be welcomed.  On that I hope they and many others will build upon in many ways.  – Bring on the innovation that this will encourage.

Image from the Library of Congress Flickr photostream.

UKSG09 Uncertain vision in sunny Torquay

uksg Glorious sunshine greeted the opening of the first day of UKSG 2009 in Torquay yesterday.  The stroll along the seafront from the conference hotel (Grand in name and all facilities, except Internet access – £1/minute for dialup indeed!)  was in delightful sharp contrast to the often depressing plane and taxi rides to downtown conference centres.

IMG_0012 The seaside theme was continued with the bright conference bags.  Someone had obviously got hold of a job lot of old deckchair canvas.  700 plus academic librarians and publishers and supplier representatives settled down, in the auditorium of the Riviera Centre, to hear about the future of their world.

The first keynote speakers were very different in topic and delivery, but all three left you with the impression of upcoming change the next few years for which they were not totally sure of the shape.

First up was Knewco Inc’s Jan Velterop pitch was a somewhat meandering treatise on the wonders and benefits of storing metadata in triples – something he kept saying he would explain later.  The Twitter #uksg09 channel was screaming “when is he going to tell us about triples” and “what’s a triple” whilst he was talking.  He eventually got there but I’m not sure how many of the audience understood the massive benefits of storing and liking data in triples, that we at Talis are fully aware of.   Coincidentally, for those who did get his message, I was posting about the launch of the Talis Connected Commons for open free storage of data – in triples, in the Talis Platform.

Next up was Sir Timothy O’Shea from the University of Edinburgh, who talked about the many virtual things they are doing up in Scotland.  You can take your virtual sheep from your virtual farm to the virtual vet, and even on to a virtual post mortem.  His picture of the way information technology is playing its part in changing life at the university, apart from being a great sales pitch for it, left him predicting that this was only the early stages of a massive revolution.  As to where it was going to lead us n a few years he was less clear.

Joseph Janes, of the University of Washington Information School, was one of those great speakers who dispensed with any visual aids or prompts and delivered us a very entertaining 30 minutes comparing the entry in to this new world of technology enhance information access, with his experience as an American wandering around a British seaside town.  His message that we expect the next few years to feel very similar on the surface, as we will recognise most of the components, but will actually be very different when you analyse it.  As an American he recognises cars, buses, adverts, and food, but in Britain they travel on the wrong side of the road, are different shapes, and are products he doesn’t recognise.   As we travel in to an uncertain but exciting future, don’t be fooled recognising a technology, watch how it is being used.

A great start to the day, which included a good break-out session from Huddersfield’s Dave Pattern. He ended his review of OPACs and predictions about the development of OPAC 2.0 and beyond, with a heads-up about my session today, which caused me to spend a couple of hours in the hotel bar, the only place with Wifi, tweaking my slides.  It would be much easier to follow Mr Janes’ example and deliver my message of the cuff without slides – not this time perhaps 😉

Looking forward to another good day – even if the sun seems to have deserted us.

Sharing Usage Data – Dave Pattern & Patrick Murray-John Talk with Talis

My guests for this Talking with Talis podcast demonstrate a great example of how openly sharing data will stimulate innovation.

Last month, Huddersfield University’s Dave Pattern announced that he was sharing usage data derived from circulation transactions held in their Library Management System

I’m very proud to announce that Library Services at the University of Huddersfield has just done something that would have perhaps been unthinkable a few years ago: we’ve just released a major portion of our book circulation and recommendation data under an Open Data Commons/CC0 licence. In total, there’s data for over 80,000 titles derived from a pool of just under 3 million circulation transactions spanning a 13 year period.

Within a matter of days Patrick Murray-John from Mary Washington University had taken a copy of that data, transformed the data to RDF and published it in a Semantic Web form.

In this conversation we explore the motivations behind Dave’s work and the benefits to the sharing process of the Open Data Commons license he chose to release the data under.   Patrick then takes us through how he worked with the data and demonstrated how simple it was to produce and RDF version.

We then explore how the principles demonstrated by their work could be expanded upon to add wide value to the library scene from recommender systems to a sales aid for Universities trying to attract students.

Dave Pattern challenges libraries to open their goldmine of data

The simple title of Dave’s recent blog post ‘Free book usage data from the University of Huddersfield’ hides the significance of what he is announcing.

I’m very proud to announce that Library Services at the University of Huddersfield has just done something that would have perhaps been unthinkable a few years ago: we’ve just released a major portion of our book circulation and recommendation data under an Open Data Commons/CC0 licence. In total, there’s data for over 80,000 titles derived from a pool of just under 3 million circulation transactions spanning a 13 year period.

13 years worth of library circulation data opened up for anyone to use – he is right about it being unthinkable a few years ago.  I suggest that for many it is probably still unthinkable now, to whom I would ask the question why not?

In isolation the University of Huddersfield’s data may only be of limited use but if others did the same, the potential for trend analysis, and the ability to offer recommendations and who-borrowed-this-borrowed-that  services, could be significant.

If you have 14 minutes to spend I would recommend viewing Dave’s slidecast from the recent TILE project meeting, where he announced this, so you can see how he uses this data to add value to the Huddersfield University search experience..

Patrick Murry-John picked up on Dave’s announcement and within a couple of days has produced an RDF based view of this data – I recommend you download the Tabulator Firefox plug-in to help you navigate his data.

Patrick was alerted to Dave’s announcement by Tony Hirst who amplified Dave’s challenge “DON’T YOU DARE NOT DO THIS…”

As Dave puts it, your library is sitting on a goldmine of useful data that should be mined (and refined by sharing with that of other libraries).  A hat tip to Dave for doing this, and another one for using a sensible open licence to do it with.

Picture published by ToOliver2 on Flickr

Semantic Future for Libraries – Martin Malmsten Talks with Talis

Martin Malmsten Martin Malmsten is from the LIBRIS department of the Royal Library of Sweden – LIBRIS being the discovery interface for the library.

Since joining as a software developer has been absorbed in to the world of library search and discovery.  He played a major part in the build and launch of the latest LIBRIS search interface which has introduced under the surface some Semantic Web and Linked Data features.

We discuss his career, the use of User Centered Design & Iterative Development methodologies, the Semantic Web techniques and technologies he used, and their future applicability to the library domain.

Items discussed in our conversation:

Technorati Tags: ,,,,,,

Ed Summers Talks with Talis

Ed Summers - 2 Ed Summers has recently been active in exposing Library of Congress Subject Heading data as Linked Data using Semantic Web technologies and RDF, through his experimental service at lcsh.info.

In this conversation we find out how Ed’s career, not always on a traditional library path, has led him to his work in the Library of Congress, his pragmatic interest in things Semantic Web, and why he has needed to experiment outside of the LoC.

In this conversation we reference:

This conversation was conducted as a Skype call on Thurday 26th June 2008, recorded with Ecamm Network‘s Call Recorder for Skype, and edited on a Mac with Garageband.

Technorati Tags: , , ,

The Bibliographic Ontology 1.0 – published.

The Bibliographic Ontology 1.0 at Frederick Giasson’s Weblog

Frédérick Giasson announces the release of BIBO:

After months of development and nearly 1000 messages on the mailing list exchanged between 83 participants, the first version of The Bibliographic Ontology has just been published.

All the background documentation and the specification can be found at bibliontology.com

From the abstract:

The Bibliographic Ontology Specification provides main concepts and properties for describing citations and bibliographic references (i.e. quotes, books, articles, etc) on the Semantic Web.

As Frédérick says in his post, this is a milestone.  A milestone, not just for the project but for those wishing to describe bibliographic things in the Semantic Web.  As he also says this is a beginning not an end – a really good beginning, a hat tip to all those that have put the hours in.

Technorati Tags: , , , ,