Panlibus Blog

Archive for the 'Linked Data' Category

What interests 250+ librarians at 8:30 on a Sunday morning

IMG_0165 Linked Data, that’s what! 

I must admit I was a little skeptical of the timing when I accepted the invitation to provide the keynote for a Linked Data session – on the last day of IFLA 2010 – at 8:30 in the morning – in August – on a Sunday.  Who was going to want to get up at that time, on the day they were probably going to leave beautiful Gothenburg, to hear me witter on about the Semantic Web and the obvious benefits of Linked Data for libraries? A few minutes before the start, I was beginning to think my skepticism was well founded, viewing the acres of empty seats laid out in their menacing ranks in front of me. But then almost as if from nowhere, the room rapidly filled and by the time I took the stage we had something approaching a full house.  As you can see from my iPhone snap below, we ended up with a significant group (I lost count at about 250) of interested librarians.

250+ Librarians in Gothenburg

So was it worth them turning up at such an unsociable time?  I obviously can’t speak for my session, but I believe it was well worth turning up.  We had a series talks which varied from the in-depth technical/ontological spectrum to the rousing plea to open up your data now – and don’t hamper it with too much licensing.

First on after my session was Gordon Dunsire from the University of Strathclyde who gave us some in depth reasoning as to why we needed complex detailed ontologies based upon standards like RDA, FRBR, FRAD, and RDA to describe library resources in RDF for the Semantic Web.   To represent the full detail that catalogers have, and want to, provide for resource description I agree with him.  I also believe that we need to temper that detailed view by including more generic ontologies in addition. People from outside of the library world, dipping into library data [with more ways to describe a title than there are flavors of ice cream], will back off and not link to it unless the can find a nice friendly dc:title or foaf:name that they understand.

Some of the other speakers that I caught included Patrick Danowski’s entertaining presentation entitled “Step 1: Blow up the silo!. He took us through the possible licenses to use for sharing data, only to conclude that the best approach was totally open public domain.  He then went on to recommend CC0 and/or PDDL as the best way to indicate that your data is open for anyone to do anything with.

Jan Hanneman from the German National Library delivered an interesting description [pdf]of the way they have been publishing their authority data as Linked Data, and the challenges they met on the way.  These included legal and licensing issues, around what and under what terms they could publish.  Scalability of their service, being another key issue once they move beyond authority data.

All in all it was an excellent Sunday morning in Gothenburg.  I presume the organizers of IFLA 2011 will take note of the interest and build a larger, more convenient, slot in the programme for Linked Data.

Note: My presentation slides can be viewed on Slideshare and downloaded in pdf form

Linked Data and Libraries – videos published

The Linked Data and Libraries event held at the British Library last month was a very successful event attended by many interested in the impact and possibilities of these new techniques and technologies for libraries.

Many travelled from the far corners of the UK and Europe, but from the several emails I received it was clear that many others could not make it.  To that end we took along the technology to capture as much of the event as possible.

The videos have now been edited and published on our sister blog, Nodalities, where you will also find links to the associated presentation slides.  I can highly recommend these as an introduction to the topic and an overview of the thinking and activities in this area from such as the British Library and Europeana.

Linking the data dots …

Something-else I joined the world of libraries some twenty years ago and was immediately impressed with the implicit mission to share, the knowledge and information that libraries look after, with all for the benefit of mankind.  Being part of an organisation that was built out of a desire for libraries to share the data that backs up that knowledge, I was surprised how insular and protectionist with that data this world can be.  Some of the reasons behind this were technological.  Anyone who played with Z39.50 and other similar, so called, standards before the century turned will relate you war stories about the joys of sharing data.  Then along came the web [I know it started in the mid 1990s, but it really came along post the .com boom] and things started to get much easier, from a technology point of view anyway.

So why do libraries make it so difficult to share the data that they hold.  This isn’t the stuff that they hold – that is a whole other can of worms.  This is the data about the stuff they hold. The data that helps mankind find what mankind can benefit from.  I know we have a lot of history to get over, but the commercial world recognised ages ago that if you share information about what you have and what you do, people find what they are looking for and you get more business.

However, negatively ranting on about the insular outlook of some in library land, was not my purpose in writing this post.  You may know that I, and Talis, are involved in the emerging world of Linked Data.  Over recent months I have found myself immersed in other parallel universes such as National and Local Government, newspapers,  broadcast media, and finance systems.  It therefore was a great pleasure, to find my self organising a Linked Data and Libraries event at the British Library last week.  Sir Tim Berners-Lee’s vision of a Web of Data, complementing the current web of documents, utilising a collection of standards and techniques known as Linked Data, is all about sharing and linking data across organisations and sectors for the benefit of mankind – sound familiar?.

It was very refreshing to see the amount of interest this day attracted.  As you will see from the presentations from the day, made available via our sister Nodalities blog, there are many libraries and library organisations actively engaged with this.   Several, such as The German National Library, have released traditional (sourced from Marc records) bibliographic data in a Linked Data form using RDF.  Others, such as VIAF hosted by OCLC and the Library of Congress Authorities are providing RDF as one of the formats openly available from their service.  The Bibliothèque nationale de France is in the process of inviting tenders for an entire new system to open up their data and holdings as Linked Data.

It is fair to say that most of these initiatives are coming from National, International,large and cooperative libraries, but the interest is already trickling down to smaller libraries especially in the academic sector.  It is also fair to say that most who are engaged in thinking about Linked Data and libraries, take on board Sir Tim’s point about many of the benefits coming from the linking of data between sectors – libraries and science and government and the media and commerce and education and leisure and…

So despite my frustrations about the library world, still very evident in some circles, I am becoming more positive about libraries being able to fulfil their mankind benefiting mission as the web of data emerges.  The changing of influential attitudes, and the move to different underlying data formats, may help us leave behind some emotional and legal baggage.

Anyone who has followed us for a while will know that we have been banging on about, and implementing, semantic web and linked data techniques and technologies for many years.  It is great when others start to ‘get it’ and you stop having to be one of a few voices in the wilderness.  These changes will not happen to all libraries over night, but it is nice to swap frustration at the lack of vision and ambition for frustration at a lack of progress – something I think I was born with.

Understanding the Semantic Web – 4

In this final post exploring questions and themes around Karen Coyle’s Understanding the Semantic Web report, I want to look at the coexistence of the universal and the particular in the Semantic Web. This is something that Karen touches on in her report:

Of course, it is not reasonable to assume a single system of identifiers for everything on the Web. Undoubtedely, different communities will assign identifiers of their own, some overlapping with those of another community.

This may seem reasonable in the year 2010. However, a glance back at the history of ideas might make us a little more circumspect. Certainly, for the last three centuries, the dominant paradigm among scientists and intellectuals has been the universal over the particular. The universal is what Karen is referring to with “a single system of identifiers for everything on the Web.”

But in this postmodern moment, there is a strong suspicion of universalism, as articulated by Karen, and a favouring of the local and particular. We have seen this in the flourishing of folksonomies alongside more authoritative taxonomies.

Making sense of the world

The reasons behind the breakdown in belief in universal values is an intellectual question far too big for this post, involving the horrors of two world wars, the end of the Cold War signalling the demise of big ideas, and other more local crises such as May 1968 in France and the US’s Vietnam War, all of which contributed to a sense of disillusion with the certainties of the old order and a distrust of grand narratives.

We only have to look back at the archetypal Victorian amateur botanist to see how much things have changed. He would have been concerned with the coining of universal terms to describe objective features, with orders and classes depending on the number and position of male and female organs of the flower, for example – in other words universally recognised objective features.

In fact, the relationship between the universal and the particular has always been in flux. The universal is not dead; it is alive in the technology domain at least, where a machine will either work or not work and is not subject to personal interpretation. Even in the heyday of the Victorian botanist, there were subtle processes undermining a universalist paradigm, although these would not reach their full expression until the 20th century. There were also vestiges of a particularist past and present, in more folkloric approaches to botany which occasionally contradicted universalist findings – where superficial impressions might lead to a a categorisation that would later be disproven by a DNA examination revealing radical differences due to divergent evolutionary lineages.

It is the ability to manage this constantly shifting emphasis between the universal and the particular which, for me, marks the Semantic Web as an emerging technology with serious potential for longevity. It will enable the coexistence of local and particular frameworks with “switching stations” as Karen calls them, or linking hubs “that gather identifiers that are equivalent or nearly so” to enable interoperability.

Crucially, linked data has the flexibility to manage these historical fluxes, just in case the postmodern moment passes and humanity recovers its confidence in grander truths. The Semantic Web can be agnostic in that sense, and that is a good thing, as the determinants arguably lie beyond the technological realm for the main part.

Stuck in the middle with you

In the meantime, the library finds itself uncomfortably positioned right on the interface between the universal and the particular. It has historically served as intermediary between the two – with the library standing as a repository of an authoritative and universal view of the world and both the library and librarians helping individuals move from their particular viewpoint into the wider world of knowledge. This is what the traditional public library gave to the likes of Michael Caine, widening their world and enabling them to transcend the circumstances of their particular upbringing. It is incorrect, in my view, to believe that technology alone has been responsible for this disintermediation – there are broader cultural and and intellectual forces at work.

Local perspectives have gradually assumed a greater importance over time, and the apotheosis of this – the library user as Google searcher – has greatly undermined the confidence of the library, wrenching it from its comfortable intermediary position. Whether the library can redefine itself in this context is something many of us have been asking for a long time now. But the good news, for me, is that the Semantic Web has the potential to safeguard the universal whilst giving expression to the local and particular, the latter being important in any intellectual climate.

The library world has to date been only subconsciously aware of the size and shape of these forces, and has intermittently adopted defensive ideas in response. To understand is the first step, and maybe the second step is to see the Semantic Web as a valuable tool in the repositioning of the library in a global setting of multiple particulars in which universal values are still out there somewhere.

Will Linked Data mean an early end for Marc & RDA

For the uninitiated, NGC4LIB is a library focused mailing list which has a reputation for often engaging in massive discussions and disagreements around the minutiae of future cataloguing and library focused metadata practices.  They have recently been involved in one of these great debates stimulated by the comments of Sir Tim Berners-Lee in a recent interview.    As is often is the case on this list, the debate wandered well off topic in to the realms of FRBR and it’s alternatives before being brought back on topic by Jim Weinheimer, who started the conversation in the first place.

A statement in Jim’s contribution caught my eye:

Implementing linked data, although it would be great, is years and years away from any kind of practical implementation

hmg.gov.uk_data Implementing linked data is already well underway with many groups across the Globe.  For instance there are couple that we at Talis are closely involved with.  Following on from Sir Tim’s interview comments, the British Government are currently running a, soon to be opened, closed beta of data.gov.uk.  Through this site they are not only opening up data in many forms such as CSV, like their American cousins at data.gov, but they are also starting to encode in RDF and publishing it via the Talis Platform which provides a SPARQL (the query language of the Linked Data web) end point.  This approach not only lets anyone download the raw data, but also enables them to query it for whatever they have in mind. If you want a sneak preview of how such data is queried, take a look at some of theses examples.   In a similar vein, metadata from BBC programmes and music is being harvested in to Talis Platform stores.  Again these are open to anyone to innovate with – check out these screencasts  to see some of the early possibilities.

Ah but that is not bibliographic data, I hear someone cry – It’ll never catch on in libraries.  I get the impression from some comments on the NGC4LIB list, that it will not be possible for ‘our’ data to participate in this Link Data web until ‘we’ have predicted all possible uses for it, analysed them, and developed a metadata standard to cope with every eventuality.   There are already a few examples of the library world engaging with RDF and Linked data, one obvious one being the Library of  Congress with LCSH another the National Library of Sweden.  Neither of these examples are encoding the kind of detail you would expect in a Marc record, they are using ontology to describe associated concepts such as subjects.

There has been some ontology development towards this larger goal with Bibo (Bibliographic Ontology Specification).  Although not there yet, Bibo is good enough to be used in live applications whishing to encode bibliographic data.  Such an example is Talis Aspire.  Underpinned by the same Platform as the UK Government and BBC Linked Data services, it uses the Bibo ontology to describe resources an an academic context

Alongside data.gov.uk there is a Google Group conversation taking place. The refreshing part of this conversation is that it is between the producers of the data sets, those developing the way it should be encoded in to RDF, and those who want to consume it.  Several times you will see a difference of opinion between those that want to describe the data to it’s fullest, and those that wish to extract the most value from it. “I agree that is a cleaner way of encoding, but can you imagine how complex the query will be to extract what I want!”.  This approach is not unusual in the Linked Data world, where producers and consumers get together, pragmatically evolving a way forward.  Dataincubator.org is an open place where such pragmatic development and evolution is taking place.  Check out examples of a subset of Open Library data. (note this is an example of data, not a user interface).

Semantic Library _ Mark Twain Another, bibliographic focused, experiment can be found at semanticlibrary.org. From some of the example links on the home page, you can see that building in this way enables very different ways of exploring metadata.  People, subjects, publishers, works, editions, series, all being equally valid starting points to explore from.

Doth the bell toll for Marc and RDA?
Not for a long old time – Ontology like Bibo, and the results of work at Dataincubator.org and semanticlibrary.org, may well lead to more open useful, and most importantly linked, access to data previously limited to library search interfaces.  That data has to come from somewhere though, and the massive global network of libraries encoding their data using Marc ,and maybe soon RDA, are ideally placed to continue producing rich bibliographic metadata.  Metadata to be fed in to Linked Data web in the most appropriate form for that purpose.  There will continue to be a place for current cataloguing practices and processes for a significant period -supporting and enabling the bibliographic part of the Linked Data web, not being replaced by it.

No doubt the NGC4LIB conversation on this topic will continue. Regardless of how it progresses, there is a current need and desire for bibliographic data in the linked data web.  The people behind that desire, and the innovation to satisfy it, may well have come up with a satisfactory solution, for them, whilst we are still talking.