Panlibus Blog

Archive for the 'Semantic Web' Category

What interests 250+ librarians at 8:30 on a Sunday morning

IMG_0165 Linked Data, that’s what! 

I must admit I was a little skeptical of the timing when I accepted the invitation to provide the keynote for a Linked Data session – on the last day of IFLA 2010 – at 8:30 in the morning – in August – on a Sunday.  Who was going to want to get up at that time, on the day they were probably going to leave beautiful Gothenburg, to hear me witter on about the Semantic Web and the obvious benefits of Linked Data for libraries? A few minutes before the start, I was beginning to think my skepticism was well founded, viewing the acres of empty seats laid out in their menacing ranks in front of me. But then almost as if from nowhere, the room rapidly filled and by the time I took the stage we had something approaching a full house.  As you can see from my iPhone snap below, we ended up with a significant group (I lost count at about 250) of interested librarians.

250+ Librarians in Gothenburg

So was it worth them turning up at such an unsociable time?  I obviously can’t speak for my session, but I believe it was well worth turning up.  We had a series talks which varied from the in-depth technical/ontological spectrum to the rousing plea to open up your data now – and don’t hamper it with too much licensing.

First on after my session was Gordon Dunsire from the University of Strathclyde who gave us some in depth reasoning as to why we needed complex detailed ontologies based upon standards like RDA, FRBR, FRAD, and RDA to describe library resources in RDF for the Semantic Web.   To represent the full detail that catalogers have, and want to, provide for resource description I agree with him.  I also believe that we need to temper that detailed view by including more generic ontologies in addition. People from outside of the library world, dipping into library data [with more ways to describe a title than there are flavors of ice cream], will back off and not link to it unless the can find a nice friendly dc:title or foaf:name that they understand.

Some of the other speakers that I caught included Patrick Danowski’s entertaining presentation entitled “Step 1: Blow up the silo!. He took us through the possible licenses to use for sharing data, only to conclude that the best approach was totally open public domain.  He then went on to recommend CC0 and/or PDDL as the best way to indicate that your data is open for anyone to do anything with.

Jan Hanneman from the German National Library delivered an interesting description [pdf]of the way they have been publishing their authority data as Linked Data, and the challenges they met on the way.  These included legal and licensing issues, around what and under what terms they could publish.  Scalability of their service, being another key issue once they move beyond authority data.

All in all it was an excellent Sunday morning in Gothenburg.  I presume the organizers of IFLA 2011 will take note of the interest and build a larger, more convenient, slot in the programme for Linked Data.

Note: My presentation slides can be viewed on Slideshare and downloaded in pdf form

Linked Data and Libraries – videos published

The Linked Data and Libraries event held at the British Library last month was a very successful event attended by many interested in the impact and possibilities of these new techniques and technologies for libraries.

Many travelled from the far corners of the UK and Europe, but from the several emails I received it was clear that many others could not make it.  To that end we took along the technology to capture as much of the event as possible.

The videos have now been edited and published on our sister blog, Nodalities, where you will also find links to the associated presentation slides.  I can highly recommend these as an introduction to the topic and an overview of the thinking and activities in this area from such as the British Library and Europeana.

Linking the data dots …

Something-else I joined the world of libraries some twenty years ago and was immediately impressed with the implicit mission to share, the knowledge and information that libraries look after, with all for the benefit of mankind.  Being part of an organisation that was built out of a desire for libraries to share the data that backs up that knowledge, I was surprised how insular and protectionist with that data this world can be.  Some of the reasons behind this were technological.  Anyone who played with Z39.50 and other similar, so called, standards before the century turned will relate you war stories about the joys of sharing data.  Then along came the web [I know it started in the mid 1990s, but it really came along post the .com boom] and things started to get much easier, from a technology point of view anyway.

So why do libraries make it so difficult to share the data that they hold.  This isn’t the stuff that they hold – that is a whole other can of worms.  This is the data about the stuff they hold. The data that helps mankind find what mankind can benefit from.  I know we have a lot of history to get over, but the commercial world recognised ages ago that if you share information about what you have and what you do, people find what they are looking for and you get more business.

However, negatively ranting on about the insular outlook of some in library land, was not my purpose in writing this post.  You may know that I, and Talis, are involved in the emerging world of Linked Data.  Over recent months I have found myself immersed in other parallel universes such as National and Local Government, newspapers,  broadcast media, and finance systems.  It therefore was a great pleasure, to find my self organising a Linked Data and Libraries event at the British Library last week.  Sir Tim Berners-Lee’s vision of a Web of Data, complementing the current web of documents, utilising a collection of standards and techniques known as Linked Data, is all about sharing and linking data across organisations and sectors for the benefit of mankind – sound familiar?.

It was very refreshing to see the amount of interest this day attracted.  As you will see from the presentations from the day, made available via our sister Nodalities blog, there are many libraries and library organisations actively engaged with this.   Several, such as The German National Library, have released traditional (sourced from Marc records) bibliographic data in a Linked Data form using RDF.  Others, such as VIAF hosted by OCLC and the Library of Congress Authorities are providing RDF as one of the formats openly available from their service.  The Bibliothèque nationale de France is in the process of inviting tenders for an entire new system to open up their data and holdings as Linked Data.

It is fair to say that most of these initiatives are coming from National, International,large and cooperative libraries, but the interest is already trickling down to smaller libraries especially in the academic sector.  It is also fair to say that most who are engaged in thinking about Linked Data and libraries, take on board Sir Tim’s point about many of the benefits coming from the linking of data between sectors – libraries and science and government and the media and commerce and education and leisure and…

So despite my frustrations about the library world, still very evident in some circles, I am becoming more positive about libraries being able to fulfil their mankind benefiting mission as the web of data emerges.  The changing of influential attitudes, and the move to different underlying data formats, may help us leave behind some emotional and legal baggage.

Anyone who has followed us for a while will know that we have been banging on about, and implementing, semantic web and linked data techniques and technologies for many years.  It is great when others start to ‘get it’ and you stop having to be one of a few voices in the wilderness.  These changes will not happen to all libraries over night, but it is nice to swap frustration at the lack of vision and ambition for frustration at a lack of progress – something I think I was born with.

Understanding the Semantic Web – 4

In this final post exploring questions and themes around Karen Coyle’s Understanding the Semantic Web report, I want to look at the coexistence of the universal and the particular in the Semantic Web. This is something that Karen touches on in her report:

Of course, it is not reasonable to assume a single system of identifiers for everything on the Web. Undoubtedely, different communities will assign identifiers of their own, some overlapping with those of another community.

This may seem reasonable in the year 2010. However, a glance back at the history of ideas might make us a little more circumspect. Certainly, for the last three centuries, the dominant paradigm among scientists and intellectuals has been the universal over the particular. The universal is what Karen is referring to with “a single system of identifiers for everything on the Web.”

But in this postmodern moment, there is a strong suspicion of universalism, as articulated by Karen, and a favouring of the local and particular. We have seen this in the flourishing of folksonomies alongside more authoritative taxonomies.

Making sense of the world

The reasons behind the breakdown in belief in universal values is an intellectual question far too big for this post, involving the horrors of two world wars, the end of the Cold War signalling the demise of big ideas, and other more local crises such as May 1968 in France and the US’s Vietnam War, all of which contributed to a sense of disillusion with the certainties of the old order and a distrust of grand narratives.

We only have to look back at the archetypal Victorian amateur botanist to see how much things have changed. He would have been concerned with the coining of universal terms to describe objective features, with orders and classes depending on the number and position of male and female organs of the flower, for example – in other words universally recognised objective features.

In fact, the relationship between the universal and the particular has always been in flux. The universal is not dead; it is alive in the technology domain at least, where a machine will either work or not work and is not subject to personal interpretation. Even in the heyday of the Victorian botanist, there were subtle processes undermining a universalist paradigm, although these would not reach their full expression until the 20th century. There were also vestiges of a particularist past and present, in more folkloric approaches to botany which occasionally contradicted universalist findings – where superficial impressions might lead to a a categorisation that would later be disproven by a DNA examination revealing radical differences due to divergent evolutionary lineages.

It is the ability to manage this constantly shifting emphasis between the universal and the particular which, for me, marks the Semantic Web as an emerging technology with serious potential for longevity. It will enable the coexistence of local and particular frameworks with “switching stations” as Karen calls them, or linking hubs “that gather identifiers that are equivalent or nearly so” to enable interoperability.

Crucially, linked data has the flexibility to manage these historical fluxes, just in case the postmodern moment passes and humanity recovers its confidence in grander truths. The Semantic Web can be agnostic in that sense, and that is a good thing, as the determinants arguably lie beyond the technological realm for the main part.

Stuck in the middle with you

In the meantime, the library finds itself uncomfortably positioned right on the interface between the universal and the particular. It has historically served as intermediary between the two – with the library standing as a repository of an authoritative and universal view of the world and both the library and librarians helping individuals move from their particular viewpoint into the wider world of knowledge. This is what the traditional public library gave to the likes of Michael Caine, widening their world and enabling them to transcend the circumstances of their particular upbringing. It is incorrect, in my view, to believe that technology alone has been responsible for this disintermediation – there are broader cultural and and intellectual forces at work.

Local perspectives have gradually assumed a greater importance over time, and the apotheosis of this – the library user as Google searcher – has greatly undermined the confidence of the library, wrenching it from its comfortable intermediary position. Whether the library can redefine itself in this context is something many of us have been asking for a long time now. But the good news, for me, is that the Semantic Web has the potential to safeguard the universal whilst giving expression to the local and particular, the latter being important in any intellectual climate.

The library world has to date been only subconsciously aware of the size and shape of these forces, and has intermittently adopted defensive ideas in response. To understand is the first step, and maybe the second step is to see the Semantic Web as a valuable tool in the repositioning of the library in a global setting of multiple particulars in which universal values are still out there somewhere.

Understanding the Semantic Web – 3

Is the Semantic Web so immense and infinite in its possibilities that drawing out a vision for a particular domain (libraries in this instance) becomes difficult or impossible? That’s what Karen Coyle seems to be saying in her Understanding the Semantic Web: Bibliographic data and metadata report here:

It is somewhat difficult to explain what you can do with linked open data because the answer is just about anything.

And here:

If all this sounds other worldly and value, it is because there is no specific vision of where these changes will lead us.

Or is it the case that the Semantic Web crystallises the changes around librarianship that have proven to be both problematic and exciting in recent times?

Living on borrowed time?

Karen’s call for action – for librarians to embrace the Semantic Web – emanates from her uncontentious point that today’s library users are more likely to be found on the Internet than in the library building itself. However it is precisely the library’s traditional role as a repository of authoritative textual artefacts, that has borne the fruit (metadata) that Karen now proposes we offer to the emergent Semantic Web. The question here is – Are we living off our former glories as we move into the future? And does this raise issues of sustainability?

Karen says this about the library’s “unique selling point” (to borrow an outdated marketing term):

What can libraries offer that no other community can? First, libraries have holdings of published and unpublished materials that are not currently represented on the Web. Next, they have metadata for most of those materials. The metadata includes controlled forms of personal and corporate names, physical description, topical headings, and classification assignments.

I am left with an uncomfortable sense that our position in the future depends on our printed heritage which, of course, would make us very vulnerable. Is Karen consigning libraries to a role of book museum in the new semantic world order? In an e-only world, I struggle to see the library as having a central role in metadata creation in the way it had in the past, with the physical copy no longer of primordial importance. Libraries have been centralising the cataloguing process for many decades, of course. But the next phase may concentrate such activities in national libraries, as works may only have to be catalogued once.

Goodbye FRBR, hello semantics

FRBR is apposite here, as it remains relevant in our postmodern world of intertextuality and adaptations into new media, Describing the relationships between e-books and movies, for example, will have longevity in our more web-centric culture.

What the Semantic Web brings, though, is an almost infinitesimally richer picture, built on broader relationships between cultural entities. Karen herself talks about the extensibility of the Web, and it is the capability of linked data to define and discover new relationships that lies at the heart of its transformative potential.

One of my favourite films is a 1970s French movie called Celine and Julie go boating. It was influenced in a loose but culturally important way by the novelist Henry James, specifically his novella The other house and his short story A certain romance of old clothes. This was the sort of knowledge that used to be passed around by word of mouth in the pre-Internet era, making it quite elitist. But FRBR provides no mechanism for describing such a loose, though culturally significant, relationship. It was also very difficult to get hold of those texts, whereas with the web it’s a whole lot easier, especially if you know what you’re doing. With a combination of the property element of RDF to describe the relationship, and the location dimension of the URI in linked data, it’s set to become even easier.

Step away from Google Maps

I started to think about refocusing on the work and away from the physical item after attending the Middlemash library mashup event in December. There’s so much attention paid to the location of physical items in library technology, with seemingly endless Google Map mashups, that it sometimes feels that library technology is disappearing up its own Yahoo Pipes. Maybe that’s symptomatic of the dominance of Google Maps as an island of openness in a sea of closed data. But is the nearest location of a book what today’s library users either want or need? The answer to that depends to an extent on which library sector you’re working in, but the Semantic Web would enable us to offer timelines and other interesting representations of relationships between one work and another and also between an artistic work and the real world, the latter being explored effectively in the report.

But the question is, are we librarians willing or able to rise to that challenge? We witnessed the demise of the scholarly librarian over the course of the 20th century. I distinctly remember being told at library school in the 1990s that librarians should be less concerned about knowing stuff, and more about where to go to find stuff out. The Semantic Web offers a depth of engagement in texts and other cultural artefacts.

Where do we want to be positioned in this new semantic world order?

In the final blog posting in this series, I will discuss the co-existence of the universal and the particular on the Semantic Web and how I see libraries fitting into that relationship.

Understanding the Semantic Web – 2

This blog covers chapter 2 of Understanding the Semantic Web: Bibliographic data and metadata, Karen Coyle’s report on the potential of the Semantic Web for libraries.

In the first chapter, Karen took us through a detailed analysis of the development of library metadata, culminating in an argument in favour of Semantic Web principles.

The shortcomings of the MARC record

In this second chapter, Karen focuses on the era of machine-readable data, kicking off with our old and trusted friend, the MARC record. She delivers a devastating and detailed critique of the MARC format, highlighting the problem of the duplication of many data elements being in the same record but in slightly different formats – exemplified by the disconnect between bibliographic and name authority data (with no automatic update when the authority record changes), and the differing format within the bibliographic record itself between indexed and description fields.

The relational database era

Coyle then turns her attention to the relational database, where the problem for library data is of a different order, and I found Coyle’s analysis and interpretation here to be particularly impressive:

Database technology was designed for a different kind of data, less textual and more compact, with fewer data elements, and less of a range of content in those elements. Database technology is designed to retrieve. For example, it can retrieve all of the invoices that contain a particular product code. Database management systems work best in environments with a lot of repetition of data values.

In contrast, as Coyle goes on to point out, most bibliographic titles are unique, for example; there is relatively little in the way of repeated data values. Databases are also relatively poor at alphabetical sorting of long strings of text. Another interesting point is the specificity of the MARC format to the library domain, eliminating the possibility of using standard business software.

These format and technology shortcomings are exacerbated by problems intrinsic to words themselves:

They can be ambiguous (e.g. Pluto the Disney character versus the orbiting body). They can be incomplete informationally, since many concepts require more than one word (e.g. solar energy, ancient Rome). They are language-based, so a search on computer does not bring up documents with the term ordinateur. And of course keyword searching falls prey to differences in spelling (fiber versus fibre) and errors in spelling or typography (history or histroy).

Taking advantage of new technical possibilities

Having laid out the problems, Karen presents a call to action, to bring library data “into the twenty-first century for machine processing and to improve service to our human end users by being able to offer more functionality in our systems.” She picks up the argument that she introduced in chapter 1, namely that we need to join our bibliographic data to the web, where a near-universal set of information resides, with many semantic relationships to the bibliographic sphere, and of course, where our users are located, for that very reason.

Semantic Web – flavour of the month

Coyle describes the Semantic Web as “the flavor of the month” in technology terms, differentiating between the web of documents, i.e. what we already have, and the web of data, which fundamentally what the Semantic Web will be.

”The Semantic Web as introduced by Tim Berners Lee is a linked web of information encoded in documents throughout the web. Achievement of this vision is still over the visible horizon. In practice, however, there is a growing community of people and organizations who have metadata available to them that they have structured using Semantic Web rules. These disparate sets of data can be combined into a base of actionable data. These sets of data are being referred to as “linked data”, and the Linked Data Cloud is an open and informal representation of compatible data available over the Internet.”

She goes on to point out that at this juncture, many of the early participants are institutions with already existing scientific data sets. Linked data, as my colleague Richard Wallis frequently points out, is a pragmatic implementation of the semantic web, the latter being the vision that we are moving towards.

In this way, library data will link into broader sources of information, and Coyle returns to the example of Moby Dick by Herman Melville to illustrate that a bibliographic record has powerful relationships with the external world – Herman Melville the author, New England, whaling et al, are potentially invoked from Moby Dick the bibliographic work.

The mechanics of the Semantic Web

Karen proceeds to guide the reader through the rudiments of the Semantic Web, beginning with RDF, or Resource Description Framework, the data model for the Semantic Web.

It defines a set of rules for the formal semantics of metadata that is meant for the elements and structure of metadata that will be able to operate on the Semantic Web. Very simply put, in the Semantic Web all data consists of things and relationships between them, with the smallest unit being a statement of the form a thing → with relationship to → another thing

The combination of the infinitesimal applicability of this model, and its readability by machines opens up the potential to create and follow previously unimagined paths in the pursuit of new knowledge at webscale.

The model is underpinned by identifiers, and this is what gives the Semantic Web its precision, going back to the textual ambiguity problem explored earlier.

The primary rule for the Semantic Web is that identifiers need to be in the form of a Uniform Resource Identifier, which is a particular form of identifier. We don’t need to go into the structure of URIs because it turns out that the common Uniform Resource Locator, URL, is in URI format, and is the preferred identifier to use on the Semantic Web.

We see then, that the URI provides not only precision, but also continuity with the web as we know it today. She does explore the problematic ambiguity of the URI / URL. A URI can point to a definitive description of something, hence play the part of an identifier, or a URL, it can simply denote a location. They are, of course, identical in format. However she is clear on the advantage of the location dimension, namely that it enables information to be returned based on the identifier.

Universality

Coyle backs away from a utopian vision of a single system of identifiers for everything on the Web. Inevitably, she says, different communities will assign identifiers of their own, some overlapping with those of another community. And that is indeed what is coming to pass. It is ironic that just as humanity arrives at a point that a single, absolutist, universal set of definition becomes technically possible, we retreat back to subjectivity and relativism. On the other hand, it is precisely this pragmatic approach which might be the killer asset of the Semantic Web in terms of adoption.

On a similar vein, Coyle addresses the area of controlled vocabulary, explaining that the Semantic Web facilitates both controlled and uncontrolled data.

Karen Coyle uses colour as an example here, and to great effect. She compares the following two sets of examples:

with…

  • red
  • yellow
  • blue

Librarians will readily perceive the first set as being a controlled list of values, and anyone who has worked with software will be aware of the limitations of hard-coded values, as per the second set. Karen also illustrates how the first set also offers multi-language flexibility, whilst maintaining semantic coherence.

The benefits of the Semantic Web

Coyle itemises a number of general benefits of the Semantic Web. For me the key benefits are its global and extensible qualities, the latter meaning that new data and data types can be added at any time, and data can be endlessly recombined to create new information. But comparing the original web to its semantic younger sibling, the ability to ascribe meaningful relationships between entities is crucial, with its potential to “transform the Web from what it is today to a richer, more meaningful information environment.”

Karen illustrates this with the work of Library of Congress creating an online version of the Library of Congress Subject Headings in linked data format.

There is a separate identifier for each entry in the subject authority file, about 350,000 total. Because the identifier is also a URL, the Library has placed information about the subject heading at that location and can display it in formats for human readers or for programs.

In the final posting on this report, I will talk about what might be the ramifications for libraries of the Semantic Web as explored in this report.

Understanding the Semantic Web – 1

The American library technology commentator Karen Coyle has produced an ambitious report Understanding the Semantic Web: Bibliographic Data and Metadata under the auspices of the American Library Association. In our increasingly open world, it was galling to have to pay $43 for the privilege of reading it, and especially ironic given that the Semantic Web delivers its value exponentially according to the amount of linked data made available to it, a point that is not lost on Karen Coyle in relation to library data.

In this first chapter, Karen sets the scene, providing rich historical context in terms of a history of cataloguing, and thus builds up an irresistible argument as to why libraries need to embrace the semantic web.

What is metadata?

It does no harm to define metadata, and these three points provide a useful starting point:

  1. It’s constructed – it is fundamentally artificial.
  2. It’s constructive – it is purposeful.
  3. It’s actionable – it should be possible to act on the metadata in some way.

Even more useful, though, is the example Coyle uses to show good metadata in action, i.e. the subway map, and saying that “If you were to superimpose this map over the city it represents, you’d find that the subway isn’t “true”, in the sense that it is neither to scale nor are the stations located where they would be on a map based on longitude and latitude.”

And continues…

And yet they perform their job incredibly well, to the point that one can arrive in a city for the first time, perhaps even with only a limited understanding of the local language, and find one’s way. These maps are a good example of functionality in metadata.

Karen then develops the idea by comparing the old-style inert paper map with one that has “machine-actionable metadata” behind it, which has the effect of enabling users to reuse it in unforeseeable ways.

Historical evolutions

Karen explains that the basic functions of bibliographic metadata have extended over time, in response to related changes in the catalogue’s context.

The sharing of cataloguing between libraries has a surprisingly long lineage. In the nineteenth century, libraries apparently used to exchange their printed book catalogues, sometimes for a charge. The industrial revolution was accompanied by a dramatic increase in printed publications, and the card catalogue came about at this time. This proved to be a mixed blessing, because although it was easier to update, it lost the ability to be accessed remotely, until the dawning of the database era a century later.

Over the course of the twentieth century, the library underwent transformative growth, new technologies were introduced to meet the needs that arose from that growth, and library management became more complex, until we get to today’s situation where, as Karen illustrates, we have a “need to filter one’s retrieved set by language in order to reduce the number of items retrieved from thousands to ‘only’ three or four hundred.” Functional augmentation such as faceting and ranking results have, as Karen puts it “put pressure on the catalog record, pushing it to perform functions it was not consciously designed to do”. This strain has been compounded by the merging of diverse back-office catalogues such as the serials check-in records into what we know today as the library management system.

And Karen is perfectly correct to remind us that information overload predates the Internet by almost half a century. The post-war boom led to an explosion of research activity, and new retrieval mechanisms, such as the citation database, were invented to help people navigate through the morass of papers written.

Whose metadata is it anyway?

Despite all these innovations though, one incontrovertible truth remained in place – the separation of library data from data in other domains. It now needs to be an integral “part of the dominant information environment that is the web.” As Coyle emphasises, that is where library users are, so it’s where the library needs to be.

The important question now is: how can the library catalog move from being ‘on the Web’ to being ‘of the Web’? The linked data technology that has developed out of the Semantic Web provides an interesting path to follow. It is specifically designed to facilitate the sharing of information on the Web, much in the same way that the Web itself was developed to allow the sharing of documents. The library must become intertwined with that rich, shared, linked information space that is the Web. Rather than creating data that can be entered only into the library catalog, we need to develop a way to create data that can also be shared on the Web. This requires that we expand the context for the metadata that we create.

Coyle notes the overlap in content between the library and the Web, which as yet, is extremely under-exploited, citing the simple fact that the name “’Herman Melville’ and the fact that he wrote Moby Dick are facts that are not limited to the data in library catalogs…”

She has set up a context that is both broad and deep for chapter 2 in which she will consider the Semantic Web in much greater detail.

Interesting developments at the Bibliotheque Nationale de France

BNFHaving read some documentation recently around the plans of the Bibliotheque Nationale de France (BNF) for what they call a “pivot” – a mechanism based on semantic technologies for optimising the value of the BNF’s entire web presence, including Gallica, its digital library, it was great to have the opportunity to hear Dominique Stutzmann from the BNF speak at the recent Eurolis Seminar in London.

The future of the library (Doom or Bloom?) was what the day event was all about, and according to Stutzmann, we’ve already invented it. We’ve got the nice buildings, and so ostensibly the library of the future will be the same as that of today. If the library space vanishes, he argued, it will only be the result of a self-fulfilling prophecy because librarians aren’t confident about what they’re doing. I think he’s really onto something – there is indeed an element of subjective crisis in the problem of the future of libraries. He admitted, though, that Web 2.0 re-presents the user-librarian relationship in quite a fundamental way; the user becomes both publisher and librarian. But users don’t want librarians to disappear. He seems to be saying that our library spaces continue to be successful, so leave them alone but engage with some interesting technological stuff as well, because libraries are well-positioned to do so. He added that users trust libraries with everything including long-term preservation of data, and BNF is clearly poised to exploit that trust, but not for its own ends, but for everyone, in the great universal tradition of libraries.

Stutzmann perceives the potential of semantic technologies very clearly in terms of the user experience – giving everyone improved and accurate access to the information available, and had an impressive array of exemplars to reel off, citing Google Book Search’s use of data mining tools taking city name from search results and pinpointing them on a map, and Bibliosurf’s map of novels as examples. Along similar lines, he demonstrated an interactive map with mashed up data from last-fm to produce a map of composers, where proximity indicates artistic commonality rather than geographical proximity – for example Beethoven is situated alongside Vaughan Williams.

As a Modern Languages graduate, I loved hearing about semantic search developments at the European Library and specifically in their TELplus project, where multilingual search (i.e. a search query with terms from more than one language) has been achieved. Stutzmann was clear that authority data is indivisible from semantic web developments, and that is where the librarian tradition really comes into its own; he demonstrated search results with LCSH headings as a facet on the side-panel. He pleaded with librarians to use metadata to give more accurate access to data.

The only downbeat element to his presentation was a survey carried out at BNF in 2008 to get a clearer picture of their users. A key finding was that the average user of the digital library 48, although there is an overall age range of 14-94. Europeana suffers from the same problem. Funnily enough, when I was out on Saturday night, a friend was saying how almost all the people who queued up recently in Birmingham to see the Anglo-Saxon treasures recently discovered in the West Midlands were white people aged 50+. Stutzmann pondered whether there was anything that could be done about it – does it come down to lifestyle fundamentals?

In the same survey, there was a fascinating finding about Library 2.0. Many users questioned felt that library sites should not be spoilt by the comments of user. They are happier to share their information and collaborate with the librarian than with other users. Obviously this goes against received Library 2.0 thinking, and left me wondering, is that a specifically “French thing”, or do UK users have more in common with their European counterparts than we think?

Will Linked Data mean an early end for Marc & RDA

For the uninitiated, NGC4LIB is a library focused mailing list which has a reputation for often engaging in massive discussions and disagreements around the minutiae of future cataloguing and library focused metadata practices.  They have recently been involved in one of these great debates stimulated by the comments of Sir Tim Berners-Lee in a recent interview.    As is often is the case on this list, the debate wandered well off topic in to the realms of FRBR and it’s alternatives before being brought back on topic by Jim Weinheimer, who started the conversation in the first place.

A statement in Jim’s contribution caught my eye:

Implementing linked data, although it would be great, is years and years away from any kind of practical implementation

hmg.gov.uk_data Implementing linked data is already well underway with many groups across the Globe.  For instance there are couple that we at Talis are closely involved with.  Following on from Sir Tim’s interview comments, the British Government are currently running a, soon to be opened, closed beta of data.gov.uk.  Through this site they are not only opening up data in many forms such as CSV, like their American cousins at data.gov, but they are also starting to encode in RDF and publishing it via the Talis Platform which provides a SPARQL (the query language of the Linked Data web) end point.  This approach not only lets anyone download the raw data, but also enables them to query it for whatever they have in mind. If you want a sneak preview of how such data is queried, take a look at some of theses examples.   In a similar vein, metadata from BBC programmes and music is being harvested in to Talis Platform stores.  Again these are open to anyone to innovate with – check out these screencasts  to see some of the early possibilities.

Ah but that is not bibliographic data, I hear someone cry – It’ll never catch on in libraries.  I get the impression from some comments on the NGC4LIB list, that it will not be possible for ‘our’ data to participate in this Link Data web until ‘we’ have predicted all possible uses for it, analysed them, and developed a metadata standard to cope with every eventuality.   There are already a few examples of the library world engaging with RDF and Linked data, one obvious one being the Library of  Congress with LCSH another the National Library of Sweden.  Neither of these examples are encoding the kind of detail you would expect in a Marc record, they are using ontology to describe associated concepts such as subjects.

There has been some ontology development towards this larger goal with Bibo (Bibliographic Ontology Specification).  Although not there yet, Bibo is good enough to be used in live applications whishing to encode bibliographic data.  Such an example is Talis Aspire.  Underpinned by the same Platform as the UK Government and BBC Linked Data services, it uses the Bibo ontology to describe resources an an academic context

Alongside data.gov.uk there is a Google Group conversation taking place. The refreshing part of this conversation is that it is between the producers of the data sets, those developing the way it should be encoded in to RDF, and those who want to consume it.  Several times you will see a difference of opinion between those that want to describe the data to it’s fullest, and those that wish to extract the most value from it. “I agree that is a cleaner way of encoding, but can you imagine how complex the query will be to extract what I want!”.  This approach is not unusual in the Linked Data world, where producers and consumers get together, pragmatically evolving a way forward.  Dataincubator.org is an open place where such pragmatic development and evolution is taking place.  Check out examples of a subset of Open Library data. (note this is an example of data, not a user interface).

Semantic Library _ Mark Twain Another, bibliographic focused, experiment can be found at semanticlibrary.org. From some of the example links on the home page, you can see that building in this way enables very different ways of exploring metadata.  People, subjects, publishers, works, editions, series, all being equally valid starting points to explore from.

Doth the bell toll for Marc and RDA?
Not for a long old time – Ontology like Bibo, and the results of work at Dataincubator.org and semanticlibrary.org, may well lead to more open useful, and most importantly linked, access to data previously limited to library search interfaces.  That data has to come from somewhere though, and the massive global network of libraries encoding their data using Marc ,and maybe soon RDA, are ideally placed to continue producing rich bibliographic metadata.  Metadata to be fed in to Linked Data web in the most appropriate form for that purpose.  There will continue to be a place for current cataloguing practices and processes for a significant period -supporting and enabling the bibliographic part of the Linked Data web, not being replaced by it.

No doubt the NGC4LIB conversation on this topic will continue. Regardless of how it progresses, there is a current need and desire for bibliographic data in the linked data web.  The people behind that desire, and the innovation to satisfy it, may well have come up with a satisfactory solution, for them, whilst we are still talking.

Will the eBook make it across the chasm

I’m currently hurtling through the English countryside on a Wifi enabled train having spent the day at E-books and E-content 2009 held at University College London.  An interesting and stimulating day  with a well matched but varied set of speakers, including yours truly (presentation on SlideShare).  The eighty strong audience were also a varied selection from academic libraries, academia in general, publishers and the information media.

The move towards a web of data, enabled by the emergence of semantic web technologies and practices, was one of my themes. Another was a plea for content publishers and providers to deliver their content to the user where he/she is.  Not expecting them to be driven to their site with a totally different interface.  This is a difficult one for the eContent industry, at a time when the publishers are in the middle of a “my platform is better than yours” battle.  Nevertheless, a student wants the content their course has recommended, not caring who published it or which aggregator their library licensed it from.

adoption curve In laying the ground, I initially discussed the technology adoption curve and how technologies don’t become mainstream overnight.  Any new technology, or new way of doing things, follows a standard pattern with a small number of innovators taking the initial often enthusiastic risk.  The early adopters then build on the innovators’ success and and join in, still very early with some risk. When the new way has been proven, adoption has increased and both costs and risk have fallen, the early and late majorities take it to mass acceptance and adoption.  This only leaves the laggards, who will only come on board if forced by circumstance.

As an adjunct to the adoption curve, I spoke about a chasm which technologies have to cross, between the early adopters and the early majority before they take off.  There are many promising technologies that failed to cross that chasm.  For example, technology watchers at the time predicted that the mini-disc would replace the cassette tape, but as we know the CD took that prize.

Today’s conference was mostly focussed on the eBook and it’s impact on libraries and publishers.  This is on the assumption that it will be the way of delivering book sized pieces of content in the approaching digital world.  In answer to a challenging question for the end of day panel, I concluded that this is by no means certain.  I believe direct access to articles will eventually see the end of the traditional journal issue format. In a similar way I believe there is a good chance that chunks of content, that are today of book size, may well be assembled and delivered in a digital object as yet to be identified.

So will the eBook jump the adoption chasm?  If I was a betting man I would only back it on an each way basis.  I believe that anyone betting their whole business model on it being a certain winner, may just be taking too much of a risk.

Photo from mstorz published on Flickr