Panlibus Blog

Archive for the 'Metadata' Category

What interests 250+ librarians at 8:30 on a Sunday morning

IMG_0165 Linked Data, that’s what! 

I must admit I was a little skeptical of the timing when I accepted the invitation to provide the keynote for a Linked Data session – on the last day of IFLA 2010 – at 8:30 in the morning – in August – on a Sunday.  Who was going to want to get up at that time, on the day they were probably going to leave beautiful Gothenburg, to hear me witter on about the Semantic Web and the obvious benefits of Linked Data for libraries? A few minutes before the start, I was beginning to think my skepticism was well founded, viewing the acres of empty seats laid out in their menacing ranks in front of me. But then almost as if from nowhere, the room rapidly filled and by the time I took the stage we had something approaching a full house.  As you can see from my iPhone snap below, we ended up with a significant group (I lost count at about 250) of interested librarians.

250+ Librarians in Gothenburg

So was it worth them turning up at such an unsociable time?  I obviously can’t speak for my session, but I believe it was well worth turning up.  We had a series talks which varied from the in-depth technical/ontological spectrum to the rousing plea to open up your data now – and don’t hamper it with too much licensing.

First on after my session was Gordon Dunsire from the University of Strathclyde who gave us some in depth reasoning as to why we needed complex detailed ontologies based upon standards like RDA, FRBR, FRAD, and RDA to describe library resources in RDF for the Semantic Web.   To represent the full detail that catalogers have, and want to, provide for resource description I agree with him.  I also believe that we need to temper that detailed view by including more generic ontologies in addition. People from outside of the library world, dipping into library data [with more ways to describe a title than there are flavors of ice cream], will back off and not link to it unless the can find a nice friendly dc:title or foaf:name that they understand.

Some of the other speakers that I caught included Patrick Danowski’s entertaining presentation entitled “Step 1: Blow up the silo!. He took us through the possible licenses to use for sharing data, only to conclude that the best approach was totally open public domain.  He then went on to recommend CC0 and/or PDDL as the best way to indicate that your data is open for anyone to do anything with.

Jan Hanneman from the German National Library delivered an interesting description [pdf]of the way they have been publishing their authority data as Linked Data, and the challenges they met on the way.  These included legal and licensing issues, around what and under what terms they could publish.  Scalability of their service, being another key issue once they move beyond authority data.

All in all it was an excellent Sunday morning in Gothenburg.  I presume the organizers of IFLA 2011 will take note of the interest and build a larger, more convenient, slot in the programme for Linked Data.

Note: My presentation slides can be viewed on Slideshare and downloaded in pdf form

Linking the data dots …

Something-else I joined the world of libraries some twenty years ago and was immediately impressed with the implicit mission to share, the knowledge and information that libraries look after, with all for the benefit of mankind.  Being part of an organisation that was built out of a desire for libraries to share the data that backs up that knowledge, I was surprised how insular and protectionist with that data this world can be.  Some of the reasons behind this were technological.  Anyone who played with Z39.50 and other similar, so called, standards before the century turned will relate you war stories about the joys of sharing data.  Then along came the web [I know it started in the mid 1990s, but it really came along post the .com boom] and things started to get much easier, from a technology point of view anyway.

So why do libraries make it so difficult to share the data that they hold.  This isn’t the stuff that they hold – that is a whole other can of worms.  This is the data about the stuff they hold. The data that helps mankind find what mankind can benefit from.  I know we have a lot of history to get over, but the commercial world recognised ages ago that if you share information about what you have and what you do, people find what they are looking for and you get more business.

However, negatively ranting on about the insular outlook of some in library land, was not my purpose in writing this post.  You may know that I, and Talis, are involved in the emerging world of Linked Data.  Over recent months I have found myself immersed in other parallel universes such as National and Local Government, newspapers,  broadcast media, and finance systems.  It therefore was a great pleasure, to find my self organising a Linked Data and Libraries event at the British Library last week.  Sir Tim Berners-Lee’s vision of a Web of Data, complementing the current web of documents, utilising a collection of standards and techniques known as Linked Data, is all about sharing and linking data across organisations and sectors for the benefit of mankind – sound familiar?.

It was very refreshing to see the amount of interest this day attracted.  As you will see from the presentations from the day, made available via our sister Nodalities blog, there are many libraries and library organisations actively engaged with this.   Several, such as The German National Library, have released traditional (sourced from Marc records) bibliographic data in a Linked Data form using RDF.  Others, such as VIAF hosted by OCLC and the Library of Congress Authorities are providing RDF as one of the formats openly available from their service.  The Bibliothèque nationale de France is in the process of inviting tenders for an entire new system to open up their data and holdings as Linked Data.

It is fair to say that most of these initiatives are coming from National, International,large and cooperative libraries, but the interest is already trickling down to smaller libraries especially in the academic sector.  It is also fair to say that most who are engaged in thinking about Linked Data and libraries, take on board Sir Tim’s point about many of the benefits coming from the linking of data between sectors – libraries and science and government and the media and commerce and education and leisure and…

So despite my frustrations about the library world, still very evident in some circles, I am becoming more positive about libraries being able to fulfil their mankind benefiting mission as the web of data emerges.  The changing of influential attitudes, and the move to different underlying data formats, may help us leave behind some emotional and legal baggage.

Anyone who has followed us for a while will know that we have been banging on about, and implementing, semantic web and linked data techniques and technologies for many years.  It is great when others start to ‘get it’ and you stop having to be one of a few voices in the wilderness.  These changes will not happen to all libraries over night, but it is nice to swap frustration at the lack of vision and ambition for frustration at a lack of progress – something I think I was born with.

OCLC Talk with Talis about Draft WorldCat Rights & Responsibilities

OCLC logo In late 2008 OCLC proposed a new bibliographic record record reuse policy to its membership with a large amount of criticism from many.  At the time we covered it in a Talking with Talis podcast, and Panlibus and other blogs covered it heavily

Some eighteen months later, the Record Use Policy Council setup to review and report on the issue have just published “WorldCat Rights and Responsibilities for the OCLC Cooperative” a document for review before being recommended for adoption by OCLC.

Record Use Policy Council Co-Chairs, Barbara Gubbin & Jennifer Younger and OCLC’s Karen Calhoun joined me in this conversation, on the day the document was released to the OCLC membership, to fill in the background and thought behind the document as well as answering a few of my questions.

It is clear from the conversation that the dozen members of have spent a considerable amount of time reviewing, the issue and purpose behind the original reuse policy, and the many submissions and comments they received.  It remains to be seen how the community react to this document.

Understanding the Semantic Web – 3

Is the Semantic Web so immense and infinite in its possibilities that drawing out a vision for a particular domain (libraries in this instance) becomes difficult or impossible? That’s what Karen Coyle seems to be saying in her Understanding the Semantic Web: Bibliographic data and metadata report here:

It is somewhat difficult to explain what you can do with linked open data because the answer is just about anything.

And here:

If all this sounds other worldly and value, it is because there is no specific vision of where these changes will lead us.

Or is it the case that the Semantic Web crystallises the changes around librarianship that have proven to be both problematic and exciting in recent times?

Living on borrowed time?

Karen’s call for action – for librarians to embrace the Semantic Web – emanates from her uncontentious point that today’s library users are more likely to be found on the Internet than in the library building itself. However it is precisely the library’s traditional role as a repository of authoritative textual artefacts, that has borne the fruit (metadata) that Karen now proposes we offer to the emergent Semantic Web. The question here is – Are we living off our former glories as we move into the future? And does this raise issues of sustainability?

Karen says this about the library’s “unique selling point” (to borrow an outdated marketing term):

What can libraries offer that no other community can? First, libraries have holdings of published and unpublished materials that are not currently represented on the Web. Next, they have metadata for most of those materials. The metadata includes controlled forms of personal and corporate names, physical description, topical headings, and classification assignments.

I am left with an uncomfortable sense that our position in the future depends on our printed heritage which, of course, would make us very vulnerable. Is Karen consigning libraries to a role of book museum in the new semantic world order? In an e-only world, I struggle to see the library as having a central role in metadata creation in the way it had in the past, with the physical copy no longer of primordial importance. Libraries have been centralising the cataloguing process for many decades, of course. But the next phase may concentrate such activities in national libraries, as works may only have to be catalogued once.

Goodbye FRBR, hello semantics

FRBR is apposite here, as it remains relevant in our postmodern world of intertextuality and adaptations into new media, Describing the relationships between e-books and movies, for example, will have longevity in our more web-centric culture.

What the Semantic Web brings, though, is an almost infinitesimally richer picture, built on broader relationships between cultural entities. Karen herself talks about the extensibility of the Web, and it is the capability of linked data to define and discover new relationships that lies at the heart of its transformative potential.

One of my favourite films is a 1970s French movie called Celine and Julie go boating. It was influenced in a loose but culturally important way by the novelist Henry James, specifically his novella The other house and his short story A certain romance of old clothes. This was the sort of knowledge that used to be passed around by word of mouth in the pre-Internet era, making it quite elitist. But FRBR provides no mechanism for describing such a loose, though culturally significant, relationship. It was also very difficult to get hold of those texts, whereas with the web it’s a whole lot easier, especially if you know what you’re doing. With a combination of the property element of RDF to describe the relationship, and the location dimension of the URI in linked data, it’s set to become even easier.

Step away from Google Maps

I started to think about refocusing on the work and away from the physical item after attending the Middlemash library mashup event in December. There’s so much attention paid to the location of physical items in library technology, with seemingly endless Google Map mashups, that it sometimes feels that library technology is disappearing up its own Yahoo Pipes. Maybe that’s symptomatic of the dominance of Google Maps as an island of openness in a sea of closed data. But is the nearest location of a book what today’s library users either want or need? The answer to that depends to an extent on which library sector you’re working in, but the Semantic Web would enable us to offer timelines and other interesting representations of relationships between one work and another and also between an artistic work and the real world, the latter being explored effectively in the report.

But the question is, are we librarians willing or able to rise to that challenge? We witnessed the demise of the scholarly librarian over the course of the 20th century. I distinctly remember being told at library school in the 1990s that librarians should be less concerned about knowing stuff, and more about where to go to find stuff out. The Semantic Web offers a depth of engagement in texts and other cultural artefacts.

Where do we want to be positioned in this new semantic world order?

In the final blog posting in this series, I will discuss the co-existence of the universal and the particular on the Semantic Web and how I see libraries fitting into that relationship.

Understanding the Semantic Web – 2

This blog covers chapter 2 of Understanding the Semantic Web: Bibliographic data and metadata, Karen Coyle’s report on the potential of the Semantic Web for libraries.

In the first chapter, Karen took us through a detailed analysis of the development of library metadata, culminating in an argument in favour of Semantic Web principles.

The shortcomings of the MARC record

In this second chapter, Karen focuses on the era of machine-readable data, kicking off with our old and trusted friend, the MARC record. She delivers a devastating and detailed critique of the MARC format, highlighting the problem of the duplication of many data elements being in the same record but in slightly different formats – exemplified by the disconnect between bibliographic and name authority data (with no automatic update when the authority record changes), and the differing format within the bibliographic record itself between indexed and description fields.

The relational database era

Coyle then turns her attention to the relational database, where the problem for library data is of a different order, and I found Coyle’s analysis and interpretation here to be particularly impressive:

Database technology was designed for a different kind of data, less textual and more compact, with fewer data elements, and less of a range of content in those elements. Database technology is designed to retrieve. For example, it can retrieve all of the invoices that contain a particular product code. Database management systems work best in environments with a lot of repetition of data values.

In contrast, as Coyle goes on to point out, most bibliographic titles are unique, for example; there is relatively little in the way of repeated data values. Databases are also relatively poor at alphabetical sorting of long strings of text. Another interesting point is the specificity of the MARC format to the library domain, eliminating the possibility of using standard business software.

These format and technology shortcomings are exacerbated by problems intrinsic to words themselves:

They can be ambiguous (e.g. Pluto the Disney character versus the orbiting body). They can be incomplete informationally, since many concepts require more than one word (e.g. solar energy, ancient Rome). They are language-based, so a search on computer does not bring up documents with the term ordinateur. And of course keyword searching falls prey to differences in spelling (fiber versus fibre) and errors in spelling or typography (history or histroy).

Taking advantage of new technical possibilities

Having laid out the problems, Karen presents a call to action, to bring library data “into the twenty-first century for machine processing and to improve service to our human end users by being able to offer more functionality in our systems.” She picks up the argument that she introduced in chapter 1, namely that we need to join our bibliographic data to the web, where a near-universal set of information resides, with many semantic relationships to the bibliographic sphere, and of course, where our users are located, for that very reason.

Semantic Web – flavour of the month

Coyle describes the Semantic Web as “the flavor of the month” in technology terms, differentiating between the web of documents, i.e. what we already have, and the web of data, which fundamentally what the Semantic Web will be.

”The Semantic Web as introduced by Tim Berners Lee is a linked web of information encoded in documents throughout the web. Achievement of this vision is still over the visible horizon. In practice, however, there is a growing community of people and organizations who have metadata available to them that they have structured using Semantic Web rules. These disparate sets of data can be combined into a base of actionable data. These sets of data are being referred to as “linked data”, and the Linked Data Cloud is an open and informal representation of compatible data available over the Internet.”

She goes on to point out that at this juncture, many of the early participants are institutions with already existing scientific data sets. Linked data, as my colleague Richard Wallis frequently points out, is a pragmatic implementation of the semantic web, the latter being the vision that we are moving towards.

In this way, library data will link into broader sources of information, and Coyle returns to the example of Moby Dick by Herman Melville to illustrate that a bibliographic record has powerful relationships with the external world – Herman Melville the author, New England, whaling et al, are potentially invoked from Moby Dick the bibliographic work.

The mechanics of the Semantic Web

Karen proceeds to guide the reader through the rudiments of the Semantic Web, beginning with RDF, or Resource Description Framework, the data model for the Semantic Web.

It defines a set of rules for the formal semantics of metadata that is meant for the elements and structure of metadata that will be able to operate on the Semantic Web. Very simply put, in the Semantic Web all data consists of things and relationships between them, with the smallest unit being a statement of the form a thing → with relationship to → another thing

The combination of the infinitesimal applicability of this model, and its readability by machines opens up the potential to create and follow previously unimagined paths in the pursuit of new knowledge at webscale.

The model is underpinned by identifiers, and this is what gives the Semantic Web its precision, going back to the textual ambiguity problem explored earlier.

The primary rule for the Semantic Web is that identifiers need to be in the form of a Uniform Resource Identifier, which is a particular form of identifier. We don’t need to go into the structure of URIs because it turns out that the common Uniform Resource Locator, URL, is in URI format, and is the preferred identifier to use on the Semantic Web.

We see then, that the URI provides not only precision, but also continuity with the web as we know it today. She does explore the problematic ambiguity of the URI / URL. A URI can point to a definitive description of something, hence play the part of an identifier, or a URL, it can simply denote a location. They are, of course, identical in format. However she is clear on the advantage of the location dimension, namely that it enables information to be returned based on the identifier.

Universality

Coyle backs away from a utopian vision of a single system of identifiers for everything on the Web. Inevitably, she says, different communities will assign identifiers of their own, some overlapping with those of another community. And that is indeed what is coming to pass. It is ironic that just as humanity arrives at a point that a single, absolutist, universal set of definition becomes technically possible, we retreat back to subjectivity and relativism. On the other hand, it is precisely this pragmatic approach which might be the killer asset of the Semantic Web in terms of adoption.

On a similar vein, Coyle addresses the area of controlled vocabulary, explaining that the Semantic Web facilitates both controlled and uncontrolled data.

Karen Coyle uses colour as an example here, and to great effect. She compares the following two sets of examples:

with…

  • red
  • yellow
  • blue

Librarians will readily perceive the first set as being a controlled list of values, and anyone who has worked with software will be aware of the limitations of hard-coded values, as per the second set. Karen also illustrates how the first set also offers multi-language flexibility, whilst maintaining semantic coherence.

The benefits of the Semantic Web

Coyle itemises a number of general benefits of the Semantic Web. For me the key benefits are its global and extensible qualities, the latter meaning that new data and data types can be added at any time, and data can be endlessly recombined to create new information. But comparing the original web to its semantic younger sibling, the ability to ascribe meaningful relationships between entities is crucial, with its potential to “transform the Web from what it is today to a richer, more meaningful information environment.”

Karen illustrates this with the work of Library of Congress creating an online version of the Library of Congress Subject Headings in linked data format.

There is a separate identifier for each entry in the subject authority file, about 350,000 total. Because the identifier is also a URL, the Library has placed information about the subject heading at that location and can display it in formats for human readers or for programs.

In the final posting on this report, I will talk about what might be the ramifications for libraries of the Semantic Web as explored in this report.

Understanding the Semantic Web – 1

The American library technology commentator Karen Coyle has produced an ambitious report Understanding the Semantic Web: Bibliographic Data and Metadata under the auspices of the American Library Association. In our increasingly open world, it was galling to have to pay $43 for the privilege of reading it, and especially ironic given that the Semantic Web delivers its value exponentially according to the amount of linked data made available to it, a point that is not lost on Karen Coyle in relation to library data.

In this first chapter, Karen sets the scene, providing rich historical context in terms of a history of cataloguing, and thus builds up an irresistible argument as to why libraries need to embrace the semantic web.

What is metadata?

It does no harm to define metadata, and these three points provide a useful starting point:

  1. It’s constructed – it is fundamentally artificial.
  2. It’s constructive – it is purposeful.
  3. It’s actionable – it should be possible to act on the metadata in some way.

Even more useful, though, is the example Coyle uses to show good metadata in action, i.e. the subway map, and saying that “If you were to superimpose this map over the city it represents, you’d find that the subway isn’t “true”, in the sense that it is neither to scale nor are the stations located where they would be on a map based on longitude and latitude.”

And continues…

And yet they perform their job incredibly well, to the point that one can arrive in a city for the first time, perhaps even with only a limited understanding of the local language, and find one’s way. These maps are a good example of functionality in metadata.

Karen then develops the idea by comparing the old-style inert paper map with one that has “machine-actionable metadata” behind it, which has the effect of enabling users to reuse it in unforeseeable ways.

Historical evolutions

Karen explains that the basic functions of bibliographic metadata have extended over time, in response to related changes in the catalogue’s context.

The sharing of cataloguing between libraries has a surprisingly long lineage. In the nineteenth century, libraries apparently used to exchange their printed book catalogues, sometimes for a charge. The industrial revolution was accompanied by a dramatic increase in printed publications, and the card catalogue came about at this time. This proved to be a mixed blessing, because although it was easier to update, it lost the ability to be accessed remotely, until the dawning of the database era a century later.

Over the course of the twentieth century, the library underwent transformative growth, new technologies were introduced to meet the needs that arose from that growth, and library management became more complex, until we get to today’s situation where, as Karen illustrates, we have a “need to filter one’s retrieved set by language in order to reduce the number of items retrieved from thousands to ‘only’ three or four hundred.” Functional augmentation such as faceting and ranking results have, as Karen puts it “put pressure on the catalog record, pushing it to perform functions it was not consciously designed to do”. This strain has been compounded by the merging of diverse back-office catalogues such as the serials check-in records into what we know today as the library management system.

And Karen is perfectly correct to remind us that information overload predates the Internet by almost half a century. The post-war boom led to an explosion of research activity, and new retrieval mechanisms, such as the citation database, were invented to help people navigate through the morass of papers written.

Whose metadata is it anyway?

Despite all these innovations though, one incontrovertible truth remained in place – the separation of library data from data in other domains. It now needs to be an integral “part of the dominant information environment that is the web.” As Coyle emphasises, that is where library users are, so it’s where the library needs to be.

The important question now is: how can the library catalog move from being ‘on the Web’ to being ‘of the Web’? The linked data technology that has developed out of the Semantic Web provides an interesting path to follow. It is specifically designed to facilitate the sharing of information on the Web, much in the same way that the Web itself was developed to allow the sharing of documents. The library must become intertwined with that rich, shared, linked information space that is the Web. Rather than creating data that can be entered only into the library catalog, we need to develop a way to create data that can also be shared on the Web. This requires that we expand the context for the metadata that we create.

Coyle notes the overlap in content between the library and the Web, which as yet, is extremely under-exploited, citing the simple fact that the name “’Herman Melville’ and the fact that he wrote Moby Dick are facts that are not limited to the data in library catalogs…”

She has set up a context that is both broad and deep for chapter 2 in which she will consider the Semantic Web in much greater detail.

Middlemash

MiddlemashI was a newbie to the library mashup scene, and took in a lot of information yesterday at Middlemash, hosted by Damyanti Patel and her colleagues at Birmingham City University. It was every bit the friendly and stimulating event that I’d expected to be, but by the time I, along with an impressive number of co-malingerers, got to the Barton Arms at the end of the day, I was able to pinpoint what had made me mildly uncomfortable at intermittent points of the day.

The discomfort had nothing to do with either the organisers or the participants, or indeed with the concept of mashing itself. The problem is that the same forward-thinking librarians who celebrate the advent of electronic resources and innovative technologies for discovering them, are the same people who, in a mashing context, are forced back into the world of print. And this has to be about ownership of data. Bibliographic data is much more “ours” than electronic resource metadata, that has traditionally been proprietary, locked away in abstract and index databases, available only in academic institutions and certainly not mashable by a bunch of librarians with a strange predilection for creating more exciting experiences of scholarly information.

Mashing the reading list

Like many people at the event, Edith Speller from Trinity College of Music was concerned about her institution’s reading lists. She felt that they were getting too static, and out of date, and, like many Talis Aspire customers, wanted to raise awareness of all those expensive subscriptions to e-resources among academics who would then be more likely to include them on resource lists. However, the solutions arrived at seem to be very book-specific, involving the following:

• Using the ISBN of a book on a resource list to look up recommendations (along the lines of “people who bought that also bought this”) using Amazon Web Services.
• Using the Mosaic API to:

• Perform an ISBN look-up to find the courses associated with the people who have borrowed that book.
• Use course codes to look up what other books were borrowed by people on those courses.

Paul Stainthorp at University of Lincoln is using RefWorks to create embeddable lists of new titles and communicate them to users, by sharing folders within RefWorks publicy and creating RSS fees on that folder. He’s also used Yahoo! Pipes (the mashup panacea du jour) to pull in the book cover image and description from Amazon. Because their academics prefer notifications by email, as opposed to running their own RSS feed, an email now comes in when a new book arrives in their subject area.

No doubt academics are availing themselves of current awareness services provided by publishers to find out about new e-journal articles, but it comes back to the disintermediation of the library from e-resource metadata. Owen Stephens from Open University reflected in the pub afterwards on the decisive break that occurred with the electronic journal, when the library no longer owned the item, but merely licensed it. Tony Hirst concurred that the library world had never challenged the proprietary nature of abstracts and indexes.

Mashing the library floor plan

Owen ran a workshop in the afternoon to develop his idea for mashing library floor plans with Google Maps. We used the University of Sheffield library floorplan as a working example, and it was fascinating to hear about how Open Layer (an Open Source mapping tool) works. Apparently maps are divided into tiles of 256 by 256 pixels, and then some javascript asks for each tile as needed as the user navigates around the map. And as the user zooms in, the map simply moves to a more detailed set of tiles. The exercise of converting a floorplan into a zoomable map forces the library to consider how granular and practicable their floorplans – is there enough detail to establish on which shelf a book is located? Maintenance is also an issue and Owen suggested augmenting the shelving workflow, so at the end of shelving, the librarian records the start and end classmark of the shelf. We also considered separate scenarios where the user wants a particular book, on the one hand, or books on a subject area on the other.

University of Sheffield plans to use heat maps to analyse how users are navigating the library. With the Ranganathan maxim in mind (positioning the stock to minimise the need for users to move around the library) they would then be able to optimise the library layout.

Sure it’s funky, but I just want to renew my books

Earlier in the day, Mark Van Harmelen from Hedtek Ltd. based at the University of Manchester, urged us all to listen more to the student voice, through focus groups and other mechanisms. I know that Owen Stephens and many other Middlemash attendees are making every effort to engage with students in the idea and design stage right now. It will be interesting to see whether we’re expending too much energy on over-sophisticated solutions for the dying format of print. As Chris Keene from University of Sussex stated, the response of students to tag clouds and other features at the discovery layer is, “Sure it’s funky, but I just want to renew my books.”

Personally, I’d love to see more focus on work-level data. The published works of an author or indeed a subject area plotted against an appropriate timeline could be tremendously useful – the works of Dickens plotted against key social legislation of the 19th century springs to mind. But the approach would come into its own with non-fiction, where there is a more direct relationship between published literature and real world events. That would really add scholarly value to bibliographic data, and would enable us to break out of transactions such as reservations that are rooted in the past not the future of scholarly life.

RIN’s Michael Jubb Talks with Talis about bibliographic records in a networked world

michael-jub Dr Michael Jubb, Director of the Research Information Network, is my guest for this podcast.

The RIN was established by the higher education funding councils, the research councils, and the national libraries in the UK to investigate how efficient and effective the information services provided for the UK research community are.

As part of their role, they publish many reports to inform and create debate to lead to real change.  Our conversation focuses on the recently published “Creating catalogues: bibliographic records in a networked world”, which explores the production and supply chain of bibliographic metadata for physical and electronic books, journals, and articles.  We discuss the need for the report, and therefore change in this area, its recommendations and possible ways forward.

Free hosting for Open Data

Over on our sister blog Nodalities, my colleague Leigh Dodds has announced the launch of  the Talis Connected Commons.

True to our desire to see a truly open web of data, under the terms of the Connected Commons scheme Talis is offering free access to the [Talis] Platform for the purposes of hosting public domain data. And the offer isn’t just limited to free hosting: the data access services, including access to a public SPARQL endpoint, are also freely available.

The terms of the offer are as follows: if you own, or are creating, a public domain dataset then you can store that data in the Platform as RDF, for free. We’re setting an initial cap of 50 million triples on each dataset, but that should be plenty of space in which to collect some really interesting data.

So have you got, or want to create, up to 50 million triples you would like to put in the public domain along with up to 10Gb of content.  Yes, well get yourself over to the The Connected Commons page and check out if you qualify.  There is also a FAQ to give you more detail.

The Connected Commons is for all sorts of data, but I’m positive that the library world provides a rich source of such open data sets – get in there guys and get your data openly linked and out there.

 

Code4lib final day in Providence – looking forward to Asheville

As always, a slightly shorter day for the last day of the conference but no less stimulating.  Talis CTO Ian Davis provided the keynote for the day, entitled if you love something…    …set it free.

He provided a broad view of how the linking capability of the web has changed the way things are connected and with participation have caused network effects to result.  But that is still at the level of linking documents together.  The Semantic Web fundamentally changes how information, machines, and people are connected.  Information semantics have been around for a while, but it is this coupling with the web that is the difference.  He conjectured that data outlasts code, meaning that Open Data is more important than Open Source; there is more structured data than unstructured, therefore people that understand structure are important; and most of the value in data is unexpected or unintended, so we should engineer for serendipity. 

He gave a couple warnings about being very clear about how you licence your data so that people know what they can & can’t do with it, and about how you control the use of some of the personal parts of data.  He made it clear that we have barely begun on the road but the goal was not to build a web of data, but to enrich lives through access to information.  Making the world a better place.

Edward M. Corrado of Binghamton University gave us an overview of the Ex Libris Open Platform strategy.  This was the topic of a previous Talking with Talis podcast with Ex Libris CSO  Oren Beit-Arie.  Edward set the scene as to why APIs were important to get data out of a library system He then explained the internal (formalised design, documentation, implementation and publishing of APIs) and external (publish documentation, host community code, provide tools, and opportunities for face to face meetings with customers) initiatives from Ex Libris.  The fact that you needed to log in to an open area raised, as it has before, some comments on the background IRC channel.

The final two full presentations of the day demonstrated two very different results of applying linking data to services. Adam Soroka, of the University of Virginia, showed how Geospatial data could be linked to bibliographic data with fascinating results. Whereas Chris Beer and Courtney Michael, from WGBH Media Library and Archives showed some innovative simple techniques for representing relationships between people and data.

The day was drawn to a close with a set of 5 minute lightening talks, a feature of all three days.  These lightening talks are one of the gems of the Code4lib conference a rapid dip in to what people are doing or thinking about.  They are unstructured and folks put their name on a list to talk about whatever they want.  The vast majority of these are are fascinating to watch.

During the conference the voting for Code4lib 2010 was completed so we now know that it will all take place again next year in Asheville, NC.  From the above picture, I can’t wait.

Technorati Tags: ,,