Panlibus Blog

Archive for the 'Networked linked environment' Category

Linking the data dots …

Something-else I joined the world of libraries some twenty years ago and was immediately impressed with the implicit mission to share, the knowledge and information that libraries look after, with all for the benefit of mankind.  Being part of an organisation that was built out of a desire for libraries to share the data that backs up that knowledge, I was surprised how insular and protectionist with that data this world can be.  Some of the reasons behind this were technological.  Anyone who played with Z39.50 and other similar, so called, standards before the century turned will relate you war stories about the joys of sharing data.  Then along came the web [I know it started in the mid 1990s, but it really came along post the .com boom] and things started to get much easier, from a technology point of view anyway.

So why do libraries make it so difficult to share the data that they hold.  This isn’t the stuff that they hold – that is a whole other can of worms.  This is the data about the stuff they hold. The data that helps mankind find what mankind can benefit from.  I know we have a lot of history to get over, but the commercial world recognised ages ago that if you share information about what you have and what you do, people find what they are looking for and you get more business.

However, negatively ranting on about the insular outlook of some in library land, was not my purpose in writing this post.  You may know that I, and Talis, are involved in the emerging world of Linked Data.  Over recent months I have found myself immersed in other parallel universes such as National and Local Government, newspapers,  broadcast media, and finance systems.  It therefore was a great pleasure, to find my self organising a Linked Data and Libraries event at the British Library last week.  Sir Tim Berners-Lee’s vision of a Web of Data, complementing the current web of documents, utilising a collection of standards and techniques known as Linked Data, is all about sharing and linking data across organisations and sectors for the benefit of mankind – sound familiar?.

It was very refreshing to see the amount of interest this day attracted.  As you will see from the presentations from the day, made available via our sister Nodalities blog, there are many libraries and library organisations actively engaged with this.   Several, such as The German National Library, have released traditional (sourced from Marc records) bibliographic data in a Linked Data form using RDF.  Others, such as VIAF hosted by OCLC and the Library of Congress Authorities are providing RDF as one of the formats openly available from their service.  The Bibliothèque nationale de France is in the process of inviting tenders for an entire new system to open up their data and holdings as Linked Data.

It is fair to say that most of these initiatives are coming from National, International,large and cooperative libraries, but the interest is already trickling down to smaller libraries especially in the academic sector.  It is also fair to say that most who are engaged in thinking about Linked Data and libraries, take on board Sir Tim’s point about many of the benefits coming from the linking of data between sectors – libraries and science and government and the media and commerce and education and leisure and…

So despite my frustrations about the library world, still very evident in some circles, I am becoming more positive about libraries being able to fulfil their mankind benefiting mission as the web of data emerges.  The changing of influential attitudes, and the move to different underlying data formats, may help us leave behind some emotional and legal baggage.

Anyone who has followed us for a while will know that we have been banging on about, and implementing, semantic web and linked data techniques and technologies for many years.  It is great when others start to ‘get it’ and you stop having to be one of a few voices in the wilderness.  These changes will not happen to all libraries over night, but it is nice to swap frustration at the lack of vision and ambition for frustration at a lack of progress – something I think I was born with.

Google Book Settlement will help stimulate eBook availability in libraries

books_logo So says former Google Book Search product manager Frances Haugen in her contribution to the debate on the September Library 2.0 Gang.

This month’s Gang was kicked off by Orion Pozo from NCSU, where they have rolled out dozens of Kindles and a couple of Sony Readers.  The comparative success of their Kindles ahead of the Sony Reader appears to be because of the simpler process of distributing purchased books across sets of readers and a broader selection of titles at a lower cost.  Currently users request books for the Kindle via an online selection form, then they are purchased and downloaded on to the devices which are then loaned out.  There were no restrictions on titles purchased and they have an approximate 50% split between fiction and non-fiction.

L2Gbanner144-plainThe Gang discussed the drivers that will eventually lead to the wide adoption of eBooks.  This included things like the emergence of open eBook standards, and the evolution of devices, other than dedicated readers, that can provide an acceptable reading experience.   Carl Grant shared his experience of starting a read on his Kindle and then picking it up from where he left off on his iPhone (as he joined his wife whilst shopping).

An obvious issue influencing the availability of eBooks is licensing and author and publisher rights.  This is where the Google Book Settlement comes in to play.  If it works out as she hopes, Frances predicts that over time this will facilitate broader availability of currently unavailable titles.  I paraphrase:

[From approx 26:50] Institutional subscriptions will become available on the 10M books that Google has scanned so far.  Imagine in the future a user with a reader that accepts open formats will be able to get access to the books this institutional license would provide.  Imagine school children having access to 10M books that their library subscribe to, instead of having to formally request one-off books to be added to their device.

[From approx 44:50] There are a huge number of books that are no longer commercially available in the US, for several reasons.  If the rights holders of those books do not opt-out, they will become available for people to purchase access to.  One of the interesting things about the way the settlement is set-up is that you will be able to purchase access either directly or through an institutional subscription.  What is neat is that cycle will put a check on prices as prices for individual books are based upon the demand for the books. So less poplar books will cost less…  So if the price of the institutional subscription ever gets too high libraries can decide to buy one-offs of these books.   I think that whole economic mechanism will substantially increase access to books.

The Gang were in agreement that eBooks will soon overtake paper ones as the de facto delivery format.  It is just a question of how soon.  Some believe that this will be much more rapid than many librarians expect.  A challenge for librarians to take their services in to this eReading world. 

RIN’s Michael Jubb Talks with Talis about bibliographic records in a networked world

michael-jub Dr Michael Jubb, Director of the Research Information Network, is my guest for this podcast.

The RIN was established by the higher education funding councils, the research councils, and the national libraries in the UK to investigate how efficient and effective the information services provided for the UK research community are.

As part of their role, they publish many reports to inform and create debate to lead to real change.  Our conversation focuses on the recently published “Creating catalogues: bibliographic records in a networked world”, which explores the production and supply chain of bibliographic metadata for physical and electronic books, journals, and articles.  We discuss the need for the report, and therefore change in this area, its recommendations and possible ways forward.

Will the eBook make it across the chasm

I’m currently hurtling through the English countryside on a Wifi enabled train having spent the day at E-books and E-content 2009 held at University College London.  An interesting and stimulating day  with a well matched but varied set of speakers, including yours truly (presentation on SlideShare).  The eighty strong audience were also a varied selection from academic libraries, academia in general, publishers and the information media.

The move towards a web of data, enabled by the emergence of semantic web technologies and practices, was one of my themes. Another was a plea for content publishers and providers to deliver their content to the user where he/she is.  Not expecting them to be driven to their site with a totally different interface.  This is a difficult one for the eContent industry, at a time when the publishers are in the middle of a “my platform is better than yours” battle.  Nevertheless, a student wants the content their course has recommended, not caring who published it or which aggregator their library licensed it from.

adoption curve In laying the ground, I initially discussed the technology adoption curve and how technologies don’t become mainstream overnight.  Any new technology, or new way of doing things, follows a standard pattern with a small number of innovators taking the initial often enthusiastic risk.  The early adopters then build on the innovators’ success and and join in, still very early with some risk. When the new way has been proven, adoption has increased and both costs and risk have fallen, the early and late majorities take it to mass acceptance and adoption.  This only leaves the laggards, who will only come on board if forced by circumstance.

As an adjunct to the adoption curve, I spoke about a chasm which technologies have to cross, between the early adopters and the early majority before they take off.  There are many promising technologies that failed to cross that chasm.  For example, technology watchers at the time predicted that the mini-disc would replace the cassette tape, but as we know the CD took that prize.

Today’s conference was mostly focussed on the eBook and it’s impact on libraries and publishers.  This is on the assumption that it will be the way of delivering book sized pieces of content in the approaching digital world.  In answer to a challenging question for the end of day panel, I concluded that this is by no means certain.  I believe direct access to articles will eventually see the end of the traditional journal issue format. In a similar way I believe there is a good chance that chunks of content, that are today of book size, may well be assembled and delivered in a digital object as yet to be identified.

So will the eBook jump the adoption chasm?  If I was a betting man I would only back it on an each way basis.  I believe that anyone betting their whole business model on it being a certain winner, may just be taking too much of a risk.

Photo from mstorz published on Flickr

Library of Congress launch Linked Data Subject Headings

Back in December I was very critical of the Library of Congress for forcing the take down of the Linked Data service at lcsh.info.  LoC employee, and Talking with Talis Interviewee, Ed Summers had created a powerful and useful demonstration of how applying Linked Data principles to a LoC dataset  such as the Library of Congress Subject Headings could deliver an open asset to add value to other systems.  Very rapidly after it’s initial release another Talking with Talis interviewee Martin Malmsten, from the Royal Library of Sweden, almost immediately made use of the links to the LCSH data.   Ed was asked to take the service down, ahead of the LoC releasing their own equivalent in the future.

I still wonder at the LoC approach to this, but that is all water under the bridge now, as they have now launched their service, under the snappy title of “Authorities & Vocabularies” at http://id.loc.gov/authorities/.

The Library of Congress Authorities and Vocabularies service enables both humans and machines to programmatically access authority data at the Library of Congress via URIs.

The first release under this banner is the aforementioned Library of Congress Subject Headings.

As well as delivering access to the information via a Linked Data service, they also provide a search interface, and a ‘visualization’ via which you can see the relationship between terms, both broader and narrower, that are held in the data.

To quote Jonathan Rochkind “id.loc.gov is AWESOME”:

Not only is it the first (so far as I know) online free search and browse of LCSH (with in fact a BETTER interace than the proprietary for-pay online alternative I’m aware of).

But it also gives you access to the data itself via BOTH a bulk download AND some limited machine-readable APIs. (RSS feeds for a simple keyword query; easy lookup of metadata about a known-item LCSH term, when you know the authority number; I don’t think there’s a SPARQL endpoint? Yet?).

On the surface, to those not yet bought in to the potential of Linked Data, and especially Linked Open Data, this may seem like an interesting but not necessarily massive leap forward.   I believe that what underpins the fairly simple functional user interface they provide will gradually become core to bibliographic data becoming a first-class citizen in the web of data.

Overnight this uri ‘http://id.loc.gov/authorities/sh85042531’ has now become the globally available, machine and human readable, reliable source for the description for the subject heading of ‘Elephants’ containing links to its related terms (in a way that both machines and humans can navigate).  This means that system developers and integrators can rely upon that link to represent a concept, not necessarily the way they want to [locally] describe it.  This should facilitate the ability for disparate systems and services to simply share concepts and therefore understanding – one of the basic principles behind the Semantic Web.

This move by the LoC has two aspects to it that should make it a success.  The first one is technical.  Adopting the approach, standards, and conventions promoted by the Linked Data community ensures a ready made developer community to use and spread the word about it.  The second, one is openness.  Anyone and everyone will not have to think ”is it OK to use this stuff” before taking advantage of this valuable asset.  Many in the bibliographic community, who seem to spend far too much time on licensing and logins, should watch and learn from this.

A bit of a bumpy ride to get here but nevertheless a great initiative from the LoC that should be welcomed.  On that I hope they and many others will build upon in many ways.  – Bring on the innovation that this will encourage.

Image from the Library of Congress Flickr photostream.

OCLC Take aim at the library automation market from the Cloud

OCLCclouds Over the last few years OCLC the US based not –for-profit cataloguing cooperative has been acquiring many for-profit organisations from the world of library automation such as PICA, Fretwell-Downing Informatics, and Sisis Information Systems. 

About fifteen months ago, Andrew Pace joined OCLC, from North Carolina State University Libraries, and was given the title of Executive Director, Networked Library Services.  After joining OCLC Andrew, who had a reputation for promoting change in the library technology sphere, almost disappeared from the radar.  

Putting these two things together, it was clear that the folks from Dublin were up to something beyond just owning a few non-US ILS vendors.

From a recent post on Andrew’s Hectic Pace blog, and press releases from OCLC themselves, we now know what that something was.  It is actually a few separate things, but the overall  approach is to deliver the functionality, traditionally provided by the ILS vendors (Innovative, SirsiDynix, Polaris, Ex Libris, etc., etc.), as services from OCLC’s data centres.   This moves the OCLC reach beyond cataloguing in to the realms of acquisitions, license management, and even circulation.

The idea of braking up the monolithic ILS (or LMS as UK libraries refer to it) is not a new one – as followers of Panlibus will know. Equally, delivering functionality as Software-as-a-Service (SaaS) has been native to the Talis Platform since its inception.  It is this that underpins already established SaaS applications Talis Prism, Talis Aspire and Talis Engage.

Both OCLC, with WorldCat Local, and Talis with Prism have been delivering public discovery interfaces (OPACs) as SaaS applications for a while now, ‡biblios.net have recently launched their social cataloguing as a service [check out the podcast with Josh Ferraro], but I think this is the first significant announcement of circulation as a service that I have been aware of.

The move to Cloud Computing, with it’s obvious benefits of economies of scale and the removal of need for libraries to be machine minders and data centre operators, is a reflection a much wider computing industry trend.  The increasing customer base of Salesforce.com, the number of organisations letting Google take care of their email, and even their whole office operation (such as the Guardian) are testament to this trend.  So the sales pitch from OCLC, and others including ourselves here at Talis, about the total cost of ownership benefits of a Cloud Computing approach are supported and validated industry wide.

So as a long time predictor of computing transforming from a set of locally managed and hosted applications to services delivered as utilities from the cloud, mirroring the same transformation for electricity generation and supply from a century ago,  I welcome this initiative by OCLC.   That’s not to say that I don’t have reservations. I do. 

The rhetoric emanating from OCLC in these announcements is reminiscent of the language of the traditional ILS vendors who are probably very concerned by this new and different encroachment on to their market place.  There is an assumption that if you get your OPAC from WorldCat (and as a FirstSearch subscriber, with this on the surface ‘free offer’,  you are probably thinking that way), you will get circulation and cataloguing and all the rest from a single supplier – OCLC.

The question that comes to mind, as with all ILS systems, is will you be able to mix and match different modules (or in this case services) from different suppliers, so that libraries can have the choice of what is best for them.  Will OCLC open up the protocols (or to be technical for a moment, the hopefully RESTful APIs) to access these application/service modules so that they can not only be used with other OCLC services but with services/applications from Open Source and other commercial vendors.  Will they take note of, or even adopt, the recommendations that will come from the OLE group [discussed in last month’s Library 2.0 Gang], that should lead towards such choice.

Some have also expressed concern that a library going down the OCLC cloud services route, will be exposing themselves to the risk of ceding to OCLC control of how all their data is used and shared, not just the bibliographic data that has been at the centre of the recent storm about record reuse policies.  Against that background, one can but wonder what OCLC’s reaction to a library’s request to openly share circulation statistics from the use of their OCLC hosted circulation service would be.  

This announcement brings to the surface many thoughts, issues, concerns and technological benefits and questions, that will no doubt rattle around the library podcasting and blogosphere for many months to come.  I also expect that in the board rooms of the the well known commercial [buy our ILS and a machine to run it on] providers, there will be many searching questions being asked about how they deal with the 500lb [not-for-profit] gorilla that has just moved from the corner of the room to start dining from their [for profit] table.

This will be really interesting to watch…..

The composite image was created using pictures published on Flickr by webhamser and Crystl.

Peter Brantley Talks with Talis as he moves to the Internet Archive

peter_brantley I first interviewed Peter Brantley, in the Talking with Talis series, in July 2007 about his role in the Digital Library Federation and its place in the world of digital libraries.

In this conversation we look back over the last couple of years at the DLF and then forward in to his new challenge and opportunity at the Internet Archive.

We go on to discuss his thoughts and plans to make it easy to identify books and  information and their locations in a way that is currently not possible with the processes and protocols we use today.

Code4lib final day in Providence – looking forward to Asheville

As always, a slightly shorter day for the last day of the conference but no less stimulating.  Talis CTO Ian Davis provided the keynote for the day, entitled if you love something…    …set it free.

He provided a broad view of how the linking capability of the web has changed the way things are connected and with participation have caused network effects to result.  But that is still at the level of linking documents together.  The Semantic Web fundamentally changes how information, machines, and people are connected.  Information semantics have been around for a while, but it is this coupling with the web that is the difference.  He conjectured that data outlasts code, meaning that Open Data is more important than Open Source; there is more structured data than unstructured, therefore people that understand structure are important; and most of the value in data is unexpected or unintended, so we should engineer for serendipity. 

He gave a couple warnings about being very clear about how you licence your data so that people know what they can & can’t do with it, and about how you control the use of some of the personal parts of data.  He made it clear that we have barely begun on the road but the goal was not to build a web of data, but to enrich lives through access to information.  Making the world a better place.

Edward M. Corrado of Binghamton University gave us an overview of the Ex Libris Open Platform strategy.  This was the topic of a previous Talking with Talis podcast with Ex Libris CSO  Oren Beit-Arie.  Edward set the scene as to why APIs were important to get data out of a library system He then explained the internal (formalised design, documentation, implementation and publishing of APIs) and external (publish documentation, host community code, provide tools, and opportunities for face to face meetings with customers) initiatives from Ex Libris.  The fact that you needed to log in to an open area raised, as it has before, some comments on the background IRC channel.

The final two full presentations of the day demonstrated two very different results of applying linking data to services. Adam Soroka, of the University of Virginia, showed how Geospatial data could be linked to bibliographic data with fascinating results. Whereas Chris Beer and Courtney Michael, from WGBH Media Library and Archives showed some innovative simple techniques for representing relationships between people and data.

The day was drawn to a close with a set of 5 minute lightening talks, a feature of all three days.  These lightening talks are one of the gems of the Code4lib conference a rapid dip in to what people are doing or thinking about.  They are unstructured and folks put their name on a list to talk about whatever they want.  The vast majority of these are are fascinating to watch.

During the conference the voting for Code4lib 2010 was completed so we now know that it will all take place again next year in Asheville, NC.  From the above picture, I can’t wait.

Technorati Tags: ,,

Google puts 1.5M books in US pockets

Picture 46 Yesterday Google Book Search announced the launch of the mobile version of Book Search for mobiles, via their blog.

They are targeting it at both iPhone and the Android phone, but I presume it will work on others – unfortunately I cannot tell as, due to “public domain differences’” outside the US they are focussing the launch to the US only.   They at this stage already have over 1.5 million books for readers to choose from.

From the screen shot opposite, you can see that the iPhone style of interface has heavily influenced the design.  From outside the US you can still access the mobile page http://books.google.com/m, which has the unnerving effect of making your browser impersonate an iPhone, as you can see below

Mobile Book Search Those that are used to the web version of Book Search will know that it has so far used scanned images of the pages to display the text – a technique that obviously that would be less than ideal for use on the small screen of a mobile device.

Google have invested much effort in taking an OCR scan of these books so that the text, as against a scanned image, is displayed on the device.  This then relies on the quality of the device in rendering text to display the contents to the reader in the best way.  They have used OCR before to identify the text within a book so that it can be searched, but getting that process good enough to produce accurate text for direct reading threw up many challenges that the Book Search team explore in their blog post.

Library 2.0 Gang heads-up
Look out for the next Library 2.0 Gang show, which is being recorded very soon.  Our guest will hopefully be a member of the Google Book Search team who will no doubt fill us in on the details behind this launch and many other aspects of Google, Books, and the world of libraries.

 

Google Analytics to analyse student course activity – Tony Hirst Talks with Talis

Tony Hirst Tony Hirst, of the Open University Department of Communications and Systems, was recognised at the Online Information Conference 2008 for his work promoting new technologies in education by being presented with a commendation in the IWR Information Professional of the Year Award.

The award took place at the end of the first day of the Online Information Conference 2008.  Earlier in the day Tony delivered a presentation entitled “Course Analytics – using Google Analytics to understand student behaviour in an online Open University course”

I caught up with Tony just after his award  and we retired to a side room to discuss what he had learnt from work with Google Analytics.

 

Picture of Tony published on Flickr by MrGluSniffer