Panlibus Blog

Archive for the 'Talis Platform' Category

Linked Data and Libraries – videos published

The Linked Data and Libraries event held at the British Library last month was a very successful event attended by many interested in the impact and possibilities of these new techniques and technologies for libraries.

Many travelled from the far corners of the UK and Europe, but from the several emails I received it was clear that many others could not make it.  To that end we took along the technology to capture as much of the event as possible.

The videos have now been edited and published on our sister blog, Nodalities, where you will also find links to the associated presentation slides.  I can highly recommend these as an introduction to the topic and an overview of the thinking and activities in this area from such as the British Library and Europeana.

Will Linked Data mean an early end for Marc & RDA

For the uninitiated, NGC4LIB is a library focused mailing list which has a reputation for often engaging in massive discussions and disagreements around the minutiae of future cataloguing and library focused metadata practices.  They have recently been involved in one of these great debates stimulated by the comments of Sir Tim Berners-Lee in a recent interview.    As is often is the case on this list, the debate wandered well off topic in to the realms of FRBR and it’s alternatives before being brought back on topic by Jim Weinheimer, who started the conversation in the first place.

A statement in Jim’s contribution caught my eye:

Implementing linked data, although it would be great, is years and years away from any kind of practical implementation

hmg.gov.uk_data Implementing linked data is already well underway with many groups across the Globe.  For instance there are couple that we at Talis are closely involved with.  Following on from Sir Tim’s interview comments, the British Government are currently running a, soon to be opened, closed beta of data.gov.uk.  Through this site they are not only opening up data in many forms such as CSV, like their American cousins at data.gov, but they are also starting to encode in RDF and publishing it via the Talis Platform which provides a SPARQL (the query language of the Linked Data web) end point.  This approach not only lets anyone download the raw data, but also enables them to query it for whatever they have in mind. If you want a sneak preview of how such data is queried, take a look at some of theses examples.   In a similar vein, metadata from BBC programmes and music is being harvested in to Talis Platform stores.  Again these are open to anyone to innovate with – check out these screencasts  to see some of the early possibilities.

Ah but that is not bibliographic data, I hear someone cry – It’ll never catch on in libraries.  I get the impression from some comments on the NGC4LIB list, that it will not be possible for ‘our’ data to participate in this Link Data web until ‘we’ have predicted all possible uses for it, analysed them, and developed a metadata standard to cope with every eventuality.   There are already a few examples of the library world engaging with RDF and Linked data, one obvious one being the Library of  Congress with LCSH another the National Library of Sweden.  Neither of these examples are encoding the kind of detail you would expect in a Marc record, they are using ontology to describe associated concepts such as subjects.

There has been some ontology development towards this larger goal with Bibo (Bibliographic Ontology Specification).  Although not there yet, Bibo is good enough to be used in live applications whishing to encode bibliographic data.  Such an example is Talis Aspire.  Underpinned by the same Platform as the UK Government and BBC Linked Data services, it uses the Bibo ontology to describe resources an an academic context

Alongside data.gov.uk there is a Google Group conversation taking place. The refreshing part of this conversation is that it is between the producers of the data sets, those developing the way it should be encoded in to RDF, and those who want to consume it.  Several times you will see a difference of opinion between those that want to describe the data to it’s fullest, and those that wish to extract the most value from it. “I agree that is a cleaner way of encoding, but can you imagine how complex the query will be to extract what I want!”.  This approach is not unusual in the Linked Data world, where producers and consumers get together, pragmatically evolving a way forward.  Dataincubator.org is an open place where such pragmatic development and evolution is taking place.  Check out examples of a subset of Open Library data. (note this is an example of data, not a user interface).

Semantic Library _ Mark Twain Another, bibliographic focused, experiment can be found at semanticlibrary.org. From some of the example links on the home page, you can see that building in this way enables very different ways of exploring metadata.  People, subjects, publishers, works, editions, series, all being equally valid starting points to explore from.

Doth the bell toll for Marc and RDA?
Not for a long old time – Ontology like Bibo, and the results of work at Dataincubator.org and semanticlibrary.org, may well lead to more open useful, and most importantly linked, access to data previously limited to library search interfaces.  That data has to come from somewhere though, and the massive global network of libraries encoding their data using Marc ,and maybe soon RDA, are ideally placed to continue producing rich bibliographic metadata.  Metadata to be fed in to Linked Data web in the most appropriate form for that purpose.  There will continue to be a place for current cataloguing practices and processes for a significant period -supporting and enabling the bibliographic part of the Linked Data web, not being replaced by it.

No doubt the NGC4LIB conversation on this topic will continue. Regardless of how it progresses, there is a current need and desire for bibliographic data in the linked data web.  The people behind that desire, and the innovation to satisfy it, may well have come up with a satisfactory solution, for them, whilst we are still talking.

LibLime Cause Upset in the Open Source Community

LibLime_logo Roy Tennant, in a blog post with a title you have to read twice, draws our attention to moves from Open Source Library Systems company LibLime which is causing much angst from supporters of Open Source.

He reproduces comments from Joan Ransom on Library Matters:

Horowhenua Library Trust developed Koha, the world’s first open source library management system back in 2000. We gave it to the world in the spirit of community. We are very happy, delighted in fact, for any organisation or individual to take it, improve it and then give their improvements back.

Recipricocity is the keystone which gives strength to the Koha Community.

We do not begrudge vendors taking our gift and building a commercial enterprise out of it, as Liblime, Biblibre and any number of others have done, but the deal is that you give back. This has worked well for a decade and Liblime has been a strong, valued and much appreciated member of the Koha international community over that time.

So it is incredibly sad and disappointing that Liblime has decided to breach the spirit of the Koha project and offer a ‘Liblime clients only’ version of Koha. Let’s call it what it is: vendor lockin and a fork.

Others including Marshall Breeding have also commented.

From the trails of comments around these posts, I get the impression that most of the upset folks are taking offence about the perceived intentions of a previously lauded open source champion who is now grappling with the commercial and operational realities of running a business that provides key services to key customers.

Even if LibLime were to turn their back on the community aspect of Koha today [their press release indicates that they are not doing that], they should still be praised for moving forward that community far further than it would ever have reached without the involvement of such a commercial organisation. 

I would suggest though that, having been immersed in the Open Source world for so long, they should have expected such a backlash of an almost religious nature and handled this much better. 

The world [not just in libraries] is rapidly moving towards Cloud Computing, Software-as-a-service, hosted solutions  There is bound to be a tension between a community mostly made up of people who develop, and often look after there own local copy of, a software instance, and an organisation that aspires to run a service of the same/similar functionality for many customers on a hosted commercial basis.

Local experience here at Talis tells me that the velocity and pattern of development is very different for SaaS applications and services.  One that does not fit in very well with the traditional process of delivering software both open and closed source. 

Open Source is a valuable contribution that must be fostered, encouraged and promoted because the innovation that it generates is a valuable asset for all of us.  Experience with projects such as Juice and Jangle reinforce this. Nevertheless there are commercial and contractual realities that companies such as LinbLime have to take in to account, which may lead to others questioning their motives as we have seen over the last few days.

 

Mashed Libraries

I’m sat at the moment amongst such a collection of library UK geekdom that I’ve not experienced the like before.  I’m in the basement of Birkbeck College in London for the Mashed Libraries UK 2008 event sponsored by UKOLN and organised by Owen Stephens.

Apart from the acrobatics of trying to get on to the wifi, which I’m sure could be made more than a little simpler and less frustrating, the day has got off to a great start.  Rob Styles did a great, mostly command line driven, introduction to using Talis Platform stores.  He was followed by Tony Hurst sharing his experiences, tips, and tricks, for using online tools such as Yahoo Pipes and the spreadsheet elements of Google Docs. This was an excellent session – each time I return to Yahoo Pipes I am amazed anew and wondering why I don’t use it more.  

Next we had Timm-Martin Siewert from Ex Libris, who gave an overview of their Open Platform Strategy, and a peek in to EL Commons.  This was the subject of a recent Talking with Talis podcast with Oren Beit-Arie Ex Libris Chief Strategy Officer.  Like myself in the podcast, others today questioned why EL Commons, being  a commons, is not open to all.

A previous colleague of mine from way back, Mark Allcock  now with OCLC then gave us a brief overview of readily available APIs from them.  Finally Ashley Sanders talked about some API work at COPAC.

After an excellent lunch, small groups formed resulting in much chatting and coding.

The afternoon was punctuated by a presentation from Paul Bevan, of the National Library of Wales.  Paul took us through the issues in how they are taking their resources to the majority of visitors – online.

That brought us to the end of the afternoon and some short reports on what people had been working.  Unsurprisingly from the presentations that started the day,  there were several groups who had made great progress using Yahoo Pipes and the Talis Platform and in several cases both of these.  For example via Pipes one group were pulling book records from Amazon, adding Jacket images then augmenting them with holdings data from the Platform. Another plotted library locations for records from the Platform, on a Google Map by again using holdings data and also location data from the Silkworm Directory.

All in all an excellent day enjoyed by thirty plus people interested in using technology to improve libraries.  There is already talk of the next one.  Well done Owen for organising this one.

Update: Dave Pattern has uploaded several photos of the day to Flickr – the image above being one of them.

Take your data with you….

As InfoWorld reported earlier this week: Google CEO Eric Schmidt, at the Web 2.0 Summit in San Francisco said Google wants to make the information it stores for its users easily portable so they can export it to a competing service if they are dissatisfied. He went on:

Making it simple for users to walk away from a Google service with which they are unhappy keeps the company honest and on its toes, and Google competitors should embrace this data portability principle

If you look at the historical large company behavior, they ultimately do things to protect their business practices or monopoly or what have you, against the choice of the users.

The more we can, for example, let users move their data around, never trap the data of an end-user, let them move it if they don’t like us, the better.

I wonder what Google’s opinions are on sharing data. Its one thing for you to be happy for your leaving customers to take their data with them [in a usable format] it is another to be happy to share the data of your current customers [with their permission] with your competitors to add value to your customers lives.

Obviously Schmidt’s comments are aimed at individual users of Google’s hosted software-as-a-service applications. Will this attitude cover aggregations of broader data – digitized book contents for instance? One key to the open movement of open data between those organizations who hold and allow access to it, is licensing. Discussion around licensing in the Open data world is a topic increasing in volume.

In presentations I attended at the recent Stellenbosch Symposium it was made clear that the research community should be discouraged from signing over all rights to their publications to publishers – some right to hold Open access copies in their institution’s repository should be retained. Then there is the constant justification by Google around what they are doing with the digitization of books. There is the Free Our Data campaign in the UK, and many other examples.

There is also the discussions around not only how things are licensed, but what can be the subject of a license.

The Talis Community License (TCL), which has received some Web 2.0 Summit coverage of its own on TechWeb, addresses a hole in the current spectrum of open data licensing which is not covered by the Open Source licenses such as GPL at one end, or by the Creative Commons movement at the other. Both of these cover creative output – source code and creative works respectively.

The problem comes when you try to protect [or enable] the use of ‘an aggregation’ of either facts [which in themselves can not be copyrighted] or individually protected/licensed elements in a data set. In Europe this access to an aggregation is covered by something called Database Right, but this is not a Global phenomenon.

To many this hole in the spectrum of Open Data Licensing is not obvious, and only becomes apparent after working through some examples. As the realisation for the need of something like the TCL spreads across the community we hope that it represents a useful contribution to the evolution of Open Licensing.

In a separate but associated discussion that is emerging around the Open availability of bibliographic records, LibraryThing‘s Tim Spalding made the following comment on the Code4Lib listserv:

As I’ve been saying at conferences, anyone who wants to build an open-source repository of MARC records, with or without wiki-like access, will get my (and LibraryThing’s) direct support. I think it’s going to happen. I only we had the time to do it directly. Maybe we’ll get to it if no one else does….

…An open-source alternative to the current system is going to happen. The only question is when. The project is doable, and would be of enormous importance.

So where do the non-libraries and small libraries who do not want, or more likely cannot afford, to pay expensive fees to get at bibliographic records go at the moment? This has to change.

One of Tim O’Rielly‘s original key aspects of Web 2.0 is ‘Data as the driving force’ – its been a slow boiler but that is starting to become more obvious by the day.

Technorati Tags: , , , , , , , , , ,

Why Nodalities?

I read the Panlibus blog – I note Talis has another house blog called Nodalities – why is this and why/who should be reading it??”

One of the major recurring themes from myself and others in Panlibus postings is Library 2.0 and its more general cousin Web 2.0. If you followed the links I provided to their descriptions in Wikipedia you will have discovered that they are both labels for a collection of attributes as against specifications.

I have yet to read a complete concise definition of what Web 2.0 or Library 2.0 ‘is’ [and probably never will], nevertheless it is far simper to look at an application or service and pronounce to the world that it is very Web 2.0 and be fairly confident that people will understand what you mean.

Web 2.0 is virtually all about technology, Web Services, Service Oriented Architecture, Social Networking tools, etc. etc., whereas it’s Library relative mixes all of that with a heavy dose of using those Web 2.0 tools and the customer handling & social skills of the library community to provide a better service to library users. – Debates about the use of mobile phones, and the provision of coffee, in a Library environment are often found in the Library 2.0 world.

We at Talis are the ‘Technology Guys’ in the Library equation, and although interested in all that is debated, our motivations are all about how new and emerging technologies [currently labelled Web 2.0] can be beneficially applied in the Library world. To this end you will find me and my colleagues evangelising on the subject both here and at conferences around the world such as these: Access2006, Internet Librarian International, Stellenbosch Symposium, Internet Librarian 2006, and the Charleston Conference.

The Talis Platform is an excellent example of applying Web 2.0, Semantic Web [to mention another ‘label’], SOA, and other technologies to provide innovative solutions to the liberating of library data, functionality, and services for the benefit of all.

In the process of proposing and delivering those [currently library specific] solutions, we are pushing both the theoretical and practical boundaries of web technologies and the theories and standards that are behind them – especially in the World Wide Web Consortium where you find Talis involved with several comittees. In doing this we are very active members, with much to contribute and say, of the world community driving forward these technologies.

This is where Nodalities comes in. You will note [today] that there is a posting from me picking up points from the blogs of Ian Davis and Sam Tunnicliffe, from our Platform Team, who are currently at the Web 2.0 Summit in San Francisco. If you are interested, like I am, in the way that all things Web are [and are being predicted to be] moving, you will find what they are reporting most engrossing.

Reading between the lines of what is being presented it is clear that the advances already being demonstrated by the Talis Platform are only the first step in a massive change in the way large sets of data and metadata (often only linked by semantics), can be marshalled, related together, and combined to change the way information is used in the future.

Dependant on the context, you will find Talis people attending and/or speaking at both Library and more general conferences across the world. Our knowledge, and understanding, of the issues surrounding the library and information industries is very valuable input into the wider technology world. As we have demonstrated this is a two way street. It is absolutely certain that our knowledge and understanding of the Web 2.0 world is already adding unique value to the world of libraries.

So to answer the question at the start of this posting…..

If you are in the library community and want to keep abreast of technology advancements – read Panlibus. If you are in the wider web community and are interested in what we are doing, and have to say about, applying these technologies as a Platform in real world situations – read Nodalities. I suspect most people, although with concentration on one, will find postings of interest in both Panlibus and Nodalities.

Technorati Tags: , , , , , ,

Get somebody else to do it!

37,000 feet above what I think should be the Sahara desert (not that I can tell as it is pitch black outside the window of this South African Airways 747) in a mini power cut.

How smug did I feel, after listening to Paul Miller’s complaints in his Access 2006 presentation (podcast here) that he had no seat-back screen on his flight to Canada, to find just the thing in my seat on this flight to Johannesburg heading towards the upcoming Stellenbosch Symposium. My smugness bubble was soon burst upon discovering that I was in the middle of a block of twelve seats with power failure – no reading light, no music, no personal entertainment system ! ;-{ So me, and the group of ten Belgian tourists I seem to have ended up in the middle of, have had to resort to that traditional participative pastime of conversation – there are some traditions that are worth maintaining.

There are some things though that benefit from technological advances. From my earlier postings you would quite rightly get the impression that I think some of the things Amazon are doing with their utility web services (S3, SQS, EC2, MT) are pretty damn cool. I already personally use a nifty tool called JungleDisk to back up the 4Gb of data on my home PC (when do they get the time to listen to all that music, and will they ever stop storing their mp3′s in with their documents and spreadsheets) in the Amazon Simple Storage Service (S3) for less than $2 per month.

S3 came to the rescue on another front. Because I like using images to liven up my presentations the PowerPoint file for my keynote in Stellenbosch runs to a whole 22Mb. Getting something that size to a couple of people in advance is not the easiest of tasks as it would give many of the most accommodating email systems indigestion. Whilst scratching my head about this problem, I suddenly had one of those well durrr moments that we get from time to time. Upload the file to S3, make it publicly visible, and let Amazon and the recipients web browsers do the work for me – simple. So, with the aid of another bit of nifty software I can recommend – John Spurlock’s NS3, thats exactly what I did. Another knock-on benefit that didn’t initially occur to me, is the piece of mind that if I loose the memory stick in my pocket, and the back up CD goes missing with my luggage, at the same time as my laptop has a nervous breakdown, all I need is access to a browser and I can get my presentation on line in a few minutes.

I don’t think I’m alone in having a recent well durrr. I think the technical team behind Second Life had one too:

The client you download may just seem like a 5-minute nuisance to you. Magnified ten thousand times, it becomes a severe issue for our webservers on days when we release a new version- tens of thousands of people all rushing to download them at the same time. An average of 30 MB per download, multiplied by however many folks who want to login to this Second Life thing, comes out to a lot of bits

Rather than continue to pile on webservers just for this purpose, which has somewhat diminishing returns, we have elected to move the client download over to Amazon’s S3 service, which is basically a big file server.

How many teams behind academic/library projects, startups etc., must there be out there worrying about sizing their servers, backing their data up, and guesstimating the bandwidth required if they become popular? If I was in their position I would be seriously considering offloading the job to someone else for a few dollars a month on my credit card. – Ah no credit card, that is a massive obstacle for many an institution!.

This is starting to sound like a sales pitch for Amazon, Its not intended to be. (but if Jeff is listening – remember your friends at Christmas time) If you want raw compute power, or storage and distributing of files, heavy lifting done for you, you could do you self a favor and take a look at what Amazon are doing.

But what if you need a more specialized heavy lifting. What about the storage, indexing, and searching of bibliographic data? What about the augmenting of such data with book-jacket images; links to disparate but related information such as articles in Wikipediea, reviews, etc; library holdings records; links in to those libraries’ OPACs? All doable individually by many a project team, but all of it without compromising your response to deliver it yesterday with a new cool user-interface? And without having to create yet another updated version of the last application you built, from scratch?

The Talis Platform, or more specifically its component services Silkworm (open directory for Collections, Locations, OPAC deep-link definitions, Collection Groupings, and potentially much much more), Bigfoot (highly scalable large data stores, designed to hold, index, search, and augment generic data), Symphony (possibly a new one for you Talis project name spotters out their – orchestrates the interaction between other platform services), is getting ready to saddle up an deliver a few well durrr moments in our world.

I say getting ready, as we are still putting a few things in place like expanding the API documentation in TDN to cover the Bigfoot APIs (mind you based on the play with it and discover how to use it yourself approach that I blogged about recently, its questionable how much documentation you need), but as demonstrated by Project Cenote there is plenty there already.

Like it or hate it, the Cenote interface is very different in its look. It is also very different in its construction – its all UI and no application. By that I mean, all the Cenote team had to worry about was capturing user input and displaying bibliographic results in a stunning interface. How the data behind it was collected, stored, indexed, and searched was never a concern for them – they got somebody else to do that. The platform is doing all the heavy lifting for them. It is, can and will do it for others well durrr.

Want to know a bit more? – Just ask, either here or in the TDN

Technorati Tags: , , , , , ,

A cloud of clouds

Let me start with a question – what is the collective noun for clouds? In trying to dream up a catchy title for this post, which you will discover once I’ve stopped waffling is about Word Clouds, I tried to discover from colleagues and places like answers.com what you call a collection of clouds. Answers received so far: a host, a storm, a front, and the one I chose – a cloud. I’m sure someone out there will be able to put me right on this, I’ll be monitoring the comments with interest.

Anyway, why am I so interested in [word] clouds all of a sudden? Well its is not all of a sudden, I’ve been interested word/tag clouds as a device for serendipitous browsing through a set of meta data based upon the popularity of words within, or tags associated with, information, for a while.

Flickr, Technorati, and LibraryThing, are all well know examples of the use of these clouds in a user interface. More examples are appearing almost daily.

The thing that triggered me to write this post was the appearance of a word cloud on the site for the BBC’s radio station Radio 1. Scroll down to the bottom of the page and you should see a display of the most popular words contained in SMS text messages sent to the station. This is refreshed every couple of minutes or so, so gives an insight in to what the station’s audience is thinking about. With the station receiving often in excess of 1,000 messages per hour, the theme behind the words displayed is an aggregate of a fair amount of input. The tool that displays this also checks for well know words, like the name of a group or DJ, and makes them a clickable link to more information.

The thing that struck me about this implementation is that the BBC just put it there with no explanation or hints, expecting that their online audience will understand that words in larger fonts are more popular than others in smaller fonts and the ones in blue are clickable. Not that many months ago I remember having to explain those concepts to those seeing Flickr and del.icio.us tag clouds for the first time.

The Web 2.0/Library 2.0 world is one where new user interface metaphors appear and become accepted very rapidly. Although, I am still aware of some libraries who shy away from making changes to their OPACs until ‘there has been training‘. All I can say to such organizations is that I think you will find your online audience is more astute and open to change than you think. By all means offer some ‘How to get the most from the new features’ sessions, but if you have to train in the basics you have probably got your interface wrong.

Another thing that made me think about word clouds today, was a comment that somebody made in a telephone conversation about the Aquabrowser OnLine trials of libraries, such as Islington Libraries, who have contributed to the Talis Platform, that I posted about the other day. The comment passed on from a further education college was that the word cloud in the Aquabrowser OnLine interface could be of great help to those with dyslexic problems identifying different spellings etc. Another good example of how offering access to data by using new and innovative user interface metaphors, in addition to the traditional ones, can have unexpected beneficial consequences.

Technorati Tags: , , , , , , , ,

Is there a place for P2P

David Bigwood was thinking out loud the other day in his Catalogablog
posting P2P OPACs

Here’s an idea, not even half-baked, how about peer-to-peer (P2P) networks of OPACs? Only available items would display. I’d get to pick the institutions I’d have display and whether to display non-circulating items. Something like Limewire.

Having struggled with the effects of teenage family members installing Limewire and its predecessors on the home PC, and with how we scale the traditional search of a single library’s collection up to a reliable performant query of information within overlapping ad hoc groups of library collections, I have also wondered if the P2P (peer-to-peer) technologies underpinning the former could be helpful with the latter.

David’s thought, of using P2P and the music sharing application Limewire as an example, when you deconsruct it is attempting to address a few well known problems in the library domain.

  • Identifying and locating Library collections – how the collection is described, physically located, and accessed electronically are all concerns in this area which resource directories, many which have come and gone, have attempted to address. In the music sharing P2P world, the major concern is getting a copy of the file with little concern as to where it comes from.

    There are several current examples of these library directories around, often limited by project, type/size of library, geographic location, commercial constraints, etc. Then there is the Silkworm Directory in the Talis Platform, an open wiki-like in philosophy, directory in which anyone can enter any library collection and then use an open API to query that information

  • The grouping together of an ad hoc set of library collections to search within. – These could be as organized as all the academic libraries within 50 miles of a city, or as random as a student’s university library, the local library near her dorm, and the library in her home town – totally logical to the student – random to everyone else

    A little known, as Paul Miller only mentioned it in his Access 2006 presentation(pdf) last week, aspect of the Silkworm Directory is its ability to create ad hoc groups and then query by the members of those groups.

  • The constant searching across many dissimilar collections. – Anyone who has used or tried to pull together a federated search across many library catalogs, traditionally using Z39.50, will always have horror tails of the way locally implemented indexing rules can make a mockery of search an results ranking.

    Now if we could consistently index, search, and rank in a single store all the holdings of the collections we are interested in, as defined in a directory, providing it was scalable and performant this problem would disappear. This is the approach successfully taken by the Googles of the world. It is also how the Bigfoot element of the Talis Platform operates. (see my recent posting for a description of how Bigfoot APIs are driving driving the recently announced Project Cenote interface)

  • Filter the results of a search by the libraries in a group that have holdings. – P2P, in the same way that Z39.50 federated search does, could help in this area by querying directly individual library collections. But I suspect that it would suffer the same problems as current federated search, the fastest response you get is based on the speed of the slowest resource. P2P addresses this with caching and by down loading from several places simultaneously, which are not really applicable where you are trying to get information from a specific collection.

    The Talis Platform’s holdings stores address these issues by storing, aggregated across many collections and freely contributed by libraries, holdings statements along side bibliographic stores. This is done in such away as to enable bibliographic results to be augmented with holdings information on the fly as results are returned from an API call.

  • Filter the results of a search by libraries that have in stock items. – This final step is probably the most difficult to solve in a live situation as any store can become out of date at any time that a book is borrowed from a particular collection. P2P may well have valuable application in this area, be it filtering a results set of known holdings, or keeping stores up to date on a minute by minute basis.

It remains to bee seen as to how P2P could be used, but it should not be dismissed as only a technique used for [often illegal] music downloading

David says his thought might be ‘half-baked’, but there are some useful ingredients in his recipe. How well some of them would scale in the wider library environment I’m not so sure, but a hybrid of P2P with some of the high volume, scalable, performent, open data, open API, aspects of the Talis platform – now that may well have legs.

Technorati Tags: , , , , , , ,

The beauty is in the API of the beholder

I published my posting about the announcement of Project Cenote, when Paul Miller was up talking about it at Access 2006 in Ottawa. Whilst I was doing it I was monitoring the #code4lib IRC chat channel, which seemed to be totally populated by people in his audience. I could tell when Paul mentioned Cenote for the first time – the following comment appeared in the channel “wow, a project named after a deep water-filled hole where humans were tossed as sacrifices…

I could do a whole posting about the idiosyncrasies of Talis project naming, but you are safe I’ll refrain for the moment. Still there is a tenuous connection between a deep water-filled hole and the distinctive application that is Project Cenote – ‘hidden depths’. (I said it was tenuous!).

Underpinning the sleek black Cenote UI are a set of new powerful Talis Platform APIs, joining those already driving things such as Talis Whisper, LibraryThingThing, and Herefordshire’s LibMap. These APIs are so new that the documentation for them is not yet published in TDN

So pin your eyelids back here comes a pre-documentation sneak preview.

Anyone who has played with APIs before is probably sceptically wondering how I can sensibly talk about an API without the documentation. Well, these APIs were designed and written with ease of discovery in mind. Like all APIs you need a base URL to start from. This URL for the API to search UK Bibliographic items is http://api.talis.com/bf/stores/ukbib/items. Also like most APIs you need to add some parameters to get the call to work for you, but where these Platform APIs differ is what they do when you don’t supply such parameters – no ‘page not found‘, 404, or other unhelpful html error. What you get is a helpful html page giving you direct access to the API – go on, click the link and see. Once there, type in a query and click search.

You should have ended up with a page that looks like thisyes I know it looks like XML gobbledygook, but if you scroll down a bit you will see the bibliographic results nicely wrapped waiting for an application to pick them out.

The default page you are presented with has a single query prompt, type in a search and click search and you will be presented with two things. Firstly, the XML/RDF formatted results and secondly in your browser address prompt the API call that returned them. For the bibuk store you can enter keywords or by using terms prefixed by a search type (eg. ‘title:war and peace’, ‘author:rowling’, ‘subject:history’, etc.). There are other stores wikipedia containing Wikipedia article abstracts; holdings contains holdings details for libraries which have contributed to the Platform (currently ISBN is the only search query for holdings); and cnimages for book jacket images (again ISBN is the currently supported search).

Pretty cool, but thats only the half of it.

With applications like Cenote you want to add value to the bib results with information such as book jackets, holdings information, etc. Yes you could call the Wikipedia abstract store API with the id for each item, but that would be a bit long-winded. Click on this link. You should be looking at the default page for the augmentation service for the Wikipedia abstracts store. Copy this URL http://api.talis.com/bf/stores/ukbib/items?query=pirates in to the prompt – click ‘Augment’ and see what you get. I squint at the returned XML should reveal that the bib results now have wikipedia abstract data included with them. The same effect can be obtained from the augment service of the book jacket images and holdings stores. – now that is impressive.

Here are the results from augmenting bib results with library holdings information. – Very cool!

I know I work with the guys who are producing this stuff, but I can’t hold back from a hat tip in their direction. This is how APIs should be built – designed to be easily understood and with the consumer in mind. You should be able to test out and see the results of what you want to without having to write a single line of code.

I’m sure someone out there is thinking, How do you argument a set of results with data from more than one store?. Well that has been thought of, and the orchestration of such things is part of another Platform API set which is well on its way to being released. You’ll just have to be a little patient.

For the XML averse among you this posting might have been a bit technical for you [sorry] but hopefully you will see that the people who produced Cenote only had to worry about how it looked and felt, leaving the heavy lifting bit of searching the data and augmenting it from other sources to the Talis Platform. An I think you will agree, only having to concentrate on the UI shows in the resultant application.

For the Talis Project name spotters reading this, you have probably identified that these APIs come from a Platform component called Bigfoot. Suffice to say the vision behind Bigfoot is:

“Bigfoot is a zero-setup, multi-tenant content and metadata storage facility capable of storing and querying across very large datasets.”

Anyway I’m all API’d out now. I’m hoping to expand this in to a TDN API user guide, so watch out for that. If in the meantime you want to know more, post a message on the TDN <a href=”http://www.talis.com/tdn/forum/75″Talis Platform Forum or drop me a line.