Panlibus Blog

Archive for February, 2008

Code4Lib 2008, looking back over 3.5 days


In the air once again, not quite on the way home, but on the way to the Open Library developers meet in San Francisco. A first chance to see the bridge for me.

But I’ll talk more about Open Library after hearing what folks there have to say, for now I need to tell you all about Code4Lib 2008

Where to start? With three and a half packed days of workshops, breakouts, prepared talks and lightning talks as well as the evening socialising there’s just too much to cover. The conference was so well organised in fact that longer standing members of the Code4Lib community wondered if they were at the right conference. Roy Tennant’s absence this year also left some folks confused and disoriented.

My time began with some folks from LibLime who ran a great pre-conference session, as did Equinox (who do a lot of the work on Evergreen); no I wasn’t in two places at once. I spent the morning listening to Josh, Galen and Henri talking about Koha and Richard spent the morning with the Equinox crew talking Evergreen. They’re both great systems and I have to say getting Koha up and running to do some dev work on it was oh so simple. Dan Chudnov also got a pre-un-conference together, reports from that were good, but I didn’t quite catch what was discussed there.

The conference proper started on Tuesday, the organisation mostly done by Jeremy Frumkin of Oregon State. His organisation was so good that when he had to leave unexpectedly on the morning of day 2 it was easy for Ed Corrado, Dan Scott, Dan Chudnov and Ross Singer to step in and hold the reigns. Ross made a fantastic MC introducing each speaker in true Oscar’s style.

But enough name-checking, what about the meat of the sessions?

I spoke this year, and I knew I had a tough act to follow when I saw that I was straight up after Brewster Kahle‘s keynote. Brewster is disarmingly witty and unassuming. Known best for his role at Internet Archive he talked about that a bit and about the Open Library project. Brewster used the occasion to announce that Talis are supporting the effort by donating the millions of records that form our 40 year-old union catalogue to Open Library. This gives Open Library a great boost of UK material including records about older rare and out-of-print stuff.

Brewster’s perspective on life seems to be that things “can’t be that hard” and that if you just make a start then you’ll see how far you can get. This approach has worked well for Internet Archive and appears to be working well for Open Library too. Perhaps we could all do with a bit more of that give-it-a-go approach in library land.

I followed Brewster with a piece about mining MARC data for relationships, a piece of R&D I’ve been working on for some time now. The Code4Lib channel occupants asked me to assume no prior knowledge of RDF, which we’ve been using to describe the relationships, and a load of people told me it was the first time they understood what RDF was about, so I’m chuffed with that. I’ll be presenting a very different side of that work at WWW2008 in April.

In the lightning talks, Andrew Bullen stunned us all with a beautiful piece, set to music, about the robber baron George Pullman, his carriages and the most wonderful music. He’s been scanning the sheet music of Pullman car classics, converting them to midi and releasing them in an archive of midi and MP3 files. Beautiful work that got him huge applause. There is a suggestion that his work should appear in a Code4Lib Journal article soon; that would make a nice sequel to his article on Historical Demographic Data in the town of Pullman.

Jodi Schneider, Ed Corrado and Jonathan Brinley talked us through the Code4Lib Journal, a fully-fledged journal (with its own ISSN, even!). The journal works through a small committee of editors and technical staff to produce a web-based journal once each quarter; they’re looking for new members for the team right now.

David Walker then covered the WorldCat API nicely in his talk about working with it to create a prototype local catalogue, very similar to what we’ve seen of WorldCat Local. It’s great to see OCLC letting their customers get at their data through this kind of API. The Grid services look like a great step forward for those who are members of the OCLC club.

Karen Coyle keynoted day two in her inimitable style, getting cheers, laughs and the occasional grumble of dissatisfied agreement from her audience. If there were ever an audience in almost total appreciation of Karen’s perspective, efforts and humour it would be Code4Lib. Taking us on a whistle-stop tour of RDA, RDF, the efforts of the RDA/DCMI folks and a sideways dig at Michael Gorman (best known in Code4Lib for Revenge of the Blog People!) she had us all hanging on her every word.

Karen’s talk included the introduction of FUQ lists (frequently unanswered questions), pronounced somewhat dangerously (see photo, right). You really have to watch the video (when it’s published) to do Karen’s keynote justice!

Corey Harper also did a piece on RDF/RDA toward the end of day 3; he did a great job at not repeating Karen (or me) and really got a lot of people further interested in why the DCAM work is really important.

Biblios, presented by Chris Catalfo, followed Karen and showed us all how good a web-based cataloguing interface really could be. I saw Biblios in its early stages, when it showed great promise. It now provides a slick, standalone cataloguing UI with plugins at the back to allow records to be stored and retrieved in different stores; that means it can be bolted onto any ILS. Very nice work Chris.

Emily Lynema and Terry Reese presented on the DLF ILS work to establish a common API spec across all ILSs, work we’ve been starting on here with Jangle. Ross did a great job of talking to various folks about how Jangle could fit in with the DLF efforts and Terry and Emily ran a fantastic breakout session for interested people to see how we can all drive forward with the DLF effort. This work follows on nicely from Emily’s Free The Data presentation from last year. Emily and Tito Sierra also did a slot on Talking with Talis last year.

Aaron Swartz did a great slot on the building of Open Library, including an explanation of ThingDB, this had a lot in common with Dan Scott’s talk on CouchDB. Both talks explained a lot about the benefits of extensible schema and both have a lot in common with RDF stores.

The chatter and banter in the Code4Lib IRC channel was, as always, inciteful, witty and good humored. I’m always pleased to see how well the smaller Code4Lib channel community opens up to the larger group of conference attendees – while there were several in-jokes floating around for the three days, everyone was welcomed and everyone supported in presenting.

That’s what really makes Code4Lib a very different style of conference. Around two-thirds of attendees also stood up to present, whether a session, a breakout or a lightning talk and that’s truly extra-ordinary.

To get a more emotional feel for the conference, you should check out the photos of code4lib 2008 on Flickr

Thanks to the efforts of Noel Pedens, videos will appear here sometime soon.

Next year is in Providence, RI, hosted by Brown University – thank goodness it’s on the east coast.


Talis Shares Bibliographic Records With The OpenLibrary – will others follow?

Announced by Brewster Kale in his opening keynote presentation at the Code4lib Conference in Portland, OR.

Talis is freely contributing its several million record bibliographic Union Catalogue to the Internet Archive’s Open Library Project. 

The Talis Union has been built over many years by professional cataloguers in libraries all over the UK. This data set is a treasure trove of rare, old and out-of-print records as well as quality catalogue records for mainstream items.

The addition of this significant set of bibliographic data will add great value to this open initiative to provide one web page for every book ever published.

Following on from this announcement there was much discussion as to if others, specifically OCLC, would follow this lead.

Interestingly the word in the IRC channel and bar, from OCLC folks, is that there may well be an announcement about sharing something, from Dublin Ohio in the none too distant future….

Card catalogue photo from Queen’s University Library shared on Flickr.
Technorati Tags: , , , , ,

The Semantic Web – Sir Tim Berners-Lee

Paul Miller over on our sister blog Nodalities has published a great podcast conversation with the inventor of the World Wide Web and now Director of the World Wide Web Consortium, Sir Tim Berners-Lee – Sir Tim Berners-Lee Talks about the Semantic Web.

This is a really good listen which I can highly recommend.  Ranging from the Semantic Web’s readiness for mainstream adoption, to Linked Data.  If you prefer a read instead of a listen, you will find a link to Read a Transcript next to the player on the post.

For a more expansive discussion of the references Sir Tim makes I recommend Paul’s more expansive post over on on ZDNet’s latest blog, The Semantic Web.

We all recognise that libraries and their evolution depend on Tim’s invention, the Web.  What is not as well recognised is that evolution is going to depend on the Semantic Web, as the data we create, curate, and share, starts to take its rightful place in a global Web of Data.

It is no accident that initiatives like RDA, and their work with Dublin Core, use the Semantic Web data language RDF at the heart.

So as well as being a good listen, the contents of this podcast are very relevant to the future of library technology.

Photograph of Sir Tim Berners-Lee taken by my colleague Rob Styles, during Tim’s keynote presentation at the WWW2007 Conference in Banff, Canada. Used with permission.
Technorati Tags: , , , , , , ,

Code4lib 2008 – A Wedding of Ideas

I knew we were in for a great conference when I saw that we were sharing the excellent conference hotel with the Association of Bridal Consultants.

Writing at the end of the first day, that first impression definitely seems to be on the money.  Opened by a keynote from Brewster Kahle of the Internet Archive talking about the Archive in general and the Open Library in particular.

My notes from Brewster’s presentation included the following:

  • Texts – 26m books in LC – 26 TB – for the cost of a house $60,000 to host.
    Need a UI – “one Web Page for Every Book Ever Published”
    Started with the Million Book project – getting costs down to 10c per page
    Scan 1M pages a month out of one of their 9 centres
    15,000 books/month
  • Libraries should have a scan on demand button on their catalogues $30/book – same price as an ILL
  • Scan all Microfilm – Internet Archive loan a microfilm scanner for free if you can keep it running full time.
  • Selection – Need to build critical mass for books in each area – need help from libraries.
  • Build a Catalogue
  • Talis contributing10 million records – Hooray for Talis!

Brewster appealed for help from the library community with time/code/digital materials/catalog records/selection help/labour to digitise microfilm/links on sites to

Working together we can build a great library!

Talis’ own Rob Styles followed Brewster with an excellent presentation on Finding Relationships in MARC. His really cool slides took us through extracting data from the attributes of a MARC to enable things like authors and subjects to become ‘first class citizens’ in the data.  This work is the basis of the work Rob and his colleagues Nadeem Shabir & Danny Ayres have used to compose the paper Semantic Marc, MARC21 and The Semantic Web [pdf] which will be presented at the Linked Data on the Web workshop at WWW2008 in Beijing in April.

Later in the morning we had Working with the WorldCat API from David Walker.   Dave had been given early access to this SRU based API and had produced a nice mashup between WorldCat searches and holdings information for his library.  This talk spawned much traffic on the cod4lib IRC channel about what this API didn’t give you that you would get from WorldCat Local.  Roy Tennant, on IRC but not at the conference, responded with this list: article citations, faceted browsing, citation formatting, cover art, etc., plus other things you would need to build yourself (e.g., interoperation w/local systems, etc.)

As the day has rolled out, it is clear that the world of library techno-geekary has moved on since code4lib 2007.  Gone are the endless array of individually impressive, but collectively repetitive, series of ‘what we have done by putting our library catalogue in to Solr’ presentations.   Replaced with a series of, on the surface unrelated except by libraries, subjects.

Sitting back at the end of the day a feel a theme coming on – by working with the many cool things that folks are doing it can be better for all – be it: libraries contributing records to OpenLibrary; or mining the MARC we already have to build Semantic; or sharing what we do in Code4lib Journal; or using WorldCat APIs to deliver data in your interface; or using Zotero to harvest your research materials and in the future share them others; or even volunteer to help train libraries in developing countries in Open Source Library systems.

The maybe individual differences in emphasis and approach between the conference attendees, but we are all wedded to the same idea of working together for the benefit of all.

I can’t wait to see what tomorrow brings….

Flickr photo of Rob Styles presenting at code4lib 2008 by Nicole Engard.
Technorati Tags: , , , , , , ,

Library Platform News Issue 7 Published

libplatnews0208 February’s issue of Talis Library Platform News newsletter is now online ready to read.

The monthly touch point for libraries, software providers, suppliers and developers working together to solve the challenges faced by libraries in delivering the next generation of services for their customers.

This month’s online newsletter issue includes:

Open Source Library Systems
A review of the scene, Evergreen, Koha and NewGenLib, plus moves from Duke University, where will it lead?

Harvard University – Open Access Journals
The vote by the Faculty of Arts and Sciences to make scholarly articles publicly available – a portent of a trend?

Plus the regular articles – Meet the Team, and Meet the API.

To read this month’s issue click here.

Technorati Tags: , , , , , , , ,

On the way to code4lib 2008

Well actually sat in Terminal C at Newark airport whiling away several hours until the connecting flight to Portland Oregon is ready to hurl itself in to the New Jersey skies  – Sunday afternoon in Newark airport, a great life this!

Anyway it will be all worth it once the conference starts.  I’m looking forward to the keynotes from Brewster Kahle, Karen Coyle and Talking with Talis interviewee Jon Udell.   Brewster will be updating us on progress with the Open Library, which seems to be gaining wide interest lately.  Karen will be giving an insight in to the developments around RDA, and I’m not sure what Jon has on his agenda, but this adopted code4liber is always full of interesting points of view.

Two Talisians Rob Styles and Ross Singer are also on the program.  Having sat next to Rob on a plane from the UK whilst he was tweaking his presentation on Finding Relationships in Marc Data, I can testify to the fact that it is going to be well worth watching.

Ross is presenting ÖpënÜRL with Jonathan Rochkind about Ümlaut, the open source OpenURL middleware.   There are two many other things on the program to mention here, but this looks like being a great conference again this year.

Photo of Newark Airport from Elmer Fishpaw displayed on Flickr.
Technorati Tags: , , ,

Brad Lajeunesse Talks with Talis about Evergreen

Brad Lajeunesse This Talking with Talis podcast is with Brad Lajeunesse, President of Equinox Software. Equinox was founded by the software team that developed Evergreen, the open source integrated library system (ILS).

We talk about the origins and development of Evergreen, and the setting up of Equinox. We then go on to discuss some of the issues associated with Open Source Library Systems in general.


This conversation was conducted as a SkypeOut call on Wednesday 13th February 2008, recorded with Ecamm Network‘s Call Recorder for Skype, and edited on a Mac with Garageband.

Technorati Tags: , , , , , , ,

Candy Zemon Talks with Talis About NCIP

Candy Zemon Joining me for this Talking with Talis podcast show is Candy Zemon. Candy from Polaris Library Systems is Chair of the NCIP Implementers Group

In the show we discuss NCIP (NISO Circulation Interchange Protocol) which was first published in 2002. We discuss the standard, its evolution, why it has not been as broadly adopted as some have hoped, and its future in the Library System Environment.

We then move on to talk about the wider library standards process and how it is evolving, changing and new initiatives, such as the DLF ILS and Discovery Systems group.


Technorati Tags: , , , , , ,

Andy Powell is spot on

Former colleague Andy Powell is always good value, and this recent blog post about his trip to Melbourne is one small demonstration of why I will always listen to him.

It’s hard to nod vehemently in a blog post, and as it’s the school holidays in this part of the UK, screaming ‘Yes’ at the computer just results in children banging down the door to see if I’m ok…

So let me draw out three short snippets…,

“…our current preoccupation with the building and filling of ‘repositories’ (particularly ‘institutional repositories’) rather than the act of surfacing scholarly material on the Web means that we are focusing on the means rather than the end”

“…our focus on the ‘institution’ as the home of repository services is not aligned with the social networks used by scholars, meaning that we will find it very difficult to build tools that are compelling to those people we want to use them”

“…that the ‘service oriented’ approaches that we have tended to adopt in standards like the OAI-PMH, SRW/SRU and OpenURL sit uncomfortably with the ‘resource oriented’ approach of the Web architecture and the Semantic Web”

…comment briefly…

Our current approach, fundamentally, is totally, completely, utterly wrong, isn’t it?

…and then send you off to read the whole thing. Off you go

Technorati Tags: , , , ,

Harvard Vote to Open Access Publish

Harvard_shield-University As reported on, Harvard faculty members are scheduled to vote on Tuesday on the proposal to publish on the Web free.

Although the outcome of Tuesday’s vote would apply only to Harvard’s arts and sciences faculty, the impact, given the university’s prestige, could be significant for the open-access movement, which seeks to make scientific and scholarly research available to as many people as possible at no cost.

“In place of a closed, privileged and costly system, it will help open up the world of learning to everyone who wants to learn,” said Robert Darnton, director of the university library. “It will be a first step toward freeing scholarship from the stranglehold of commercial publishers by making it freely available on our own university repository.”

Under the proposal Harvard would deposit finished papers in an open-access repository run by the library that would instantly make them available on the Internet. Authors would still retain their copyright and could publish anywhere they pleased — including at a high-priced journal, if the journal would have them.

What distinguishes this plan from current practice, said Stuart Shieber, a professor of computer science who is sponsoring the faculty motion, is that it would create an “opt-out” system: an article would be included unless the author specifically requested it not be.

If the vote is carried, and the opt-out Open Access publishing by default system spreads across the rest of the faculties and to the wider world, this could be the beginning of a seismic shift in scholarly publishing.  

Of course there are many steps between a single vote at one of the world’s most prestigious universities and the collapse of the scholarly publishing industry as we know it – but stranger things have happened in other industries.

Thinking down the same path brings one to wonder about the journal aggregators, knowledgebase providers, scholarly networks, and the whole patchwork of interdependence that has grown up around the way we publish, review, and provide content back to the institutions from whence it came for the benefit of those that follow.  Currently a heck of a lot of money changes hands to facilitate that loop. 

Technorati Tags: , , , , ,