Behind The Lines
Posted on 25th May 2011
Back last year I got a curious email from a fellow London.pm'er asking why I was releasing so many WWW-Scraper-ISBN distributions. The reason was quite simple, to make my life easier! Well okay, that's why I wrote the distributions, but I figured others might find them useful too.
In the UK the book trade is a bit odd, and I dare say the rest of the world suffers from this too. The publishers don't like to give too much information away about their books, and the central body for allocating ISBNs, Nielsen, don't always have all the necessary metadata available. The book trade uses MARC Records to transfer this metadata around, and unfortunately, while there is provision to include much of the metadata, it often isn't included. The obvious things such as the Author, Title and the ISBN itself are usually there, but some of the data relating to the physical attributes (pages, height, width and weight) rarely are.
Originally I wrote the Amazon, Pearson Education, O'Reilly Media and Yahoo! Books distributions to use within Labyrinth, particularly for the Birmingham Perl Mongers website, and our Book Reviews. The plugin mechanism allowed me, when I received a review, to enter the ISBN and prepopulate the metadata fields and links before adding the review itself. The four distributions saved a lot of time, but the initial releases were quite basic.
Jumping forward several years, now needing this extra metadata, I first expanded the original four distributions. However, not all of these online bookstores provided this extra metadata. Picking a variety of books I searched to see what metadata I could retrieve, and came across several sites around the world that included this information to varying degrees. Much of the basic information regarding an ISBN shouldn't change from country to country, so metadata retrieved from Australia or New Zealand is as valid as that from America or the UK. There are aspects that can differ, such as the cover illustration, but the majority of metadata returned should be applicable regardless of location.
There was some interesting discrepancies with the different units of weights and measures used across the sites too. While some stuck to a set of fixed units, others changed depending how big the values were, particularly for grammes and kilogrammes. I settled on grammes for weight and millimetres for height and width, seeing as metric was the most commonly used on the various sites.
It did cross my mind whether to include the prices in the metadata returned, but as prices often fluctuate frequently and are very location dependent, you are probably better to write this side of things yourself for your specific purpose, such as a comparision website. I also left out depth, as only a few sites regularly provided a value for it. I can always save it for a future release anyway.
Hopefully those that work in the book trade, who have been wishing that MARC Records were populated a little more fully than they are currently, can make use of these distributions to help fill in the gaps.
Comments
No Comments