Publish your first dataset
With this session I wanted to see if in the course of 40min we could decide on a first dataset we could all publish, when we got back to our respective libraries. In the near future the Libraries Taskforce and the LGA will be looking at proper schemas for public library data, which will be more complex than what we could tackle in this session. Nevertheless, I wanted to both discuss with the other session participants the kind of things one would have to think about when starting to release data, and for us to create something together that we could all take back to our own libraries to then simply go ahead and start publishing open datasets.
First we had to agree on what kind of data to publish. I thought visitor figures would be a straightforward one, but it turned out that one of the library services represented in the room do not collect that type of figures at the moment! So we agreed on issue figures as all participants, regardless of which library management system they use, do collate numbers of loans from the service.
What do we mean by issue figures? We need to be clear on the definition to make sure we are all publishing the same thing. We decided that in this case issues are loans or renewals of physical items (therefore excluding e-books) – any items that appear in the library catalogue and can be borrowed directly via the library management system, be they books or bikes. (One of the library services in the room did lend out bikes via the LMS!)
How are we going to present those issue figures? Should they be shown by library by month? Or organised by item type by borrower type? Or a mix of those? We thought they would most helpful if presented by library (each library a line) and by month (each month a column). We can start the dataset at whatever date we want: either the previous month or go further back in time if the data is easily available.
What type of document will our dataset be in? We agreed on a spreadsheet, saved as a CSV file.
Since we are going to use dates for the month, we also need to decide on a date format to use. We chose YYYYMM.
Next is the copyright licence. For it to be open data we need to place our dataset under an open licence, which allows anyone to re-use and build on the data for free, even for commercial purposes. Most local authorities already publish some data under an open licence using the Open Government Licence (version 3) so the group decided this would be the preferred licence.
And finally: where will we publish our issue figures open dataset? The obvious place is the library's web pages. Some local authorities may have signed up to open data repositories like the Data Mill North or The Data Place, which make it easier to upload and manage files. Another option is to create a (free) GitHub account for the library and upload it there (while linking to it from the library's web pages). When you publish your dataset, do explain what it is (e.g. what your definition of “issues” encompasses) and what can be done with it (the licence it's under).
And that's it! You could publish your very own first library dataset today. If you do get datasets published I'd love to hear about it; if you have any questions do contact me as well at aude[dot]charillon[at]newcastle[dot]gov[dot]uk
Photo of flipchart notes page1
Photo of flipchart notes page2