Why am I learning to code. I get asked this a fair amount these days. Why would a librarian want to learn to code? What good is it for my job?
I alluded to it in my first post on the catcode phenomenon: “Catalogers work with massive amounts of curated bibliographic data, and being able to manipulate it in new and different ways and in ever increasing amounts is key as we move forward into the bibliographic future and the world of linked data and the semantic web.”
But that’s a pretty generic statement. And doesn’t explain much. I usually end up giving some examples of how coding would be (or already is) used in my current day to day work:
1. Editing of record sets prior to loading – beyond MarcEdit
All of our batch loads of MARC records for various collections go through a basic editing process. We have a standardized set of edits that all of our records go through to normalize certain elements and insert local-specific notes about access restrictions and our licenses. These edits are all combined in a script to facilitate the process. All sets are quickly run through the script as a first step of the loading process. Currently when something changes (like a change to MARC coding or the need to add something or add a conditional for another format such as images), I have to ask our systems folks to edit the script, I review it, we test it, and then it’s put into regular use. If I could make the edits and test it myself the process would be much more efficient.
Often I have very large sets and potentially complicated edits that need to be made. MarcEdit has a script editor that allows me do make a series of basic conditional edits, but for more complicated things I’m still limited to asking our systems staff to write a script.
For example, our records from ProQuest for the online version of our dissertations come in with subject codes that are proprietary to ProQuest and their indexes. For these to be useful for our users for searching, they need to be mapped to valid Library of Congress Subject Headings. I can build a cross-walk in a spreadsheet for the mapping, but I need a script to run the records through a process to actually match on the codes in the records in the file and insert the LCSH heading.
2. Reports from our system (or from any large file of records)
The only mechanism for getting a report from our current ILS (Voyager) is via script. There’s no reporting interface. Like with script editing, I have to pester systems folks to help me. And it can be a tedious process. I tell them what I need (being as clear and specific as possible within the bounds of how confusing MARC is and the myriad of exceptions/variations that exist). They then take my request, and turn the query into a script (interpreting things in the process), run it, and send it to me for review. If we’re lucky, we get it right the first time. More often than not, there’s errors, and exceptions, and quite a bit of back and forth before we get it right.
If I could gain some experience in coding, I could hopefully reduce the back and forth and write the exceptions and variations into my query request, essentially cutting out the middle man interpretation. Having a better understanding of what is possible to write into a query based on the data and how “clean” it is (or isn’t) is invaluable.
3. Batch editing of records already in the system
Just because something is already cataloged and loaded doesn’t mean you can ignore it. Records require maintenance. Rules change, headings change, tags change, new fields and subfields are added, etc. etc. MARC as a standard is changing regularly. Names are updated in authority files, subjects are created, collapsed, divided, etc. And we have to keep our data up to date so it’s actually useful to our users and so relationships are maintained. All of this updating requires editing a very large batch of data. It starts with getting an effective/accurate report of things that need updating, and then telling the system what needs to change. A big thing that would be useful to articulate more clearly is conditional edits…add this field but only if these things are true. But if your instructions to the system isn’t clear you end up with a big mess that you have to undo and try again, which is time consuming and most likely also problematic for user access until you clean things up (or undo). Once you have a better understanding of how to write the instructions, the number of messes decreases exponentially. And less mess is always a good thing when dealing with data that potentially impacts user access.
4. Loading of records in bulk
There are set of profiles used for loading records into our ILS. Distilled down, they are a series of scripts dictating the modification and creation of bibliographic, holdings, and item records. The specifics of each load determine which profile is most appropriate. I don’t have a good understanding of these profiles because my understanding of coding is still limited. I’m hoping learning coding will give me a better understanding so I may better advise which profile is more appropriate.
A good example of why this would be useful. A few months ago we tried to load a set of records into our system. Both the colleague in systems and myself being unfamiliar with the profiles, we picked the one we thought was correct. Well, given her unfamiliarity with MARC and library records, and my unfamiliarity with coding and in reading the profiles, we blundered and used the wrong one. We made a bit of a mess of things in our production catalog (hello error message displayed to the public) and had fix things after the load, once we figured out what we did wrong, of course. Our blunder created quite a bit of extra work for both of us.
5. System design / user interface design / etc.
We’re currently in the process of redesigning our public interface to our catalog (our OPAC). This means dealing with indexes and the underlying data in different ways. Having a good solid understanding of the data means I can explain what data is useful and what our data can and cannot do. Writing functional specifications for the use/manipulation of the data so the coders can go to work. Having a solid understanding of how systems talk to each other and where data lives helps immensely. Learning to code helps me do all of that.
I think an overarching theme of all the above examples is communication. To be able to explain things to coders, and to understand the questions from coders, and explain things to catalogers, and non-catalogers, and effectively talk/explain my needs to the systems themselves, I need to have a better understanding of how it all works. Learning coding helps with that. It’s like learning another language. If I can be even semi-fluent in coding, things will be much clearer for everyone involved in the conversation. Even if I don’t become a full-fledged coder, the exposure I’m gaining from participating in Code Year and workshops on Python and attending project nights and talks about various coding languages and systems is already proving to be invaluable.