USMAI Library Consortium
Page tree
Skip to end of metadata
Go to start of metadata

This page is to coordinate testing of MARCIVE Authorities Processing.  Estimated time frame: July 28-August 7

For the test run, MARCIVE will process and return every 100th record in the full extract file that we send them. CLAS will reload the records in the Aleph Test database. Please use the reports linked below to identify processed records for review. The Aleph reports will have a "Reviewer" column where you can enter your name to "claim" the records you intend to review, to minimize duplication of effort. If you have a particular area of expertise, there may be a report that limits to that type of record. 

See also: Metadata Subgroup's LTI Replacement page

Roster of Testing Volunteers

Audrey Chen (CP), Megan Del Baglivo (HS), Neil Frau-Cortes (CP), Kathy Glennan (CP), Beth Guay (CP), Sarah Hovde (CP), Deborah Li (UB), Maria Pinkas (HS), Aimee Plaisance (BC), Audrey Schadt (SU), Vicki Sipe (BC), Sam Taavoni (CP)

Testing Instructions (Email Copy)

Testing Scenarios

Items to review on each record

  1. Check to see that the corrected record has overlaid the older version in Aleph Test.
    1. Matching - was the appropriate record overlaid? Compare titles for the same system number in Live and Test.
    2. Are 950 AUTH EXTRACT/RELOAD fields with appropriate dates present?
    3. Are protected fields (CAT, XPT, XPX, FMT, OWN, 035, 950, 956) from the original record still present (and not duplicated)?
  2. Examine records to see if abbreviations have been correctly expanded in field 300(“p.” to “pages”, “v.” to “volume”, “ill.” to “illustrations”, “facsims.” to “facsimiles”, “col.” to “color”, “ports.” to “portraits”, “b&w” to “black and white”, “sd.” to “sound”, “[i.e. ...]” to “[that is ...]”, “ca.” to “approximately”, etc.). Note that “[9] p. of plates” should change to “9 unnumbered pages of plates”.
  3. Examine records to see that the correct substitutions have been made, such as “33 p. of music” changing to 1 score (33 pages)”, and “2 sound discs” changing to “2 audio discs”. “1 close score” should change to “1 condensed score”, and “miniature score” should change to “study score”.
  4. Examine records to make sure the correct 33X fields have been added. MARCIVE should add both $a and $b, along with $2 for the source of the term.
  5. Examine records to make sure the correct 34X fields have been added (340, 344, 345, 346, 347, 348).

Special categories to check

  1. Examine records with diacritics or special characters.
    • See Aleph BOOK-Non-English report
  2. Examine bibliographic records where cataloging practices have changed over time e.g.,
    1. obsolete fixed field codes, such as “N/A” - See Aleph obsolete-lang report
    2. new country codes for Guernsey, Isle of Man, Jersey, South Sudan (in fixed fields & 043) - Search in MARCIVE Preprocessing Changes Report
    3. separate subfielding of languages in 041) - See Aleph 041 report
  3. Examine records to see what opting for adding “reader notes” did: Accelerated Reader (added in 526), Lexile Framework for Reading (added in 521).
    • See Aleph juvenile report
  4. Examine records to see if the global delete of 6XX fields with 2nd indicator 4, 5, or 6 worked.
    • See Aleph 6xx_delete report
  5. Confirm that no unwarranted changes happened with request to replace LCSH term containing “Negro” with “African American” when inconclusive (e.g., no change to “Negro leagues”).
    • See Aleph negro report
  6. Confirm that, when a name with $c was matched to one without the same $c, the match was justified by a 400 (See From Reference) in the authority record.
    • See MARCIVE Authorities Processing Report
  7. Confirm that, when an unqualified name was matched with one containing $d, the match was justified by a 670 (Source Data Found) or a 672 (Title Related to the Entity) in the authority record.
    • See MARCIVE Authorities Processing Report
  8. See if any 490 0_ have been converted to 490 1_/8XX.
    • See MARCIVE Authorities Processing Report

Issue Reporting

Please use the MARCIVE Reviewer Report Form to report any potential problems that you find while reviewing records.

MARCIVE Reports

These are some of the reports that MARCIVE provides along with their processing.

  • Preprocessing Changes Report 
    • Shows data modifications that occur prior to authorities processing.
    • Does not include changes to the leader, 040, or 9XX fields
    • Is in order by Aleph Bib System Number 
    • The << symbol marks fields belonging to the file of incoming records. The >> symbol marks fields belonging to the file of processed records.
  • Authorities Processing Report 
    • Changed authorized fields. This report shows fields that have changed during authorities processing.
    • Each section of changes, based on tag or type of change, starts with a label containing the phrase "begin here"which can be searched for to locate the next section.
    • Within sections, changes are in order by Aleph Bib System Number
    • Each group of text first shows the control number and record number associated with the field.
    • The + denotes the incoming bibliographic field. The - denotes the resulting bibliographic field.
  • Statistical Report
    • Not needed for testing, but shows the number of processed records and updated fields in the sample
  • All available MARCIVE reports (including Unrecognized or Invalid terms for each type of heading) are available at https://alephtest.umd.edu/reports/generic/?report=auth_reload&lib=mai01

Aleph Reports

These reports are subsets of the processed records, reflecting particular characteristics of the pre-processed Aleph record (formats, character sets, audience, etc.). 

  • They contain hyperlinks to the MARC tags view of the record in the Live and Test OPACs
  • Please "claim" the records you intend to review by entering your name in the "Reviewer Name" column.
  • Please indicate that you have completed review of the record by checking the box in the "Reviewed?" column.

Reports are linked below

60 Comments

  1. MARCIVE Authorities Processing Report, 650 fields, began alephsys002697522. Will work thru first 50 or so (alephsys003153346) in great detail esp looking at order of subdivision changes. Willing to do scanning thereafter. 

  2. MARCHIVE Preprocessing Report: Taking a chunk of 25 in the middle (Sys #'s 004093640 thru 004096201). 

  3. MARCIVE Authorities Processing Report, 651 fields, alephsys002871136 to alephsys002021518 (should be 50 changes)

  4. MARCIVE Authorities Processing Report 1XX/6XX/7XX fields, alephsys003279292 to alephsys000908395

  5. MARCIVE Authorities Processing Report, x10 fields, alephsys001740698 to alephsys003488990 (50 changes)

  6. #3 Juvenile report  Had a few minutes, so looked at a random 10 records. Out of 10, only 2 had any notes. They looked as I might expect.

  7. #5 Negro leagues  Went in and found that Kathy had already been there.

  8. MARCIVE Preprocessing Report: Another 25 (#'s 3501991 thru 1444299)

  9. MARCIVE Authorities Processing Report, x11 fields, alephsys006042714 through alephsys000749959 (13 changes/all of them)

  10. MARCIVE Authorities Processing Report, 650 fields, 50 from the end, alephsys000416154 to alephsys003914106.

  11. Sigh; I reported an "error" about the absence of ending punctuation in field 300 and got it exactly backwards yesterday - so you can ignore that one in the error report list!

  12. In these last 2 days of testing, is there anything in particular that folks feel hasn't yet gotten enuf attention?


    1. Has anyone looked at the "Split Terms" and "Generated/Modified Genre/Form" in the Authorities Processing Report? I noticed a lot of MeSH in the split terms and wondered if Megan Del Baglivo and Maria Pinkas had looked at those.

      1. Come to think of it, I'm not sure why we even have "Generated/Modified Genre/Form" changes. We did not opt for Genre Term Generation in our profile (section 5.7).

        1. Oops, meant to click on Reply first!! I haven't, but can take a look now. 

  13. Kathy Glennan I was looking at the heading that you reported that was changed incorrectly, in spite of a matching authority record (n 88613908). I noticed that we did not receive that specific authority record; instead we got the more general one, n 81011106.  On the profile, we did not select 6.4 "Send authority records for any level in the bibliographic record for which LC has created an authority record." Should we have opted for that?

    1. Linda- Great question. I perhaps misinterpreted this profile question. I rather expect MARCIVE to deliver authority records that exactly match our access point, regardless of granularity. This was the case here; both the original work and the arrangement have NARs.

      Now, there are plenty of times that the arrangement isn't established but the original work is. Another situation is when a numbered part isn't established (see my report on ".$n" conversion problems, but the main work is. This affects music in particular, but I suspect there are similar cases for other types of resources.

      I guess we'd like the authority records in both of these cases, even if the more granular access point doesn't have it's own NAR. If we do this, I assume we'd get both n 88613908 & n 81011106 because of the record in question. Would that present any problems? (For example, would that create a blank node?)

      1. Here's how the Guide explains it:

        With standard processing, you will receive one authority record for the fullest form. (Note that this is not actually what happened in the case I cited - Linda)
        <snip>

        Selecting the hierarchical authority records option increases the number of authority records you will receive. Here are some additional examples:
        Incoming bib record: 630 00 $a Bible. $l English. $s Authorized. $f 1849
        Matched to authority: 130 _0 $a Bible. $l English. $s Authorized. $ f1849
        Outputs authority records:
        130 _0 $a Bible. $l English. $s Authorized. $f 1849
        130 _0 $a Bible. $l English. $s Authorized
        130 _0 $a Bible. $l English
        130 _0 $a Bible

        Since we (sadly) don't have authorities in the OPAC, I don't think this would hurt anything to have all forms in the hierarchy.

        1. Agreed.

          I still wonder if they would have made this incorrect $n punctuation change anyway; they should have the NAR in their internal database, whether or not they send the record to us.

  14. Maria Pinkas and Meg del Baglivo: Authority Processing Report - Split Terms.

    1. We covered Split Terms: 001664895 to 001959539

  15. Beth Guay You will be happy to know that our 655 7 $aElectronic books.$2lcgft headings appear on the "Unrecognized or Invalid terms" report for genre terms.

    See 20200728.genr at https://alephtest.umd.edu/reports/generic/?report=auth_reload&lib=mai01 

  16. How common are MeSH coded 650s without a first indicator? Is this to be considered an error?

    1. Hi Vicki! No, it's not an error - and it's very common. In MeSH the first indicator code is for primary (1) or secondary (2) subject. I don't know about Maria, but I leave the 1st indicator blank (easy for me since I'm primarily a serials cataloger and our subject headings are necessarily broad!). 

      1. Good, thanks. Yes, understand about first indicator 1 or 2, but did wonder about no indicator. 


        1. Hi Vicki,

          I assume you mean no first indicator, since it is the second indicator 2 that identifies it as a MeSH heading. The first indicator blank, as Meg said, is common. It indicates that no statement is being made whether the subject is primary or secondary. NLM tends to use both indicators, but many other institutions rather not make that statement. Including ours.

          Thanks,


          María

          1. In summary, it should not be considered an error.

  17. For cleanup purposes, I like the dupn report. This is much more informative than what we got from LTI.

    By the way, did we expect to be able to identify which USMAI institutions have holdings on the sys # that are in any of these reports?

    1. That's something I would like to offer; I'm still thinking about ways to do it. It would be a fairly simple matter to generate a separate report of who has holdings on system numbers mentioned in the report. It would be more complicated to incorporate that info into the MARCIVE report.

      1. If you're going to have to do something extra to provide this information, we'll have to figure out which reports would benefit from this extra analysis, and which aren't worth the effort.

  18. MARCIVE Preprocessing Changes Report, alephsys006158500 to alephsys006168370 (50 changes)

  19. I am so sorry, everyone. I thought I would be have time to test this week but I just don't. It's all-reopening all the time now, and even my non-reopening-related meetings have increased ten-fold since I became interim. My last hope was for tomorrow afternoon, and now that time has been co-opted. My deepest apologies to my metadata peeps!  

    1. Hey, Kat, hang in there!

  20. Hi Kat, 

    It is more than understandable that your priorities should change significantly since you became interim. Good luck with all of your planning.

  21. Kathy Glennan Maria Pinkas Neil Frau-Cortes Aimee Plaisance Audrey Schadt

    I wanted to see the extent of the problem with incorrect changes to the Cataloging Source and Place of Publication codes in the 008, so I made some reports:

    Cataloging Source Change (646 records)

    Publication Place Change (1134 records)

    I hope I've included enough info to judge whether the change was warranted and/or where the source of the problem is.

    1. On the cataloging source change, I finally found some information about the construction that uses a slash between two organizations in the 040 $a. From MARC 21 for Bibliographic Data (via Cataloger's Desktop):

      008/39 #


      [national bibliographic agency]
      040 ##$a DLC/ICU $c ICU
      [LC non-MARC record upgraded and input via online input/update to LC by the University of Chicago.]


      So, it looks like these changes are correct. I reported the coding "error" based on what I was seeing with the master record in OCLC, which I usually find to be reliable.


      Sorry for the false alarm.

      1. That's OK. What about the ones where DLC comes after the slash?

        1. It's so hard to find any documentation about this. (I think it took about 1/2 for me to get this far.) From what I can tell from OCLC's BFAS (https://www.oclc.org/bibformats/en/0xx/040.html), the slash means the creation of the record is shared between two institutions. But, it's applied unevenly:

          If LC copy has Shared cataloging for DNAL (National Agricultural Library) in the lower left corner on LC copy, enter blank in Srce and AGL in field 040 subfield ǂa.

          For LC copy that has Shared cataloging with DNLM (National Library of Medicine) in the lower left corner on LC copy, enter blank in Srce and DNLM/DLC in field 040 subfield ǂa.

          If you transcribed the cataloging exactly, enter blank in MRec.

          Exact transcription, shared cataloging for DNAL:

          Desc:

          a
          Srce:

          blank character
          MRec:

          blank character
          040

          AGL ǂb eng ǂc LDL
          [Original cataloging by AGL and entered by LDL as an English language record using AACR2]

          Exact transcription, shared cataloging for DNLM:

          Desc:

          a
          Srce:

          blank character
          MRec:

          blank character
          040

          DNLM/DLC ǂb eng ǂc WAU
          [Original cataloging by DNLM and DLC and entered by WAU as an English language record using AACR2]



    2. Sources of trouble with the place of publication code seem to be when there are multiple places in one $a, and when there is more than one 260 field.

      1. The "wb^" change to "gw^" is fine throughout; that's an obsolete code change ("Berlin" to the preserved code for "Germany").

        The change from "rur" to "ru^" is fine throughout; that's an obsolete code change (USSR to Russia). There are similar appropriate changes for other countries formerly in the USSR (including "bwr" to "br").

        The change from "uk" to "xxk" is fine throughout; that's an obsolete code change (UK to UK!)

        The change from "ge^" to "gw^" is fine throughout; that's an obsolete code change ("East Germany" to the code that was redefined from "West Germany" to simply "Germany")

        The change from "at^" to "xna" is more specific (Australia vs. New South Wales); the one example I checked appears to be a correct change.

        Changes of "^^^", "xx^", and "|||" to places that match the first 260/264 $a are also fine.


        But...

        The change from "par" to "pa^" is incomprehensible – neither of these codes exist or have existed (https://www.loc.gov/marc/countries/countries_code.html)

        The change of "^ir" to "^i^" is strange. It's appropriate to get rid of the concept of the place of publication as Iran (although why does this start with a blank?). However the correct code for Israel is "is^"; there is no "i" - with or without spaces.

        The biggest category of inappropriate changes is not having the 008/15-17 not match the 260/264 $a (first occurrence), such as in sys #784507


        1. I've filtered the list down to the bad changes (as far as I can see).

          • They definitely change it to "nyu" if "New York" is found anywhere within the first subfield $a. (There are a few with the DC street name!)
          • I'm not sure if field 257 takes precedence over 26X. There are a couple where they've changed it in favor of the place in 257.
          • There are a bunch where they've updated an obsolete non-specific code to a valid non-specific code, when it shouldn't have been too hard to choose the state or province.
          • And then there are the baffling invalid code to invalid code changes.
          1. I wouldn't consider the 257 at all when selecting what to encode in the 008/15-17.

            257 = Country of Producing Entity. (http://www.loc.gov/marc/bibliographic/bd257.html)

            The field definition & scope says (in part):

            Name or abbreviation of the name of the country(s), area(s), etc. where the principal offices of the producing entity(s) of a resource are located.

            Entity(s) in this instance is the production company(s) or individual that is named in the statement of responsibility (subfield $c) of field 245 (Title Statement).

            1. BF&S says only to code Ctry for 257 for archival moving images.

  22. These concerns are based on an issue reported using the Reviewer Report Form? Can we see those?

    Thanks, Vicki

    1. Here they are:

      2754262, 1654902, 1752565, 6021095, 6028548, 6028960, etc.Change of 008/39 from "c" to blank - done inappropriately. None of these are strictly LC or national library sourced records.
      2860054, 2034786008/39 changed to blank but the records have 042 "lccopycat". This is totally inappropriate.
      1692114008 location changed to nyu, but this is the 2nd place of publication; the original coding was correct.
      1791404008 place of publication changed from obsolete "ur" (for USSR) to "xx", but place of publication is Novosibirsk, which is now in Russia, so "ru" would be the correct change.
      6029169008 place of publication changed from "enk" to "nyu", but primary place of publication is Oxford, England. NY is a secondary place of publication; the original record was correct.
      3556791008 place of publication changed from pau (current place) to nyu (former place); I consider this to be an incorrect change.
      1. OK, I take some of my "no problem" comment back. I also found the following in Cataloger's Desktop by doing a search on "lccopycat"

        Cat source (008/39)

        usually "c" when 042 = pcc

        usually "d" when 042 = lccopycat


        Thus, the 008/39 should not be changed from the existing code without consulting the 042 values. If these values are contained there, the existing 008/39 should not be changed.

        1. I think some, if not all, of the changes of Srce to blank are legitimate. Excerpts from Bib Formats & Standards below (italics mine).

          https://www.oclc.org/bibformats/en/fixedfield/srce.html 

          National bibliographic agency. Use to indicate that the creator of the original cataloging data is a national bibliographic agency (e.g., U.S. Library of Congress or Library and Archives Canada). Formerly used only for Library of Congress cataloging records. Now used for records from other national bibliographic agencies as well.

          Cooperative cataloging program. Use to indicate that the creator of the cataloging data is a participant (other than a national bibliographic agency) in a cooperative cataloging program. See field 040 for guidelines on transcribing cataloging copy for 'old' LC cooperative cataloging projects.

          According to the 040 documentation, two codes with a slash between them in subfield $a indicates "shared cataloging," whatever that means.

          Going by the parenthetical "other than a national bibliographic agency" in the description of code "c", all of the cases where DLC or DNLM/DLC (or some other national agency) are in 040 $a and field 042 contains pcc, should be Srce blank. (I'm making the assumption that DNLM, DNAL, and DGPO are considered national agencies.)

          The less clear case is when a non-national agency shared cataloging with DLC (e.g. TXA/DLC) and the 042 contains pcc. There are no such examples in BF&S, but I think an argument could still be made for blank because of the presence of DLC in 040 $a.

          Likewise, I cannot find a stated relationship between 042 $alccopycat and Srce code in BF&S. If a national agency is in the 040 $a and there's an 042 lccopycat, I still think Srce would be blank based on the 040 $a.

          1. That's exactly right on the lccopycat; this is why the documentation I found yesterday said "normally". If the 042 contains this value, the record certainly didn't originate at LC, but it still could be from another national agency.

            I honestly don't know how important all of this is for records that already exist in our system. I don't think the incorrect coding in the previous version of these records caused any trouble. On the other hand, it's good to have correct data.

      2. Thanks for the detail.

        Vc


  23. What are our next steps? Do we need to come up with recommendations to Metadata Subgroup? Will subgroup generate recommendations? 

    Appears as tho we do have some things to take back to MARCIVE ...

    Thanks,

    Vicki

    1. I am working on a report. Any suggestions you have are welcome. 

      1. You probably have most of what I would suggest, but just in case.

        In 6XX fields (and I looked only at 650s), the change of $x to $v is in error most of the time I saw it. Submitted several reports. Looked thru our spec and thru the Guide and could not find where this change was explained. Can we opt out of this?  In one case a $v was changed inappropriately to a $x. 

        In subject strings where MARCIVE found a verification record in LSCH, it appears that they took that to mean that this was the ONLY valid string with those elements and changed an otherwise valid string to a string that left out elements of the original. Or in changing the string, the new string became a heading not reflecting the subject content.  Found some info in the Guide about how verification records would play out with authority records we get sent, but nothing about changes to be made in bib records. Can we opt out of this? 

        1. You're right. The validation records are playing a role in the $x to $v flips too. On one that Aimee Plaisance pointed out:

          000384098changed "|x Periodicals" to "|v Periodicals" for 650 |a African Americans and 651 |a Africa but not for 650 |a Black race

          We received validation authority records for |a African Americans |v Periodicals and |a Africa |v Periodicals, but none for |a Black race |v Periodicals

          1. Is it safe to assume that if we get an authority record, the bib records with the appropriate heading are changed accordingly? I could easily have missed this in the MARCIVE documentation.

            Vc

            1. According to section 6.4 of the Guide, "The default is for us to send you the fullest authority records that match terms in your database." They didn't mention that they were going to force the headings in our database to match any available authority record. (wink)

              1. Sigh. How do we make it stop! Hopefully this is not a "feature."

  24. I thought we discussed 020 $q (parenthesis) or not, but I do not see it. This is what comes up today when you log into oclc (bold mine):


    Hello CATALOG!


    Welcome to the OCLC Connexion™ service.  You will be using the service in Full mode.


    Multiple ISBNs are acceptable on a single bibliographic record under both RDA

    and AACR2. For instance, hardcopy items may have additional ISBNs for the

    paperback, online, and CD-ROM versions printed on the item. RDA Library of

    Congress-Program for Cooperative Cataloging Policy Statements for 2.15 and

    Library of Congress Rule Interpretations for 1.8 allow the transcription of

    multiple ISBNs from the resource, with the ISBN for the manifestation being

    described as the first 020, if that applies. Any parenthetical qualifiers should

    be included in subfield $q. LC-PCC PS 2.15.1.7 further stipulates that ISBNs

    that "clearly represent a different manifestation from the resource being

    cataloged and would require a separate record" should be recorded in 020

    subfield $z. See OCLC #921166149 for an example of a record with multiple ISBNs


    Example viewed:

    020  ǂz 9780316352673 ǂq (ebook)

    020  ǂz 0316352675 ǂq (ebook)

    020  9780316352680 ǂq (paperback)

    020  0316352683 ǂq (paperback)


    I think we had said that with the subfield q the parenthesis did not seem necessary, but according to this, it may be more appropriate to keep it.