Working with EAFs programmatically

git

To avoid dealing with formatting changes, .eaf files are "normalized" before they are added to the index - they are re-formatted to a canonical XML form.

This implemented in

  • Python script “eaf-normalize.py” that normalizes EAF files. It reads XML as text from stdin, normalizes it, and outputs to stdout. In the VIHI_LENA repo, I saved it into the repo root so that it is available to any clone.

    #!/usr/bin/env python3
    import sys
    import xml.etree.ElementTree as element_tree
    
    xml_data = sys.stdin.read()
    xml_data_canonicalized = element_tree.canonicalize(xml_data)
    sys.stdout.write(xml_data_canonicalized)
  • Define clean filter “eaf-normalize” in the repo-local config. The command below will work only if the Python script is saved in the repository root. If you put it somewhere else, modify what is inside double quotes to resolve to wherever you put the script.

    git config --local filter.eaf-normalize.clean "$(git rev-parse --git-dir)/../eaf-normalize.py"
  • Add “eaf-normalize” as the filter for EAF files in “.gitattributes”.

    *.eaf filter=eaf-normalizepyth

Last updated