Working with EAFs programmatically
git
To avoid dealing with formatting changes, .eaf files are "normalized" before they are added to the index - they are re-formatted to a canonical XML form.
This implemented in
Python script “eaf-normalize.py” that normalizes EAF files. It reads XML as text from stdin, normalizes it, and outputs to stdout. In the VIHI_LENA repo, I saved it into the repo root so that it is available to any clone.
#!/usr/bin/env python3 import sys import xml.etree.ElementTree as element_tree xml_data = sys.stdin.read() xml_data_canonicalized = element_tree.canonicalize(xml_data) sys.stdout.write(xml_data_canonicalized)
Define clean filter “eaf-normalize” in the repo-local config. The command below will work only if the Python script is saved in the repository root. If you put it somewhere else, modify what is inside double quotes to resolve to wherever you put the script.
git config --local filter.eaf-normalize.clean "$(git rev-parse --git-dir)/../eaf-normalize.py"
Add “eaf-normalize” as the filter for EAF files in “.gitattributes”.
*.eaf filter=eaf-normalizepyth
Last updated