Collection Automation
KIParla-collection is automatically rebuilt whenever any module publishes a new GitHub Release.
This page describes how the automation works and how to maintain it.
Overview
The pipeline has two sides:
-
Module repos (KIP, KIPasti, ParlaBO, ParlaTO) — each has a workflow that fires when a release is published and notifies the collection.
-
KIParla-collection — has a workflow that receives the notification, pulls fresh data from all modules, rebuilds the collection, and commits the result.
What the build does
When triggered, the KIParla-collection build workflow:
-
Clones the latest
mainbranch of each module repo. -
Copies all
.tsvfiles intotsv/and all.eaffiles intoeaf/. -
Regenerates
linear-jefferson/andlinear-orthographic/by runningtsv2formats.pyon every TSV file. -
Merges
participants.tsvandconversations.tsvfrom all modules usingtools/merge_metadata.py. -
Commits and pushes any changes back to
KIParla-collection.
The build is idempotent: if no files changed, no commit is made.
Metadata merging
Metadata is merged by tools/merge_metadata.py.
The script normalises column differences between modules before merging:
| Module | Participants columns | Note |
|---|---|---|
KIP |
|
|
KIPasti |
|
|
ParlaBO |
|
|
ParlaTO |
|
— |
The merged participants.tsv keeps these columns: code, occupation, gender, conversations, birth-region, age-range, study-level.
The merged conversations.tsv keeps these columns: code, type, duration, participants-number, participants, participants-relationship, moderator, topic, year, collection-point.
Rows with duplicate code values are deduplicated, keeping the first occurrence (module order: KIP → KIPasti → ParlaBO → ParlaTO).
Trigger mechanism
The module-to-collection notification uses a GitHub repository_dispatch event.
Each module repo holds a workflow at .github/workflows/notify-collection.yml that sends a module-released event to KIParla-collection when a release is published.
This requires a GitHub Personal Access Token (PAT) with Contents: Read and Write permission on KIParla-collection, stored as a repository secret named COLLECTION_TOKEN in each module repo.
To trigger a rebuild manually (without publishing a release), go to KIParla-collection → Actions → Build Collection → Run workflow.
Adding a new module
To include a new module in the collection build:
-
Add a
git cloneline for the new repo in.github/workflows/build.ymlinsideKIParla-collection. -
Add the cloned path to the
--modulesargument of themerge_metadata.pycall. -
If the new module has non-standard column names in its metadata, add a rename mapping to the
COLUMN_RENAMESdict intools/merge_metadata.py. -
Add the
notify-collection.ymlworkflow to the new module repo and set itsCOLLECTION_TOKENsecret.
Troubleshooting
- Build failed during metadata merge
-
Check that all module repos have a
metadata/participants.tsvandmetadata/conversations.tsvwith acodecolumn. If a module added or renamed a column, updateCOLUMN_RENAMESor the target column lists inmerge_metadata.py. - Build triggered but no changes committed
-
All files were already up to date. This is expected behaviour.
- Module workflow cannot dispatch to collection
-
The
COLLECTION_TOKENsecret in the module repo may be expired or have insufficient permissions. Regenerate the PAT and update the secret.