DiplomaBot
229 graduation diplomas, two languages, one run.
// the problem
Every semester, A-CAT graduates around 230 kids across three program tracks and four cohort cycles, split between Jewish and Arab schools. Each kid gets a printed diploma: name in Hebrew or Arabic depending on their school, program text matching their track, the right principal's signature, the right date. Producing these by hand means someone typing 229 names into Word templates without typos, in two scripts, against a ceremony deadline. Commercial diploma software doesn't survive contact with right-to-left Arabic.
// what I built
A two-phase pipeline. Phase one, once a semester: convert the legacy .doc templates to modern .docx, preserving the borders, logos, and signatures the schools expect. Phase two, every run: Python reads the roster exported from Salesforce, fills every name and date into the documents in about thirty seconds, then drives Microsoft Word itself through JXA — JavaScript for Automation — to batch-export all 229 documents to PDF in one twelve-minute session, organized on the NAS by cycle, school, and edit-versus-print folders. Word does the rendering because Word is the only thing that renders Word documents correctly.
// how the RTL name handling works
Names are where this gets unforgiving — they're the one thing on the page a family will actually inspect. Each student's sector field decides the script; the run refuses to start if any student is missing one. Hebrew names containing a geresh — the backtick-like mark that turns ג into a "j" sound — triggered a Word PDF-export bug that silently dropped the surrounding characters, so the script normalizes them to apostrophes before any text is filled. Long names get measured: when a name passes 32 characters at 28pt it no longer fits the placeholder, so the font scales by max(14, 28 × 32 / length) — applied to 31 of the 229 students, written directly into the document XML's run properties. Dates are replaced by matching the full date pattern atomically, because replacing substrings leaves orphaned fragments of the old date behind.
// what broke
About 30% of PDFs came out with stale content — the previous student's diploma under the current student's filename. Word's activeDocument lags when files live on an SMB share; it will happily export whatever document it thinks is active. The fix closes every open document on each iteration and verifies the active document's name against the expected filename before exporting. This was the most elusive bug in the project.
The NAS had opinions too: after heavy write runs, folders refused deletion with "Resource busy" — a macOS-SMB timing issue solved by renaming the folder and retrying in a loop. And automated PDF validation turned out to be impossible for Arabic: pypdf extracts Arabic as presentation-form characters that never match the source letters, producing 100% false positives. Final checks are visual — render to PNG, look with your eyes.
// where it is now
In production. All 229 diplomas for the spring 2026 semester were printed by May 20 — 114 Hebrew, 115 Arabic, across all four cycles — and handed out at the ceremonies. The repo carries a next-semester checklist: refresh the roster from Salesforce, update the dates, run both phases. The production scars are documented next to the code that earned them.
// © eden_hadad · edenhadad.com