THIVLVC: Elevating Latin Parsing with a Modern Twist

Parsing ancient languages like Latin poses unique challenges, but recent advancements suggest we're entering a new era of computational linguistics. THIVLVC, a pioneering two-stage system, shows that with smart retrieval and model prompting, even classical texts can be parsed with impressive precision.

The THIVLVC Approach

THIVLVC, a participant in the EvaLatin 2026 Dependency Parsing task, innovatively combines retrieval and language models to tackle Latin parsing. It starts by retrieving structurally similar entries from the CIRCSE treebank. These examples, selected based on sentence length and POS n-gram similarity, serve as a guide to refine baseline parses generated by UDPipe.

This isn't just another parsing system. It's a convergence of AI and classical studies that challenges what we thought possible in the automation of ancient languages. By using examples and annotation guidelines, THIVLVC provides a nuanced and accurate parsing approach, especially impactful on Latin poetry.

Performance Gains

The results are promising. On Latin poetry, such as the works of Seneca, THIVLVC improves the CLAS score by an impressive 17 points over the UDPipe baseline. This isn't merely a technical achievement. it signals a shift in how we approach language parsing. For prose, like the writings of Thomas Aquinas, the improvement is more modest at 1.5 points, but still noteworthy.

The system's retrieval-augmented generation (RAG) approach is particularly effective, sparking the question: How far can this be pushed in other ancient languages? The AI-AI Venn diagram is getting thicker, and THIVLVC is at its center.

Beyond the Numbers

An intriguing aspect of THIVLVC is its impact on annotation consistency. In a double-blind error analysis of 300 discrepancies between THIVLVC and the gold standard, 53.3% of unanimous annotator decisions favor THIVLVC's output. This suggests not only the potential for improved parsing but also highlights inconsistencies in traditional treebank annotations. If agents have wallets, who holds the keys? In the area of language parsing, THIVLVC might just be the keyholder, redefining standards.

In an era where machine learning is reshaping industries, THIVLVC's advancements in parsing bring a fresh perspective to classical studies. It's a reminder that the convergence of AI and traditional fields can lead to breakthroughs that were previously thought unattainable.

THIVLVC: Elevating Latin Parsing with a Modern Twist

The THIVLVC Approach

Performance Gains

Beyond the Numbers

Key Terms Explained