Megrelian Language Corpus

☰ ABOUT

Aims

The Megrelian Language Corpus (MLC) is a digital collection of texts, audio and video materials aimed at supporting linguistic and anthropological research on one of the Kartvelian languages — Megrelian — by systematizing language data and enabling in-depth analysis of the cultural and social contexts reflected in the texts.

Main Objectives

Preservation of the Megrelian language through the text documentation;
Creation of a morphologically annotated corpus based on linguistic materials collected during fieldwork in the Samegrelo region between 2022 and 2024;
Description of the contemporary linguistic and sociocultural situation of the Megrelian language;
Compilation of a formal grammar and lexicon(s) of the Megrelian language using FieldWorks Language Explorer (FLEx).

Methodology

The digital documentation and archiving process was carried out in two main stages: fieldwork conceptualization and data collection, followed by laboratory analysis and data processing.

Stage I: Fieldwork Conceptualization and Data Collection

The project began with the planning and implementation of field expeditions, which involved selecting routes and assessing risks; identifying target consultants from different age and gender groups to create a balanced corpus; recording materials of diverse themes and genres; and defining effective methodologies for working with consultants and collecting recordings.

Stage II: Laboratory Analysis and Data Processing

Following data collection, a structured process was implemented for analysis and storage. Linguistic transcriptions followed the International Phonetic Alphabet (IPA), while glossing adhered to the Leipzig Glossing Rules and Eurotyp Guidelines. FieldWorks Language Explorer (FLEx), a software tool for linguistic fieldwork, was used for data annotation.

The finalized data were made available online through The Megrelian Language Corpus (MLC).

Texts & Words Statistics (using chosen texts)

Total number of unique words (types):	60881
Megrelian:	30526
Megrelian (International Phonetic Alphabet):	30350
Total word count (tokens):	97702
Megrelian:	97702
Megrelian (International Phonetic Alphabet):	97581
Total number of sentences (segments):	9255

A Research Project & Financial Support

This work was supported by the Shota Rustaveli National Science Foundation of Georgia (SRNSFG) [FR-21-993-3, Annotated Corpus of the Megrelian Language with Sketch Grammar and Online Dictionary]. Additionally, co-funding from Ilia State University enabled us to host the website online and ensure its IT support. All ideas expressed herein are those of the authors and do not represent the opinions of the Foundation or the University.

Selected Publications

Lobzhanidze, I., Gersamia, R., & Gogia, T. (2026). The Megrelian Language Corpus (MLC): Creation, Annotation, and Initial Steps toward a UD Treebank. In Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026) (pp. 3250–3256). European Language Resources Association (ELRA). https://doi.org/10.63317/3tctzeeznuxb .
Lobzhanidze, I., & Gersamia, R. (2025). A corpus-based dictionary for the endangered Megrelian language. In I. Kosem, M. Jakubíček, M. Medveď, K. Zgaga, Š. Arhar Holdt, T. Munda, & A. Salgado (Eds.), Electronic lexicography in the 21st century (eLex 2025): Intelligent lexicography. Proceedings of the eLex 2025 conference (pp. 561–583). Lexical Computing CZ.
Gersamia, R., & Lobzhanidze, I. (2025). Encoding the progressive aspect in Megrelian verbs. Bulletin of the Georgian National Academy of Sciences (Moambe), 19(1), 113–118.
Gersamia, R., & Lobzhanidze, I. (2025). Code-mixing patterns in a corpus-based Megrelian–English dictionary. In T. Margalitadze (Ed.), Lexicography in the XXI century: Proceedings of the II International Conference (pp. 60–64). Centre for Lexicography and Language Technologies, Ilia State University.
Lobzhanidze, I., Gersamia, R., & Tsulaia, N. (2024). Compiling a bilingual Megrelian–English online dictionary: Preserving endangered Kartvelian languages. In K. Š. Despot, A. Ostroški Anić, & I. Brač (Eds.), Lexicography and semantics: Proceedings of the XXI EURALEX International Congress (pp. 647–661). EURALEX.
Gersamia, R., & Lobzhanidze, I. (2023). Documentation and annotation of Megrelian texts in FLEx. Language and Culture, 29, 46–50.
Gersamia, R., & Lobzhanidze, I. (2023). Preserving endangered Kartvelian languages: Lexicographic insights for a Megrelian dictionary. In Proceedings of the Lexicography in the XXI Century Conference (pp. 104–113).
Gersamia, R., Lobzhanidze, I., Skhulukhia, T., & Tsulaia, N. (2023). Documentation of the Megrelian language: Report of a linguistic expedition to Samegrelo (2022–2023). Kadmos, (15), 123–135.

Copyright information

The Corpus of Megrelian Language (MLC) by Rusudan Gersamia, Irina Lobzhanidze is licensed under CC BY-NC-SA 4.0 . This licence allows users to copy and redistribute the material in any medium or format, as well as remix, transform, and build upon it, provided that their contributions are distributed under the same license. The primary restriction is that the material must not be used for commercial purposes.

In future all contributions to the MLC should be submitted to the editors for review at rgersamia[@]iliauni.edu.ge or irina_lobzhanidze[@]iliauni.edu.ge and will be distributed under the same license.

If you want to cite material from the MLC, please, copy and paste the following information:

გერსამია, რ., ლობჟანიძე, ი. (რედ.). (2022 წლის 4 აპრილი). მეგრული ენის კორპუსი (ვერსია 1) [მონაცემთა ბაზა]. https://xmf.iliauni.edu.ge/

Gersamia, R., & Lobzhanidze, I. (Eds.). (2022, April 4). The Megrelian Language Corpus (Version 1) [Data set]. https://xmf.iliauni.edu.ge/

Acknowledgments

This project, aimed at preserving the endangered Megrelian language, was led by principal investigators Assoc. Professor Dr. Rusudan Gersamia and Professor Dr. Irina Lobzhanidze.

We also acknowledge the invaluable contributions of project team members Nino Tsulaia, MA and Tamuna Skhulukhia, MA, whose hard work was crucial to the project's progress.

Throughout various phases of the project, several individuals provided their expertise and support: Professor Dr. Zaal Kikvidze, Tamar Gogia, PhD, and Mariam Nadaraia, MA. Their involvement significantly enriched the project's outcomes.

We are grateful to the volunteers — Khatia Danelia, BA, Khatia Kobalia, BA, Lika Jalaghonia, BA, , Sopiko Tsertsvadze, BA, and Lasha Kvlividze, BA — whose enthusiasm and commitment considerably advanced our efforts.

User Guide & Grammatical Features

To view the usage instructions, please click the User Guide link.

To view grammatical features, please click the Grammatical Features link.