
Re-imaging Versiones Slavicae
The paper discusses a strategy for transforming the Versiones Slavicae database into an XML format, which would improve opportunities for application-independent preservation and maintenance.
More...We kindly inform you that, as long as the subject affiliation of our 300.000+ articles is in progress, you might get unsufficient or no results on your third level or second level search. In this case, please broaden your search criteria.
The paper discusses a strategy for transforming the Versiones Slavicae database into an XML format, which would improve opportunities for application-independent preservation and maintenance.
More...
This paper is a write-up of a keynote from El’Manuscript 2021, reflecting on the ways in which the field of computationally-supported medieval Slavic studies has and has not changed since the mid-2000’s. Looking towards developments in the broader fields of digital humanities and natural-language processing, it explores the ways that recent improvements in the tools at our disposal for mass digitization of manuscripts and text analysis at scale open up possibilities for working with manuscripts that have received very little attention. For these advancements to be feasible, however, scholars will need to prepare and share their digitized texts and annotations in ways that are not currently the norm, though a number of projects provide exemplary models of how these new conventions could be put into practice.
More...
The article contains some results of analyses of the Vienna part of the Codex Marianus (ÖNB, Vind. slav. 146), undertaken by an interdisciplinary group of scholars and scientists from the Centre of Image and Material Analysis in Cultural Heritage (CIMA ‒ www.cima@or.at) within two Austrian Science Fund-projects devoted to the ancient Glagolitic heritage. The investigation consisted of four parts, codicological, multispectral, chemical and philological. While the codicological survey served to get as much information as possible about the writing material (source of parchment, methods of preparation, writing process, deletions, condition), color and multispectral recordings had been made to preserve the manuscript at its best and to provide an apt basis for further investigations. The chemical analysis was executed with two portable spectroscopes (XRF and rFTIR) and aimed to get exact information on the parchment, the inks, paints and binders, and to collect data for a comparative study of parchment degradation. The philologists analysed the fragment comparatively with all other Old Church Slavonic-Glagolitic manuscripts preserved to get as much information as possible about their scribes.
More...
The paper defines the elementary principles for creating an electronic corpus of Serbian medieval charters and letters. The commitment to the principle of maximum representativeness of the corpus of medieval charters and letters, determined entirely by the preserved written legacy (based on manuscripts, microfilms or photographs), excludes the indispensability of applying the principle of balance, while simultaneously satisfying the principle of reliability, since charters and letters known solely by the edition are not included in the corpus. The selection of texts is done according to the diplomatic criterion by excluding the transcripts and copies of documents already available in the original, as well as later transcripts, chronologically and linguistically distant from the assumed original. This approach to the selection of texts is justified by the size of the corpus, as well as by the exceptional cultural and historical significance of medieval charters and letters. The definition of the metadata about corpus texts is determined by their general diplomatic properties, as well as the corpus search needs for diatopic, diachronic and genre variations. Conversion of texts into electronic form strives for fidelity to the original, encompassing the preservation of abbreviations, superscript letters and original punctuation, as well as the absence of accent marks and contemporary rules of capitalization.
More...
The article discusses a proposal of a minimal set of criteria for sentence segmentation (an obligatory stage in the corpus processing and annotation, especially with respect to the syntactic annotation) of medieval texts. In the context of a review of different definitions of a sentence (unit) and approaches to sentence segmentation, various criteria are discussed: structural, thematic, graphic, on the basis of sample sentences in order to define the minimal criteria. The discussion of the different factors is illustrated by sample sentences from two texts from 14th and 17th c. The proposed criteria aim at considering mainly structural characteristics while trying to avoid textual and semantic interpretation though these can also present challenges because the interpretation of the (syntactic) structure is inevitably related to the interpretation of the (semantic) content.
More...
The St Petersburg Corpus of Hagiographic Texts (SCAT) has launched two new mark-up formats. The first innovation is the comprehensive format developed for the division of hagiographic texts into parts, which are both explicitly marked as section headings and extrapolated through comparison with texts of the similar genre. The second innovation is an elaborate format representing the full range of various types of biblical, patristic and liturgical quotations occurring in the lives of saints. For the time being, three morphologically annotated manuscript texts have been marked up according to these guidelines, and we are planning to add two more texts in the near future. Close cooperation with the IHRIM research laboratory (Lyon) and wide use of their techniques and technology makes it possible to obtain some illuminating cross-format statistical data and thus offer new insights into the canons and rules of the Old Russian hagiography.
More...
The neural network tagger CLStM has been applied to the Old Russian Žitie Evfimija Velikogo (GIM, Chud. 20), a copy of the second half of the 14th century. The strengths of this tagger consist in its ability to automatically annotate an orthographically non-normalized text with dozens of pages within a few minutes, yielding a high accuracy with respect to part of speech and morphological features. Moreover, the tagger is capable of disambiguating case syncretism to a large extent, even in split constructions. Manual correction of the automatic tagging will result in a correctly tagged text considerably faster than when using a rule-based tagger or tagging completely manually. The weaknesses of the CLStM-tagger comprise certain examples of incorrect POS-tagging, sometimes incomplete or incorrect attribution of morphological categories to some parts of speech. Superscript letters and punctuation can pose special problems, normalization of punctuation will achieve better tagging results. The proportion of correct tags is higher when the token has been seen during the training process; unknown words (OOV) show a higher error rate. In the paper, we analyze the strengths and weaknesses of the tagger by providing specific examples. Furthermore, we demonstrate how to use automatically tagged, uncorrected data for quantitative analysis.
More...
The paper presents results, including work in progress, related to two databases of “non-bookish” / vernacular Old East Slavic writing, viz. the databases of birchbark letters and epigraphy. The aim of the project is the interlinking of visual, archeological/historical and linguistic information. The epigraphical database represents different interpretations of a single inscription, providing the outline of versions proposed in the existing literature. These sources, an archeographical database and a linguistic corpus making part of a larger Russian National corpus, are intended to be easily synchronized, expanded, and updated. An online work station for the morphological annotation of texts is a part of this project. An important function performed by this platform is creating an index to the corpus that can be used in the linguistic description of the dialect, verifying the index and the data of the book Old Novgorod Dialect. Addenda by Andrei Zaliznjak that is being prepared for a posthumous publication. New linguistic discoveries have been made during the implementation of the project.
More...
The work demonstrates the methods and techniques of elimination of variation of linguistic units in the transcriptions of the medieval Slavonic manuscripts of the historical corpus “Manuscript” (manuscripts.ru). The textual corpus, the material of which is presented by the machine-readable copies which resemble the original most closely, provides the user with such tools of transformation (modification) of linguistic units which enable the creation of queries and getting of retrievals corresponding to the task to be solved. In case of an inexact search the user has the possiblity to delete titlos and diacritics, reduction of the versions of letters to their basic form, indication of the mask of the linguistic units being searched in the form of a regular expression, use of the letters of the contemporary Cyrillic alphabet. To ensure operations over lemmas by means of the statistic modules of the corpus, it is necessary to automatically assign a given textual form to exactly one lemma. Due to grammatical homonymy, incorrect lemmatization would result in a situation where quantitative data based on word forms and data based on lemmas do not match each other. In order to assign word forms to the correct lemma, we apply a rule-based approach, taking into account the formal and quantitative characteristics of the linguistic units (such as their morphological variation or invariation, their frequency in the sub-corpus, the matching or mismatching with the lemma form, the frequency of relationships between the textual forms and dictionary paradigms of variable words, the results of manual elimination of the homonymy). The reduction of textual forms to unified, normalized, transliterated or initial forms is a necessary procedure for extracting of data from the historical corpus for the distributive-statistical analysis of the semantics of linguistic units.
More...
We report on applying Handwritten Text Recognition (HTR) to manuscripts from the archive of Konstantin Rychkov preserved at IOM RAS, St. Petersburg, within the INEL project. Folklore texts in Evenki (Tungusic) were collected in Western Siberia in 1910s. We used services provided by the Transkribus platform. The necessary step of Layout Analysis proved to be time-consuming due to the organization of the parallel Evenki-Russian text on the page without following a strict separation line. HTR models have been trained successively on different amounts of data up to 521 pages. The best Character Error Rate attained on validation data for the largest dataset is 4.50% for models trained on all characters. The distribution of errors is non-uniform: most errors are due to just a few problematic issues, especially diacritics such as the accent marking stress. It is written high above the line and frequently cut off from the line images at the preprocessing stage. After excluding the stress mark from training data and recognition, the lowest CER dropped to 2.90%. We compared two recognition engines, HTR+ and PyLaia. The HTR+ model trained without stress marks made less errors in letters, while PyLaia performed better with respect to diacritics.
More...
The author compares the marginal glosses in the book of Epifanij Slavinetskij’s Sbornik perevodov, 1665, with the text of Athanasius’ Third Oration against the Arians in Gavrilo Venclović’s Razglagolnik, 1734. The marginal glosses in Epifanij’s Russian Version are taken from a South Slavonic manuscript that has a common origin with the protograph of Venclović. The Orationes contra Arianos in Razglagolnik are written in South Slavonic koine and their source has the features of an Athonite translation related to the Council of Ferrara-Florence and the disputes over the filioque.
More...
The text transmission of the Slavonic translation of Hippolytus’ De Christo et Antichristo presents a stable and well-testified tradition. It gives a base for possible reconstruction of the Greek original from which this translation was made. The article demonstrates some omissions, additions, and reconstructions on the Greek text compared to the Slavonic one. Also, the paper addresses significant problems that occur in the scholars’ work on bilingual dictionaries discussing possible approaches and solutions. Still, some questions remain, and it is not easy to suggest a definite answer to them. The author underlines the importance of the fragmentary copy of the Greek text, presented in the manuscript of Meteora 573, bearing in mind its significant correspondence to the Slavonic tradition. Unfortunately, this manuscript preserves only trifling fragments of the whole work by Hippolytus of Rome.
More...
The article focuses on Old Slavonic versions of Euthalian chapter-lists to Acts and Epistles considering meta-communicative terms, such as παραίνεσις or προοίμιον. The author aims to evaluate the level of accuracy of Slavonic translations and their exegetical potential, which makes the content of the main text of Acts and Epistles clear. The analysis reveals two tendencies prevailing in Slavonic sources from the 12th–16th centuries: first, there are phenomena of lexical variability, as results of applying various translation strategies, more or less successful in terms of the accuracy and clarity of the resulting text (calques, periphrastic constructions, and text expansion). Second, there is a tendency towards unification, suggesting a universal Slavonic term for several Greek correlates. Authoritative dictionaries, including academic ones, do not record some lexemes. There is no dependence of the chapter-lists lexicon on the main text vocabulary.
More...
The focus of this report is the still-unexplored Interpretation of Orthodox liturgy, attested in two copies: first in manuscript No. 88 from the collection of Obolensky (201), State Archive of Russian Federation (Moscow), the second in manuscript No 52 of 1567, from the Archive of Baltazar Bogisić in Cavtat. The two manuscripts contain proven original works of Constantine of Kostenets (1380–1431). The author analyzes the structure and content of the interpretation and comments on it as a source for the history of Liturgy – from the point of view of the data concerning the liturgical features described in it. It can be concluded that the basis of texts in MS No 88 and MS Bogishić 52 is a late composition of Byzantine mystagogy, which, in turn, means that the time of implementation of the South Slavic translation should be dated no earlier than the end of the 12th century. This is one of the many short epitomes created during the Second Bulgarian Kingdom as a result of the secondary reduction of the original extensive commentary. A detailed investigation and the text-critical edition will be forthcoming.
More...
Review of: Т. Avgustinova. Word Order and Clitics in Bulgarian [ Saarbrucken Dissertations in Computational Linguistics and Language Technology. Volume 5]. Saarbrucken, 1998. 184 p.
More...
This paper studies the statistical implicational universals in the 30 languages sample from the classical paper by Joseph Greenberg (1966). Some problems in the universals proposed by Greenberg are shown, as well as 43 previously undiscovered universals of this type. The whole text of the article was generated by the computer program UNIVAUTO (UNIVersals Authoring TOol) and only the formatting according to the style-sheet of the journal was manually added. A brief description of this program, as well as another article generated by it, were previously published by this journal (Contrastive Linguistics 1999, issue 4).
More...
The article is devoted to the linguistic ways of the depreciation of Ukraine as an independent state. The analyzed material allows us to conclude that the linguistic plane of the studied discourse reflects several ways of depreciation and delegit- imization of Ukraine as an independent state. In addition to the almost mechanical replacement of the name Украина with Малороссия, which is derived from a simple denial of Ukraine’s right to independent existence, there appear such units that express certain arguments characteristic of imperial discourse: about the lack of real independence of the state (e.g. филиал, укрпроект), about the illegality of the procedures for the election of its authorities (государственный переворот), their illegal, violent character (e.g. хунта, диктатура), about the chaos prevailing on the territory of Ukraine (мазепинская самостийность, самостихийность). A characteristic feature of the Orthodox variety of imperial discourse, on the other hand, is even an indirect reference to the essentially medieval religious argumentation by pointing to the non-Christian character of the Ukrainian authorities (безбожная власть).
More...
This paper analyzes internet memes pertaining to Covid-19. We analyzed more than 200 memes over nine months. By utilizing Blending Theory and Discourse Viewpoint, we attempt to explain the creative inner workings of memes as well as how meaning is negotiated on the internet. We were clearly able to detect memes synchronously following the actual development of Covid-19 We show that meme makers use visuals metonymically to address the current state of Covid-19 while the overall message of memes is driven by simile. As much as memes draw on the concept of Covid-19, they also feed back to it in a loop of self-reference. Along with their underlying metaphoric nature, memes convey a feels-like attitude with two main phases emerging from their usage, i.e., the Observer Phase and the Experiencer Phase. The former showed memes at a stage where Covid-19 was not yet a pandemic (but perceived through media coverage from elsewhere) while the latter, the Experiencer Phase, clearly showed that meme creators had experienced the virus themselves. As for the timeframe covered, however, we conclude that memes do not show full conceptual integration as Covid-19 was not yet fully entrenched.
More...
This article explores the connection between artificial intelligence (AI) and language learning in the context of Education 4.0, highlighting how the former revolutionizes the latter with the introduction of emerging technologies and innovations in education. The article discusses how AI improves the processes of language learning through personalized learning experiences, interactive practice, and automated assessment. AI can be used to create diverse learning materials and immersive experiences that align with the principles of Education 4.0. When used correctly, AI can bring numerous benefits to language learning, such as increased efficiency, greater student engagement in the teaching-learning process, and the accessibility of content from anywhere and on any device. Additionally, it emphasizes the need to adopt Education 4.0 accompanied by the development of content that equips students with the necessary skills in the digital age. The article also highlights the importance of integrating AI and Education 4.0 in language learning to promote critical thinking, problem-solving skills, and digital literacy.
More...
This article presents the project Assessing the Reading Literacy and Comprehension of Early Graders in Bulgaria and Italy, which is carried out as part of an international collaboration between two partner organisations – the Institute for Bulgarian Language Prof. Lyubomir Andreychin (BAS) with participants from the Department of Computa-tional Linguistics and the Institute for Computational Linguistics A. Zampolli in Pisa, Italy. The main goal of the project is to research and assess the reading skills of primary school students using modern language technologies.
More...