
We kindly inform you that, as long as the subject affiliation of our 300.000+ articles is in progress, you might get unsufficient or no results on your third level or second level search. In this case, please broaden your search criteria.
Noun phrases in Croatian can differ in the degree of correlation between its constituents. Some constituents form a descriptive free word combinations (velik stol ʽlarge table’, sunčan dan ʽsunny day’, slatka kava ʽsweet coffee’, hladne ruke ʽcold hands’), while others form multiword units which concretize extra-linguistic content that can not be ex-pressed in one word (crna kava ʽblack coffee’, krevet na kat ‘bunk bed’, kreditna kartica ‘credit card’, radno mjesto ‘workplace’). Dependent constituents can be adjectives, which are congruent with a noun (velika soba ‘big room’, radno mjesto ‘working place’), or they can be adverb phrase or prepositional phrase (korak naprijed ‘step ahead’, mnogo ljudi ‘many people’, malo prijatelja ‘a few friends’, četkica za zube ‘toothbrush’, roba s greškom ‘faulty good’). This paper will analyze noun mreža (with reach syntagmatic and semantic potential) and its co-occurrences – they can either form a collocation or a free combination of words. The lexicographic description will be compared with the corpus-data. The analyses will take into consideration a list of computationally obtained collocates (collocation candidates) of a node noun. The frequency and the strength between the words occurring within a particular span can differ. The list of collocates obtained from the corpus will be taken into account and we will examine how it coincides with the existing lexicographic description and with theoretical principles of word combination interpretations in Croatian. The aim of the study is to determine how the corpus analysis can improve the treatment of word-combination entries in lexicographic work.
More...
The given research aims at the analysis and description of the structure of two-component terms in English for Audit and Accounting as a language for specific purposes. Such two-component terms are referred to as bi-term monomials and English for Audit and Accounting plays a role of a selected professional language domain. The theoretical principles of the work provide a brief overview of languages for specific purposes and substantiate the need in introduction of the term‚ monomial’. The research is based on the findings that a great number of terms / term clusters are strictly set and/or irreversible in their structure, which is why such are addressed as monomials and polynomials rather than terms. The topicality of the research links to bridging the gap between academic findings and applicability of such in the digital solutions for any area in science, technology and business in future. Bi-term monomials are classified and analysed according to their structure and categories, to which the terms in the monomials belong. The material of the study is composed of 115 monomials (terms) selected from the Handbook of International Quality Control, Auditing, Review, Other Assurance, and Related Services Pronouncements validated by the IAASB in 2018.
More...
Lemmatization, morphological (or morphosyntactic) annotation (MSD) and disambiguation is a basic and indispensable step in Natural Language Processing of languages with a moderate level of inflection. We present a web interface demonstrating the de facto default lemmatization and MSD for Slovak, as used in major Slovak corpora (with several enhancements yet to be applied in the corpora). The interface can be used chiefly for presentation or pedagogical purposes, with the morphological tags expanded and explained using plain language in several languages, including two different terminological registers of Slovak (professional linguistic or a "common" one).
More...
The contribution focuses on the use and collocation analysis of basic notions linked to the field of onomastics, including those that are specialized (e.g., "proprium", "anthroponym", "toponym", "chrematonym", etc.) and the ones that are commonly used (e.g., "proper name/naming"). In the latter case, attention will be paid to whether these terms are connected to onomastic contexts. In general, the research analyses sources where the lemmata are used, including their typical surroundings. The goal of the paper is to show how the public perceives onomastics, whether it is familiar with its key terms. The analysis is based on the data of the Czech National Corpus, version 8, opinion-journalism texts.
More...
Digital annotation of verbal aspect in Old Russian and Church Slavonic texts is a challenging and quite complicated task that requires a complex approach. While studying Slavic aspect systems synchronically, we always know whether the verb is perfective, imperfective or biaspectual, however, this is often not the case for the research of aspect in a diachronic perspective. The determination of the aspectual status of a particular verb for earlier stages is possible only after considering together different parameters such as: actionality, lexical semantic, morphology, functional distribution, syntactic restrictions, collocations, statistics etc. All essential parameters should be annotated sufficiently for an effective use of a corpora. That would enable a researcher to collect quickly the information necessary to build aspectual profile of a verb. It is also important to understand the hierarchy of the parameters, as they might have different degrees of importance, and for this purpose a special algorithm should be developed. The preliminary results, related to the parameters of annotation and the algorithm for aspect determination (using ‘Morphy’, the System for digital morphological annotation of Old Russian and Church Slavonic manuscripts, developed in Vinogradov Russian Language Institute RAS), are discussed in the paper.
More...
The article presents basic principles of designing the diachronic linguistic corpus of documents of the Don Cossack Host offices from the State Archive of the Volgograd region, Russia, including collecting documents for the text corpus, arranging the technical base of automatic processing and text editing, scheduling automated tagging, morphological annotation, and corpus software tools. The authors explain some technical aspects of corpus processing and text corpus constituency. It is considered reasonable to add any document to the corpus, the draft texts with the crossed-out fragments included, as it ensures accurate registration of grammar and vocabulary of the language at a certain historical period. A set of language marker types is worked over for automated meta-tagging. The corpus software tools are defined to enable accurate annotation of obsolete fonts so that they can be processed in a pair with regular language units and expressions in morphological and genre meta-tagging; in cases of partial text adaptation, the authentic old graphic symbols may have to be preserved.
More...
In cases where there is a larger collection of manuscripts, the scribe or author of which is unknown or in doubt, analyzing such manuscripts can take a lot of time and effort. The more pages and potential writers are involved, the more complicated it is to get tangible results. LiViTo is a free tool2 that requires a minimum of experience with the command line and allows a simplified search for keywords, revisions, and clustering of historical manuscripts. We present the application of LiViTo on the “lab case” of the biographies of Czech Protestant refugees from the 18th–19th century. Most of these manuscripts include stories of farmers’ and craftsmen’s families who fled to Berlin because of their religious beliefs. The examination of this type of biographies and manuscripts using the methods of Digital Humanities takes place for the first time for Czech. Using extracts from the research project in which LiViTo was developed, individual functions of the tool are explained. In addition, individual findings relating to the manuscripts and the potential further development of the tool are presented.
More...
The article deals with various efforts of the Staatsbibliothek zu Berlin (SBB) to make its collection of about 250 Church-Slavic prints from the 17th to the 19th century accessible in terms of content using the methods of modern information technology from the Digital Humanities sector. The focus is on full-text indexing of the heterogeneous Church Slavonic prints using HTR+ language models from the programme Transkribus. Depending on whether they are Moscow, Kiev or Old Believer prints, these models require different approaches and corresponding adaptations that take into account the printing area and printing period. Prints such as Kirillova kniga (1644) or Gistorija Ioanna Damaskina (1637) and many others are processed at large scale, whereby the developed character recognition models are constantly refined by training new verified data. The full texts generated in this way are permanently stored in various XML formats (ALTO, PAGE) on the one hand in a central repository for subsequent use, and on the other hand they are merged with original digital copies in the IIIF-compatible Digital Library of the SBB. As a further element, the Church Slavonic full texts will be indexed using special SOLR analyzers for efficient searches (Tokinising, Translit, N-Grams) and made searchable in subject portals (including the Slavistik-Portal) using modern text-image web design.
More...
The paper discusses some results obtained as part of an ongoing project at the Slavic Institute of Heidelberg University to produce automatic transcriptions of an early 18th century trilingual printed dictionary (Fedor Polikarpov’s Leksikon trejazyčnyj) and, on a preliminary basis, of a 17th century trilingual manuscript (Epifanij Slavineckii’s working copy of his Greek–Slavic–Latin dictionary) using the handwritten text recognition (HTR) platforms Transkribus and eScriptorium. It is argued that there are considerable advantages to employing such tools in terms of the simplification and acceleration of work on multilingual edition projects. Moreover, a comparison of our experience working with Transkribus and eScriptorium is given, along with an overview of the practical benefits and challenges of working with each of these platforms.
More...
The concept of stop words introduced by H. P. Lun in the mid-20th century plays a huge role in today’s NLP practice. Stop words are used to reduce noisy text data, remove uninformative words, speed up text processing, and minimize the amount of memory required to store data. The Kyrgyz language is an agglutinative Turkic language for which no scientific study of stop words has been previously published in English. In our study, we combined frequency analysis with rule-based linguistic analysis. First, we found the most frequently used words, set a threshold, and removed words below the threshold. This way we got a list of the most frequently used words. Then we reduced the list by excluding from the list all words that do not belong to the category of function words of the Kyrgyz language. Finally, we got a list of 50 words that can be considered stop words in the Kyrgyz language. In our analysis, we used a single corpus of sentences collected and posted as an open source project by one of the local broadcasters.
More...
The study introduces OnomOs, a new corpus of Czech texts with annotation of proper names. The corpus was compiled by onomasticians from the Department of Czech Language, Faculty of Arts, University of Ostrava, and made available by the Institute of the Czech National Corpus, Faculty of Arts, Charles University in Prague. The paper briefly discusses the content and structure of the corpus, the selection of texts for inclusion, and the onomastic-geographical classification of the identified names. The text consists chiefly of three preparatory analyses, which focus on the most frequent surnames, collocations found in Western and Eastern countries in the pre-1989 period, and the declension patterns of three types of onyms. In the summary, further possibilities of onomastic corpus research are presented.
More...
Cybersecurity is a rapidly developing domain, where emerging new concepts are usually first designated in English and then find their way into the usage of other languages. As the Lithuanian terminology in this domain develops, different types of synonymous terms appear in usage, which are treated differently by speakers. The article presents a terminology survey involving 593 respondents from various age groups, from different regions and expertise levels. In the survey, the respondents had to name the most suitable terms for 10 cybersecurity concepts: the respondents could choose the terms proposed in the questionnaire or they could propose their own terms and give the reasons why they made their choices. The concepts and their terminological designations were selected from the Lithuanian-English Cybersecurity Termbase, the dataset of which is based on bilingual parallel and comparable cybersecurity corpora. The quantitative and qualitative analysis of survey results reveals preferences for different types of terms, such as borrowings, metaphorical calques, and descriptive terms, and how these preferences differ across the two segments of respondents: students vs. graduates, and cybersecurity experts vs. general public. The results show that some terminological designations have been already established in the Lithuanian language, while most of them are still competing for their positions. The analysis of the reasons reveals that accuracy and clarity are the main factors for selecting a term. The research contributes to the standardisation of cybersecurity terms in Lithuania and provides insights into user preferences and the reasons behind them.
More...
Drawing on the foundational concepts of Construction Grammar and the analytical approach of Distinctive-Collexeme Analysis, this linguistic inquiry systematically investigates the affinities of adverbial complements (ACs) for the ‘AC-(rzecz) ujmując-construction’ (e.g., ogólnie rzecz ujmując ‘generally speaking’; verbatim: generally thing/matter expressing/phrasing) in comparison to the ‘AC-(rzecz) biorąc-construction’ (e.g., ogólnie rzecz biorąc ‘all things considered’; verbatim: generally thing/matter taking). The analysis of the dataset from the National Corpus of Polish (NKJP) shows that these linguistic constructions extend beyond mere syntactic variance, revealing distinct semantic nuances, assuming varying pragmatic roles within communicative contexts, and exhibiting marked collocational patterns with a diverse array of adverbs and adverbial phrases.
More...
The article considers the possibility of translating literary texts from a natural language into a target language using an automatic machine translation system. The author considers the possibilities of biological intelligence in comparison with machine "intelligence", which is a prototype of artificial intelligence. The paper gives an example (based on the game in GO) of the emergence of neural networks that appear as a result of self-learning of automated systems. New types of neural networks create an algorithm that always wins (the game of GO) against a person, even with the title of “world champion”. Studies show that automated translation systems must be taught to recognize lexico-semantic groups that would be based on the linguistic categories of natural language. Another task is to teach the machine to “think” and operate with images, since natural language is closely related to images and, at present, cannot be fully algorithmized. The practical application of automatic translation of poetic works shows that automatic machine translation systems, not possessing a psyche, are not able to translate texts of this type.
More...
Correction of grammatical errors is today integrated into the most widely used text processing tools and is accessible online. However, these tools are primarily half-automatic, merely suggesting possible corrections and variations, and require interaction with a user, which can be a tedious task when used on lengthy texts. Recent advancements in the field of artificial intelligence and natural language processing offer a more efficient strategy. This paper analyzes a possibility of using ChatGPT for correcting grammar in Portuguese texts written by native speakers of Croatian. The texts were corrected by a native speaker of European Portuguese and by ChatGPT. The authors analyzed error detection and correction at various linguistic levels and accompanied it with examples. Due to class imbalance, the system’s performance was evaluated using the F-measure. The calculation of false positives and true negatives was adjusted because of special cases of improper correction. Taking that into consideration, the F0.5 score was 0.805. Nevertheless, it should be noted that the results would likely vary if the input corpus had different structure and proficiency level.
More...
This article presents a literature review of the blending and integration of Virtual Reality (VR) and literary texts in the teaching of FL/SFL. The use of these two educational tools has sparked increased interest in the academic community, but their application and effectiveness vary widely, especially in the specific case of their fusion. The present review, using the PRISMA method, addresses existing academic literature across multiple databases to identify trends, best practices, and challenges in the convergence of VR and literature in the language teaching context. It examines how these technologies can enhance the understanding and appreciation of literature in FL/EFL learning. The article not only provides an overview of the current situation in this field but also highlights the educational possibilities that arise when combining virtual environments with the study of literary texts. Additionally, it addresses the limitations and challenges present in implementing this integration, laying a solid foundation for future research and curriculum development. It is part of a broader doctoral research project in the field, encompassing exploration and educational intervention.
More...
Computational thinking, defined as a way of thinking that can be applied to various fields that require problem-solving skills, has become prevalent in education. Students, i.e., future specialists, have to be prepared for complex thinking competence, necessary for solving business and societal problems, for which a combination of mathematical thinking and computational thinking is essential. The preliminary premise is that there is a correlation between ability in specific mathematical and computational fields. Therefore, this paper aims to highlight the significance of investigating the relations between those fields from linguistic point of view. In order to better understand the relationship between abilities in specific mathematical and computational fields, this paper presents an analysis of a new approach, namely, developing hypotheses for exploring the relationship between metalanguages of different fields of Mathematics and Computer Science. Additionally, the paper describes the first stage of a study on a doctoral level in an attempt to suggest possible statistical analyses suitable for testing hypotheses based on meta-analysis of the current literature.
More...
Generative linguistics is widely claimed to produce theories at the level of computation in the sense outlined by David Marr. Marr even used generative grammar as an example of a computational level theory. At this level, a theory specifies a function for mapping one kind of information into another. How this function is computed is then specified at the algorithmic level before an account of how this is algorithm is realised by some physical system is presented at the implementation level. This paper will argue that generative linguistics does not fi t anywhere within this framework. We will then look at several ways researchers have attempted to modify either the framework of generative theory to reconcile the two approaches. Finally, it presents and discusses an alternative position, anti-realism about generative grammar. While this position has attracted some recent support, it also runs into some of the problems that earlier modifications faced.
More...
While e-books and audiobooks have gained popularity, traditional print books continue to hold significance in both physical and online bookstores, contributing to a rich reading landscape. The globalization of cultural industries, including the book sector, is a dynamic process shaped by contemporary economic mechanisms and global exchange. In this context, the evolution of book cover design is closely tied to advancements in digital technology and shifts in consumer behaviour. Digital tools such as Adobe Photoshop and Illustrator have revolutionized the aesthetics of book covers, leading to the proliferation of certain design styles. This study explores how global graphic design trends influence the aesthetics of book covers in Romania. The research corpus comprises fiction book covers published in January 2025, selected from four prominent online bookstores in Romania. The analysis seeks to identify visual trends using a defined set of categories, with an emphasis on how international design trends have been integrated into the Romanian publishing industry. Selected covers were systematically coded and evaluated by two independent coders to ensure consistency. Trends that exhibited the highest frequency scores were retained for further analysis and discussion.
More...