MINOR SYNTAX AND CORPUS DATABASE IN ENGLISH AND UZBEK LANGUAGES: A COMPARATIVE AND COMPUTATIONAL ANALYSIS
DOI:
https://doi.org/10.26662/vmv7j339Abstract
This scientific article presents an exhaustive comparative analysis of minor syntax—encompassing elliptical constructions, fragments, word-sentences (so'z-gap), parentheticals (kiritmalar), and formulaic expressions—within the English and Uzbek languages. While the history of linguistics has been dominated by "major syntax," the study of complete sentences governed by the subject-predicate dichotomy, this report argues that minor syntactic units constitute a foundational element of human communication, particularly in spoken and digital discourse. Integrating theoretical frameworks from generative grammar, functional linguistics, and the specific Uzbek theoretical school of Kichik sintaksis (Small Syntax), the research interrogates the structural and semantic divergences between the analytic nature of English and the agglutinative morphology of Uzbek. Furthermore, the study provides a detailed technical evaluation of corpus database development for both languages. It scrutinizes the architecture of the British National Corpus (BNC) as a benchmark and rigorously analyzes the ongoing construction of the Uzbek National Corpus (UNC), addressing specific challenges in metadata standardization (TEI), syntactic annotation (CoNLL-U/Universal Dependencies), and the development of specialized tagsets for non-sentential units. The findings demonstrate that while English minor syntax is primarily driven by syntactic deletion and pragmatic economy, Uzbek minor syntax is deeply rooted in morphological derivation, where the internal structure of the word functions as a syntactic engine. The article concludes by outlining the implications of these findings for computational linguistics, specifically the necessity for adaptive algorithms in Natural Language Processing (NLP) to accurately parse the "periphery" of language.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.



















