While there hasn't been much additional development on the project since its presentation at Balisage last summer, this will be the first presentation for a Slavist audience.
Below is the abstract we submitted for the talk:
An XML-Based Approach to Dialectological Data: The Development of Syllabic Liquids in Bulgarian
The reflexes of syllabic liquids (hereafter CrC) in East South Slavic are strikingly diverse and therefore of interest for linguists working on a wide range of topics. In particular, the distribution of CrC reflexes in standard Bulgarian has been a recurring topic in the phonological literature, due to the empirical observation that the place of the vowel (/ăr/ versus /ră/ or /ăl/ versus /lă/) is conditioned by the syllable structure (Scatton 1974, Scatton 1976, Petrova 1993, Barnes 1997). In this paper, we present a tool to facilitate the examination and analysis of CrC reflexes across the dialects of Bulgarian.
This tool builds upon the word lists in the Bulgarian Dialect Atlas (BDA) by providing more accessible interfaces to the data. The words have been transcribed and marked up using XML to indicate lexeme, reflex, and place of stress (where applicable). Each site is listed with its associated words and geographic coordinates. This metadata is leveraged using XSLT stylesheets to generate views onto the data that would not previously have been possible. Each site has its own profile that shows what percentage of the tokens have which reflex, lists all tokens, and notes tokens of the same lexeme that have different reflexes. The profile for each reflex shows what percentage of sites have that reflex, which reflexes co-occur with it, and which lexemes have the given reflex and a different reflex within a single site. One of the views onto the lexemes is a sort based on how many reflexes are attested for a given lexeme, which provides insight into the lexical diffusion of reflexes. The token view identifies where a token is the unique carrier of its reflex. Dynamically generated maps are provided for most views, using color-coded location markers that better capture the nuances of the data than those found in the printed atlas.
This allows for an extremely detailed micro-analysis of the dynamics of lexical diffusion involved in the development of Bulgarian CrC reflexes, while providing macro-analytic tools that facilitate the identification of larger-scale trends in the data. The enhanced ability that this tool provides to identify locally divergent geographical points enables the easier identification of areas that may be of interest for more in-depth research. The ability to compare CrC reflexes in different environments makes it more feasible to track regional variation not just in the specific tokens attested in the BDA, but also, when multiple reflexes are found, to characterize the functioning of each reflex within the overall grammatical structure of any given dialect. These features will be of use in future research on this topic by enabling the inclusion of Bulgarian dialect data to an extent that was previously not feasible. We will also discuss the applicability of similar markup schemes to other types of data sets.
- Barnes, Jonathan. 1997. “Bulgarian Liquid Metathesis and Syllabification in OT.” in Bošković, Željko, Steven Franks, and William Snyder, eds. Annual Workshop on Formal Approaches to Slavic Linguistics: the Connecticut Meeting: 38-53.
- Petrova, Rossina. 1993. “Prosodic Theory and Schwa Metathesis in Bulgarian.” in Avrutin, Sergey, Steven Franks, and Ljiljana Progovac, eds. Annual Workshop on Formal Approaches to Slavic Linguistics: the MIT Meeting: 319-340.
- Scatton, Ernest. 1974. “Metathesis of Liquids and [Ъ] and the Bulgarian Verb.” in V Pamet na Prof. Dr. St. Stojkov – Ezikovedski Izsledvanija: 87-90.
- Scatton, Ernest. 1976. “Liquids, schwa, and vowel-zero alternations in modern Bg.” in Butler, ed. Bulgaria Past and Present. Columbus: 323-327.
- Stojkov et al., ed. 19641975. Bălgarski dialekten atlas. BAN: Sofia.