Iyfjky

Question

For languages without written alphabets, wondering how a linguist goes out in the field and determines, "hey, these are the core sounds of the language" and defines an alphabet in terms of those phonemes.

For example, I have seen languages with phonemes such as "dsh" or "tshÃŠÂ°", etc., things which could be broken down into smaller components such as "d + sh" for example. But instead they end up with an "alphabet" (not sure if that's the correct terminology) of let's say around 50 phonemes, many of which could be constructed from more primitive phonemes.

I guess this is related to the general question about how a phonology is constructed -- a complicated one such as the Ubykh phonology, with ~84 consonants. Not sure if an "alphabet" (unwritten) is the same thing as a phonology, but basically I'm just wondering how a linguist determines that "these are the base sounds" in a language. They then use that system to write down the words and such in the language for their book or other thing. So they may determine that "mb" is the annotation/transcription for the "nasalized [b] sound", and "tshÃŠÂ°" for the aspirated [ch] sound, etc. And then they spell their words out like /mbatshÃŠÂ°e/ for example.

I would basically like to know how they determine what should be at the lowest level. What goes into their thinking, etc.

departments.bucknell.edu/linguistics/lectures/05lect06.html
â€“Â Alex B.
2 hours ago — 2 hours ago

score 3 · Accepted Answer · 2018-10-07 16:31:39Z

You should not be surprised if I tell you that the process is highly variable. Very roughly speaking, you start by eliciting a bunch of words and writing them down. Linguists have varying degrees of experience with phonetic symbols and what they stand for, and this introduces the first layer of variability. The ideal is that there is a set of standard reference sounds (expert IPA renditions) and actual language instantiations across languages, so that you can decide whether the vowel in question is [a] or [ÃƒÂ¦]. There is no such reference source, so we make do idiosyncratically.

In principle, you would write down what was actually said, not concerning yourself with whether [a] or [ÃƒÂ¦] are contrastive Ã¢Â€Â“ making that decision comes after you have the data. In fact, people approach languages (after the first couple of hours) with at least a weak ideological bias. If you have encountered 100 tokens of [a] and 4 tokens of [ÃƒÂ¦], you will be biased in favor of rejecting [ÃƒÂ¦] as error or a conditioned variant.

It is common that linguists (outside of students in a field methods class, who may be deprived of the opportunity to Ã¢Â€ÂœcheatÃ¢Â€Â) seek guidance from prior knowledge, so, if you start working on a new language, e.g. Guerze, you might figure out what has been written on the language already, or what languages it is related to and therefore what phonemes this language is likely to have. This can backfire, of course, but still, doing this can give you hypotheses that can be tested.

Returning to the mythical pure methodology, you assemble a large enough corpus of (narrow) phonetic transcriptions, and then do a distributional analysis of the individual sounds. The analysis is almost or actually always done heuristically and not computationally, that is, you look at similar segments and phonetic properties that are attested in the corpus, and determine for a given pair ([a] vs. [ÃƒÂ¦], [t] vs. [tÃŠÂ°]Ã¢Â€Â¦) whether the contexts where the two sounds appear overlap, or are those contexts disjoint. This bit of analysis is the most common form of phonological analysis that the public encounters when they hear about Ã¢Â€ÂœphonologyÃ¢Â€Â, and innumerable textbooks teach about allophones and complementary distribution, trying to impart the analytic skill required to discover that in this language, [a] and [ÃƒÂ¦] are in complementary distribution, and [t] vs. [tÃŠÂ°] are not. I will simply say that it is both simple and difficult to understand the method, and the reason is that too much emphasis is put on looking for minimal pairs. A complete dissection of the method of analyzing for complementary distribution is beyond the scope of this answer.

Once you know that the sounds [a] and [ÃƒÂ¦] are in complementary distribution, one has license (or an obligation, depends on your ideology) to say that these are allophones of one phoneme: now you have to decide what that phoneme is. This too becomes a complex and ideology-riddled process. The basic question is, do you say that /a/ becomes [ÃƒÂ¦] somewhere, or do you say that /ÃƒÂ¦/ becomes [a] somewhere? The typical reasoning is remotely based on the rules involved to express the respective changes, so are the required rules turning /a/ into [ÃƒÂ¦] more complex than those turning /ÃƒÂ¦/ into [a]? In order to answer that question, you have to have a theory of what a Ã¢Â€ÂœruleÃ¢Â€Â is/can be. Often people appeal to Ã¢Â€Âœenvironment countingÃ¢Â€Â i.e. does one sound appear in more environments than the other (the idea being that the most general environment is the context where there is no rule and the most restricted environment is where there is a specific rule). However, Ã¢Â€Âœan environmentÃ¢Â€Â is not an obviously quantifiable fact. People may appeal to a priori notions of naturalness, for instance [a] might be seen as a more Ã¢Â€ÂœnaturalÃ¢Â€Â vowel compared to [ÃƒÂ¦] so you might decide based on naturalness. At any rate, if [ÃƒÂ¦] only appears after palatalized /tÃŠÂ²/, you would say that the phoneme /a/ becomes [ÃƒÂ¦] after /tÃŠÂ²/, and no other rule changes /a/, so that explains why the vowel is usually pronounced [a]. One way around this is to use Ã¢Â€ÂœelsewhereÃ¢Â€Â in the rule (effectively, every rule of allophonic distribution becomes an n-tuple of rules with Ã¢Â€ÂœelsewhereÃ¢Â€Â always in the mix. You can say Ã¢Â€Âœ/ÃƒÂ¦/ becomes [ÃƒÂ¦] after /tÃŠÂ²/; /ÃƒÂ¦/ becomes [a] elsewhereÃ¢Â€Â, and now you can claim that the phoneme is /ÃƒÂ¦/, even though [ÃƒÂ¦] is token-infrequent.

The question of cluster vs. unit-phoneme is a pretty advanced one which is not easily resolved by the field linguist. One person may decide that [mbwa] Ã¢Â€ÂœdogÃ¢Â€Â has three segments in the syllable onset, and another person may decide that it has one pre-nasalized rounded segment. 95% of the time, there isnÃ¢Â€Â™t compelling phonological evidence to chose between these accounts (and some phonological theories donÃ¢Â€Â™t even allow the possibility of making such a distinction).

The question of practical orthography is even more vexing. There are actually two kinds of orthographies: the orthography used by the linguist for talking to linguists, and something devised for speakers of the language. The linguist-orthography may perform simple mappings from IPA to more convenient letter, where ng may represent [Ã…Â‹] and ngg represents [Ã…Â‹g] (and so on). The popular orthography is (these days) based on community ideas, where a council of elders may like the idea of writing [ÃƒÂ, ÃƒÂ¬] Ã¢Â€Â“ or they may dislike it. Often, the decision is based on sentiments like Ã¢Â€ÂœWe wonÃ¢Â€Â™t write [Ã„Â©] like our enemies doÃ¢Â€Â vs. Ã¢Â€Âœwe will write [Ã„Â©] like our brothers doÃ¢Â€Â. The general rule is, avoid strange symbols, hence Taa and Shona orthographies avoid the obscure phonetic letters that would be necessary in an IPA transcription; also, not everything that is phonemic gets included in practical spelling.

I should also point out that some linguists think of phoneme letters like /a, ÃƒÂ¸/ as Ã¢Â€Âœthe lowest levelÃ¢Â€Â, but others further analyze sounds into defining features such as [+low,Ã¢Â€Â“rd] or [Ã¢Â€Â“back,Ã¢Â€Â“hi,Ã¢Â€Â“lo,+round].

I would like to know where I can find more info about "others further analyze sounds into defining features such as [+low,Ã¢Â€Â“rd] or [Ã¢Â€Â“back,Ã¢Â€Â“hi,Ã¢Â€Â“lo,+round]." — 1 hour ago
Distinctive feature theory is usually covered in introductory textbooks in phonology. Unfortunately, the SPE standard is not used in most current textbooks, so in depends on which theory you want to learn. The idea is the same across books, the details differ. Introducing phonology is built around the SPE standard, and Introductory phonology presents a different set of features. — 1 hour ago

score 3 · Accepted Answer · 2018-10-07 16:31:39Z