How linguists select phonemes to construct an alphabet for a language

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












For languages without written alphabets, wondering how a linguist goes out in the field and determines, "hey, these are the core sounds of the language" and defines an alphabet in terms of those phonemes.



For example, I have seen languages with phonemes such as "dsh" or "tshÊ°", etc., things which could be broken down into smaller components such as "d + sh" for example. But instead they end up with an "alphabet" (not sure if that's the correct terminology) of let's say around 50 phonemes, many of which could be constructed from more primitive phonemes.



I guess this is related to the general question about how a phonology is constructed -- a complicated one such as the Ubykh phonology, with ~84 consonants. Not sure if an "alphabet" (unwritten) is the same thing as a phonology, but basically I'm just wondering how a linguist determines that "these are the base sounds" in a language. They then use that system to write down the words and such in the language for their book or other thing. So they may determine that "mb" is the annotation/transcription for the "nasalized [b] sound", and "tshÊ°" for the aspirated [ch] sound, etc. And then they spell their words out like /mbatshÊ°e/ for example.



I would basically like to know how they determine what should be at the lowest level. What goes into their thinking, etc.










share|improve this question





















  • departments.bucknell.edu/linguistics/lectures/05lect06.html
    – Alex B.
    2 hours ago














up vote
1
down vote

favorite












For languages without written alphabets, wondering how a linguist goes out in the field and determines, "hey, these are the core sounds of the language" and defines an alphabet in terms of those phonemes.



For example, I have seen languages with phonemes such as "dsh" or "tshÊ°", etc., things which could be broken down into smaller components such as "d + sh" for example. But instead they end up with an "alphabet" (not sure if that's the correct terminology) of let's say around 50 phonemes, many of which could be constructed from more primitive phonemes.



I guess this is related to the general question about how a phonology is constructed -- a complicated one such as the Ubykh phonology, with ~84 consonants. Not sure if an "alphabet" (unwritten) is the same thing as a phonology, but basically I'm just wondering how a linguist determines that "these are the base sounds" in a language. They then use that system to write down the words and such in the language for their book or other thing. So they may determine that "mb" is the annotation/transcription for the "nasalized [b] sound", and "tshÊ°" for the aspirated [ch] sound, etc. And then they spell their words out like /mbatshÊ°e/ for example.



I would basically like to know how they determine what should be at the lowest level. What goes into their thinking, etc.










share|improve this question





















  • departments.bucknell.edu/linguistics/lectures/05lect06.html
    – Alex B.
    2 hours ago












up vote
1
down vote

favorite









up vote
1
down vote

favorite











For languages without written alphabets, wondering how a linguist goes out in the field and determines, "hey, these are the core sounds of the language" and defines an alphabet in terms of those phonemes.



For example, I have seen languages with phonemes such as "dsh" or "tshÊ°", etc., things which could be broken down into smaller components such as "d + sh" for example. But instead they end up with an "alphabet" (not sure if that's the correct terminology) of let's say around 50 phonemes, many of which could be constructed from more primitive phonemes.



I guess this is related to the general question about how a phonology is constructed -- a complicated one such as the Ubykh phonology, with ~84 consonants. Not sure if an "alphabet" (unwritten) is the same thing as a phonology, but basically I'm just wondering how a linguist determines that "these are the base sounds" in a language. They then use that system to write down the words and such in the language for their book or other thing. So they may determine that "mb" is the annotation/transcription for the "nasalized [b] sound", and "tshÊ°" for the aspirated [ch] sound, etc. And then they spell their words out like /mbatshÊ°e/ for example.



I would basically like to know how they determine what should be at the lowest level. What goes into their thinking, etc.










share|improve this question













For languages without written alphabets, wondering how a linguist goes out in the field and determines, "hey, these are the core sounds of the language" and defines an alphabet in terms of those phonemes.



For example, I have seen languages with phonemes such as "dsh" or "tshÊ°", etc., things which could be broken down into smaller components such as "d + sh" for example. But instead they end up with an "alphabet" (not sure if that's the correct terminology) of let's say around 50 phonemes, many of which could be constructed from more primitive phonemes.



I guess this is related to the general question about how a phonology is constructed -- a complicated one such as the Ubykh phonology, with ~84 consonants. Not sure if an "alphabet" (unwritten) is the same thing as a phonology, but basically I'm just wondering how a linguist determines that "these are the base sounds" in a language. They then use that system to write down the words and such in the language for their book or other thing. So they may determine that "mb" is the annotation/transcription for the "nasalized [b] sound", and "tshÊ°" for the aspirated [ch] sound, etc. And then they spell their words out like /mbatshÊ°e/ for example.



I would basically like to know how they determine what should be at the lowest level. What goes into their thinking, etc.







phonology theoretical-linguistics transcription






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 2 hours ago









Lance Pollard

749311




749311











  • departments.bucknell.edu/linguistics/lectures/05lect06.html
    – Alex B.
    2 hours ago
















  • departments.bucknell.edu/linguistics/lectures/05lect06.html
    – Alex B.
    2 hours ago















departments.bucknell.edu/linguistics/lectures/05lect06.html
– Alex B.
2 hours ago




departments.bucknell.edu/linguistics/lectures/05lect06.html
– Alex B.
2 hours ago










1 Answer
1






active

oldest

votes

















up vote
3
down vote



accepted










You should not be surprised if I tell you that the process is highly variable. Very roughly speaking, you start by eliciting a bunch of words and writing them down. Linguists have varying degrees of experience with phonetic symbols and what they stand for, and this introduces the first layer of variability. The ideal is that there is a set of standard reference sounds (expert IPA renditions) and actual language instantiations across languages, so that you can decide whether the vowel in question is [a] or [æ]. There is no such reference source, so we make do idiosyncratically.



In principle, you would write down what was actually said, not concerning yourself with whether [a] or [æ] are contrastive – making that decision comes after you have the data. In fact, people approach languages (after the first couple of hours) with at least a weak ideological bias. If you have encountered 100 tokens of [a] and 4 tokens of [æ], you will be biased in favor of rejecting [æ] as error or a conditioned variant.



It is common that linguists (outside of students in a field methods class, who may be deprived of the opportunity to “cheat”) seek guidance from prior knowledge, so, if you start working on a new language, e.g. Guerze, you might figure out what has been written on the language already, or what languages it is related to and therefore what phonemes this language is likely to have. This can backfire, of course, but still, doing this can give you hypotheses that can be tested.



Returning to the mythical pure methodology, you assemble a large enough corpus of (narrow) phonetic transcriptions, and then do a distributional analysis of the individual sounds. The analysis is almost or actually always done heuristically and not computationally, that is, you look at similar segments and phonetic properties that are attested in the corpus, and determine for a given pair ([a] vs. [æ], [t] vs. [tʰ]…) whether the contexts where the two sounds appear overlap, or are those contexts disjoint. This bit of analysis is the most common form of phonological analysis that the public encounters when they hear about “phonology”, and innumerable textbooks teach about allophones and complementary distribution, trying to impart the analytic skill required to discover that in this language, [a] and [æ] are in complementary distribution, and [t] vs. [tʰ] are not. I will simply say that it is both simple and difficult to understand the method, and the reason is that too much emphasis is put on looking for minimal pairs. A complete dissection of the method of analyzing for complementary distribution is beyond the scope of this answer.



Once you know that the sounds [a] and [æ] are in complementary distribution, one has license (or an obligation, depends on your ideology) to say that these are allophones of one phoneme: now you have to decide what that phoneme is. This too becomes a complex and ideology-riddled process. The basic question is, do you say that /a/ becomes [æ] somewhere, or do you say that /æ/ becomes [a] somewhere? The typical reasoning is remotely based on the rules involved to express the respective changes, so are the required rules turning /a/ into [æ] more complex than those turning /æ/ into [a]? In order to answer that question, you have to have a theory of what a “rule” is/can be. Often people appeal to “environment counting” i.e. does one sound appear in more environments than the other (the idea being that the most general environment is the context where there is no rule and the most restricted environment is where there is a specific rule). However, “an environment” is not an obviously quantifiable fact. People may appeal to a priori notions of naturalness, for instance [a] might be seen as a more “natural” vowel compared to [æ] so you might decide based on naturalness. At any rate, if [æ] only appears after palatalized /tʲ/, you would say that the phoneme /a/ becomes [æ] after /tʲ/, and no other rule changes /a/, so that explains why the vowel is usually pronounced [a]. One way around this is to use “elsewhere” in the rule (effectively, every rule of allophonic distribution becomes an n-tuple of rules with “elsewhere” always in the mix. You can say “/æ/ becomes [æ] after /tʲ/; /æ/ becomes [a] elsewhere”, and now you can claim that the phoneme is /æ/, even though [æ] is token-infrequent.



The question of cluster vs. unit-phoneme is a pretty advanced one which is not easily resolved by the field linguist. One person may decide that [mbwa] “dog” has three segments in the syllable onset, and another person may decide that it has one pre-nasalized rounded segment. 95% of the time, there isn’t compelling phonological evidence to chose between these accounts (and some phonological theories don’t even allow the possibility of making such a distinction).



The question of practical orthography is even more vexing. There are actually two kinds of orthographies: the orthography used by the linguist for talking to linguists, and something devised for speakers of the language. The linguist-orthography may perform simple mappings from IPA to more convenient letter, where ng may represent [ŋ] and ngg represents [ŋg] (and so on). The popular orthography is (these days) based on community ideas, where a council of elders may like the idea of writing [í, ì] – or they may dislike it. Often, the decision is based on sentiments like “We won’t write [ĩ] like our enemies do” vs. “we will write [ĩ] like our brothers do”. The general rule is, avoid strange symbols, hence Taa and Shona orthographies avoid the obscure phonetic letters that would be necessary in an IPA transcription; also, not everything that is phonemic gets included in practical spelling.



I should also point out that some linguists think of phoneme letters like /a, ø/ as “the lowest level”, but others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round].






share|improve this answer






















  • I would like to know where I can find more info about "others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round]."
    – Lance Pollard
    1 hour ago










  • @LancePollard Look into "distinctive feature theory".
    – Draconis
    1 hour ago










  • Distinctive feature theory is usually covered in introductory textbooks in phonology. Unfortunately, the SPE standard is not used in most current textbooks, so in depends on which theory you want to learn. The idea is the same across books, the details differ. Introducing phonology is built around the SPE standard, and Introductory phonology presents a different set of features.
    – user6726
    1 hour ago










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "312"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2flinguistics.stackexchange.com%2fquestions%2f29323%2fhow-linguists-select-phonemes-to-construct-an-alphabet-for-a-language%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote



accepted










You should not be surprised if I tell you that the process is highly variable. Very roughly speaking, you start by eliciting a bunch of words and writing them down. Linguists have varying degrees of experience with phonetic symbols and what they stand for, and this introduces the first layer of variability. The ideal is that there is a set of standard reference sounds (expert IPA renditions) and actual language instantiations across languages, so that you can decide whether the vowel in question is [a] or [æ]. There is no such reference source, so we make do idiosyncratically.



In principle, you would write down what was actually said, not concerning yourself with whether [a] or [æ] are contrastive – making that decision comes after you have the data. In fact, people approach languages (after the first couple of hours) with at least a weak ideological bias. If you have encountered 100 tokens of [a] and 4 tokens of [æ], you will be biased in favor of rejecting [æ] as error or a conditioned variant.



It is common that linguists (outside of students in a field methods class, who may be deprived of the opportunity to “cheat”) seek guidance from prior knowledge, so, if you start working on a new language, e.g. Guerze, you might figure out what has been written on the language already, or what languages it is related to and therefore what phonemes this language is likely to have. This can backfire, of course, but still, doing this can give you hypotheses that can be tested.



Returning to the mythical pure methodology, you assemble a large enough corpus of (narrow) phonetic transcriptions, and then do a distributional analysis of the individual sounds. The analysis is almost or actually always done heuristically and not computationally, that is, you look at similar segments and phonetic properties that are attested in the corpus, and determine for a given pair ([a] vs. [æ], [t] vs. [tʰ]…) whether the contexts where the two sounds appear overlap, or are those contexts disjoint. This bit of analysis is the most common form of phonological analysis that the public encounters when they hear about “phonology”, and innumerable textbooks teach about allophones and complementary distribution, trying to impart the analytic skill required to discover that in this language, [a] and [æ] are in complementary distribution, and [t] vs. [tʰ] are not. I will simply say that it is both simple and difficult to understand the method, and the reason is that too much emphasis is put on looking for minimal pairs. A complete dissection of the method of analyzing for complementary distribution is beyond the scope of this answer.



Once you know that the sounds [a] and [æ] are in complementary distribution, one has license (or an obligation, depends on your ideology) to say that these are allophones of one phoneme: now you have to decide what that phoneme is. This too becomes a complex and ideology-riddled process. The basic question is, do you say that /a/ becomes [æ] somewhere, or do you say that /æ/ becomes [a] somewhere? The typical reasoning is remotely based on the rules involved to express the respective changes, so are the required rules turning /a/ into [æ] more complex than those turning /æ/ into [a]? In order to answer that question, you have to have a theory of what a “rule” is/can be. Often people appeal to “environment counting” i.e. does one sound appear in more environments than the other (the idea being that the most general environment is the context where there is no rule and the most restricted environment is where there is a specific rule). However, “an environment” is not an obviously quantifiable fact. People may appeal to a priori notions of naturalness, for instance [a] might be seen as a more “natural” vowel compared to [æ] so you might decide based on naturalness. At any rate, if [æ] only appears after palatalized /tʲ/, you would say that the phoneme /a/ becomes [æ] after /tʲ/, and no other rule changes /a/, so that explains why the vowel is usually pronounced [a]. One way around this is to use “elsewhere” in the rule (effectively, every rule of allophonic distribution becomes an n-tuple of rules with “elsewhere” always in the mix. You can say “/æ/ becomes [æ] after /tʲ/; /æ/ becomes [a] elsewhere”, and now you can claim that the phoneme is /æ/, even though [æ] is token-infrequent.



The question of cluster vs. unit-phoneme is a pretty advanced one which is not easily resolved by the field linguist. One person may decide that [mbwa] “dog” has three segments in the syllable onset, and another person may decide that it has one pre-nasalized rounded segment. 95% of the time, there isn’t compelling phonological evidence to chose between these accounts (and some phonological theories don’t even allow the possibility of making such a distinction).



The question of practical orthography is even more vexing. There are actually two kinds of orthographies: the orthography used by the linguist for talking to linguists, and something devised for speakers of the language. The linguist-orthography may perform simple mappings from IPA to more convenient letter, where ng may represent [ŋ] and ngg represents [ŋg] (and so on). The popular orthography is (these days) based on community ideas, where a council of elders may like the idea of writing [í, ì] – or they may dislike it. Often, the decision is based on sentiments like “We won’t write [ĩ] like our enemies do” vs. “we will write [ĩ] like our brothers do”. The general rule is, avoid strange symbols, hence Taa and Shona orthographies avoid the obscure phonetic letters that would be necessary in an IPA transcription; also, not everything that is phonemic gets included in practical spelling.



I should also point out that some linguists think of phoneme letters like /a, ø/ as “the lowest level”, but others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round].






share|improve this answer






















  • I would like to know where I can find more info about "others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round]."
    – Lance Pollard
    1 hour ago










  • @LancePollard Look into "distinctive feature theory".
    – Draconis
    1 hour ago










  • Distinctive feature theory is usually covered in introductory textbooks in phonology. Unfortunately, the SPE standard is not used in most current textbooks, so in depends on which theory you want to learn. The idea is the same across books, the details differ. Introducing phonology is built around the SPE standard, and Introductory phonology presents a different set of features.
    – user6726
    1 hour ago














up vote
3
down vote



accepted










You should not be surprised if I tell you that the process is highly variable. Very roughly speaking, you start by eliciting a bunch of words and writing them down. Linguists have varying degrees of experience with phonetic symbols and what they stand for, and this introduces the first layer of variability. The ideal is that there is a set of standard reference sounds (expert IPA renditions) and actual language instantiations across languages, so that you can decide whether the vowel in question is [a] or [æ]. There is no such reference source, so we make do idiosyncratically.



In principle, you would write down what was actually said, not concerning yourself with whether [a] or [æ] are contrastive – making that decision comes after you have the data. In fact, people approach languages (after the first couple of hours) with at least a weak ideological bias. If you have encountered 100 tokens of [a] and 4 tokens of [æ], you will be biased in favor of rejecting [æ] as error or a conditioned variant.



It is common that linguists (outside of students in a field methods class, who may be deprived of the opportunity to “cheat”) seek guidance from prior knowledge, so, if you start working on a new language, e.g. Guerze, you might figure out what has been written on the language already, or what languages it is related to and therefore what phonemes this language is likely to have. This can backfire, of course, but still, doing this can give you hypotheses that can be tested.



Returning to the mythical pure methodology, you assemble a large enough corpus of (narrow) phonetic transcriptions, and then do a distributional analysis of the individual sounds. The analysis is almost or actually always done heuristically and not computationally, that is, you look at similar segments and phonetic properties that are attested in the corpus, and determine for a given pair ([a] vs. [æ], [t] vs. [tʰ]…) whether the contexts where the two sounds appear overlap, or are those contexts disjoint. This bit of analysis is the most common form of phonological analysis that the public encounters when they hear about “phonology”, and innumerable textbooks teach about allophones and complementary distribution, trying to impart the analytic skill required to discover that in this language, [a] and [æ] are in complementary distribution, and [t] vs. [tʰ] are not. I will simply say that it is both simple and difficult to understand the method, and the reason is that too much emphasis is put on looking for minimal pairs. A complete dissection of the method of analyzing for complementary distribution is beyond the scope of this answer.



Once you know that the sounds [a] and [æ] are in complementary distribution, one has license (or an obligation, depends on your ideology) to say that these are allophones of one phoneme: now you have to decide what that phoneme is. This too becomes a complex and ideology-riddled process. The basic question is, do you say that /a/ becomes [æ] somewhere, or do you say that /æ/ becomes [a] somewhere? The typical reasoning is remotely based on the rules involved to express the respective changes, so are the required rules turning /a/ into [æ] more complex than those turning /æ/ into [a]? In order to answer that question, you have to have a theory of what a “rule” is/can be. Often people appeal to “environment counting” i.e. does one sound appear in more environments than the other (the idea being that the most general environment is the context where there is no rule and the most restricted environment is where there is a specific rule). However, “an environment” is not an obviously quantifiable fact. People may appeal to a priori notions of naturalness, for instance [a] might be seen as a more “natural” vowel compared to [æ] so you might decide based on naturalness. At any rate, if [æ] only appears after palatalized /tʲ/, you would say that the phoneme /a/ becomes [æ] after /tʲ/, and no other rule changes /a/, so that explains why the vowel is usually pronounced [a]. One way around this is to use “elsewhere” in the rule (effectively, every rule of allophonic distribution becomes an n-tuple of rules with “elsewhere” always in the mix. You can say “/æ/ becomes [æ] after /tʲ/; /æ/ becomes [a] elsewhere”, and now you can claim that the phoneme is /æ/, even though [æ] is token-infrequent.



The question of cluster vs. unit-phoneme is a pretty advanced one which is not easily resolved by the field linguist. One person may decide that [mbwa] “dog” has three segments in the syllable onset, and another person may decide that it has one pre-nasalized rounded segment. 95% of the time, there isn’t compelling phonological evidence to chose between these accounts (and some phonological theories don’t even allow the possibility of making such a distinction).



The question of practical orthography is even more vexing. There are actually two kinds of orthographies: the orthography used by the linguist for talking to linguists, and something devised for speakers of the language. The linguist-orthography may perform simple mappings from IPA to more convenient letter, where ng may represent [ŋ] and ngg represents [ŋg] (and so on). The popular orthography is (these days) based on community ideas, where a council of elders may like the idea of writing [í, ì] – or they may dislike it. Often, the decision is based on sentiments like “We won’t write [ĩ] like our enemies do” vs. “we will write [ĩ] like our brothers do”. The general rule is, avoid strange symbols, hence Taa and Shona orthographies avoid the obscure phonetic letters that would be necessary in an IPA transcription; also, not everything that is phonemic gets included in practical spelling.



I should also point out that some linguists think of phoneme letters like /a, ø/ as “the lowest level”, but others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round].






share|improve this answer






















  • I would like to know where I can find more info about "others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round]."
    – Lance Pollard
    1 hour ago










  • @LancePollard Look into "distinctive feature theory".
    – Draconis
    1 hour ago










  • Distinctive feature theory is usually covered in introductory textbooks in phonology. Unfortunately, the SPE standard is not used in most current textbooks, so in depends on which theory you want to learn. The idea is the same across books, the details differ. Introducing phonology is built around the SPE standard, and Introductory phonology presents a different set of features.
    – user6726
    1 hour ago












up vote
3
down vote



accepted







up vote
3
down vote



accepted






You should not be surprised if I tell you that the process is highly variable. Very roughly speaking, you start by eliciting a bunch of words and writing them down. Linguists have varying degrees of experience with phonetic symbols and what they stand for, and this introduces the first layer of variability. The ideal is that there is a set of standard reference sounds (expert IPA renditions) and actual language instantiations across languages, so that you can decide whether the vowel in question is [a] or [æ]. There is no such reference source, so we make do idiosyncratically.



In principle, you would write down what was actually said, not concerning yourself with whether [a] or [æ] are contrastive – making that decision comes after you have the data. In fact, people approach languages (after the first couple of hours) with at least a weak ideological bias. If you have encountered 100 tokens of [a] and 4 tokens of [æ], you will be biased in favor of rejecting [æ] as error or a conditioned variant.



It is common that linguists (outside of students in a field methods class, who may be deprived of the opportunity to “cheat”) seek guidance from prior knowledge, so, if you start working on a new language, e.g. Guerze, you might figure out what has been written on the language already, or what languages it is related to and therefore what phonemes this language is likely to have. This can backfire, of course, but still, doing this can give you hypotheses that can be tested.



Returning to the mythical pure methodology, you assemble a large enough corpus of (narrow) phonetic transcriptions, and then do a distributional analysis of the individual sounds. The analysis is almost or actually always done heuristically and not computationally, that is, you look at similar segments and phonetic properties that are attested in the corpus, and determine for a given pair ([a] vs. [æ], [t] vs. [tʰ]…) whether the contexts where the two sounds appear overlap, or are those contexts disjoint. This bit of analysis is the most common form of phonological analysis that the public encounters when they hear about “phonology”, and innumerable textbooks teach about allophones and complementary distribution, trying to impart the analytic skill required to discover that in this language, [a] and [æ] are in complementary distribution, and [t] vs. [tʰ] are not. I will simply say that it is both simple and difficult to understand the method, and the reason is that too much emphasis is put on looking for minimal pairs. A complete dissection of the method of analyzing for complementary distribution is beyond the scope of this answer.



Once you know that the sounds [a] and [æ] are in complementary distribution, one has license (or an obligation, depends on your ideology) to say that these are allophones of one phoneme: now you have to decide what that phoneme is. This too becomes a complex and ideology-riddled process. The basic question is, do you say that /a/ becomes [æ] somewhere, or do you say that /æ/ becomes [a] somewhere? The typical reasoning is remotely based on the rules involved to express the respective changes, so are the required rules turning /a/ into [æ] more complex than those turning /æ/ into [a]? In order to answer that question, you have to have a theory of what a “rule” is/can be. Often people appeal to “environment counting” i.e. does one sound appear in more environments than the other (the idea being that the most general environment is the context where there is no rule and the most restricted environment is where there is a specific rule). However, “an environment” is not an obviously quantifiable fact. People may appeal to a priori notions of naturalness, for instance [a] might be seen as a more “natural” vowel compared to [æ] so you might decide based on naturalness. At any rate, if [æ] only appears after palatalized /tʲ/, you would say that the phoneme /a/ becomes [æ] after /tʲ/, and no other rule changes /a/, so that explains why the vowel is usually pronounced [a]. One way around this is to use “elsewhere” in the rule (effectively, every rule of allophonic distribution becomes an n-tuple of rules with “elsewhere” always in the mix. You can say “/æ/ becomes [æ] after /tʲ/; /æ/ becomes [a] elsewhere”, and now you can claim that the phoneme is /æ/, even though [æ] is token-infrequent.



The question of cluster vs. unit-phoneme is a pretty advanced one which is not easily resolved by the field linguist. One person may decide that [mbwa] “dog” has three segments in the syllable onset, and another person may decide that it has one pre-nasalized rounded segment. 95% of the time, there isn’t compelling phonological evidence to chose between these accounts (and some phonological theories don’t even allow the possibility of making such a distinction).



The question of practical orthography is even more vexing. There are actually two kinds of orthographies: the orthography used by the linguist for talking to linguists, and something devised for speakers of the language. The linguist-orthography may perform simple mappings from IPA to more convenient letter, where ng may represent [ŋ] and ngg represents [ŋg] (and so on). The popular orthography is (these days) based on community ideas, where a council of elders may like the idea of writing [í, ì] – or they may dislike it. Often, the decision is based on sentiments like “We won’t write [ĩ] like our enemies do” vs. “we will write [ĩ] like our brothers do”. The general rule is, avoid strange symbols, hence Taa and Shona orthographies avoid the obscure phonetic letters that would be necessary in an IPA transcription; also, not everything that is phonemic gets included in practical spelling.



I should also point out that some linguists think of phoneme letters like /a, ø/ as “the lowest level”, but others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round].






share|improve this answer














You should not be surprised if I tell you that the process is highly variable. Very roughly speaking, you start by eliciting a bunch of words and writing them down. Linguists have varying degrees of experience with phonetic symbols and what they stand for, and this introduces the first layer of variability. The ideal is that there is a set of standard reference sounds (expert IPA renditions) and actual language instantiations across languages, so that you can decide whether the vowel in question is [a] or [æ]. There is no such reference source, so we make do idiosyncratically.



In principle, you would write down what was actually said, not concerning yourself with whether [a] or [æ] are contrastive – making that decision comes after you have the data. In fact, people approach languages (after the first couple of hours) with at least a weak ideological bias. If you have encountered 100 tokens of [a] and 4 tokens of [æ], you will be biased in favor of rejecting [æ] as error or a conditioned variant.



It is common that linguists (outside of students in a field methods class, who may be deprived of the opportunity to “cheat”) seek guidance from prior knowledge, so, if you start working on a new language, e.g. Guerze, you might figure out what has been written on the language already, or what languages it is related to and therefore what phonemes this language is likely to have. This can backfire, of course, but still, doing this can give you hypotheses that can be tested.



Returning to the mythical pure methodology, you assemble a large enough corpus of (narrow) phonetic transcriptions, and then do a distributional analysis of the individual sounds. The analysis is almost or actually always done heuristically and not computationally, that is, you look at similar segments and phonetic properties that are attested in the corpus, and determine for a given pair ([a] vs. [æ], [t] vs. [tʰ]…) whether the contexts where the two sounds appear overlap, or are those contexts disjoint. This bit of analysis is the most common form of phonological analysis that the public encounters when they hear about “phonology”, and innumerable textbooks teach about allophones and complementary distribution, trying to impart the analytic skill required to discover that in this language, [a] and [æ] are in complementary distribution, and [t] vs. [tʰ] are not. I will simply say that it is both simple and difficult to understand the method, and the reason is that too much emphasis is put on looking for minimal pairs. A complete dissection of the method of analyzing for complementary distribution is beyond the scope of this answer.



Once you know that the sounds [a] and [æ] are in complementary distribution, one has license (or an obligation, depends on your ideology) to say that these are allophones of one phoneme: now you have to decide what that phoneme is. This too becomes a complex and ideology-riddled process. The basic question is, do you say that /a/ becomes [æ] somewhere, or do you say that /æ/ becomes [a] somewhere? The typical reasoning is remotely based on the rules involved to express the respective changes, so are the required rules turning /a/ into [æ] more complex than those turning /æ/ into [a]? In order to answer that question, you have to have a theory of what a “rule” is/can be. Often people appeal to “environment counting” i.e. does one sound appear in more environments than the other (the idea being that the most general environment is the context where there is no rule and the most restricted environment is where there is a specific rule). However, “an environment” is not an obviously quantifiable fact. People may appeal to a priori notions of naturalness, for instance [a] might be seen as a more “natural” vowel compared to [æ] so you might decide based on naturalness. At any rate, if [æ] only appears after palatalized /tʲ/, you would say that the phoneme /a/ becomes [æ] after /tʲ/, and no other rule changes /a/, so that explains why the vowel is usually pronounced [a]. One way around this is to use “elsewhere” in the rule (effectively, every rule of allophonic distribution becomes an n-tuple of rules with “elsewhere” always in the mix. You can say “/æ/ becomes [æ] after /tʲ/; /æ/ becomes [a] elsewhere”, and now you can claim that the phoneme is /æ/, even though [æ] is token-infrequent.



The question of cluster vs. unit-phoneme is a pretty advanced one which is not easily resolved by the field linguist. One person may decide that [mbwa] “dog” has three segments in the syllable onset, and another person may decide that it has one pre-nasalized rounded segment. 95% of the time, there isn’t compelling phonological evidence to chose between these accounts (and some phonological theories don’t even allow the possibility of making such a distinction).



The question of practical orthography is even more vexing. There are actually two kinds of orthographies: the orthography used by the linguist for talking to linguists, and something devised for speakers of the language. The linguist-orthography may perform simple mappings from IPA to more convenient letter, where ng may represent [ŋ] and ngg represents [ŋg] (and so on). The popular orthography is (these days) based on community ideas, where a council of elders may like the idea of writing [í, ì] – or they may dislike it. Often, the decision is based on sentiments like “We won’t write [ĩ] like our enemies do” vs. “we will write [ĩ] like our brothers do”. The general rule is, avoid strange symbols, hence Taa and Shona orthographies avoid the obscure phonetic letters that would be necessary in an IPA transcription; also, not everything that is phonemic gets included in practical spelling.



I should also point out that some linguists think of phoneme letters like /a, ø/ as “the lowest level”, but others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round].







share|improve this answer














share|improve this answer



share|improve this answer








edited 1 hour ago

























answered 1 hour ago









user6726

30.5k11656




30.5k11656











  • I would like to know where I can find more info about "others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round]."
    – Lance Pollard
    1 hour ago










  • @LancePollard Look into "distinctive feature theory".
    – Draconis
    1 hour ago










  • Distinctive feature theory is usually covered in introductory textbooks in phonology. Unfortunately, the SPE standard is not used in most current textbooks, so in depends on which theory you want to learn. The idea is the same across books, the details differ. Introducing phonology is built around the SPE standard, and Introductory phonology presents a different set of features.
    – user6726
    1 hour ago
















  • I would like to know where I can find more info about "others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round]."
    – Lance Pollard
    1 hour ago










  • @LancePollard Look into "distinctive feature theory".
    – Draconis
    1 hour ago










  • Distinctive feature theory is usually covered in introductory textbooks in phonology. Unfortunately, the SPE standard is not used in most current textbooks, so in depends on which theory you want to learn. The idea is the same across books, the details differ. Introducing phonology is built around the SPE standard, and Introductory phonology presents a different set of features.
    – user6726
    1 hour ago















I would like to know where I can find more info about "others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round]."
– Lance Pollard
1 hour ago




I would like to know where I can find more info about "others further analyze sounds into defining features such as [+low,–rd] or [–back,–hi,–lo,+round]."
– Lance Pollard
1 hour ago












@LancePollard Look into "distinctive feature theory".
– Draconis
1 hour ago




@LancePollard Look into "distinctive feature theory".
– Draconis
1 hour ago












Distinctive feature theory is usually covered in introductory textbooks in phonology. Unfortunately, the SPE standard is not used in most current textbooks, so in depends on which theory you want to learn. The idea is the same across books, the details differ. Introducing phonology is built around the SPE standard, and Introductory phonology presents a different set of features.
– user6726
1 hour ago




Distinctive feature theory is usually covered in introductory textbooks in phonology. Unfortunately, the SPE standard is not used in most current textbooks, so in depends on which theory you want to learn. The idea is the same across books, the details differ. Introducing phonology is built around the SPE standard, and Introductory phonology presents a different set of features.
– user6726
1 hour ago

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2flinguistics.stackexchange.com%2fquestions%2f29323%2fhow-linguists-select-phonemes-to-construct-an-alphabet-for-a-language%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

Long meetings (6-7 hours a day): Being “babysat” by supervisor

Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

Confectionery