Categorical and gradient ungrammaticality in optional processes

Current theories of optionality often take a gradient view of grammaticality: unattested variants are not categorically excluded but rather highly improbable. Vowel harmony in Eastern Andalusian challenges this view. Unstressed vowels optionally harmonize in a coordinated fashion. For example, if one posttonic vowel harmonizes, they all must. Different implementations of noisy harmonic grammar are tested for their ability to account for this pattern. Only the implementation that categorically excludes forms with uncoordinated harmony succeeds; other implementations, which can only make such forms unlikely outputs, provide inferior models. This contrast indicates that there remains a need for a categorical approach to (un)grammaticality alongside a gradient approach. *

Keywords

optionality, noisy harmonic grammar, Eastern Andalusian, vowel harmony, positional licensing, harmonic bounding, gradience

1. Introduction

Since at least Labov 1969, optionality—what is sometimes called 'free variation'—has been recognized as an important component of linguistic competence. Optionality has been reported in a wide variety of languages and in many (perhaps all) spheres of linguistic competence (phonology, syntax, etc.). It poses a number of challenges for linguistic theory, and any comprehensive approach to optionality must answer questions such as the following: What device permits the coexistence of the variants in question? To what extent is the choice among variants free, and what conditions that choice? Are some variants more common than others? What is the source of such (a)symmetries? If the attested variants are a proper subset of the logically possible variants, why is that? This article addresses this last question, although it will inevitably touch on the others to varying degrees. In Eastern Andalusian's ATR harmony (Jiménez & Lloret 2007, Lloret 2018, Lloret & Jiménez 2009; henceforth collectively J&L), a final lax vowel triggers obligatory harmony on the stressed syllable. Unstressed vowels optionally harmonize but must obey a handful of restrictions. For example, nonfinal posttonic vowels optionally harmonize as a group. Either they all harmonize or none does: [kɔ́metelɔ] ∼ [kɔ́mɛtɛlɔ] 'eat them (for you)!'; *[kɔ́mɛtelɔ]. What is responsible for the restricted range of variation?

I consider two broad approaches to this issue. The first rules out *[kɔ́mɛtelɔ] categorically at the architectural level: *[kɔ́mɛtelɔ] is not a possible variant because the grammar is incapable of accessing it. This approach relies on the notion of harmonic bounding, a fundamental principle in constraint-based phonology (Prince & Smolensky 1993 [2004]) that describes a situation in which a candidate cannot become the optimal candidate under any circumstances. Because *[kɔ́mɛtelɔ] is harmonically bounded, it is ineligible to be a surface variant of the lexical item from which [kɔ́metelɔ] and [kɔ́mɛtɛlɔ] are derived.

Harmonic bounding is often characterized as an impediment to analyses of optional phenomena because it limits the range of possible surface forms (e.g. Kaplan 2011, Kimper 2011b, Riggle & Wilson 2005, Vaux 2008). For this reason, theories of optionality [End Page 703] that allow harmonically bounded candidates to become surface forms have gained currency in phonological research. In these theories, *[kɔ́mɛtelɔ] is in principle a possible surface form, but the grammar is sufficiently unlikely to generate it that it can be considered illicit. I argue that this approach to the matter is untenable, at least within a conceptually and typologically sound analysis.

These two approaches make different claims about the nature of the ungrammaticality of forms like *[kɔ́mɛtelɔ]. The first takes a discrete view of the distinction between grammatical and ungrammatical: *[kɔ́mɛtelɔ] is categorically impossible. The second takes a gradient and probabilistic view: the difference between *[kɔ́mɛtelɔ] and the attested forms is a matter of degree, not kind. Because harmonic bounding is not an impediment in these frameworks, they can only make *[kɔ́mɛtelɔ] improbable, not exclude it entirely. They are unable to accomplish this because the grammar reveals a relationship between the grammatical and ungrammatical forms that prevents disentangling them.

Research has uncovered the continuous nature of much of linguistic competence, from gradient acceptability judgments (see Schütze 2016 and Schütze & Sprouse 2013 for surveys) to exemplar-theoretic (e.g. Pierrehumbert 2001) and maximum entropy -based (MaxEnt; Goldwater & Johnson 2003) views of production. This line of inquiry has led to valuable insights and a greater appreciation for the nuanced nature of linguistic systems, but Eastern Andalusian's harmony shows that there is still room for discrete and categorical contrasts as captured by mechanisms such as harmonic bounding.

This argument is grounded in an examination of noisy harmonic grammar (NHG), a family of stochastic implementations of harmonic grammar (HG; e.g. Legendre et al. 1990) that has recently received attention (e.g. Boersma & Pater 2016, Hayes 2017, Jesney 2007). The various implementations of NHG differ in the way noise is introduced into the evaluation of candidates. Only one of these implementations successfully models Eastern Andalusian's harmony, and it is also the only one that respects harmonic bounding in the sense that harmonically bounded candidates cannot win. 1 All other versions open the floodgates: harmonically bounded candidates are possible surface forms, and these implementations of NHG cannot separate the attested forms from the unattested forms. They inevitably produce outputs such as *[kɔ́mɛtelɔ] as least as frequently as one of the attested variants. The success of one version of NHG and failure of the others can be traced directly to their treatment of harmonically bounded candidates. Only if harmonically bounded candidates are inaccessible is an NHG analysis of Eastern Andalusian possible. NHG is not unique in this respect: rule-based theories lack a notion comparable to harmonic bounding and similarly struggle with Eastern Andalusian's harmony.

The article is structured as follows. Section 2 presents the facts of Eastern Andalusian's harmony. Constraint-based analyses of that harmony are discussed in §3, and how harmonic bounding emerges from those analyses is examined in §4. The different varieties of NHG and the results of simulations that test their ability to model Eastern Andalusian's harmony are then presented in §5 and §6, respectively. Finally, §7 and §8 discuss the implications of those results and conclude the article.

2. ATR harmony

This section summarizes J&L's description of ATR harmony in Eastern Andalusian, a Romance language spoken in southern Spain. The particular pattern at issue comes from the Granada dialect. Harmony is triggered by the well-known process of /s/-aspiration, whereby a word-final /s/ deletes. 2 As a consequence, the now-word-final [End Page 704] vowel becomes lax, which, J&L argue, reflects the preservation of /s/'s [spread glottis] feature on the vowel. For simplicity, I assume that the relevant feature is [−ATR]. Nothing crucial hinges on this choice, and the optimality-theoretic (OT) analyses discussed in §3 adopt comparable positions. /s/-aspiration is, to my knowledge, the only source of [−ATR] in the language.

Final lax vowels trigger ATR harmony on nonhigh vowels in the stressed syllable (1). Where available, similar words lacking /s/-aspiration are provided for comparison. (/a/ fronts in the /s/-aspiration context but not as a target of harmony (1g–h); I do not analyze fronting here.)

(1)

inline graphic
 

High vowels do not harmonize (2), though they do undergo word-final laxing, as 1a and 2a show. They are transparent to harmony, as we see below in 4d–f.

(2)

inline graphic
 

Unstressed vowels optionally harmonize. Nonfinal posttonic vowels are shown in 3. As we have already seen, if there is more than one such vowel, they harmonize in lockstep, to use Hayes's (2017) term: either they all harmonize, or none does (3b).

(3)

inline graphic
 

Similarly, pretonic vowels optionally harmonize in lockstep (4). Furthermore, as 4g demonstrates, posttonic harmony is a prerequisite for pretonic harmony: in a word with both a pretonic vowel and a nonfinal posttonic vowel, the former cannot harmonize without the latter. We can also see here high vowels' transparency. In 4d–f, the nonfinal high vowels cannot harmonize, but the other vowels harmonize as normal.

(4)

inline graphic
  [End Page 705]

To summarize, final lax vowels trigger harmony obligatorily on the stressed syllable and optionally on unstressed syllables. This harmony obeys the following restrictions: (i) posttonic vowels harmonize in lockstep, as do pretonic vowels, (ii) pretonic harmony requires posttonic harmony, and (iii) high vowels may not harmonize.

The fact that /s/-aspiration and harmony affect /a/ suggests either that /a/ is not [−ATR] (contrary to standard assumptions) or that [ATR] is not the active feature in harmony. As the questions raised by the latter conclusion are tangential to present concerns, I provisionally adopt the former position: /a/ is a [+ATR] vowel in this language.

We now turn to formal accounts of this harmony and the harmonic bounding that emerges from them.

3. OT and HG analyses

Eastern Andalusian's harmony bears hallmarks of what is often called positional licensing (e.g. Walker 2011), a collection of phenomena in which an element's distribution is subject to positional restrictions. Well-known positional-licensing phenomena include the exclusion of most non-schwa vowels from nonfinal unstressed syllables in English and coda place restrictions in Japanese, whereby codas must share place features with a following onset (Itô 1988). In general, positional licensing requires an element to appear in a prominent licensing position, such as an onset, a stressed syllable, a root/stem, or an initial syllable. (See Beckman 1999, Walker 2011, and Kaplan 2015 for relevant discussion of positional prominence.) In Eastern Andalusian, the restricted element is [−ATR], and obligatory harmony on the stressed syllable indicates that this position is the licensor. In this respect, Eastern Andalusian's harmony resembles the metaphony systems of other Romance languages, wherein [+high] spreads to the stressed syllable. In fact, the patterns found in Eastern Andalusian represent three of Walker's four invariant licensing-based patterns in which the harmonizing feature emanates from a weak position. Examples from Romance languages (drawn from Walker 2011) include the following: harmony in only the licensor resembles Lena's metaphony; harmony also in intervening positions resembles Central Veneto's metaphony; and harmony in all positions resembles Servigliano's harmony. 3 (In Walker's fourth pattern, a feature vacates the weak position and relocates to the strong position.)

Walker develops a positional licensing formalism for OT, and she and J&L apply that formalism to Eastern Andalusian's harmony. License([−ATR], inline graphic) assigns one violation for any [−ATR] that does not coincide with the stressed syllable. Spreading to the stressed syllable is one way to avoid this violation, as illustrated in 5. The details of this and subsequent OT tableaux follow Walker; J&L use slightly different constraints that are functionally equivalent to Walker's constraints, at least for present purposes. When License outranks Ident(ATR), the stressed vowel harmonizes. *Duplicate penalizes discontiguous harmony and thereby encourages vowels between the trigger and target to harmonize, a process that is optional in Eastern Andalusian but obligatory in Central Veneto, for example. The analysis employs Anttila's (1997) approach to optionality: the ranking between *Duplicate and Ident is not fixed, and nonfinal posttonic vowels therefore may or may not harmonize, depending on the resolution of this indeterminate ranking.

(5)

inline graphic
  [End Page 706]

Pretonic harmony stems from a maximal licensing (MaxLic) constraint. Rather than mandating harmony in one particular position, it requires harmony everywhere by penalizing each disharmonic vowel. As with *Duplicate, MaxLic's ranking with respect to Ident is not fixed, allowing for optional pretonic harmony (6).

(6)

inline graphic
 

To complete the picture, *[+hi, −ATR] outranks all of the constraints mentioned so far and prevents high vowels from harmonizing. In turn, it is outranked by Max(−ATR), which ensures that /s/-aspiration always leaves behind a [−ATR] feature, even if that means creating a lax high vowel. That is, /s/-aspiration without word-final laxing violates Max(−ATR).

My understanding is that lax vowels appear in Eastern Andalusian only as a consequence of /s/-aspiration and the accompanying harmony. This means *[−ATR], not Ident(ATR), is the active constraint militating against harmony. The two constraints are distinguished by rich-base inputs that contain lax vowels outside of the /s/-aspiration/harmony context: *[−ATR] properly eliminates them, while Ident (ATR) incorrectly predicts a surface ATR contrast. We do not deal with rich-base inputs here, so the choice between *[−ATR] and Ident(ATR) has little consequence: these constraints assign identical violations for the inputs and candidates we consider, including the ones in 5 and 6. Nonetheless, in the interest of analytical rigor, I use *[−ATR] henceforth. To be explicit, this constraint assigns −1 for each vowel bearing [−ATR] (as opposed to −1 for each [−ATR] feature itself, which may be associated with more than one vowel).

HG replaces OT's constraint ranking with a system of constraint weighting. Candidates are evaluated according to the weighted sum of their violation marks. A candidate's harmony score is derived by multiplying its violations of each constraint by that constraint's weight and summing those products across all constraints. The candidate with the greatest harmony score is the winner. Typically, violations are represented with negative numbers and constraint weights are positive, so violating a higher-weighted constraint drags harmony scores down more precipitously than violating a lower-weighted constraint. To anticipate a constraint introduced later in this section, constraints may assign rewards for compliance (i.e. a positive 'penalty') instead of penalties for noncompliance, in which case they push harmony scores in the positive direction. In principle, constraint weights can be negative, but there are good reasons to avoid that (Boersma & Pater 2016). For example, with a negative weight, Dep encourages rather than blocks epenthesis, and Onset bans onsets.

Many pieces of the OT analysis summarized above survive the translation to HG, but in Kaplan 2018b I argue that the core interaction between License and *[−ATR] is pathological in HG. If [−ATR] originates at some distance from the stressed syllable, satisfying License may require many violations of *[−ATR], and the accumulated *[−ATR] violations create a gang effect (Pater 2009): the many violations of *[−ATR] become more costly than a lone violation of License, allowing satisfaction of License only at arbitrarily short distances. The problem is illustrated in 7: License can overcome *[−ATR] to trigger harmony in 7a, but the additional intervening vowel in 7b adds another *[−ATR] violation, blocking harmony. Increasing the weight of License solves the immediate problem, but the larger issue remains: we predict languages that have harmony across n vowels but not n + 1, for any value of n. This is inconsistent with crosslinguistic patterns. [End Page 707]

(7)

inline graphic
 

The problem is that as the distance between the trigger and the target increases, there is no limit to the number of *[−ATR] violations that may be necessary to escape one violation of License. Such trade-offs behave very differently in HG than they do in OT (Pater 2009 ). OT's strict domination ensures that satisfying License is always better than avoiding *[−ATR] violations when the former outranks the latter. But in HG, all constraints contribute to a candidate's harmony, and lower-weighted constraints can gang up on a higher-weighted constraint, either when there are sufficiently many violations of multiple lower-weighted constraints or (as in 7b) when one lower-weighted constraint is violated sufficiently many times.

Resolving the pathology requires a formulation of positional licensing whose incentive for harmony escalates with distance to establish a counterweight to *[−ATR]'s increasing violations. The positional licensing formalism proposed in Kaplan 2018b is given (in somewhat simplified form) in 8. 4

(8) License([−ATR], inline graphic): assign +1 for each [−ATR] that coincides with inline graphic and +1 for each additional syllable that this [−ATR] appears in.

This constraint differs from standard positional licensing in two ways. First, it is positive: it rewards licensed features instead of penalizing unlicensed ones. We return to this immediately below. Second, it is gradient: because it assigns a reward for each position that hosts [−ATR], spreading across a greater distance earns a greater reward.

Positivity addresses some long-standing issues that arise from unexpected interactions involving harmony-driving constraints. This comes at the cost of 'infinite goodness' (Kimper 2011a). Because 8's reward increases as more vowels harmonize, it is advantageous to epenthesize infinitely many vowels that can serve as harmony targets. Kimper's (2011a) solution is serialism: if epenthesis and harmony must be executed on separate derivational steps, epenthesis becomes impossible because it has no motivation absent harmony. In principle, then, 8 demands a serial implementation of HG, but I am unaware of any attempt to unify NHG and serial HG. This task requires answering nontrivial questions that cannot be adequately addressed here (e.g. are the random noise values chosen separately at each step, or just once at the outset?), so I adopt an alternative solution to the infinite-goodness problem. I assume that Dep universally outweighs License, making epenthesis always more costly than the benefit gained by harmonizing an epenthetic vowel. There is ample precedent in the literature for fixed rankings in [End Page 708] OT, and fixed weightings are equally viable in HG; see McCarthy & Prince 1995 for an early OT-based example and Pater 1999 for an example from OT in which the possibility of Dep universally outranking another constraint is raised.

Furthermore, 8 assumes an autosegmental (or similar) view (Goldsmith 1976 ) such that a [−ATR] feature can be shared among many vowels—that is the only way [−ATR] can be rewarded for appearing both in the stressed syllable and in other syllables. For example, [kɔ́metelɔ] earns +2 because the stressed vowel shares its [−ATR] with another vowel. 5 Of course, there is another phonetically identical candidate in which the two lax vowels host distinct [−ATR] features. That candidate earns just +1 for the [−ATR] feature in the stressed syllable; the other [−ATR] feature does not coincide with the stressed syllable and earns no reward. (This kind of candidate is obviously inferior to its multiply associated counterpart and is not considered henceforth.) As this example illustrates, coincidence with the stressed syllable is a prerequisite for earning any rewards at all.

The pathology in 7 now disappears. As harmony spans greater and greater distances, each new violation of *[−ATR] is countered by a new reward from License. As long as License outweighs *[−ATR], [−ATR] will spread through intervening positions to reach the stressed syllable (9).

(9)

inline graphic
 

This version of positional licensing rewards any unstressed syllable that hosts a licensed [−ATR] (i.e. [−ATR] that coincides with the stressed syllable), so it motivates harmony that extends beyond the licensor. See Kaplan 2019 for an argument that this arrangement has benefits; one obvious advantage in the current context is that it obviates MaxLic and consolidates the motivation for all harmony in one constraint. Consequently, we now need a constraint that reins in this power so that the forms without pretonic harmony can be produced. For this purpose I use CrispEdge([−ATR], inline graphic, L), defined in 10. For detailed discussion of CrispEdge see Itô & Mester 1999, Kaplan 2018a, Kawahara 2008, Walker 2001; for our purposes it suffices to say that 10 penalizes each pretonic vowel that harmonizes.

(10) CrispEdge([−ATR], inline graphic, L): The stressed syllable's [−ATR] cannot extend beyond the left edge of that syllable.

To summarize, our HG analysis is grounded in the interaction between the harmonyrewarding License and a handful of other constraints. *[−ATR] and CrispEdge disfavor [End Page 709] harmony (CrispEdge is concerned only with pretonic vowels), and *[+hi, −ATR] blocks harmony on high vowels. Finally, Max(−ATR) enforces final-vowel laxing.

4. Harmonic bounding

4.1. Collective harmonic bounding and local optionality

In constraint-based phonology, harmonic bounding describes a situation in which an input cannot be mapped onto a particular surface form (i.e. an output candidate) under any arrangement of the constraints. An output candidate is harmonically bounded when there is always a better candidate for the input in question, no matter how constraints are ranked or weighted. In the simplest case, candidate A harmonically bounds candidate B if candidate B has a proper superset of candidate A's violations. In this circumstance, there is no constraint that favors candidate B over candidate A, so candidate B cannot win under any ranking or weighting. At issue in Eastern Andalusian is a somewhat more complex situation, collective harmonic bounding (Samek-Lodovici & Prince 1999). Here, candidate A does not harmonically bound candidate B on its own, but instead teams up with a third candidate, candidate C, so that under any ranking/weighting one of them outperforms candidate B. This arrangement arises in situations like the one illustrated in 11, taken from Hayes 2017 . In this hypothetical example, the input contains four /p/s, and the first candidate retains them all. Each successive candidate voices an additional /p/ at the behest of *VpV (which penalizes intervocalic [p]) and in violation of Ident(voice). The lockstep candidates, with either no voicing or exhaustive voicing, collectively harmonically bound the intermediate candidates. No candidate has a proper superset of another's violations, but no ranking/weighting of the two constraints yields the intermediate candidates. When Ident(voice) dominates *VpV, the all-[p] candidate wins. The opposite relationship favors the all-[b] candidate. Harmonic bounding, then, makes some outputs impossible and thereby imposes a fundamental limit on the range of input/output mappings OT and HG can produce.

(11)

inline graphic
 

Suppose, though, that these harmonically bounded candidates are attested. This situation arises in cases of local optionality (Kaplan 2011, 2016, Kimper 2011b, Riggle & Wilson 2005), wherein different loci for an optional process need not behave identically: one /p/ might voice while another does not. Real examples of local optionality include English flapping and French schwa deletion; see the works just cited for various approaches to these phenomena. Harmonic bounding is at odds with local optionality because surface forms can show something between all-or-nothing application of the optional process.

The works cited in the previous paragraph explore a handful of proposals that allow selection of the intermediate candidates that are otherwise out of reach. The enterprise seems straightforward: certain attested forms are harmonically bounded, so we need a formalism that is not constrained by harmonic bounding. Most versions of NHG fall into this category. A consequence of this line of reasoning is that when optionality is more constrained, as in Eastern Andalusian's harmony, we cannot rely on harmonic [End Page 710] bounding to prevent any candidate from being selected as an output. Rather, unattested candidates must be ruled out by ranking/weighting the constraints in a fashion that precludes (or renders sufficiently improbable) selection of those candidates. For example, candidate (a) in 5 and 6 is excluded by making the ranking between License and Ident(ATR) invariant. We will see that this strategy is not viable for Eastern Andalusian under NHG. An adequate analysis is possible only when NHG preserves harmonic bounding's consequences; otherwise, local optionality unavoidably results.

Hayes (2017) identifies a hallmark of (at least the relevant kind of) collective harmonic bounding (and therefore local optionality): the 'double pyramids' of violations visible in 11. As the candidates shed violations of one constraint, they gain violations of the other. This configuration appears in both tableaux from 9; 9b is repeated as 12 below. Because License is positive, the pyramids are not visual inversions of each other, but the formal properties of the double pyramids persist: each successive candidate performs better on one constraint and worse on the other, creating collective harmonic bounding. As we will see in §4.2, things are bit more complex here: most obviously, because candidate (a) has no reward from License, License's pyramid is incomplete, and therefore candidate (b) is not harmonically bounded, though (c) and (d) are.

(12)

inline graphic
 

However, Eastern Andalusian does not show local optionality. The harmonically bounded forms with partial application of the optional process—in 12, harmony on nonfinal unstressed vowels—are unattested. We return to the contrast between Eastern Andalusian and local optionality in §8. The following section fleshes out the interaction between harmonic bounding and (un)attested candidates in Eastern Andalusian more fully.

4.2. Harmonic bounding in eastern andalusian

Table 1 summarizes the harmonic bounding that arises from our HG analysis of Eastern Andalusian. The third column marks the attested surface forms, and the final column indicates which candidates are harmonically bounded. These claims of harmonic bounding are justified in this section. The five words in Table 1 together instantiate the full range of harmonic possibilities, with one exception to be discussed below. For each, the first candidate contains no lax vowels, the second has only a lax final vowel, and the third shows harmony on just the stressed syllable. The remaining candidates show other harmonic possibilities depending on the input configuration. For /monedéros/ and /kómetelos/, these involve the logically possible combinations of harmony on pretonic and posttonic syllables, respectively, and for /rekóhelos/, the candidates show the interaction of pretonic and posttonic harmony. The behavior of /i/ is captured by /kɾísis/ and /kotiʒónes/, showing, respectively, the inability of stressed /i/ to harmonize and unstressed /i/'s transparency. Not shown explicitly in Table 1 is the comparable transparency of stressed /i/ (4d). This configuration is derivationally opaque: typically in licensing-driven harmony, failure of the licensor to harmonize means nonlicensing positions also do not harmonize (Walker 2011). Eastern Andalusian obviously disobeys this generalization, with harmony overapplying [End Page 711] in 4d. To avoid opacity's inevitable distractions, examples like these are excluded. It is worth acknowledging that the OT analyses cited above cope with this opacity via MaxLic, which triggers harmony when the standard positional licensing constraint cannot. This is ostensibly a mark in those analyses' favor, but they succeed at the cost of redundantly calling on two licensing constraints instead of addressing the opacity directly. Furthermore, the OT and HG analyses make different predictions about opaque forms with both a pretonic and a posttonic vowel, as in consíguelos (4f). In OT, because MaxLic is solely responsible for harmony here, the pretonic and posttonic vowels are incorrectly predicted to harmonize (or not) in lockstep. But with CrispEdge, the HG analysis correctly allows the posttonic vowel to harmonize without the pretonic vowel, just as in /rekóhelos/, assuming that opacity itself can be overcome (which is not a trivial task, of course).

Table 1. Harmonically bounded candidates.
Click for larger view
View full resolution
Table 1.

Harmonically bounded candidates.

The following tableaux support the harmonic bounding relationships indicated in Table 1. Harmonically bounded candidates are signaled by ×, and inline graphic marks attested forms. For reasons of space, *[+hi, −ATR] is omitted from the first three tableaux, where it is inactive because there are no high vowels.

(13)

inline graphic
  [End Page 712]

Various double pyramids are evident in 13, but the resulting collective harmonic bounding is more nuanced than it was in the simple case of 11. The tableau in 13a presents a more complete picture than we saw in 12, with an additional candidate and a fuller constraint set. With no lax vowels, *[kómetelo] is the no-harmony lockstep form. As described above, lockstep candidates are typically harmonic bounders in double-pyramid configurations, but not so here. Candidates (c) and (f) (only the latter of which is a lockstep candidate, with full harmony) harmonically bound candidates (d) and (e). Candidates (a) and (b) are neither harmonically bounded nor harmonic bounders. As the only candidate violating Max(−ATR), candidate (a) cannot harmonically bound anything. And because candidate (b) is not rewarded by License, it cannot harmonically bound candidates with greater harmony. Consequently, candidate (c) is the harmonic bounder at the no-harmony end of the double-pyramid configuration. By way of illustration, 14 shows that this candidate can win under the right weights and is therefore not harmonically bounded. [End Page 713]

(14)

inline graphic
 

Compared to candidate (b), candidate (c) earns a new +2 from License at the expense of a new −1 from *[−ATR]. This two-for-one trade-off means that when *[−ATR] has less than twice the weight of License, candidate (c) is superior to candidate (b). Candidates (d)–(f) add additional rewards from License at the cost of an equal number of violations of *[−ATR], so when *[−ATR] outweighs License, those candidates are inferior to candidate (c). Consequently, candidate (c) wins when 15 holds.

(15) 2w(License) > w(*[−ATR]) > w(License)

Returning to 13a, no weights favor candidates (d) and (e). When License outweighs *[−ATR], exhaustive harmony is favored. When License has less than half of *[−ATR]'s weight, candidate (a) or (b) is favored, depending on the weights of *[−ATR] and Max(−ATR). Collective harmonic bounding, then, emerges from the final four candidates in this tableau.

The situation in 13b is nearly identical. With the presence of pretonic vowels, CrispEdge adds its own pyramid of violations, so the double pyramid now pits License against the combination of CrispEdge and *[−ATR]. The resulting harmonic bounding is essentially the same, though, with candidates (c) and (f) harmonically bounding candidates (d) and (e).

In 13c, there are three licit variants and one harmonically bounded candidate. This time we have simple harmonic bounding: candidate (e) has a superset of candidate (d)'s violations. Nonetheless, the familiar double pyramids involving License and *[−ATR] are evident. CrispEdge penalizes the final two candidates, and for this reason candidate (d) is not harmonically bounded: despite the double pyramids, the lockstep candidate (f) violates a constraint that candidate (d) does not.

Finally, 13d and 13e exhibit no harmonic bounding and are included to test NHG's treatment of high vowels.

The conditions under which each licit variant wins are summarized in 16. The logic behind them is the following: License triggers harmony while *[−ATR] discourages it; CrispEdge disfavors pretonic harmony, and *[+hi, −ATR] blocks harmony on high vowels. To prevent harmony on any vowel, the sum of the weights of the relevant antiharmony constraints must exceed the weight of License, except when the vowel in question is stressed, in which case the antiharmony constraints must have double License's weight. Finally, to ensure that even high vowels undergo word-final laxing, Max(−ATR) must outweigh the combination of *[+hi, −ATR] and *[−ATR].

(16) Core weighting requirements

a. Harmony on inline graphic only: 2w (License) > w(*[−ATR]) > w(License )

b. Posttonic harmony without pretonic harmony: w(*[−ATR]) + w(CrispEdge) > w(License) > w(*[−ATR])

c. Full harmony: w(License) > w (*[−ATR]) + w(CrispEdge)

d. High vowels: w(Max(−ATR)) > w (*[+hi, −ATR]) + w(*[−ATR]) > 2w(License) [End Page 714]

To conclude this section, we have seen which candidates can and cannot be most harmonic in a nonnoisy HG tableau given the constraints used here. NHG adds a stochastic element to this system, creating variation in a tableau's outcome. The next section explains how it accomplishes this.

5. Varieties of noisy harmonic grammar

Constraint-based phonology has fostered many formalisms designed for optionality. In OT optionality is most commonly attributed to the availability of multiple constraint rankings, each of which selects a different output candidate (Anttila 1997, 2006, 2007, Boersma 1998, Boersma & Hayes 2001, Kaplan 2016, Kimper 2011b, Nagy & Reynolds 1997, Reynolds 1994, Riggle & Wilson 2005). (See Coetzee 2006 and Kaplan 2011 for very different OT-based frameworks.) Research has identified a rich set of formal constructs that generate these multiple rankings. For example, stochastic OT ( Boersma & Hayes 2001) ranks constraints on a continuous numerical scale, and when noise is added to each constraint's ranking, domination relationships may change.

Stochastic OT's arrangement finds a natural home in HG. Perturbing weights with noise provides an elegant, effective account of phonological variation, as first demonstrated by Jesney (2007). Hayes (2017) observes that NHG can be implemented in several ways, only some of which manipulate weights directly. NHG is therefore not a single formalism, but a family of formalisms with potentially distinct empirical properties.

The remainder of this section is largely a summary of Hayes's survey of NHG implementations. These possibilities are summarized in 17.

(17)

a. Noise at the constraint level

(i) Noise added before multiplication of penalties by weights: penalty * (weight + noise)

(ii) Noise added after multiplication of penalties by weights, no noise allowed if penalty = 0: (penalty * weight) + noise

(iii) Noise added after multiplication of penalties by weights, noise allowed if penalty = 0: (penalty * weight) + noise

b. Noise at the cell level

(i) Noise added before multiplication of penalties by weights: penalty * (weight + noise)

(ii) Noise added after multiplication of penalties by weights, no noise allowed if penalty = 0: (penalty * weight) + noise

(iii) Noise added after multiplication of penalties by weights, noise allowed if penalty = 0: (penalty * weight) + noise

c. Noise at the candidate level

d. MaxEnt

Hayes identifies three levels at which noise can be introduced: the constraint level (17a), the cell level (17b), and the candidate level (17c). Constraint-level noise assigns each constraint a particular noise value that remains constant for all candidates. Visualized in a tableau, this means that every cell in a constraint's column is affected by the same noise. In contrast, cell-level noise adds noise to each cell of a tableau independently of every other cell. Finally, when noise is added at the candidate level, each candidate is assigned noise independently of every other candidate, and that noise perturbs the candidates' harmony scores directly. In procedural terms, we compute harmony scores as in nonnoisy HG and then perturb these scores.

Constraint-level and cell-level noise submit to further refinements. In these systems, noise affects the multiplication of penalties/rewards by weights directly. We can add this noise before multiplication— penalty * (weight + noise)—or after—(penalty * [End Page 715] weight) + noise. In the former case, if no penalty/reward is assigned, the product of the multiplication is zero and it does not contribute to the candidate's overall score, just as in nonnoisy HG. But in the latter case, because noise is added after multiplication, we end up with a nonzero contribution to the overall score even if the constraint assigns no penalties or rewards (unless the noise itself is zero). For this reason, Hayes considers two versions of postmultiplicative noise, one in which noise is added in all cases, and one in which empty cells are exempt from noise (so that zeros remain zeros).

A digression on postmultiplicative constraint-level noise is warranted because this turns out to be an ill-formed version of NHG that produces variation only under special circumstances. When noise is added even to empty cells, the effect is the same as computing harmony scores as in nonnoisy HG and then generating a random number that is added to all scores. That is, scores are all adjusted up or down in unison with no effect on the outcome; optionality does not result. When noise is excluded for zero violations/rewards, variation is possible only when one candidate violates a constraint and another does not. In the simulations reported in §6, noise was prevented from creating negative weights, and that mechanism can provide postmultiplicative constraint-level noise with another source of variation. Exploratory work with these versions of NHG showed that they are incapable of producing the full set of variants for Eastern Andalusian, generating a sufficiently impoverished output set that it is not clear to me that they qualify as theories of optionality at all; certainly negative-weight avoidance is a conceptually obtuse place for a theory of optionality to hang its hat. I do not consider them further.

To these formalisms we can add MaxEnt (17d), which is grounded in HG and derives variation by assigning each candidate an output probability proportional to its harmony score. See Hayes 2017, for example, for the formal details.

We therefore have six implementations of NHG, some of which are known to be closely related. Hayes (2017:9, n. 10) argues that variety 17c (candidate-level noise) is largely equivalent to variety 17b(iii) (postmultiplicative cell-level noise where all cells receive noise). Likewise, Flemming (2017) shows that MaxEnt is virtually identical to versions of NHG in which the noise that affects one candidate is independent of the noise that affects other candidates, as is the case in cell- and candidate-level noise. The results in §6 are largely consistent with Hayes's and Flemming's arguments; where these similar frameworks diverge from each other, it is unclear whether that reflects a substantive difference or merely an artefact of stochastic systems.

The first implementation, 17a(i), is unique among the frameworks in 17. Hayes calls this classical NHG, and we will see that it provides the best model of Eastern Andalusian's harmony. This version of NHG is most faithful to the OT-based theories of optionality described above. Candidates' weights are manipulated at the outset of the evaluation, just as stochastic OT, for example, manipulates the ranking at the outset. Because these OT-based theories merely manipulate the constraint ranking, harmonically bounded candidates remain off the table (because by definition, no ranking favors those candidates). Likewise, classical NHG merely manipulates weights and consequently also cannot produce harmonically bounded outputs (Hayes 2017).

To be more precise, with only positive weights, a harmonically bounded candidate's best hope is to tie with its bounder(s). For 11, repeated in 18, this occurs when the two constraints have equal weights, in which case all five candidates have identical scores. One way to resolve this tie is to randomly select a winner from the tied candidates; this is the method used in simulations presented below. So the only chance the nonlockstep candidates have under classical NHG is that the constraint weights are equal after the addition of noise—a vanishingly improbable outcome. Indeed, in none of my simulations with classical NHG did a harmonically bounded candidate win. [End Page 716]

(18)

inline graphic
 

The exclusion of harmonically bounded outputs is the key to classical NHG's success regarding Eastern Andalusian, and this property is shared by no other framework in 17. All versions of NHG formalize a continuous and gradient view of optionality. Constraint weights and the perturbations introduced by NHG use the continuous number line, and NHG supplies gradient output probabilities for candidates. Harmonic bounding under classical NHG aside, all candidates are possible outputs, the distinction between licit and illicit ones being that the latter ideally have probabilities indistinguishable from zero. Classical NHG combines this system with the discrete distinctions that harmonic bounding provides. Thus classical NHG has two methods for excluding unattested outputs, both of which are illustrated by 13a: candidates (d) and (e) are harmonically bounded, but candidates (a) and (b) are not, so the grammar must minimize their probability of winning. Only the second strategy is available to nonclassical NHG, and as the next section demonstrates, that is insufficient.

6. Simulations

This section reports the results of simulations using the HG analysis and the varieties of NHG discussed above. These simulations were executed with OTSoft version 2.6 (Hayes et al. 2013) using the settings in 19. 6 Because no actual frequency data for the possible harmonic variants is available, all licit outputs were assigned equal target frequencies. This turns out to influence the specific frequencies returned by the simulations but not their ability to sort the attested outputs from the unattested ones; see §7 for discussion.

(19)

a. Number of times to go through forms: 7 200,000

b. Initial plasticity: 1

c. Final plasticity: 0.001

d. Number of times to test the grammar: 100,000

OTSoft draws noise from a Gaussian distribution with a standard deviation of 1. For simplicity, negative constraint weights were disallowed in the simulations reported here. 8 The input file consisted of the underlying forms, candidates, and constraints from Table 1 and the tableaux in 13.

The graphs below show the output frequency of each candidate in a particular simulation. Attested outputs (always at the top of each graph) are indicated with black bars [End Page 717] and unattested ones with gray bars. The simulations were run multiple times to ensure that the results reported here are not anomalous. Each graph shows a representative iteration of each simulation, and where warranted, properties of other iterations are discussed. See the appendix for the full list of output frequencies for each simulation and for the constraint weights returned by the simulations.

A successful simulation is one that produces attested forms with greater frequency than unattested forms, ideally with zero or near-zero frequencies for the latter. Without actual frequency data, more precise assessment of frequencies is impossible. Fortunately, evaluation of the simulations is straightforward: only classical NHG provides remotely satisfactory results. All other simulations overgenerate, producing certain unattested forms at least as frequently as some attested forms.

6.1. Classical NHG

In the classical NHG (constraint-level premultiplicative noise; 17a(i)) simulation shown in Figure 1, all of the attested candidates have nonzero frequencies, and the unattested ones have frequencies of exactly zero. This version of NHG often errs by producing *[kɾísi], but no more than a handful of times per iteration—often less than ten and at most a few dozen (out of 100,000 trials). However, iterations like the one shown here, with no errors, are also common. No other simulation matches this. Increasing the number of learning trials seems to reduce the error rate, but even at one million cycles *[kɾísi] occasionally emerges. Classical NHG effectively eliminates all unattested candidates whether or not they are harmonically bounded.

Figure 1. Results of simulations under variety 17a(i).
Click for larger view
View full resolution
Figure 1.

Results of simulations under variety 17a(i).

Two results in Fig. 1 recur in all simulations presented here. First, the lone attested output for /kɾísis/ is produced (nearly) exclusively. And for the other inputs, the candidates with no lax vowels are never produced. Because the outcome for /kɾísis/ does not change substantially across simulations, it is not discussed further in this section, and it is omitted from subsequent graphs. [End Page 718]

6.2. Other simulations

The results of the remaining simulations are shown in Figures 2 6. In each one, the five unattested harmonically bounded forms identified in Table 1 are produced at least as frequently as some of the attested forms; these simulations fail to adequately distinguish attested and unattested forms, and §7 explores why this is. MaxEnt produced additional incorrect forms.

Figure 2. Results of simulations under variety 17b(i).
Click for larger view
View full resolution
Figure 2.

Results of simulations under variety 17b(i).

Figure 3. Results of simulations under variety 17b(ii).
Click for larger view
View full resolution
Figure 3.

Results of simulations under variety 17b(ii).

Figure 4. Results of simulations under variety 17b(iii).
Click for larger view
View full resolution
Figure 4.

Results of simulations under variety 17b(iii).

[End Page 719]

Figure 5. Results of simulations under variety 17c.
Click for larger view
View full resolution
Figure 5.

Results of simulations under variety 17c.

Figure 6. Results of simulations under MaxEnt (17d).
Click for larger view
View full resolution
Figure 6.

Results of simulations under MaxEnt (17d).

7. Discussion

This section examines the six implementations of NHG in more detail in order to understand why the reported output patterns obtain. The collective harmonic bounding attributable to the double-pyramid configuration is directly responsible for those patterns: it renders classical NHG incapable of generating the unattested candidates, while nonclassical NHG must produce them if it produces the attested ones. These candidates are inextricably linked, and efforts to manipulate the attested forms' output probabilities independently of those of the unattested candidates is futile under nonclassical NHG.

7.1. Bounded outputs

The distinction between classical and nonclassical NHG regarding Eastern Andalusian rests on the double-pyramid configurations we saw in 13. In 20 is one of those tableaux showing only the candidates and constraints from which the double pyramids emerge.

(20)

inline graphic
  [End Page 720]

The three possible outcomes of this tableau are summarized in 21. When one constraint outweighs the other, the harmonic bounder that best satisfies the dominating constraint wins, and the other harmonic bounder has the worst harmony score because it performs most poorly on that constraint. The harmonically bounded candidates have harmony scores between the harmonic bounders. If the two constraints have identical weights, all four candidates have scores of zero, and a winner is randomly selected.

(21) Outcomes of tableau 20

a. wL > wA → (d) wins; (b) and (c) have better scores than (a)

b. wL < wA → (a) wins; (b) and (c) have better scores than (d)

c. wL = wA → four-way tie

If the unperturbed weights of License and *[−ATR] are sufficiently similar, classical NHG's perturbations will generate both 21a and 21b. The weights that the classical NHG simulation yielded are given in 22; License and *[−ATR] are indeed almost identically weighted. Only 21a and 21b are available because tied weights are effectively impossible in classical NHG, so this theory necessarily produces just the attested forms.

(22) 46.000 Max(−ATR)
27.000 *[+hi, −ATR]
11.655 License
11.345 *[−ATR]
 0.251 CrispEdge

Things are very different for nonclassical NHG. In all nonclassical frameworks, output probabilities are correlated with (unperturbed) harmony scores. This is obviously true for MaxEnt, and under candidate-level noise and postmultiplicative cell-level noise, noise is added after computation of harmony scores, so candidates with better unperturbed scores are more likely to come out ahead after noise's addition. (The choice to add noise or not for empty cells is irrelevant to 20, though it is a meaningful choice in the full tableaux, where there are empty cells.) In each of these frameworks, because candidates (b) and (c) have scores between (a) and (d) in 21a and 21b, they must be more probable than either (a) or (d).

The same goes for premultiplicative cell-level noise, though it may be less obvious. A revised version of 20 showing premultiplicative cell-level noise is given in 23; alphabetical indices represent cells' assigned noise values, and harmony scores are shown in a format that highlights the contribution of noise. With unequal weights, either (a) or (d) is favored as usual, and those weights must be similar enough to allow the other of those candidates to win under the right collection of noise values. But that also opens the door for candidates (b) and (c).

(23)

inline graphic
 

The nonclassical simulations presented above bear out the situations just described. License outweighs *[−ATR] in each case (see appendix Table A2), and the four candidates in 20 have output frequencies correlated with the extent of their posttonic harmony. [End Page 721]

Equal weights lead to a four-way tie in nonclassical NHG except under cell-level premultiplicative noise, which is addressed in the next paragraph. 9 The four candidates consequently have identical output probabilities—an improvement over unequal weights because neither attested form is less probable than the unattested ones. But this situation cannot be maintained across all of the relevant words. The four-way tie for 20 requires identical weights for License and *[−ATR], but /monedéros/, where the action is pretonic, requires License's weight to equal the sum of *[−ATR]'s and CrispEdge's weights. It is impossible to satisfy both criteria simultaneously unless CrispEdge 's weight is zero, in which case pretonic and posttonic harmony cannot be distinguished, impairing the treatment of /rekóhelos/. With these inputs pulling in incompatible directions, the four-way-tie scenario is unavailable.

Tied weights under cell-level premultiplicative noise are a bit different. A U-shaped distribution emerges here (upsilonism), with harmonically bounded candidates being less probable than the harmonic bounders, because each harmonic bounder benefits more when unusually small noise values are assigned to the constraint that it violates most severely (Hayes 2017). Upsilonism would render the attested forms more probable than the unattested ones, but as usual, competing pressures across different inputs undermine this possibility, and indeed upsilonism is not evident in Fig. 2. (To probe upsilonism further, I submitted the tableau in 20 to OTSoft. Upsilonism emerged occasionally, but not dramatically: the attested candidates had frequencies just over 0.25 each, and the unattested ones had frequencies just under 0.25 each. So even in the bestcase scenario, upsilonism's advantage is minuscule.)

Because nonclassical NHG cannot categorically rule out illicit candidates, constraints must be weighted in a way that makes those candidates sufficiently improbable. As we have just seen, this can be impossible. However, these simulations successfully exclude unattested candidates not involved in the double pyramids, showing that this gradient approach to ungrammaticality sometimes suffices. Nonclassical NHG simply does not provide a comprehensive account of ungrammaticality. The categorical exclusion of candidates provided by harmonic bounding is essential, and therefore only classical NHG can separate the wheat from the chaff in situations like 20.

Importantly, the foregoing results are independent of target output frequencies. Because frequency information for Eastern Andalusian's surface variants is unavailable, the OTSoft input files assigned every licit candidate a frequency of 1 (which is interpreted by OTSoft as equal frequencies for all attested forms for a particular input), and OTSoft naturally tried to match those arbitrary frequencies. But as we have just seen, nonclassical NHG cannot assign greater frequencies to all of the attested candidates than to the unattested candidates—surely a necessary criterion whatever the actual output frequencies are. In contrast, in classical NHG, because the choice between the two attested forms depends only on the relative weights of License and *[−ATR], output frequencies can be manipulated by assigning a greater starting weight to the constraint favoring the more frequent form. To verify this, I ran the classical NHG simulation again using only /kómetelos/. 10 Arbitrary target frequencies of 0.25 and 0.75 were assigned [End Page 722] to [k inline graphicmetelɔ] and [k inline graphicmɛtɛlɔ], respectively. OTSoft returned weights of 13.472 and 12.528 for License and *[−ATR], respectively, and the target output frequencies were generated nearly exactly: 0.253 for [k inline graphicmetelɔ] and 0.747 for [k inline graphicmɛtɛlɔ].

To summarize, we can strengthen Hayes's (2017) demonstration that nonclassical NHG can produce harmonically bounded outputs in a double-pyramid configuration: it must do so if it also produces the harmonic bounders. Because the harmonic bounders' violation profiles are mirror images of each other, weights that favor one of these candidates disadvantage the other to a greater extent than they disadvantage the intermediate forms.

What about the remaining candidates for /kómetelos/, *[kómetelo] and *[kómetelɔ]? These candidates are not harmonically bounded and cannot be categorically excluded under any version of NHG. A high weight for Max(−ATR) gives *[kómetelo] an output probability near zero in every simulation. This strategy is unavailable for *[kómetelɔ], which is penalized/rewarded only by constraints that also affect the licit outputs. Consequently, MaxEnt inevitably assigns it a nonnegligible probability. In other versions of NHG, it wins when *[−ATR]'s penalty is more than twice License 's reward; those constraints' unperturbed weights never yield that outcome on their own and are sufficiently large that noise, however it is added, is unlikely to manufacture it.

The other words used in the simulations behave similarly in core respects, but each deviates from /kómetelos/ in one way or another. CrispEdge adds more antiharmony pressure for /monedéros/, so harmony on unstressed syllables generally occurs less often here, but otherwise this word presents NHG with the same harmonic-bounding configuration as /kómetelos/. Likewise, /rekóhelos/ has the familiar double pyramids, but this time CrispEdge penalizes a subset of the pyramid candidates and therefore alters the harmonic bounding relationships. Of those candidates only *[rɛk inline graphichelɔ] is unattested; it is harmonically bounded and therefore inaccessible to classical NHG but unavoidable for nonclassical NHG.

Because /kɾísis/ has just one licit output, the other two candidates can be largely avoided if (i) Max(−ATR) dominates everything (triggering final laxing) and (ii) *[+hi, −ATR] dominates everything except Max(−ATR) (blocking harmony). This is possible in all versions of NHG, and all modeled this word well. Crucially, nothing in the other words conflicts with this arrangement, and a high weight for Max(−ATR) is supported by all five inputs. Furthermore, /kotiʒónes/ reinforces the need for a high weight for *[+hi, −ATR].

Classical NHG, then, is ideally suited for Eastern Andalusian. Many illicit candidates (but no licit ones) are harmonically bounded, rendering them inaccessible. The remaining illicit candidates are straightforwardly avoided with proper weights. Other versions of NHG can do some of this work, but because of the way these frameworks interact with collective harmonic bounding, they cannot do all of it. An anonymous referee suggests that the poor performance of nonclassical NHG may be attributable to the learning algorithms used by OTSoft, and with improved algorithms may come improved performance. This is of course a fair point, but the reasoning articulated here suggests that the learning algorithms are not to blame—these frameworks simply do not have access to the necessary means of excluding candidates categorically.

7.2. Alternative accounts

This section considers alternatives to the analysis presented above, first in constraint-based theories, then in rule-based phonology. An anonymous referee asks if better results under nonclassical NHG could be achieved with different constraints. This possibility cannot be ruled out, of course, but there are reasons to be skeptical. This section considers and rejects some options. [End Page 723]

A straightforward way to rescue nonclassical NHG involves a new constraint that penalizes just the harmonically bounded candidates. Tableau 24 exemplifies the strategy with a constraint called *Bounded.

(24)

inline graphic
 

A sufficiently high weight for *Bounded can effectively eliminate candidates (b) and (c) in the same way Max(−ATR) deals with *[kómetelo]. But *Bounded is obviously an unprincipled constraint, and satisfactory substitutes are not readily available. A constraint requiring ATR agreement between nonfinal posttonic vowels would suffice, but that merely stipulates the desired outcome and does not belong to any existing family of harmony-driving constraints. (Harmony that is confined to a smaller domain than a whole word exists, of course, but Eastern Andalusian's nonfinal posttonic vowels do not constitute an independently identifiable domain such as a foot or a stem.) Some other possibilities clearly fail: *e and *ɛ would also penalize one of the attested candidates (and to a greater degree than they penalize the unattested ones); a constraint against discontiguous harmony domains (e.g. Walker's (2011) *Duplicate or Kimper's (2012) *Gap) would also penalize [k inline graphicmetelɔ]; a *[−ATR] constraint for nonfinal posttonic vowels would also penalize [k inline graphicmɛtɛlɔ].

The problem is compounded when the exercise broadens to include the other configurations that give nonclassical NHG trouble. No principled constraint or set of constraints seems to be available to penalize the five pernicious harmonically bounded candidates. To test *Bounded's efficacy, though, I ran each nonclassical simulation again with *Bounded penalizing just those five candidates. Those outputs were successfully eliminated (though MaxEnt's other errors remained). Under versions of *Bounded that penalize a larger set of candidates which an actual constraint could plausibly target (say, all those in which nonfinal unstressed vowels do not behave uniformly), the five problematic candidates reemerged as outputs because *Bounded cannot be weighted high enough to eliminate those forms without also eliminating attested candidates. Whether or not an appropriate constraint is eventually identified, the larger point remains: nonclassical NHG struggles to replicate what comes for free under classical NHG.

What about a more drastic reassessment? As discussed in §3, Eastern Andalusian shows variation between three harmony systems that surface as invariant patterns in other languages. An analysis of Eastern Andalusian should formalize this connection; that Eastern Andalusian, Central Veneto, Lena, and Servigliano are related (Romance) languages only adds urgency to that objective. Positional licensing accomplishes this by offering a uniform explanation for the three patterns: they all enhance the prominence of the harmonizing feature, and the choice among them depends on which configuration in 16a–c is adopted. In fact, the HG-based positional licensing in 8 improves upon its OT-based counterparts by producing maximal harmony without a separate MaxLic constraint. From this perspective, the question is not what constraint set best formalizes the harmony patterns at issue—positional licensing does that very well—but which means of fostering optionality allows positional licensing to produce all and only the [End Page 724] three harmony patterns at once. Of the frameworks considered here, only classical NHG does this. 11

Because nonclassical NHG necessarily overgenerates when confronted with the double pyramids, it requires a constraint set that does not foster that configuration. That is, nonclassical NHG can account for Eastern Andalusian only by eradicating the structured nature of the candidate set that is revealed by the double pyramids. These candidates cannot surface as the result of a single constraint formalism, and nonclassical NHG therefore demands that we sacrifice the formal connections between different licensing patterns.

Even traditional categorical positional licensing, which does not itself create a pyramid of violations (see 5–7), does not escape this conundrum. Recall that this formalism requires MaxLic, which penalizes each disharmonic vowel, to produce pretonic harmony. MaxLic and *[−ATR] create the double pyramids, and nonclassical NHG's problem reemerges.

Rule-based phonology lacks the concept of harmonic bounding but nonetheless encounters a familiar hurdle regarding Eastern Andalusian. It is common to invoke optional rules—rules that do not always apply when their structural descriptions are met—to account for optionality. For example, Dell (1973) employs rules of this sort in an analysis of schwa deletion and various other phenomena in French. Schwa deletion is locally optional, so (to a first approximation) a schwa may delete or not independently of the behavior of other schwas in the form. Consequently, the choice to apply the rule is made separately for each schwa, not once and for all per word, derivation, and so forth. Similarly, Cedergren and Sankoff (1974) (following Labov 1969) observe that the likelihood that an optional rule will apply is influenced by phonological elements in the environment: some items facilitate a rule's application, and others discourage it. Building on Labov's optional-rule formalism, they attach probabilities to these influential items in a rule's structural description. When an opportunity for the rule to apply arises, its likelihood of application is influenced by the probabilities associated with the particular elements (not) found in the environment around the locus of application. Though Cedergren and Sankoff do not address this point explicitly, their formalism seems to support a theory of optional rules resembling Dell's: because two different loci of application in a form may be accompanied by different contextual elements, the probability of a rule's application is independent for each locus. That is, their rules invite local optionality.

But in Eastern Andalusian, not all loci behave independently. An optional rule that triggers harmony on one nonfinal posttonic vowel must apply to all such vowels. Two ways of ensuring this are apparent. First, alongside Dell/Cedergren and Sankoff-style rules, we might adopt a second formalism in which the choice to apply the rule is made globally: once a rule is turned on, it cannot be turned off. Alternatively, the harmony rule can be constructed to apply to multiple loci simultaneously, as in the formalism of Chomsky & Halle 1968 . Rather than a simple vowel-harmony rule like 25a, this would require an elaboration like 25b plus a stipulation that the rule target all possible loci when activated. Either way, another rule would be necessary for pretonic vowels, and [End Page 725] this pretonic rule must be more complex to ensure that pretonic harmony does not occur without posttonic harmony, perhaps by requiring the absence of nonhigh [+ATR] posttonic vowels or by duplicating the effect of the posttonic rule and targeting those vowels again. Linear rules are shown in 25, but the same issues arise in other formalisms, such as autosegmentalism.

(25)

inline graphic
 

The challenge posed by Eastern Andalusian's harmony appears to be universal. Like nonclassical NHG, rule-based phonology lacks the built-in prohibition on partial posttonic (or pretonic) harmony that harmonic bounding provides; these theories require unwieldy or unsound elaborations at best and simply cannot capture the facts at worst.

7.3. Local optionality and classical NHG

In essence, the defect of nonclassical NHG and rule-based formalisms is that they produce local optionality and are therefore incompatible with Eastern Andalusian's harmony. What, then, does classical NHG have to say about local optionality? Working in OT, Kaplan 2016 demonstrates that positionspecific constraints can relieve harmonic bounding (much like CrispEdge does in 13c), and these constraints allow different loci to be manipulated independently, thus producing local optionality even in frameworks that respect harmonic bounding. One case study in that work comes from Pima, where reduplication marks plurals ( Munro & Riggle 2004, Riggle 2006). In compounds, any combination of stems can be reduplicated as long as at least one is. The most elaborate example given by Munro and Riggle is 26. With five stems, thirty-one plural variants are possible. The variant in which every stem reduplicates is given in 26b; the remainder can be constructed by removing the underlined reduplicant from one or more stems.

(26)

inline graphic
 

Max-BR penalizes each unreduplicated stem, and Contiguity penalizes each reduplicated stem; these constraints generate the double pyramids and accompanying harmonic bounding. But this harmonic bounding is alleviated when these constraints are decomposed into position-specific families. The analysis in Kaplan 2016 rests on Max-BR and Contiguity constraints that target stems in specific prosodic positions: the stem bearing primary stress, prosodic word-initial stems, and so forth. Any stem can be singled out for reduplication if the Max-BR constraint for its position outranks the corresponding Contiguity, and ranking permutation makes available at least one ranking for each combination of reduplicated stems—no combination is harmonically bounded after all.

Because any OT ranking has an HG equivalent (Legendre et al. 2006 , Prince & Smolensky 1993 [2004], Prince 2003), this system also permits a classical NHG analysis. I submitted the Kaplan 2016 analysis of 26 to OTSoft. Classical NHG produces all thirty-one variants while excluding the illicit no-reduplication candidate. Just like OT, classical NHG is compatible with local optionality as long as the constraint set leaves no attested variant harmonically bounded. Unlike the efforts to reconcile nonclassical NHG and rule-based formalisms with Eastern Andalusian in §7.2, the OT/classical NHG analysis of Pima uses only constraints referring to prominent positions that are [End Page 726] known to be active in phonological systems (Barnes 2006, Beckman 1999, Kaplan 2015, Smith 2005 , Walker 2011).

8. Conclusion

Under all versions of NHG, the grammar can be arranged in a way that makes certain candidates extremely unlikely outputs. This is a continuous view of (un)grammaticality and variants' probabilities: an ungrammatical output is one that is sufficiently improbable. Eastern Andalusian's harmony indicates that this is inadequate. Constraints create structure within the candidate set via the violations they assign, and it may be impossible to privilege attested outputs while suppressing related unattested ones in the way just described. A violation of *[−ATR] cannot be fatal to one candidate and harmless to another, for example. Fortunately, the structure created by these constraints makes available a second way to avoid generating unwanted surface forms—harmonic bounding. Sometimes the boundary between the grammatical and the ungrammatical is discrete.

Classical NHG succeeds because it incorporates both gradient and categorical formalizations of (un)grammaticality. This is not to say that nonclassical versions of NHG should be discarded—other phenomena may support an alternative or be compatible with multiple NHG frameworks. In this vein, Zuraw and Hayes (2017) compare MaxEnt and classical NHG analyses of three phenomena and conclude that while they outperform non-NHG analyses, they are too similar to each other to decide which is superior. Each of these frameworks makes certain kinds of output patterns (im)possible, and Eastern Andalusian's harmony helps reveal these patterns.

A handful of issues remain unresolved. The opaque examples of harmony are unaccounted for, and they must be addressed if the analysis of Eastern Andalusian is to be comprehensive. In addition, because of the infinite-goodness problem, positive constraints must be introduced with care. The analysis developed here assumes that Dep blocks infinite epenthesis, but it is also possible to turn to serial HG to resolve the issue (Kimper 2011a). Questions regarding how serial NHG operates and whether it changes the outcomes reported above remain entirely unexplored.

Optionality may represent a relaxation of the one-to-one mapping from inputs to outputs that linguistic theory often assumes, but it is still a structured system. That structure can reflect language-particular facts about a language (for example, Eastern Andalusian's ban on high lax vowels), but it can also reveal more basic architectural properties. Harmonic bounding is one such property. It imposes a fundamental limit on the range of input/output mappings OT and related formalisms can generate. When those limitations collide with an optional process, they erect guardrails that circumscribe the range of possible variation. In other words, sometimes unattested forms in optional processes reveal fundamental boundaries of the system on which the grammar rests.

Aaron Kaplan
The University of Utah
Department of Linguistics Languages & Communication Bldg 255 S. Central Campus Dr., Rm 2300 Salt Lake City, UT 84112 [a.kaplan@utah.edu]
[Received 3 December 2019;
revision invited 30 March 2020;
revision received 7 August 2020;
revision invited 17 January 2021;
revision received 23 May 2021;
accepted pending revisions 27 May 2021;
revision received 8 June 2021;
accepted 8 June 2021]

Appendix. NHG simulations: candidate frequencies and constraint weights

Table A1. Output frequencies from simulations.
Click for larger view
View full resolution
Table A1.

Output frequencies from simulations.

Table A2. Constraint weights from simulations.
Click for larger view
View full resolution
Table A2.

Constraint weights from simulations.

REFERENCES

Anttila, Arto. 1997. Deriving variation from grammar. Variation, change, and phonological theory (Current issues in linguistic theory 146), ed. by Frans L. Hinskens, Roeland van Hout, and W. Leo Wetzels, 35–68. Philadelphia: John Benjamins.
Anttila, Arto. 2006. Variation and opacity. Natural Language & Linguistic Theory 24.893–944. doi:10.1007/s11049-006-0002-6.
Anttila, Arto. 2007. Variation and optionality. The Cambridge handbook of phonology, ed. by Paul de Lacy, 519–36. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511486371.023.
Barnes, Jonathan. 2006. Strength and weakness at the interface: Positional neutralization in phonetics and phonology. Berlin: Mouton de Gruyter.
Beckman, Jill N. 1999. Positional faithfulness. New York: Garland.
Boersma, Paul. 1998. Functional phonology: Formalizing the interactions between articulatory and perceptual drives. The Hague: Holland Academic Graphics.
Boersma, Paul, and Bruce Hayes. 2001. Empirical tests of the gradual learning algorithm. Linguistic Inquiry 32.45–86. doi:10.1162/002438901554586.
Boersma, Paul, and Joe Pater. 2016. Convergence properties of a gradual learning algorithm for harmonic grammar. Harmonic grammar and harmonic serialism, ed. by John J. McCarthy and Joe Pater, 389–434. Bristol, CT: Equinox.
Cedergren, Henrietta J., and David Sankoff. 1974. Variable rules: Performance as a statistical reflection of competence. Language 50.333–55. doi:10.2307/412441.
Chomsky, Noam, and Morris Halle. 1968. The sound pattern of English. New York: Harper and Row.
Coetzee, Andries W. 2006. Variation as accessing 'non-optimal' candidates. Phonology 23.337–85. doi:10.1017/S0952675706000984.
Dell, François. 1973. Les règles et les sons : Introduction à la phonology générative. Paris: Hermann. [English translation by Catherine Cullen: Generative phonology and French phonology, New York: Cambridge University Press, 1980.]
Flemming, Edward. 2017. Stochastic harmonic grammars as random utility models. Poster presented at the Annual Meeting on Phonology (AMP) 2017. Online: http://web.mit.edu/flemming/www/paper/SHG.pdf.
Goldsmith, John. 1976. Autosegmental phonology. Bloomington: Indiana University Linguistics Club.
Goldwater, Sharon, and Mark Johnson . 2003. Learning OT constraint rankings using a Maximum entropy model. Proceedings of the Workshop on Variation within Optimality Theory, ed. by Jennifer Spenader, Anders Eriksson, and Östen Dahl, 111–20. Stockholm: Stockholm University.
Hayes, Bruce. 2017. Varieties of noisy harmonic grammar. Proceedings of the 2016 Annual Meeting on Phonology. doi:10.3765/amp.v4i0.3997 .
Hayes, Bruce; Bruce Tesar; and Kie Zuraw. 2013. OTSoft 2.6 [software package]. Los Angeles: University of California, Los Angeles. Online: http://www.linguistics.ucla.edu/people/hayes/otsoft/.
Herrero de Haro, Alfredo. 2019. Consonant deletion and Eastern Andalusian Spanish vowels: The effect of word-final /s/, /r/ and /θ/ deletion on /i/. Australian Journal of Linguistics 39.107–31. doi:10.1080/07268602.2019.1542935.
I, Junko. 1988. Syllable theory in prosodic phonology. New York: Garland.
I, Junko, and Armin Mester. 1999. Realignment. The prosody-morphology interface, ed. by René Kager, Harry van der Hulst, and Wim Zonneveld, 188–217. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511627729.007.
Jesney, Karen. 2007. The locus of variation in weighted constraint grammars. Poster presented at the Workshop on Variation, Gradience and Frequency in Phonology, Stanford University, July 2007. Online: https://web.stanford.edu/dept/linguistics/linginst/nsf-workshop/Jesney_Poster.pdf.
Jiménez, Jesús, and Maria-Rosa Lloret. 2007. Andalusian vowel harmony: Weak triggers and perceptibility. Paper presented at the 4th Old World Conference in Phonology, Workshop on Harmony in the Languages of the Mediterranean, Rhodes, January 18–21, 2007.
Kaplan, Aaron. 2011. Variation through markedness suppression. Phonology 28.331–70. doi:10.1017/S0952675711000200.
Kaplan, Aaron. 2015. Maximal prominence and a theory of possible licensors. Natural Language & Linguistic Theory 33.1235–70. doi:10.1007/s11049-014-9273-5.
Kaplan, Aaron. 2016. Local optionality with partial orders. Phonology 33.285–324. doi:10.1017/S0952675716000130.
Kaplan, Aaron. 2018a. Asymmetric Crisp Edge. Hana-bana: A festschrift for Junko Ito and Armin Mester, ed. by Ryan Bennett, Adrian Brasoveanu, Dhyana Buckley, Nick Kalivoda, Shigeto Kawahara, Grant McGuire, and Jaye Padgett. Santa Cruz: Department of Linguistics, University of California, Santa Cruz. Online: https://itomestercelebration.sites.ucsc.edu/.
Kaplan, Aaron. 2018b. Positional licensing, asymmetric trade-offs, and gradient constraints in harmonic grammar. Phonology 35.247–86. doi:10.1017/S0952675718000040.
Kaplan, Aaron. 2019. Overshoot in licensing-driven harmony. Phonology 36.605–26. doi:10.1017/S0952675719000319.
Kawahara, Shigeto. 2008. On the proper treatment of non-crisp edges. Japanese/Korean linguistics 13, ed. by Mutsuko Endo Hudson, Peter Sells, and Sun-Ah Jun, 55–67. Stanford, CA: CSLI Publications.
Kimper, Wendell A. 2011a. Competing triggers: Transparency and opacity in vowel harmony. Amherst: University of Massachusetts, Amherst dissertation.
Kimper, Wendell A. 2011b. Locality and globality in phonological variation. Natural Language & Linguistic Theory 29.423–65. doi:10.1007/s11049-011-9129-1.
Kimper, Wendell A. 2012. Harmony is myopic: Reply to Walker 2010. Linguistic Inquiry 43.301–9. doi:10.1162/LING_a_00087 .
Labov, William. 1969. Contraction, deletion, and inherent variability of the English copula. Language 45.715–62. doi:10.2307/412333.
Legendre, Géraldine; Yoshiro Miyata ; and Paul Smolensky. 1990. Can connectionism contribute to syntax? Harmonic grammar, with an application. Chicago Linguistic Society 26.237–52.
Legendre, Géraldine; Antonella Sorace ; and Paul Smolensky. 2006. The optimality theory-harmonic grammar connection. The harmonic mind: From neural computation to optimality-theoretic grammar. Vol. 2: Linguistic and philosophical implications, ed. by Paul Smolensky and Géraldine Legendre, 339–402. Cambridge, MA: MIT Press.
Lloret, Maria-Rosa. 2018. Andalusian vowel harmony at the phonology-morphology interface. Paper presented at the 2015 Old World Conference on Phonology, London, January 12–14.
Lloret, Maria-Rosa, and Jesús Jiménez. 2009. Un análisis óptimo de la armonía vocálica del andaluz. Verba 36.293–325. Online: http://hdl.handle.net/10347/3518.
McCarthy, John J., and Alan Prince. 1995. Faithfulness and reduplicative identity. Papers in optimality theory (University of Massachusetts occasional papers in linguistics 18), ed. by Jill N. Beckman, Laura Walsh Dickey, and Suzanne Urbanczyk, 249–384. Amherst, MA: GLSA Publications.
Munro, Pamela, and Jason Riggle. 2004. Productivity and lexicalization in Pima compounds. Berkeley Linguistics Society (Special session on the morphology of Native American languages) 30.114–26. doi:10.3765/bls.v30i2.912 .
Nagy, Naomi, and Bill Reynolds. 1997. Optimality theory and variable word-final deletion in Faetar. Language Variation and Change 9.37–55. doi:10.1017/S0954394500001782.
Pater, Joe. 1999. Austronesian nasal substitution and other NC effects. The prosody-morphology interface, ed. by René Kager, Harry van der Hulst, and Wim Zonneveld, 310–43. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511627729.009.
Pater, Joe. 2009. Weighted constraints in generative linguistics. Cognitive Science 33.999–1035. doi:10.1111/j.1551-6709.2009.01047.x.
Pierrehumbert, Janet B. 2001. Exemplar dynamics: Word frequency, lenition and contrast. Frequency and the emergence of linguistic structure , ed. by Joan L. Bybee and Paul J. Hopper, 137–57. Amsterdam: John Benjamins.
Prince, Alan. 2003. Anything goes. A new century of phonology and phonological theory: A festschrift for Professor Shosuke Haraguchi on the occasion of his sixtieth birthday, ed. by Takeru Honma, Masao Okazaki, Toshiyuki Tabata, and Shin-ichi Tanaka, 66–90. Tokyo: Kaitakusha.
Prince, Alan, and Paul Smolensky . 1993 [2004]. Optimality theory: Constraint interaction in generative grammar. New Brunswick, NJ: Rutgers University, and Boulder: University of Colorado, Boulder, ms. [Published, Malden, MA: Blackwell, 2004.]
Reynolds, William Thomas. 1994. Variation and phonological theory. Philadelphia: University of Pennsylvania dissertation.
Riggle, Jason. 2006. Infixing reduplication in Pima and its theoretical consequences. Natural Language & Linguistic Theory 24.857–91. doi:10.1007/s11049-006-9003-8.
Riggle, Jason, and Colin Wilson. 2005. Local optionality. North East Linguistic Society (NELS) 35.539–50.
Samek-Lodovici, Vieri, and Alan Prince. 1999. Optima. London: University College London, and New Brunswick, NJ: Rutgers University, ms. Online: http://roa.rutgers.edu/article/view/373.
Schütze, Carson T. 2016. The empirical base of linguistics: Grammaticality judgments and linguistic methodology. Berlin: Language Science. doi:10.17169/langsci.b89.100.
Schütze, Carson T., and Jon Sprouse . 2013. Judgment data. Research methods in linguistics, ed. by Robert J. Podesva and Devyani Sharma, 27–50. New York: Cambridge University Press.
Smith, Jennifer L. 2005. Phonological augmentation in prominent positions. New York: Routledge.
Vaux, Bert. 2008. Why the phonological component must be serial and rule-based. Rules, constraints, and phonological phenomena, ed. by Bert Vaux and Andrew Nevins, 20–60. Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199226511.003.0002.
Walker, Rachel. 2001. Round licensing, harmony, and bisyllabic triggers in Altaic. Natural Language & Linguistic Theory 19.827–78. doi:10.1023/A:1013349100242.
Walker, Rachel. 2011. Vowel patterns in language. New York: Cambridge University Press. doi:10.1017/CBO9780511973710.
Zuraw, Kie, and Bruce Hayes. 2017. Intersecting constraint families: An argument for harmonic grammar. Language 93.497–548. doi:10.1353/lan.2017.0035.

Footnotes

* For feedback on this work, I wish to thank Rachel Walker, participants in the Analyzing Typological Structure workshop at Stanford University, and audiences at Linguistics at Santa Cruz 2020 and the 16th Old World Conference in Phonology. I am especially grateful to Bruce Hayes for his generous efforts to ensure that my simulations functioned properly in OTSoft.

1. This is a bit of a simplification, but a justifiable one for present purposes. See §5 for discussion.

2. Depending on the variety of Eastern Andalusian, other consonants may also delete word-finally, and these consonants may instead weaken, not fully delete. The effect that this deletion/weakening has on the adjacent vowel also varies by dialect (and not all deleted/weakened consonants have the same effect within a dialect), as does the nature of the resulting harmony. See especially Lloret 2018 for details and also Herrero de Haro 2019. The Granada dialect allows for final-consonant weakening instead of deletion, but for simplicity I show only deletion.

3. This language has a variety of harmony patterns. The one most relevant here is the regressive harmony triggered by both stressed and unstressed [i, u], causing preceding /e, o/ in the stem to raise to [i, u].

4. Only a cursory description of how this formalism applies to Eastern Andalusian is given in Kaplan 2018b. The remainder of this section develops that account more fully.

5. This autosegmental view is shared by OT-based positional licensing. Neither possible winner in 5 violates License because the stressed vowel's [−ATR] is assumed to be shared by the other lax vowel(s) in the word.

6. OTSoft version 2.6 improves the software's interaction with constraints that assign rewards. Due to License's rewards, the simulations with postmultiplicative noise do not function properly in version 2.5. I am deeply indebted to Bruce Hayes for patiently and generously working out the relevant issues with me and for revising his software to facilitate these simulations. The OTSoft files used in the simulations reported in §6 and §7 can be found in the online supplementary materials, available at http://muse.jhu.g.sjuku.top/resolve/136.

7. The choice on this matter was informed by classical NHG's performance. At 200,000 learning cycles, it is common for an iteration of the classical NHG simulation to produce no errors. The performance of other versions of NHG was not substantially affected by adjusting this number.

8. As Boersma and Pater (2016) show, negative weights can undermine harmonic bounding relationships, which might make them an intriguing option regarding Eastern Andalusian. I ran all of the simulations reported in this section with negative weights allowed, but OTSoft never returned a negative weight.

9. Tied weights are more feasible in nonclassical NHG than in classical NHG. In principle it is possible to establish an initial tie between License and *[−ATR], generate the identical harmony scores that are effectively impossible in classical NHG, and allow noise to break the four-way tie in some nonclassical NHG fashion.

10. Only one word was used because the frequencies of the outputs for one word interact with those of another word, exerting either antagonistic or cooperative pressure on constraint weights. Absent actual frequency data, sorting through these interactions would serve little purpose.

11. An anonymous referee asks whether the different varieties of NHG can produce each of the three invariant patterns independently of the other patterns. To find out, I modified the Eastern Andalusian OTSoft file to represent these invariant systems. All versions of NHG produced the correct outputs with minimal errors (< 1%) except MaxEnt, which generated substantial errors for every input except /kɾísis/, with correct forms having frequencies between 0.29 and 0.88.

Share