
The typology of external splits
The lexicon divides into parts of speech (or lexical categories), and there are cross-cutting regularities (features). These two dimensions of analysis take us a long way, but several phenomena elude us. For these the term 'split' is used extensively ('case split', 'split agreement', and more), but in confusingly different ways. Yet there is a unifying notion here. I show that a split is an additional partition, whether in the part-of-speech inventory or in the feature system. On this base an elegant typology can be constructed, using minimal machinery. The typology starts from four external relations (government, agreement, selection, and anti-government), and it specifies four types of split within each (sixteen possibilities in all). This typology (i) highlights less familiar splits, from diverse languages, and fits them into the larger picture; (ii) introduces a new relation, anti-government, and documents it; (iii) elucidates the complexities of multiple splits; and (iv) clarifies what exactly is split, which leads to a sharpening of our analyses and applies across different traditions.*
split, government, agreement, selection, anti-government, canonical typology, syntax, inflectional morphology
Supplementary material: http://muse.jhu.g.sjuku.top/article/884975
1. Introduction
The lexicon divides into parts of speech (or lexical categories), and there are cross-cutting regularities (features). We can make good progress with these two dimensions of analysis, but a set of remarkably varied and interesting phenomena elude us. These are often termed splits: 'plurality split' (Smith-Stark 1974), 'split ergativity' (Silverstein 1976, Foley 2018, Goedegebuure 2018, and others), 'person split' (Coon & Preminger 2012, 2017), 'split agreement' (Sauerland 2004), and more. These apparently disparate phenomena can be brought together into an illuminating typology. I show that a split is an additional partition, either in the part-of-speech inventory or in the feature system. Starting from additional partitions in these two dimensions, the typology incorporates the wide range of data, and with minimal machinery. This is valuable, since different branches of the field share the basic terminology here. We should capitalize on this compatibility in usage, while addressing analytical challenges according to varying traditions. Therefore, besides offering a novel typology, unifying the various types of split, and establishing anti-government as a new phenomenon, I include a measure of intellectual housekeeping, aiming to prevent our talking past each other.
As a first illustration of a split, consider government by the postposition gibi 'like' in Turkish (tur). In 1 we find benim, the genitive of the personal pronoun ben (first singular). [End Page 108]
(1) Turkish (Kornfilt 1997:25–26, 423–24)1
Since gibi 'like' governs the genitive in 1, we expect the noun Rubinstein to appear in the genitive in 2. Yet it stands in the nominative.
(2)
Gibi 'like' does not govern consistently, but governs ben 'I' and Rubinstein differently. This unexpected behavior is induced just by gibi 'like' and three other postpositions (için 'for', -(y)lA/ile 'with, by', and kadar 'as … as'). We partition lexemes into parts of speech in order to capture shared behavior. And indeed, the postpositions of Turkish have shared properties, which is the basis for treating them together. But examples 1 and 2 show that we need an additional partition, a split, separating these four particularly interesting postpositions from the rest. Their special behavior is that they govern different case values, not as simple alternatives, but according to the item governed; there is an additional partition, a split, here too. At first sight the distinction is pronoun vs. noun, but the situation is more complex: (i) not all pronouns behave alike, (ii) the number value of the pronoun has an effect on the case governed, and (iii) the situation is fluid. We return to the challenging details in §5.6, including the partition into items that pattern with ben 'I' and those like Rubinstein. Already we begin to see why splits are fascinating; examples 1 and 2 show that there is much of interest to be found behind labels such as 'case split'.
I present here a complete typology of such splits. It is built on a simple framework rooted in mostly uncontroversial assumptions about the structure of the lexicon and the ways in which lexemes can be related to one another. I start with two dimensions for analyzing lexemes, found in many theories of syntax and morphology, namely (i) syntactically motivated parts of speech (such as noun and verb) and (ii) morphologically motivated features (such as number or case) that cross-cut parts of speech. Then I propose a four-way classification of how two lexemes in a single utterance can be related to one another. The first three are the classic relations of government, agreement, and selection; I show how all three can be reduced to the different lexemic and featural requirements imposed by one lexeme on another. This analysis in turn derives a fourth logically possible relation I call anti-government; this remarkable relation is one we would not expect to find, yet I show that it is attested in a variety of languages. Going further, I take [End Page 109] as a baseline the notion that all instances of those four relations 'should' be consistent for any two items (I call this the lexeme consistency principle). We can then investigate a range of interesting examples that go against this principle; each of these is a split. Since a split can involve either the primary or secondary lexeme, and furthermore a split can be either lexemic or featural in nature, it follows that there are four theoretically possible types of split for the four relations above, resulting in a sixteen-way typology. I lay out the full typology, showing that it is surprisingly close to complete.
To achieve this, I first lay out the conceptual foundations, leading to the definition of external split (§2). On this base, I establish the typology of the four relations that can host splits (§3), namely government, agreement, selection, and the novel notion of anti-government. That leads in §4 to the typology of external splits, constructed using precisely the same machinery as that used for defining the host relations. The four types of split in the four host relations are analyzed in turn: those in government (§5), agreement (§6), selection (§7), and anti-government (§8), unifying these related types of split for the first time. Key examples are given at each point to clarify and underpin the typology. There can be combinations of splits, as we saw in Turkish, and these are further illustrated in the case studies. I also explore examples needing careful analysis, either to recognize a split or to accept an alternative using further distinctions in parts of speech or features. These are given in §9, together with possible challenges to the typology and related solutions. I conclude in §10, returning to the theme of sharing clear terminology while tackling data using the different approaches. (An alternative route through the article is to jump ahead to the data in §§5–8, and then return to §§2–4 for the foundations of the typology.)
2. Analytical essentials
To ground the typology securely I begin with lexemes, and the simple relations between them (§2.1), before moving on in §2.2 to the classification of lexemes by part of speech and features, as mentioned in §1. I can then define 'split' in §2.3. I investigate relations between lexemes, where 'lexeme' generalizes over a set of inflectional forms (thus the inflected forms go, goes, went, gone, and going constitute a single lexeme). A more formal definition from Stump 2016 is given in 3.
(3) A lexeme is (i) a lexical abstraction that (ii) has either a meaning (ordinarily) or a grammatical function, (iii) belongs to a syntactic category (most often a lexical category), and (iv) is realized by one or more phonological forms (canonically, by morphosyntactically contrasting word forms). (Stump 2016:58)
Insightful discussion of the issues lurking in this definition can be found in Stump 2016:58–66 and Spencer 2018. For Stump's 'syntactic category' I use 'part of speech' (justified in §2.2).
2.1. Single specification: the 'lexeme consistency principle'
For the lexemes that are our focus, I add to Stump's definition the characteristic that they specify an 'outgoing' external requirement (for example, an adposition may require a particular case value of its governee).2 Our starting assumption is that if a lexeme has an external [End Page 110] requirement it will be a single external requirement (irrespective of how we model it). This simple assumption is shared, I suggest, across the field, from syntacticians to lexicographers. It can be stated as in 4.
(4) Lexeme consistency principle: A lexeme's internal structuring and its external requirement are both consistent.
I focus on the external requirement:3 if a dictionary or a more formalized lexical entry states that a given verb takes its object in the instrumental case, I start from the assumption that this is equally true for the first-person singular present and the third plural pluperfect. Similarly, our initial expectation is that the case requirement of an adposition will be a single value, rather than two values as for Turkish gibi 'like' in 1 and 2 and the three postpositions like it.
The lexeme consistency principle provides the baseline from which we can calibrate the interesting diversity we find in the world's languages. It is conceptually cleaner to take the simplest baseline (lexemes are externally consistent) and to calibrate from there.4 This approach, setting a clear baseline and measuring from it, is a hallmark of canonical typology. It will prove its worth, since 'the canonical approach breaks down complex concepts in a way that clarifies where disagreements may lie between different linguists and theoretical frameworks' (Nikolaeva 2013:100).5 Having a simple baseline will be valuable both in leading us to a range of remarkable data and in prompting us to evaluate any additional machinery we may think necessary to account for it. Justification for this canonical approach is given by Round and Corbett (2020). [End Page 111] The main point to retain is the strategy of establishing a logical baseline (here 'single requirement') and measuring from there.
Our baseline, the lexeme consistency principle, has simple and clear requirements. Naturally our interest is in instances where the expectation of external consistency is not met, since then we are dealing with a split, and we need to weigh the options. 'External consistency' has two parts:
(i). a baseline (canonical) lexeme is consistent in being like other lexemes (it is the same as others); a canonical postposition behaves like other postpositions in a language;
(ii). a baseline lexeme always requires the same specification of other lexemes (it is the same for others); if a postposition requires the genitive, it does so for all items it governs.
These two parts of external consistency are the base for a typology with sixteen possibilities (§4), a typology that provides a clear framework for analyzing the rich variety of splits.
2.2. Two dimensions of analysis
Our two dimensions follow from §2.1.
(i). We partition the lexeme inventory to capture syntactic generalizations. Here the term 'part of speech' is widely used, especially in head-driven phrase structure grammar (HPSG; Müller et al. 2021). Lexical-functional grammar (LFG; Dalrymple et al. 2019) favors 'lexical category'; I avoid 'lexical category' and 'lexical class' because 'lexical' brings unhelpful ambiguity. Furthermore, while some treat 'lexical category' as a category of lexemes, and therefore equivalent to 'part of speech', for others it is reserved for part of speech with lexical meaning, for example, noun, verb, and adjective; this latter use is prevalent in minimalism (Adger 2003).
(ii). We need to capture consistencies across lexemes, including those consistencies that cross-cut different parts of speech. To model these, we partition by features in the other dimension. For example, there are generalizations involving items with, say, the case value 'genitive', which apply to various lexemes, including those of different parts of speech. (Case, number, gender, and so on are features; their values include genitive, plural, feminine.)6
It is useful to keep in mind these two perspectives, part of speech and morphosyntactic features. We can become habituated to assuming one or the other solution, when we should evaluate the options carefully. Continuing discussion of such trade-offs is often fruitful, while in some cases we find a clear outcome (this will be a recurring theme; see §5.6, §5.7, §7.4; and see §1 and particularly §2 of the supplementary materials).7 [End Page 112]
2.3. Split: an additional partition
Granted the two dimensions of analysis, and the baseline requirement that specifications are simple and consistent, we can define a split.8 The unstated terminological convention is that the partitions by part of speech and by features (§2.2) are taken as given (not that they are straightforward). 'Split' indicates an additional partition.
(5) Given
(i) a well-grounded part-of-speech inventory, and
(ii) a well-grounded feature system, an additional partition (categorical or gradient), in either dimension, is an external split.
The Turkish examples in §1 demonstrate this definition. Accounts of Turkish include (i) a part-of-speech inventory, which includes postposition, justified by shared syntactic behavior, and (ii) a set of case values, justified by syntactic and morphological analysis. But we need more than that in order to account for our data. The part-of-speech inventory is insufficient: it requires an additional partition, since gibi 'like' and three further postpositions behave differently from the other postpositions in their case government. Having said that these four are different, the set of case values is not sufficient for us to specify their behavior straightforwardly. To account for the pattern of government we find, we need to specify additional factors (the lexemes governed and their number values; see §5.6). Furthermore, the split is not categorical (§5.6); we shall see splits that are categorical and others that are gradient: both are equally splits (§3.6).9
While the Turkish examples show that there is a split, according to the definition, they also show why we need to go further. There is a split in the postpositions and another split in the pronouns (§5.6), and intuitively these two splits are different in nature. We need to recognize this difference. Looking for generalizations over these splits will point us to a satisfying typology, with sixteen possibilities. Toward this goal, we have the two dimensions of our analysis from §2.2 (part-of-speech inventory and feature system) and the definition of external split in 5 above.
3. The typology of external requirements
Given the two dimensions of analysis (part of speech and morphosyntactic feature), we turn to the four types of external requirement that can be split. Here we need some primitives for analyzing the requirements lexemes make of others. Take the deliberately simple example of German die Frau 'the woman'. The definite article die must be in this form, the feminine, rather than the masculine der or the neuter das. This is a binary relation, holding between these two items irrespective of further structure (including headedness; §3.5). The relation is asymmetric in that the primary lexeme, the noun Frau, corresponds to one of the forms available to the article. There are also other forms of Frau, and for the morphosyntactic (featural) specification of each there is an appropriate form of the article. But the opposite does not hold: some forms of the article (those of the other gender values) [End Page 113] have no corresponding featural specification of the noun (for discussion see Corbett 2006:7–8). In the direction noun→article (controller→target) there is always a corresponding form, but not in the direction article→noun (target→controller). We operate with this primitive notion of an asymmetric relation. The logical asymmetries discussed below need to be incorporated, in some guise, into any model of syntax.
Like agreement, government also shows an asymmetry, no matter how we model it. Thus, when we state that a particular adposition governs a given case value, the requirement is that the primary lexeme, the adposition, singles out one possibility (the realization of one case value) from those available to the secondary lexeme, the nominal; the nominal has additional values, not available for the given adposition. And selection involves a similar logical asymmetry: one lexeme, the primary one, forces a choice from the possible syntactically appropriate secondary lexemes.
Given this notion of asymmetry, we have the following primary and secondary lexemes, with relevant terms, as set out in Table 1.10
Asymmetric relations between primary and secondary lexemes.
The three well-established asymmetric relations, government, agreement, and selection, implicate a new fourth relation, namely anti-government. In anti-government, the different featural specifications of the primary lexeme determine the choice between possible secondary lexemes, as we see shortly. Having established the terms used, we now ask how these four relations form the basis for the sixteen-way typology of splits.
The asymmetric relations are based on two binary distinctions. We already have the distinction primary vs. secondary lexemes (Table 1). The second distinction, 'presence' vs. 'featural specification', is essentially lexeme vs. features (§2.2):
(i). 'presence': the lexeme is involved irrespective of its morphosyntactic specification (indicated by the circle in Figure 1, implying that there is no need for access to featural specification);
(ii). 'featural specification': the requirement refers to a particular morphosyntactic specification (indicated by the grid, evoking the cells of the paradigm).
These two binary distinctions give four logical possibilities; the four relations (without splits) cover the logical space, as in Fig. 1. These provide the underpinning for our typology of external splits (see Round & Corbett 2020 for substruction in typology).
Starting with the primary lexeme, consider the opposition 'presence' vs. 'featural specification'. A lexeme, purely by its presence, may require a secondary lexeme to take a particular featural specification. This is government, as in from him. Or a verb may govern the dative case value of its subject or the instrumental of its object; it does so simply by its presence (its own featural specification is irrelevant: it governs that [End Page 114]
The four asymmetric relations that host external splits.
case value regardless of its tense or person features). And a governor governs one specification, irrespective of other values of the governee (for example, the latter's value for number or person). Contrast that with the situation where the featural specification of the primary lexeme (controller) determines the featural specification of the secondary lexeme (target). This is agreement, as in Mary runs.
The third possibility is the situation where the presence of the primary lexeme determines the presence of a secondary lexeme; these are the characteristics of selection. Thus, the English verb depend requires the preposition on or upon.
The fourth logical possibility is that depicted in the last cell of Fig. 1. This is anti-government. In abstract terms, anti-government is the relation between a primary lexeme that requires, according to its featural specification, the presence of different lexemes. This is the inverse of government, hence the term. To grasp its nature, consider a constructed example.
(6) Constructed example (English-prime)
When (in English-prime) rely is in the present tense, it requires at (as in 6a); other choices are ungrammatical. But when rely is in the past, it takes in (as in 6b). Unlike in real English, it is not the case that rely simply requires on. Rather, according to the featural specification (tense value) of the primary lexeme rely, different secondary lexemes, here prepositions, are required. This is anti-government, which logically completes the scheme in Fig. 1.
(7) Anti-government:
(i) resembles agreement, since the stipulation depends on the featural specification of the primary lexeme (the tense of the verb in 6);
(ii) resembles selection, since the stipulated secondary item is a lexeme (the prepositions in 6);
(iii) is the inverse of government, since different featural specifications of the primary lexeme (the verb in 6) require the presence of different secondary lexemes (the prepositions in 6).
The constructed example in 6 appears implausible. While anti-government fits perfectly into the scheme of asymmetric relations (Fig. 1), it has not been discussed previously, which is reasonable, given how unlikely it seems. Can anti-government exist? The answer is—surprisingly—that it can (§3.4). Moreover, just as with the other three relations, we shall see instances of splits in anti-government. [End Page 115]
I review briefly the four relations (without splits) in §§3.1–3.4, with special attention to anti-government in §3.4. I then consider headedness, to show why it is orthogonal to our main concern (§3.5). Finally in this section, I examine basic assumptions that underlie the use of the typology in elucidating external splits (§3.6).
3.1. Government
In this relation, one lexeme (the governor) stipulates the morphosyntactic specification of another (the governee), as in from him. The governor, the primary lexeme, stipulates one of the possibilities available to the governee, the secondary lexeme. It does so simply by its presence, irrespective of its own featural specification. Thus a verb may require that its governee stand in the instrumental case, and the requirement is the same whether the verb is in the first, second, or third person, or in the future or pluperfect tense. (In brief, government is 'do as I say'.) The feature involved in government is typically case: see Zaliznjak 1973, Blake 1994, Sigurðsson 2003, and Malchukov & Spencer 2009; within that volume Spencer 2009 is particularly relevant for discussion of case as a morphological (internal) and a syntactic (external) phenomenon; see also Corbett 2012:200–222.11 Governors may be verbs, adpositions, or adjectives, and less often nouns. The baseline we calibrate from is a single case frame.
3.2. Agreement
Agreement differs from government in that the controller (the primary lexeme) requires the target (the secondary lexeme) to match its morphosyntactic specification. A feminine singular noun may require that a determiner it controls also stand in the feminine singular (in brief, 'be like me'). Again the baseline requirement is simple: a lexeme requires agreement according to its values, one for one. The features most often involved are person, number (as in our example Mary runs), and gender (die Frau 'the woman'); see Corbett 2006:125–41 for numerous examples and other possible agreement features. Government and agreement typically divide up the features, with government specifying a single case value, and agreement specifying person, number, and gender values. But the distribution can be less straightforward, as with numerals (Corbett 2000:211–13, 2009:157–63, Stolz 2002). Insightful discussion of the government-agreement connection, starting from the dominant participles of Latin, can be found in Nikitina & Haug 2016.
3.3. Selection
The selector, the primary lexeme, stipulates the presence of the selectee, the secondary lexeme (in brief, 'be there when I'm there'). For example, the English verb depend selects the lexeme on (or upon). As with government, it is the presence of depend that matters, not its morphosyntactic specification; equally the presence of the selected lexeme on is stipulated—this lexeme being free, for instance, to govern a case value. Note that the requirement can change between derivationally related items; series such as prides herself on, pride in, proud of show that it is not roots that select (Merchant 2019), which is problematic for the notion of category-free roots. Alternative terms for selection are 'combinatorics' and 'valence', the latter particularly within HPSG (Sag et al. 2003:50). There are various subtypes of selection. Bruening et al. (2018:4) distinguish: semantic selection (s-selection), categorial selection (c-selection), selection for [End Page 116] features (their example is [finite]), and lexical selection (l-selection). In our typology, lexical selection is the baseline: selection is the requirement of the presence of the secondary lexeme, and this is a single requirement.
3.4. Anti-government
I first proposed anti-government as a logical consequence of the way the typology is built. Anti-government arises from the canonical approach, since the simplest means for defining the other three relations imply this one. To discover no examples of anti-government would have been a good result: after all, it seems unlikely to occur, and we could propose reasons why it should not exist. And yet, evidence has been found.
We find anti-government in various Dagestanian languages; a good source is Maisak 2017. In Andi, the choice of which polar question word appears depends on the featural specification of the verb. In 8 the verb is in the aorist, and the question word =de is used.
(8) Andi (ani), Rikvani dialect, elicited (Maisak 2017)
When the verb has a different featural specification (other than the aorist), then =le is found.
(9)
Andi also has pairs of reportative clitics and of wh-question clitics; both pairs follow the same distribution as the polar question words, being anti-governed according to tense. Maisak shows that this distribution is not truly semantic nor fully morphological: he establishes that there is no syntactic conditioning, no phonological conditioning, and no semantic distinction between the markers.
Maisak (2017) provides an interesting comparison in Godoberi (gdo), where question markers (for polar and wh-questions) are sensitive to tense, but this time it is present and future vs. past. A further example of anti-government is found in Greek of different periods, where there are two negators whose distribution is complex, depending in part on the features of the verb (Willmott 2013). As with government, agreement, and selection, we expect to find examples that are less than fully canonical.
Thus anti-government is not just a theoretical construct; we find instances in different language families. Better still, there are examples of split anti-government. These are presented, parallel to the splits in the other relations, in §§8.1–8.3. I first need to demonstrate that we are indeed dealing with anti-government, which will further strengthen the evidence for its existence.
The four host relations are summed up in 10 (each presented as: primary – requires – secondary).
(10)
3.5. Headedness
We make a small diversion here, since head is a central concept in almost all current approaches to syntax, yet does not have a major role in our typology. This is because the primitive binary relations laid out above are still more basic, so I do not additionally invoke headedness. This subsection gives three reasons why. [End Page 117]
First, the primitive binary relations retain their value irrespective of headedness. There is ongoing debate about whether the nominal phrase is better modeled as NP or DP. The early assumption was that the N headed the NP. The counter-suggestion of DP was put forward by Abney (1987), among others. An informative debate between Zwicky (1985) and Hudson (1987) followed, continued in Corbett et al. 1993. More recent articles include Bruening et al. 2018 in favor of NP and Larson 2020 in response arguing for DP. Yet whichever analysis is favored, the more primitive asymmetry needs to be accounted for. Whichever is treated as head, we still need to get the German article die to be feminine to agree with Frau 'woman'. Moreover, we need to do so in a way that accounts for the various splits we find.
Second, where we find generally accepted assumptions about headedness, the asymmetry we are interested in operates in both directions, downward from the head and upward to it. Specifically for agreement, there is acceptance of both head-dependent and dependent-head determination (Larson 2020:537–39); see also Lehmann 1982:228–33, Haug & Nikitina 2016:884, Smith 2017:841–42, and references there. We need to treat both types (upward and downward) as agreement, since the gradient use of feature values runs seamlessly across them, as shown by the effects of the agreement hierarchy discussed in supplementary materials §1. Furthermore, the specific noun involved is key in determining these gradient effects (Corbett 2015b:195–96). Just as for agreement, so also for selection: we need to address both the requirements that heads place on dependents and those that dependents place on heads. Bonami (2015:86–90) provides insightful discussion, and terms selection by dependents 'reverse selection'. As instances of the latter, verbs may select an auxiliary, and nouns may select an adposition (see §7.1 for examples).
Third, linguists' use of the term 'split' is consistent with the primitives above. 'Split' is used of verbs selecting adpositions downward, and auxiliaries upward. Indeed, a highly productive area of research is splits in auxiliary selection in various Romance languages; for many syntacticians, this would be a dependent-on-head relation. I accept and develop this underlying logic to current usage in the discipline.
An interesting twist in the headedness tail is that the evidence from splits in agreement has been brought to bear on the headedness debate in a special issue of Glossa, with Bruening (2020) and Van Eynde (2020) arguing for NP and Salzmann (2020) for DP. Perhaps there is not one answer to the debate: some argue that headedness is better regarded as a gradient notion (Nikolaeva et al. 2019:31, and compare Lichte 2021). Even so, the primitive binary relations we address here retain their validity. Our typology, then, highlights splits in the primitive asymmetries I have laid out, all of which have a place in the development and evaluation of different models of syntax.
3.6. Assumptions
I now turn briefly to some basic assumptions; the data are complex, so it will be helpful to keep these in mind. (References back to them are given when needed, so readers may choose to fast-track to §4.)
Defaults and overrides
Given a default (for instance, transitive verbs take the accusative by default), overrides to it can be characterized along two dimensions: they can range from principled to arbitrary, and their application can go from general (applying to all items that meet the description) to lexically specified. These two criteria intersect: if the principle picks out few items, but applies to all of them, this override can be principled but of limited applicability (Bye 2015; compare Biberauer & Roberts 2016:260). The opposition 'general vs. lexically specified' is particularly relevant; we shall see that overrides can be very general and very specific: that is, they can relate to individual lexemes. [End Page 118]
'To the extent possible.'
Many generalizations have this understood caveat. For example, when we state that an agreement controller requires its target to match feature values, the absence of an agreement target does not lead to ungrammaticality. More interestingly, the agreement target may be present and yet be unable to realize the appropriate values. We see this in Archi (aqc): some verbs show agreement (11), but many do not (12).
(11) Archi (Chumakina & Bond 2016:112)
(12)
In 11 the verb is in gender ii, singular; it agrees with the absolutive argument Ajša (a woman's name), as indeed the personal pronoun does here. In 12, however, though the syntactic structure is the same, there is no overt agreement. The distinction is morphology-internal; no syntactic distinction results from it. In a careful study of the distantly related Ingush (inh), Nichols (2018) shows that there is indeed no syntactic effect from targets that do or do not show agreement, and Fedden (2022) found a similar result in a corpus of Mian (mpt).
More than one relation involved at once
We see this in 11 and 12 above. The Archi pronoun is governed by the verb, taking the dative; since it is in the dative, the pronoun then hosts agreement, controlled ultimately by the absolutive argument. Another type is 'collaborative agreement', where a governing element agrees with its governee (Corbett 2006:85). Or one lexeme may select a second, which governs the case value of the first (see §7.1 for examples); and pluralia tantum nouns in Russian select a collective numeral that in turn governs their case value (see 35 below).
One requirement, not two (whether categorical or gradient)
The baseline requirement is for a single outcome, which means exactly one (for example, one case frame). Anything beyond this implies a split, whether the addition is categorical or gradient. This issue has proved difficult, specifically concerning splits in the case marking of the single argument of intransitive verbs.12 As is true more widely (going beyond the [End Page 119] single argument of intransitives), the split may be categorical (for example, feature value X in context A, and feature value Y in context B; see §5.1) or gradient. And it can be gradient in various ways (for example, some items must take X, and others take X or Y; alternatively, some take X, some Y, and some X or Y). Furthermore, the split may be subject to conditions, including but not restricted to semantic conditions. To avoid confusion, then, it makes sense: (i) to talk of splits, for all of these instances, (ii) to specify whether the split is categorical or gradient, and (iii) to give the conditioning factor(s).
Today's or yesterday's syntax
Typology should lay out the range of phenomena, including their extremes, so that new examples are examined on their merits. Specifically, given a complex set of forms, some linguists will assume they must be accounted for with more articulated syntactic structure, while others will first look to a morphological analysis. We ask whether there are grounds for a synchronic syntactic account (an external split), or whether earlier syntactic processes have left traces in synchronic morphology (an internal split; see n. 3). The alternatives are argued out carefully in Deal's (2016) discussion of the ergativity split based on person in Nez Perce (nez); see also Forbes 2021 on Gitksan (git). In order to concentrate on external splits, I focus on examples with clear indicators favoring a syntactic account. (Thus, split intransitives involving only verbal inflection will not figure.) I take up this issue in case studies, especially in supplementary materials §2.2.
Now that we have in place the four relations that can be split, as well as necessary assumptions, we turn to the typology of such splits.
4. The logic of the typology of external splits
Given the four host relations, we can now ask what split means for each of them. Our typology uses minimal resources, in that the types of split are induced by exactly the distinctions that underpin the typology of relations in Fig. 1 above. For each pairing of primary lexeme (governor, controller, selector, anti-governor) and secondary lexeme (governee, target, selectee, anti-governee), we ask—relating to each member of the pair—whether we are dealing with:
(i). an additional partitioning of the lexemes: a split between lexemes, henceforth lexemic split (we need to specify that certain lexemes behave differently from other lexemes, beyond what would be predicted), or
(ii). an additional partitioning by morphosyntactic specification: a split within lexemes, henceforth featural split (we need to specify that items with certain featural specifications induce different behaviors, beyond what would be predicted).
I demonstrate that these two, lexemic and featural splits, together form a novel and comprehensive typology (they are our two dimensions of analysis; §2.2). Each type of split can apply to the primary lexeme and to the secondary lexeme. Thus for each host relation there are four possible types of split. Splits in the primary lexemes have typically been major areas of research. I show their place in the general typology, and then highlight the remaining, less well-researched types. Distinguishing carefully the issues that are due to the primary lexeme and those that arise from the secondary lexeme will allow us to clarify earlier confusion.
4.1. The four types of split
Figure 2 starts from government and shows how the four types of partition apply there. For the primary lexeme (the governor) there are two types of split, lexemic and featural, and similarly for the secondary lexeme (the governee).
The expanded box spells out the four types of split specifically for government (the relation in which the presence of the primary lexeme requires a particular featural specification of the secondary lexeme): [End Page 120]
The four types of split, illustrated with government.
• Type (i), as indicated in the expansion of Fig. 2, is a lexemic split of the primary lexemes. There is an additional partition of the primary lexemes: some governors require one feature value and some another; for instance, some transitive verbs take the accusative and some the instrumental (§5.1).
• Type (ii), a featural split of the primary lexemes, is found where governors have different requirements according to their own featural specification: thus a verb may have different case requirements according to its own tense, aspect, and mood (§5.2).
• Type (iii), a lexemic split of the secondary lexemes, occurs when different governees are governed differently. We saw this with our original Turkish examples 1 and 2, and a further example is given in §5.3.
• Type (iv), a featural split of the secondary lexemes, is the situation where the governees take a different specification according to their own feature values; for example, the governee of an adposition stands in a different case according to the governee's number value (§5.4).
After analyzing the four types of split involving government in §5, I apply the same logic to the other host relations: agreement (§6), selection (§7), and anti-government (§8), which gives sixteen types of split in all. Typically, I present one set of data in sufficient detail to make the case clearly, and may then refer briefly to other comparable ones. Figure 2 and the initial examples in §§5–8 are intended to identify the possible types of split. These splits can occur in combination, giving rise to complex examples. To demonstrate the usefulness of the typology for analyzing this complexity, case studies are included at key points.13
4.2. The four host relations: less clear instances caused by splits
Figure 2 also allows us to address some issues with the four relations that may have arisen (otherwise, the reader may wish to go straight to §5). For example: How do splits affect our classification of examples? How do the features link to the relations? For clarity we have started from the baselines. Real life is more complex, and Fig. 2 helps us to understand how: the expanded box clarifies how the types of split affect the host relation differently. Consider again type (i), a lexemic split of the primary lexemes, where some governors require one feature value and others another. For example, if the lexemes in question are verbs, some verbs may govern their object in the accusative and others in the instrumental (as in Russian; see §5.1 below). This is no less a situation of government; different lexemes simply have different government requirements. Contrast this [End Page 121] with type (ii), where the primary lexemes are split according to their featural specification. For example, one and the same verb could govern different cases according to its own featural specification for tense, aspect, and mood (as in Georgian; see §5.2). As the graphical representation suggests, this gives us a less clear case of government (since a given lexeme does not have a unique government requirement), and it moves us somewhat toward agreement.
This pattern of contrast generalizes through the sixteen possibilities in an elegant way. We look first at the situation where the type of split matches the type (presence vs. featural specification) of the lexemes that are split (whether primary or secondary). In the Turkish examples in 1–2 above, government is a relation between a primary lexeme that, purely by its presence, requires a secondary lexeme to take a particular featural specification. If we have a lexemic split, such that not all primary lexemes (governors) behave alike, this is a 'congruent' split: the government relation involves lexemes (their presence) in the primary role, so a split lexeme-from-lexeme leaves the relation unaffected. Thus in our Russian example below (§5.1), each verb shows straightforward government, but not all verbs govern the same case value. This specific instance is given in the top row in Table 2, followed by other congruent splits. In Table 2, the brief descriptions of the relations are 'lexeme' for all specifications of the lexeme (its presence) and 'feature' for featural specification. Thus the relation government (column1) has the description that the presence of the primary lexeme ('lexeme') requires (→) a featural specification ('feature') of the secondary lexeme. Bold and italic indicate the links from the description to site and type of split.
Congruent splits.
Just as in the first congruent split in government, in each of the other seven congruent splits the relation involved remains straightforwardly unchanged. Table 2 shows why this is so: the split produces 'more of the same'. These congruent splits are the ones where the data are easier to classify. And indeed, no one is likely to treat examples like the lexemic split of the Russian verbs (§5.1) as anything other than government.
We move on to those splits where the split type is not congruent with the type of the primary or secondary lexeme that is split. Recall our type (ii) in Fig. 2, where government by the primary lexemes is split according to their featural specification (Georgian verbs govern different case values according to their own featural specification for tense, aspect, and mood; §5.2). In the canonical situation, government involves simply the presence of the primary lexeme; a featural split, within the primary lexeme, means that we have a less canonical instance of government, as indicated in the top row of Table 3, which gives the eight incongruent splits.
The key difference in Table 3, as compared with Table 2, is seen in the 'type of split' column; the type of split is not congruent with the site of the split, so the split leads to a less good example of the relation. In our example from government, all specifications [End Page 122]
Incongruent splits.
of the primary lexeme 'should' require the same featural specification of the secondary lexeme. A featural split in the primary lexeme means that we have a less good instance of government; it shows a slight shift toward agreement. This is not to claim that it is agreement, simply that the incongruence points in that direction.
Table 3 shows a clear pattern: incongruent splits involving the primary lexeme link those phenomena that differ in the role of primary lexeme:
• government is shifted toward agreement, and agreement toward government;
• selection is shifted toward anti-government, and anti-government toward selection.
Incongruent splits involving the secondary lexeme similarly link those phenomena that differ in the role of secondary lexeme:
• government is shifted toward selection, and selection toward government;
• agreement is shifted toward anti-government, and anti-government toward agreement.
I stress that incongruent shifts give less clear examples ('shifted toward'), not that the relation changes. The relation is preserved for two reasons, quantitative and qualitative. The quantitative argument relates to primary lexemes, where a split needs to be seen against the background of the full paradigm. Thus, the Georgian verb (§5.2) splits three ways in terms of its case government; but this is within an extremely large paradigm, so overall we are dealing with government, albeit not quite as clearly as in government with no such split. The qualitative argument concerns the features involved and the secondary lexeme. Starting with the features (whose relevant properties are analyzed in Corbett 2013), these may be divided according to whether they are carried by the controller/governor and the target/governee. This distinguishes gender, number, and person, where both controller and target carry the feature (hence these are found in agreement), from case, carried only by the governee (hence it occurs in government). Similarly, as we shall see, the secondary lexemes required in selection and anti-government fall into types: tense, aspect, and mood for selection, and negators and question words for anti-government. This linkage between the four host relations and their effect on the secondary lexeme (feature realized or semantic type) remains even in the less straightforward instances. Thus, if the effect of splits makes the relation less clear, quantitative and qualitative pressures maintain the relation. In §§5–8 we concentrate on clear examples of the relevant relations; in other words, the split does not bring them close to the borderline with neighboring relations.
5. Splits in government
Recall that in this relation, one lexeme (the governor) stipulates the morphosyntactic specification of another (the governee). The governor, the primary lexeme, stipulates one of the possibilities, typically the case value, available [End Page 123] to the governee, the secondary lexeme. For possible splits, we apply the opposition lexemic vs. featural to the primary lexeme, and equally to the secondary lexeme. This gives us four types of split within government (as in Fig. 2); I tackle each in turn.
5.1. Government: primary lexeme: lexemic split
This type of split partitions the primary lexemes; some have one requirement, others a different one. A familiar basis for this is quirky case, where there is an additional partition within the verbs: some verbs have a different requirement from most verbs. Thus in Russian, the majority of transitive verbs require the accusative for their direct object (13), but certain verbs require instead the instrumental (14).
(13) Russian
(14)
Verbs like that in 14 have their lexical semantics in the domain of directing and managing (Janda 1993:160–61 provides a list); hence the split is partially motivated. We see an additional partition within the part of speech 'verb', clearly meeting our definition of a split (§2.3). Similarly in Archi, transitives typically take their subject in the ergative, while verbs of emotion and perception take their subject in the dative (as in 11 and 12). Such partitions may be well or less well motivated in semantic terms. Whether the partition is motivated or not, something is needed beyond the simple statement that transitive verbs have a particular case frame.14
Indeed, each part of the definition of split prompts questions of motivation. If there is a lexemic split between verbs according to the case values they govern (as in 13 and 14), we may reasonably ask whether the partitioning involves all and only the verbs of, say, a particular semantic class. We may also ask whether the case values involved fit into a more general scheme of case use or whether they appear to be synchronically unmotivated. The answers, for each type of split, may be straightforward or complex, ranging from 'indubitably fully motivated' through 'partially motivated' to 'no proven synchronic motivation'. A value of the typology is that it raises these questions, and it raises them for each type of split: a partition is no less a split if we believe we can offer a convincing motivation for it than if we find it surprising and obscure.
5.2. Government: primary lexeme: featural split
A featural split of the primary lexeme involves an additional partition in the other dimension: some featural specifications of the primary lexeme govern one case value, some another. A rightly famous example is Georgian (kat), where government of the appropriate case value changes as the tense, aspect, and mood (TAM) combination of the verb changes (indicated by the prefixes and the 'series' marker, here i/iii). [End Page 124]
(15) Georgian (Alice Harris 1981:1, and p.c.)
This complex system is laid out in Harris 1981 and Anderson 1992:141–58; a brief account is in Corbett 2015a:167–70. Further discussion is found in Van Valin 1990:240–48 and Wier 2011, while the wider Kartvelian perspective is provided by Tuite (2017). Such systems can be modeled in various ways; for example, we might introduce more syntactic structure for certain TAM combinations (Nash 2017). But everyone needs to do something extra, as compared with languages where TAM changes have no effect on case frames. A helpful survey of TAM splits is given in Malchukov 2014. The Georgian data presented here illustrate beautifully a split based on the featural specification of the primary lexeme, but there is also a lexemic split, since there are other classes of verb, not illustrated, that have different case requirements. The distribution of the verbs over these classes is partially motivated (Anderson 1992:147).
This type typically involves core case arguments of the verb; in Tuvan (tyv), however, locations are affected. When the verb is in the present, the locative is used; when the verb is in the past or future, location is almost always expressed by the dative (Isxakov & Pal′mbax 1961:128, Nevskaya 2014:263–64).
5.3. Government: secondary lexeme: lexemic split
We now turn to lexemic splits involving the secondary lexemes. We see this in Mari (chm), formerly Cheremis, a Uralic language spoken mainly in the Mari Republic (Russian Federation). Postpositions induce a split within the nominals they govern. Pronouns stand in the genitive, as in 16.
(16) Mari (Kangasmaa-Minn 1998:227, 237)
Nouns normally appear in the nominative.
(17)
Kangasmaa-Minn 1966 gives more data on this split; see also Alhoniemi 1993:50.
We can now take further the idea that the division by part of speech and by features is taken as given, and split is typically used when there is some additional partitioning (§2.3). That holds for primary lexemes, but more needs to be said about secondary lexemes. Thus for government, we do not expect governees to behave differently according to their part of speech; rather, the nominal parts of speech are taken together, and a governor requiring a given value takes this value of any governee. There are splits within the nominals, as we have just seen, and then we sometimes too readily assume a split along the noun-pronoun divide (perhaps overinfluenced by morphological differences; Garrett 1990:286). In Mari the split is gradient (see §3.6): pronouns appear in the [End Page 125] genitive, while nouns stand normally in the nominative (but sometimes in the genitive). Ossetic (oss) too has a complex case split within the pronouns (Belyaev 2021:260–61). And in the telling Russian case split (§5.7), we shall see clearly a split moving through the pronoun in stages.
5.4. Government: secondary lexeme: featural split
Here the featural specification of the secondary lexeme has a role in the case value governed. Belarusian (bel), an East Slavonic language, shows a nice contrast; the case governed by the preposition pa 'for' varies according to the number of its governee.
(18) Belarusian (Atraxovič 2003, Uwe Junghanns, p.c.)
The important factor is the featural specification of the secondary lexeme, singular vs. plural. In 18a the secondary lexeme (the noun) is singular, and hence dative, while in 18b it is plural, and hence locative. There is an additional partition of the governees, according to their specification for number.
This split is a gradient one. The locative is fully established in the plural, but both dative and locative are found in the singular. This is a split of the type discussed in §3.6, where the opposition is X vs. X or Y, locative vs. locative or dative. (In Belarusian, as more generally in Slavonic, the direction of change here is to the locative; in Russian, as we shall see in §5.6, the trend is to the dative.) We should consider the inflectional system within which the change is occurring. In Belarusian, some classes of noun have dative-locative syncretism in the singular; none do in the plural. The fact that the choice of case value is now fixed in the plural, and is moving in favor of the locative in the singular, shows that the analysis must be based on features, and that an analysis appealing to specific morphemes could not work.
There are further splits here: first, only the preposition pa 'for' is involved (an additional partition of the primary lexemes, separating this preposition from all others); and second, in its spatial use, pa 'for' is found with the accusative with a small list of nouns (similarly in Russian).
Extensive evidence on the main split in government (dative vs. locative as in 18) is provided by Mayo (1988), who conducted a hand-coded corpus study. His sources (approximately 360 pages for each) were as follows:
• Michaś Zarecki, Ściězki-darožki, published 1927 (abbreviated as 'MZ');
• Kuźma Čorny, Treciaje pakaleńnie, 1935, and Luba Łuk'janskaja, 1936 (KC);
• Ivan Mielež, Ludzi na balocie, 1960 (IM);
• Viačasłaŭ Adamčyk, Čužaja baćkaŭščyna, 1978 (VA).
In Table 4, I present one particularly indicative set of the data, which Mayo characterizes as the use of the preposition 'in constructions expressing objective relations'. Table 4 includes all examples where there is a potential choice of case value. This is achieved by omitting (i) sixty-four instances of the plural; as expected, the locative was used in all of them, and (ii) 178 instances of dative-locative syncretism in the singular.
In the singular, in the instances with differentiated case forms, the dative was found in 33% of the examples (Table 4). We see considerable variation: first between the [End Page 126]
Belarusian: government of pa 'for': clearly differentiated forms in singular only. Note: authors are abbreviated by initials (given in text above); dat = percentage items in the dative.
senses in which the preposition is used, and second between authors. However, much of the apparent variance between authors results from the proportions in which they use pa in its different senses. Thus Ivan Mielež, who would appear to be the most conservative, retaining the dative in almost half of the examples, has this distribution because he uses pa frequently in the sense 'according to'.
There is no sensible way to avoid positing a split. Enhancing the part-of-speech inventory to include a special part of speech with variable government requirements would be simply to relabel the lexical entry for pa 'for'. Equally, enhancing the values of case, with a new value dative-locative, would still involve complex conditions specifying when dative-locative is realized as dative and when as locative. What we should retain is the complex interaction of factors involved as the balance of case values changes over time, moving, it appears, toward a situation in which pa 'for' will take only the locative. Thus, there is a split in the part of speech 'preposition', with pa 'for' having unique government properties—but the main point is that pa 'for' has a gradient split in the case values it governs, determined by the number value of the secondary lexeme.
5.5. Government: summary of the evidence for splits
We find all four types of split in government that are provided for in our typology (Fig. 2), both the more familiar examples involving the primary lexeme and the less-discussed items where the split concerns the secondary lexeme. The evidence is summarized in Table 5.
Government: summary of evidence for splits.
We have seen a clear example for each of the splits. Congruency is as defined in §4.2; since government is the relation in which the presence of the primary lexeme requires a particular featural specification of the secondary lexeme, both a lexemic split of the primary lexemes and a featural split of the secondary lexemes are congruent. Incongruent splits are, arguably, the more striking and challenging. We have also noted that splits can be complex, involving more than just one type. This was true of Georgian (15), and Belarusian illustrated it again (in addition to the featural split in the secondary lexemes, there is a lexemic split in the primary lexemes, since only the preposition pa 'for' induces this effect). Our case studies for government (Turkish in §5.6 and Russian in §5.7) provide further illustrations of complex splits. [End Page 127]
5.6. Case study turkish: a complex split
The Turkish split with which we began (§1) proved invaluable for bringing out key issues; we now consider fuller data. Our original illustration shows the basis of the split, the postpositions governing different case values.
(19) Turkish (Kornfilt 1997:423–24)
(20)
Going further, the four postpositions, gibi 'like', için 'for', -(y)lA/ile 'with, by', and kadar 'as … as' (Göksel & Kerslake 2005:242), have the interesting government requirements summarized in Table 6 (Matthew Baerman, Steven Kaye, and Jaklin Kornfilt, p.c.; Lewis 1967:85–86, Kornfilt 1997:423–24, Libert 2008a, 2008b:240–42).
Turkish forms governed by gibi 'like' and similar postpositions (shaded).
These postpositions govern nominative and genitive. With nouns the picture is clear: the nominative is used, as in 20. Examples are: ev gibi/evler gibi 'like a house/houses'. It may seem surprising to have postpositions governing the nominative case; it is also the form used of indefinite or nonspecific objects, so is sometimes labeled 'absolute' (Libert 2008b). With the basic forms of the first- and second-person pronouns (also with singular demonstratives and kim 'who') the genitive is used: benim gibi 'like me', bizim gibi 'like us'. But with the third plural pronoun (which is also a demonstrative), we find the nominative: onlar gibi 'like them'. However, the first- and second-person pronouns also have 'multi-plural' forms; the choice between these multi-plural forms and the normal ones is complex (Nevskaya 2005:348–50 and references there). Pronouns in the multi-plural form appear in the nominative with our four postpositions. The suffix -lar/-ler of the multi-plural pronouns, like that of third plural pronoun, suggests they are more noun-like, and they take the nominative as nouns do. Thus, we need to distinguish nouns (which take the nominative whether singular or plural) from pronouns, and to make further distinctions within the pronouns, concerning both the lexemes involved and the featural specification. Furthermore, the nominative can encroach beyond what is specified above, particularly with kim 'who' (Lewis 1967:85–86); see Libert 2007 for more evidence of the considerable variability in the current language, and for comparison with other Turkic languages. This variability shows that the split is gradient rather than categorical (§3.6).
We should again ask whether there is an analysis that would avoid postulating a split. Creating a new part of speech to accommodate the four special postpositions would fail to account for the fact that, apart from their special government requirement, they behave as postpositions. Adding a new feature value, nominative-genitive, is equally unattractive, since an account of its use would be simply a listing of the conditions under which it matches the nominative and those where it matches the genitive. Thus, there is [End Page 128] a split in the inventory of postpositions, with just four having unique government properties. As primary lexemes these four postpositions govern both genitive and nominative: the case value of the secondary lexeme is determined both by the part of speech (noun vs. pronoun) and, within the pronouns, by the specific lexeme and by the featural specification (singular, plural, multi-plural).
5.7. Case study russian: a lesson from diachrony
We conclude our discussion of government with a remarkable split, documented over two and a half centuries. It involves the case value governed by the Russian preposition po 'for' (also 'on, about, by, according to'). After an interesting development, the split has been largely resolved in contemporary Russian. We therefore examine an earlier period to see the split in operation, while reducing the degree of variability. A convenience sample was taken, consisting of the works of Andrej Platonov (1899–1951) that were readily available online (I am grateful to Aleksandr Krasovitsky for these examples); to restrict the range of meanings a single phrase was taken: skučat′ po 'to long for, miss'. In this tightly specified search, we find both dative and locative, and the distribution is as in these examples.
(21) Russian (from the writings of Andrej Platonov 1899–1951)
(22)
In this sample from the first half of the twentieth century, we find dative for nouns and locative for pronouns. There is clearly a split here, but it is more complex and interesting than 21 and 22 reveal. It appears to be a split determined by part of speech; it must be said, however, that the sample did not include any third plural pronouns. This interesting split has attracted researchers including Grigor′eva (1951), Bondarenko (1961: 28–30), Filippova (1964), Ždanova (1965), Graudina, Ickovič, and Katlinskaja (1976: 48–49), Ickovič (1977), and Iomdin (1991).15 We concentrate on Muravenko (2014), who traces the split from the mid-eighteenth century to the end of the twentieth century. She bases her conclusion on a wealth of examples, but gives summary data rather than detailed statistics.
Muravenko (2014) examines the locative vs. dative split (not the smaller issue of the accusative). The main development is clear: over the last two and a half centuries the dative has replaced the locative here. Muravenko examines three senses of the preposition: object (as in 'fire at'), spatial (as in 'go along'), and cause of sorrow (as in 'long for', 'miss'). In that order, they represent more innovative to more conservative: thus, our examples 21 and 22 show the last part of the overall change, in a period with abundant data. The end point of the change, from governing the locative to governing the dative, was reached at different times for the three senses of po just given. However, they went through a similar progression, as in 23. [End Page 129]
(23)
The apparent gaps in this list are explained by syncretisms. For the first- and second-person singular pronouns, the dative and locative are syncretic (hence are omitted). For nouns in some inflection classes, dative and locative are syncretic in the singular, while none are in the plural. This makes clear that the change concerns feature values (as with Belarusian; §5.4), since it begins precisely where there is no syncretism, that is, in the plural of nouns. While discussing the forms, it is worth noting that Russian had a mature case system throughout the change (there was no 'addition' of markers, as seen in some changes affecting differential argument marking; compare Garrett 1990:286).
If we concentrate on the sense 'cause (of sorrow)' (21 and 22), Muravenko's description (2014:671) allows us to give the overview in Figure 3.
The case split with Russian po 'cause (of sorrow)'. Note: use of locative (older usage);
use of dative (innovative usage);
rarer continuing use of locative.
Figure 3 abstracts away from the detail and shows the change running through time (left to right) and through the morphosyntactic environments (top to bottom). The split, at different periods, is determined both by the lexemes involved (primary and secondary) and by the featural specification of the secondary lexeme. And, as noted in §5.4, the change is running in the opposite direction from that elsewhere in Slavonic, notably in Belarusian.
We must accept a complex split here. If we enhance the part-of-speech inventory by partitioning the prepositions just to accommodate po 'for', we are no further along; we would still need to list the requirements of this unique preposition. Conversely, if we enhance the feature values to include a value locative-dative, we have again made no progress, since we would still need to list the conditions under which this value is realized as locative vs. dative. The split involves the preposition po 'for' and the case values it governs, the choice being determined both by the part of speech and by the featural specification of the governee. This case study demonstrates that a split can have multiple determining factors and can target parts of lexemes, seen as it works its way through the pronouns. It reminds us to be cautious of analyses of apparently neat splits, since they may be hiding greater complexity.
6. Splits in agreement
The agreement relation is found when the controller (the primary lexeme) requires the target (the secondary lexeme) to match its morphosyntactic specification. We again consider the four logical possibilities: lexemic and featural splits of the primary lexeme, and lexemic and featural splits of the secondary lexeme. [End Page 130]
6.1. Agreement: primary lexeme: lexemic split
Showing that lexemes are split according to their agreement requirements seems easy: items higher on the animacy hierarchy distinguish number, while those lower do not (this is what Smith-Stark 1974 calls the 'plurality split'; Corbett 2000:54–132). But in many cases, it could be argued, the difference in agreement is a natural consequence of the noun's number behavior (shown, for instance, in inflection): those higher on the hierarchy distinguish number inflectionally, and so control agreement in number. It is possible, if only rarely, to pull the two issues apart: in the Chadic language Miya (mkf), inflection for number and agreement do not run entirely in parallel (data from Schuh 1989, 1998, discussed in Corbett 2006:177–79). Miya has two number values and two gender values; however, agreement targets have just three forms: masculine singular, feminine singular, and plural. The noun inventory is partitioned by animacy; hence this split is motivated. Animate nouns are those that denote 'all humans, most, if not all, domestic animals and fowl, and some large wild animals'. Nouns denoting large wild animals form an intermediate zone, and the remaining nouns are inanimate (Schuh 1989:175). When animate nouns are used of a plurality of entities, it is obligatory to mark plurality inflectionally. Furthermore, such plural-inflected animate nouns take plural agreement obligatorily.
It is the inanimates that are significant. With inanimates, inflectional marking of plural is optional, which is seen clearly in numeral phrases (Schuh 1989:175, 1998:198, 258). The surprising fact is that even when they are inflected as plural, inanimate nouns do not take plural agreement. The resulting agreement form is not a default form; rather, the target tracks the noun's gender as though the noun were singular.
(24) Miya (Schuh 1998:193, n. 6, 197)
The noun inventory is split: animates must take agreement in number, while inanimates cannot. The latter situation is not a failure to agree, since there is still agreement in gender. Agreement in number is determined by an animacy split, which follows the lexical semantics of the noun: it does not match inflectional marking, since inanimates can inflect for number but still cannot control plural agreement.
6.2. Agreement: primary lexeme: featural split
This type of split characterizes instances where some featural values of the primary lexeme require one value, and others another. Now we expect that the featural values of the primary lexeme will determine agreement; for example, we expect plural forms to control plural agreement (a point we return to in the discussion of Bayso below). But beyond this normal requirement there can be splits. Consider the Serbo-Croat16 noun oko 'eye' (sg.nom), which has the plural oči 'eyes' (pl.nom). This is irregular, since the consonant alternation k ∼ č [] before -i is not a synchronic alternation in the inflectional morphology. With oko 'eye', this old alternation follows the singular-plural divide; based on it the noun [End Page 131] follows two different inflection classes. The relevance of this split is that it affects agreement, not in number but in gender.
(25) Serbo-Croat
This noun is split in the gender agreement it requires, and the split is a featural one, being determined by number: the noun is neuter in the singular (25a), and feminine in the plural (25b) (the only similar noun is uho 'ear', plural uši). This type of situation is called 'split agreement' by Dolberg (2019:52), who makes a natural extension from the internal split in the lexeme to the split in the agreement values that result.
A less usual split of this type involves a split in gender determined by case, in various Scottish Gaelic dialects. Thus in Leurbost, Isle of Lewis, according to Oftedal (1956: 180), muð 'sea' and taLu 'earth, land, soil' are masculine in the nominative singular and feminine in the genitive singular; some dialects have the reverse situation. See Corbett 2015a:170–71 for details and sources, and for the interesting agreement hierarchy effects see Corbett 2022a:60–61.
A third case deserves mention here. In the Cushitic language Bayso (bsw), regular nouns when paucal require plural agreement, and when plural require masculine singular agreement. There are additional splits based on small numbers of lexemes (see Corbett 2019:88–92 for examples and analysis). This split is significant, since it undermines the assumption we tend to make that the 'internal', morphosemantic description of a cell matches its 'external' agreement requirement. We assume, for instance, that plural cells require plural agreement, and in the canonical situation this is true. But this does not hold in Bayso, nor indeed in Miya (24), where plural inflection does not necessarily imply plural agreement. Miya has a lexemic split based on the primary lexeme. But the split may also be a featural one and so belong here, as is the case in Bayso.
6.3. Agreement: secondary lexeme: lexemic split
This type of split can be seen particularly clearly in agreement with honorific pronouns. Consider Bulgarian (bul) polite address to one person.
(26) Bulgarian (Katina Bontcheva, p.c.)
As 26 shows, the verb is plural (both the auxiliary and past participle). In contrast, an adjective is singular.
(27)
Thus, the part of speech of the secondary lexeme is a key determinant (more detail in Corbett 2006:230–33). These data fall under a broader generalization, the predicate hierarchy, established by Comrie (1975) and discussed further in Corbett 1983:42–59, Wechsler 2011, Despić 2017, and Puškar-Gallien 2019.
6.4. Agreement: secondary lexeme: featural split
A split in agreement involving the featural specification of the secondary lexeme (the agreement target) seems [End Page 132] impossible. After all, agreement implies that the featural specification of the target will match that of the controller. Our approach makes us search for the possibility of a split here, and we find one: gender and number agreement can be split, according to the case value of the target. We see this with the remarkable item deca 'children' in Serbo-Croat (its irregularities are detailed in Corbett 2011:120–21, 2022a:61–62). In brief, deca 'children' is the semi-suppletive plural of dete 'child'; it inflects like a feminine singular noun, and takes feminine singular and neuter plural agreement (if we include the personal pronouns, also masculine and feminine plural agreement). The choice is subject to a range of constraints (Corbett 1983:76–88, Wechsler & Zlatić 2012, Hristov 2013:336–41, 2021:62–102, and references there). The agreement target relevant for us is the relative pronoun. Consider first 28, where the relative pronoun stands in the nominative (determined by its role in the relative clause).
(28) Serbo-Croat
Koja 'who' is neuter plural here; this is the only specification that is consistent both with the plural auxiliary and the form of the participle. Contrast 29, with the relative pronoun in the accusative (functioning as object in its clause).
(29) Serbo-Croat (Corbett 1983:79)
In 29 the form is unambiguously feminine singular.17 Thus the hybrid noun controller deca 'children' is split in the agreement it controls on the relative pronoun, the split being determined by the relative pronoun's own case specification. I return to these data in §1 of the supplementary materials.
6.5. Agreement: summary of the evidence for splits
The data on agreement are summarized in Table 7.
Agreement: summary of evidence for splits.
With agreement, as with government, we find each of the four possible types of split. The incongruent ones are striking, in different ways. The last type has the odds stacked against it and is rare, but even it is attested. Agreement provides additional evidence supporting our typology, given in supplementary materials §1. There I demonstrate that each element of our typology is essential, and hence the typology is minimal. [End Page 133]
7. Splits in selection
In this relation, the selector, the primary lexeme, stipulates the presence of the selectee, the secondary lexeme (depend requires on). Just as with government and agreement, there are four logical possibilities to consider.
7.1. Selection: primary lexeme: lexemic split
Here some lexemes have one requirement, others another. For example, French verbs are split according to the auxiliary they take:
• most take avoir 'have', including: écrire 'write', chanter 'sing', applaudir 'applaud', jouer 'play' …
• others take être 'be', including: arriver 'arrive', venir 'come', naître 'be born' …
Splits in the selection of auxiliaries are a major topic of research: in general, a split in this context is a split between verbs taking be and those taking have, as in French. McFadden 2007 remains a good general survey; see also Ledgeway 2014, 2019, Kailuweit & Rosemeyer 2015, Ackema & Sorace 2017, and Gregersen et al. 2017.
While research on auxiliaries dominates, there are further situations where primary lexemes are split according to their selectional requirements. Thus in Russian, most nouns select basic locational prepositions, as in the first row of Table 8. A smaller number take the second set of options, while nouns denoting humans take the third possibility.
Russian prepositions: split in selection by the primary lexeme.
Note the interaction (§3.6): the noun selects the preposition, which governs its case value; for further data from Slavonic see Browne 2015. Another interesting example is the choice between en and à selected by geographical names in French: en France 'to France', but au Canada 'to Canada' and aux États-Unis 'to the United States' (see Miller et al. 1997:80–86). The related issue of expressions of time is discussed relating to 32 below.
7.2. Selection: primary lexeme: featural split
Here some featural specifications of the primary lexeme require one lexeme, some another. This too occurs in auxiliary selection. There can be a split according to tense, as shown by Ledgeway (2000: 186; further examples pp. 201–2). The data come from Procidano, a peripheral variety of Neapolitan (nap). For the perfect, the have auxiliary is selected, while for the pluperfect the be auxiliary is required.
(30) Procidano (Ledgeway 2000:186)
[End Page 134]
(31)
Note that in Procidano the participle is the same across the tenses. Elsewhere we often find different forms of the primary verb.
As a second example, consider selection in Russian time expressions (Skoblikova 1971:120–34, Timberlake 2004:429–41, and Nesset & Makarova 2018). The relevant example involves the days of the week. The time expression splits its selection of preposition according to its own number (singular in 32a and plural in 32b).
(32) Russian
Again there is an interaction (§3.6): the noun selects the preposition, which in turn governs the case of the noun.
7.3. Selection: secondary lexeme: lexemic split
Here we consider Lithuanian (lit) numerals, concentrating on the lower numerals, the most frequent ones (Cerri 2019); these match the gender and case of the noun. In general, the numerals have recognizably plural forms (du 'two' and trys 'three' are somewhat exceptional) and can be said to agree in number. Pluralia tantum nouns, on the right in 33 and 34, select a different set of numerals (the 'cardinal plural numerals', cpn).18 These also match the gender, number, and case of the noun.
(33) Lithuanian (Ambrazas et al. 1997:95, 168, normal orthography)
(34)
The cardinal plural numerals are indeed a distinct set, selected by the pluralia tantum nouns: we cannot argue that they are just the plural of the ordinary cardinal numerals, because those are already plural.
The Russian situation seems initially similar, but it involves an additional split. As in Lithuanian, pluralia tantum nouns require a special set of numerals (traditionally the 'collective' numerals) for 'two', 'three', and 'four'. The motivation is that the normal cardinal numerals take the genitive singular, something a plurale tantum noun naturally does not have. In the oblique cases, however, numeral and noun stand in the same case, with the noun in the plural. Therefore, a special numeral is not needed, and the ordinary cardinal is normally used in modern Russian, as in 35 and 36. [End Page 135]
(35) Russian
(36)
We find a split, in that pluralia tantum nouns select a special numeral for 'two', 'three', and 'four', but they do so only when the numeral stands in the nominative (or accusative = nominative). We thus have a split of the secondary lexeme, which is both a lexemic split and a featural split.
7.4. Selection: secondary lexeme: featural split
Some (secondary) cells require one lexeme, some another. This is found frequently in Italo-Romance dialects, as described in detail by Loporcaro (2007, 2016), Ledgeway (2019), and Pescarini and Loporcaro (2022). The key point here is that the selection of the auxiliary (be or have) depends on its person and number specification.19 Our example is from San Benedetto del Tronto (Central Italy); data are from Štichauer 2018:7, following Manzini & Savoia 2005(II):682–83.
(37) San Benedetto del Tronto
In such instances we should ask whether we might be dealing with one suppletive lexeme, namely an auxiliary that has combined forms from two auxiliaries, previously distinct but now merged into a single lexeme. Štichauer (2018:6–8) responds to this concern directly. One cogent argument is that in this variety both verbs have full paradigms, with be functioning as a copular verb and have as a verb of possession (among other uses). Moreover, there are varieties with two possibilities in individual cells (as in Table 9, Vasto), and even three can be found, documented in Loporcaro 2007.
The pattern in 37, opposing third person to first and second, is common (see the survey in Pescarini & Loporcaro 2022); it is like that described for Eastern Abruzzese by D'Alessandro and Roberts (2010:43), for example. We might be tempted to look for a syntactic solution here, but this is just one pattern of many. The various patterns, in Abruzzo alone, are given in Table 9.
Table 9 demonstrates that related varieties can have a range of patterns of splits in their selectional requirements, based on the featural specification of the secondary lexeme; see also Hončová 2012 and Andriani 2016:146–49. A remarkably complex case of [End Page 136]
Auxiliary choice in Italo-Romance varieties spoken in Abruzzo (adapted from Loporcaro 2007:184, following Giammarco 1973).
this general type, but involving reflexivity, is the selection of the auxiliary in French surcomposé forms, for which see Abeillé & Godard 2002:447–48; and for further data on the dramatic variety of Italo-Romance auxiliary selection see Loporcaro 2014.
7.5. Selection: summary of the evidence for splits
Table 10 summarizes the rich data on splits in selection.
Selection: summary of the evidence for splits.
As with government and agreement, we see that each of the four possible types of split is attested. Here again the incongruent splits are remarkable; Italo-Romance illustrates phenomena that prima facie seemed unlikely.
8. Anti-government
Recall that anti-government is a relation implied by the interaction of the primitives underlying our typology of splits. Different featural specifications of the primary lexeme require the presence of different secondary lexemes. In the constructed examples from §3.4, it is the tense of the verb (present vs. past) that determines the requirement for different prepositions (at vs. in).
(38) Constructed examples
It seemed likely that there would be no actual occurrences. It is exciting that examples have been found, establishing anti-government as a phenomenon, including evidence of split anti-government. For each language I first demonstrate the existence of anti-government, and then present the split.
8.1. Splits in anti-government: primary lexeme: lexemic split
Here we examine Middle Mongol (thirteenth to fourteenth centuries); the data come from Benjamin Brosig's (2015) survey of the Mongolic family, and personal communications with him (May 2021). The basic pattern of anti-government involves negation. The negator ese is anti-governed by the perfective/past, ülü by the imperfective/nonpast, and büü by moods including the imperative. Examples are from The secret history of the Mongols (SH; thirteenth century); see Brosig 2015:71, n. 2, n. 3 for details of sources. In 39 the verb is in the factual past, and so the negator is ese, while in 40 there is a future participle, and this anti-governs ülü.
(39) Middle Mongol, SH §243 (Brosig 2015:71)
[End Page 137]
(40) Middle Mongol, SH §82 (Brosig 2015:71)
The negators appear before the verb; they can be separated from it by focus markers and can host the clitic question marker (as in 44 below). Example 41 shows a negated imperative.
(41) Middle Mongol, SH §170 (de Rachewiltz 2004:91, Brosig 2015:112)
Such negated imperatives (prohibitives) are structurally symmetric, in that dropping the negator would leave an acceptable positive command (Brosig 2015:112). We thus have a good representative of anti-government; the verb anti-governs different negators according to its own featural specification: perfective/past vs. imperfective/nonpast vs. moods including the imperative. (Further languages with different negators for the imperative, and where positive and negative commands are otherwise the same, can be found in van der Auwera et al. 2013.)
The split within this pattern of anti-government is between main and auxiliary verbs, and involves their respective ability to anti-govern. Suppose we have a main and auxiliary verb, whose tense-aspect values mean that they have different anti-government requirements. If the negator appears in front of the main verb, then that will determine the choice of negator.
(42) Middle Mongol, SH §255 (Brosig 2015:83)
In 42, it is the requirement of the main verb that prevails. However, if the negator appears between main verb and auxiliary, two outcomes are possible. In 43 the auxiliary wins out, while in 44 it is the main verb.
(43) Middle Mongol, SH §209 (de Rachewiltz 2004:142, Brosig 2015:84)
(44) Middle Mongol, SH §254 (de Rachewiltz 2004:186, Brosig 2015:84)
We see a lexemic split involving the primary lexeme: the main verb anti-governs a preceding negator, while if the negator precedes the auxiliary, the auxiliary may anti-govern the negator or its position as anti-governor may be usurped by the main verb. Thus [End Page 138] Middle Mongol provides not only a fine instance of anti-government, but also an interesting split within it.
8.2. Anti-government: primary lexeme: featural split
Here the split is in the other dimension; some featural specifications of the primary lexeme require a further split (in addition to the anti-government requirement). Surprisingly, this is found in Andi. We saw anti-government in Andi in §3.4, but in the Rikvani dialect; the Zilo dialect provides even better evidence. The examples were generously provided by Steven Kaye, based on fieldwork with Umargadži Magomedov (August 2019). Here we see that the featural specification of the verb, namely aorist, habitual, or other (illustrated by the progressive), determines the item to be used for a wh-question.
(45) Andi (ani), Zilo dialect
This is remarkable: we have a three-way distinction (as opposed to two-way for the Rikvani dialect). Again these are not the only lexemes involved, since polar question words behave similarly, as Table 11 shows.
Anti-government patterns in the Zilo dialect of Andi (Steven Kaye, p.c.).
Thus we have solid evidence for anti-government. Let us move on to the split. The past-tense versions of the habitual and of the progressive are periphrastic, formed with the aorist auxiliary -iʁi 'was, stayed'. Example 46 shows the auxiliary forming the past of the habitual; the main verb is marked as habitual, and the auxiliary shifts this to the past. Both agree in gender with the absolutive argument riʟ'i 'meat' (ʟ' is the lateral ejective; riʟ'i 'meat' belongs to the fourth of Andi's five gender values, and prefixal bhere does not distinguish number; Moroz & Verhees 2019:273–74).
(46) Andi (ani), Zilo dialect, elicited
If we question example 46, there is a surprise, as in 47.
(47)
In 47 there is no auxiliary. Similarly, the possible polar question words (Table 11 above) are missing. What we find instead is =ʁiro, which combines question with past.
Let us now consider wh-questions, to match our examples above. Take first the past habitual (we saw the present habitual in 45b). Here again there is no separate auxiliary, just the same question word =ʁiro, marking the past as well as the question. [End Page 139]
(48)
The forms we would have expected, *Men-ni ib=k'o / ib=di ǯid-e b-iʁi?, which have the auxiliary b-iʁi (agreeing prefixally with the absolutive argument) and the two potentially acceptable wh-question words, are both ungrammatical. Hence we indeed have split anti-government. The split occurs when the verb stands in the past habitual: the expected aorist auxiliary and the question word give rise to a different form, conveying both past and question.
We see the same effect in the past progressive (see 45c above for the present progressive).
(49)
Again, the forms with the polar question words we might have expected, namely *Menni ib=ʁi / ib=di ǯidi-r b-iʁi?, are unacceptable. Thus the split in anti-government involves the past versions of habitual and progressive, for both polar and wh-questions. We find =ʁiro, marking the past as well as the question. (There is no past version of the aorist, so there are just the two possibilities that we have seen.) It is remarkable that, in addition to a clear example of anti-government, the Zilo dialect of Andi also provides evidence to demonstrate a split, which involves the featural specification of the primary lexeme.
8.3. Anti-government: secondary lexeme: lexemic split
The Uralic languages exhibit great variety in their negation strategies, in both the markers and their distributions (Miestamo et al. 2015). Erzya (myv) is particularly rich, and again illustrates both anti-government and an interesting split within it. Erzya is spoken in the Middle-Volga region of Russia; the data are from Hamari & Aasmäe 2015, and see also Turunen 2011:154–62. Erzya has no fewer than six negators, determined in part by the tense and mood of the verb. Some of these are illustrated in the following affirmative-negative contrasts.
(50) Erzya (Hamari & Aasmäe 2015:297–99)
The present tense has a paradigm of three persons and two numbers, indefinite as here and with further forms for the definite. These are negated with the negator a. The same is true of the past progressive/habitual (traditionally the 'second past' tense).
(51)
Now compare the normal past tense, where there is a different negator; this negator inflects fully, while the main verb stands in the 'connegative' form.
(52)
Finally we consider the imperative, which shows yet another negator. [End Page 140]
(53)
The full set of feature values determining the choice of negators provides another good instance of anti-government. Our main point, however, is that in addition there is a split of the secondary lexeme, the negator. The negators show marked differences: for example, they may leave the main verb unaffected (as in 51), or they may inflect and take the main verb in its connegative form (52 and 53). For more on the complexities of negation within periphrasis, see Spencer 2013 and Bonami 2015, and for its interesting intricacies in Georgian Sign Language, see Pfau et al. 2022.
Another, rather different, example of this type of split is found in Livonian (liv), also from the Uralic family. Livonian has three inflecting negators, and these are split by word-order requirements as follows: äb and iz must appear immediately before the main verb, while the prohibitive alā/al- can be separated from it (Metslang et al. 2015: 438–40, 443).
8.4. Anti-government: secondary lexeme: featural split
This (as yet unattested) configuration could occur in a language like Andi if, say, the question words for wh-questions were sensitive to the grammatical number of the constituent questioned. It is understandable why this split should be the hardest to find; a combination of factors is stacked against it. Anti-government is rarer than the other three host relations, so there are fewer places where the split could arise. Then splits of the primary lexeme exceed those of the secondary lexeme, with featural splits of the secondary lexeme being particularly hard to find. The corresponding type involving agreement (§6.4) is rare, yet agreement generally is well researched compared with anti-government. Hence at this point there seems no reason to attribute the lack of this type of split anti-government to anything more significant than a relative shortage of opportunities and researchers.
8.5. Anti-government: summary of the evidence for splits
Remarkably, we have found clear cases of anti-government, a relation implied by the underpinnings of our typology (§3.4). The data from Middle Mongol, the Zilo dialect of Andi, and Erzya are particularly telling. This progression from specifying the typology exactly, to subsequently identifying and highlighting significant data, shows the value of the exploratory nature of canonical typology (Bond 2013:24).20 Given the few instances of anti-government established to date, it is surprising that sufficient of them yield splits to cover three of the four possibilities. With this in mind, let us consider the summary of the evidence for splits involving the other three relations as well as anti-government, in Table 12.
Table 12 shows that the correspondence of attested splits to possible splits is strikingly close. Even the highly unlikely agreement split is attested (§6.4), as are the surprising incongruent splits in selection. The one gap involves anti-government, a relation that itself has just been established as a contribution of this article. In classical typology, we might start looking for explanations of why this type of split has not been found. Since we have, as yet, few examples overall for anti-government, we already have more split examples than might have been expected. Hence we should continue to look for [End Page 141]
Summary of main evidence for the typology of splits.
the missing split; it is too early to say whether the distribution is significant. The key points are: (i) within the three well-known relations, all of the splits allowed for by the typology are attested; and (ii) the typology has established a new relation, anti-government, and within that three of the four theoretically possible splits have been found.
9. Related solutions
In laying out the issues clearly, showing the full range of splits, I aim to ensure that we can talk about the data constructively and consistently across frameworks. A danger is that a given type of split is tackled in a particular way, and by inertia other related (but subtly different) splits attract the same analysis, even when less appropriate. Conversely, some splits provoke lively debate (as with the great Latvian case controversy, and the Romanian gender polemic, both of which have gone on for decades).
The two types of split (lexemic and featural) on which the typology is built prefigure the main analytical choice. Given prima facie evidence for a split, there are two main strategies to attempt to accommodate the data without resorting to a split. First, we may suggest that the partitioning of lexemes can be made regular (for example, by extending the part-of-speech inventory, which for many linguists involves enriching the syntactic structure). Or second, we may extend the feature system (typically by adding feature values).
As an example, consider a split in the case frames required by different verbs. In the canonical world, the syntactic split (the case requirements) matches a semantic split (for instance, active vs. nonactive; Mithun 1991). Then it would make good sense to partition the category 'verb'. But often we face Saussurean arbitrariness; then some instinctively add more structure, suggesting that the anomalous examples form a distinct set in terms of structural properties, while others enhance the features in the lexical entries. A good illustration of this debate can be found in Loporcaro 2015 (on auxiliary choice).
Or consider the nominal lexicon. Many items can be readily assigned the part of speech 'noun', but some are noun-like to a greater or lesser extent. These were analyzed by Ross (1973), with batteries of tests. Numerals offer a striking picture, and the Slavonic languages are especially rich (Corbett 1983:215–40). One way forward is to establish subclasses of increasing delicacy. We can then capture the typological generalization [End Page 142] that as the simple cardinal numerals get numerically larger, they become more noun-like. This is basically a part-of-speech solution. An alternative strategy enriches the feature system with various types of case values (lexical, quantitative, and structural), as for example in Babby 1987:114–17. There is an extensive literature, including Mel′čuk 1985, Franks 1995:93–219, 2023, Pesetsky 2013, and Ionin & Matushansky 2018:161–221.
With these different views in mind, namely the inclination to seek solutions preferably within the part-of-speech inventory or alternatively within the feature structures, I present three illuminating case studies in §2 of the supplementary materials. In Latvian there is an apparent split in the government requirement of prepositions; here strong arguments point to a featural solution (and no split). Split ergativity deserves discussion; I analyze a significant example, Guugu Yimidhirr, and show that here too the issues are in the feature structure and the inflectional morphology. The third case study, Aguaruna (Overall 2017), is important since it might appear to lie outside of the typology: it seems to need reference to the interaction of two arguments (Witzlack-Makarevich et al. 2016). Yet even Aguaruna can be accommodated, and this provides validation for the typology. All three can be found in supplementary materials §2.
10. Conclusion
There is a unifying notion to the various uses of 'split'. The lexicon divides into parts of speech, and there are cross-cutting regularities (features); a split is an additional partition, in either dimension. We defined the host relations involved, namely government, agreement, and selection, and the primitives we needed there implied a fourth relation, anti-government, where different featural specifications of the primary lexeme require the presence of different secondary lexemes. This new host relation has been documented in languages of various families. This result, brought about by the exploratory approach of canonical typology, echoes earlier work on internal splits. For these, a full typology was proposed (discussed briefly in n. 3), which included some apparently implausible types; but surprising instances were found, suggesting that the full range of possibilities was attested. Moreover, an additional phenomenon, repartitioning, has since been discovered in Soq (mdc), defined by contrast with the typology of internal splits (Daniels & Corbett 2019). While it is sobering to find that the real possibilities are more extensive than we imagine, this is preferable to artificially constraining our search in advance.
To define the four host relations, we started from the opposition lexeme vs. featural specification in asymmetric relations; hence we distinguish a primary and a secondary lexeme. Our typology of splits reuses the same minimal machinery. Each relation has a primary and a secondary lexeme, and for each a partitioning of the lexemes or a partitioning by featural specification is possible. There are therefore four types of split, in each of the four host relations, which gives sixteen theoretical possibilities. Thus we were able to construct an elegant typology, using minimal machinery and applying the same measures within and across languages (Round & Corbett 2020, Himmelmann 2022).
This typology brings clarity to the discussions of splits. It also helps to balance the research agenda: previous work tended to concentrate on splits that involve the primary lexeme; here I have documented also the splits involving the secondary lexeme. And the notion of congruency (§4.2) makes clear why some splits are more easily recognized, while others are analytically more difficult. We have seen how some splits create major clefts in a language's grammar (as with splits in the verb lexicon according to the auxiliary selected; §7.1), while some involve individual lexemes (such as the Russian preposition po 'for'; §5.7). Whatever model of syntax we favor, it needs to be able to [End Page 143] accommodate these data. Many splits are complex, in that they involve partitions of more than one type, and the interacting factors need to be carefully unpacked. The typology scales up readily to cover these. Instances were documented in the case studies, in languages that are genetically and geographically varied, from major world languages and from small and threatened languages.
A particular value of the typology is that it encourages us to be clear about what exactly is split, and what determines it. Expressions of the type 'split-X' are multiply ambiguous. We need to specify:
(i). what is partitioned,
(ii). the basis/evidence for the partition, and
(iii). the condition(s).
For instance, in the 'plurality split' (§6.1) there is: (i) a partitioning of the nominal lexicon, involving the primary lexeme; (ii) the basis is agreement in number (for the external split, the key evidence is that some items induce a singular-plural distinction for agreement and some do not); and (iii) it is determined by animacy, which is motivated. Hence we might call it a nominal split for number agreement, conditioned by animacy. 'Plurality split' is not fully adequate as a term, while other multiply ambiguous terms, like 'split ergativity', are confusing. Besides various external splits, this latter term is sometimes used to refer to an internal split, according to which different lexemes have different patterns of syncretism, but where there is no split in external requirements (see supplementary materials §2.2).
The four possible types of partition, essential for (i) above, are laid out clearly in our typology (§3); we distinguish lexeme vs. featural specification, and primary and secondary lexemes, giving four possibilities. The bases (ii) are the four host relations (government, agreement, selection, and anti-government) within which are numerous possibilities. The conditions (iii) may be categorical or gradient (§3.6); moreover they may be unclear and disputed in given instances, particularly in the extent to which they are motivated.
Elucidating what exactly is split leads to a sharpening of our analyses and applies across different traditions. And that is the hope and the prospect: that we can continue to share the terminology, widen our horizons to take in the full range of splits, as laid out here, and tackle them from our various perspectives. The splits in auxiliary selection in Romance engage those who are passionate about formal syntax and those who are passionate about the detail of Italo-Romance dialects. Other types of split, documented here, deserve equally committed investigation, from different wings of the discipline.
Literature and Languages
Faculty of Arts and Social Sciences
University of Surrey
Guildford, Surrey, GU2 7XH, UK
[g.corbett@surrey.ac.uk]
REFERENCES
Footnotes
* The support of the AHRC (grant: AH/N006887/1: Lexical splits: a novel perspective on the structure of words) and of the ESRC (grant: ES/R00837X/1: Optimal categorization) is gratefully acknowledged. Versions of this paper were read at SLE 2017, Zurich, September 2017; 'New Fields for Morphology', Surrey, November 2018; 'Nouvelles approches de la typologie des systèmes flexionnels', Paris, November 2018; joint session of the American International Morphology Meeting (AIMM4) and Formal Approaches to Slavic Linguistics (FASL 28), Stony Brook, May 2019; 'Non-canonicity in Inflection', Surrey, June 2019; and the Department of Theoretical and Applied Linguistics, Moscow (virtually), April 2021. I thank members of those audiences for their comments. I am especially grateful to Peter Arkadiev, Laura Arnold, Jenny Audring, Matthew Baerman, Balthasar Bickel, Olivier Bonami, Patricia Cabredo Hofherr, Mae Carroll, Bernard Comrie, Hans-Olav Enger, Tim Feist, Mike Franjieh, Axel Holvoet, Steven Kaye, Adam Ledgeway, Alan Libert, Michele Loporcaro, Tore Nesset, Petr Rossyaykin, Erich Round, Pavel Štichauer, Anna Thornton, and Martin van Tol for their helpful suggestions at various stages, and to Penny Everson and Lisa Mack for their help in preparing the manuscript. Finally I thank John Beavers, Marlyse Baptista, and three Language referees for their substantial constructive input.
1. I follow the Leipzig glossing rules (https://www.eva.mpg.de/lingua/resources/glossing-rules.php); [] is for nonovert elements (as when information is inferred from the use of the bare stem), () marks inherent, nonovert feature values such as gender, > indicates agent acting on patient. Bold is a flag to draw attention to relevant characteristics of examples. Abbreviations follow Leipzig with some additions: 1, 2, 3: first, second, third person, i, ii, iii, iv: genders i, ii, iii, iv, abs: absolutive, acc: accusative, aff: affirmative, aor: aorist, art: article, aux: auxiliary, caus: causative, cng: connegative, colln: collective numeral, cop: copula, cpn: cardinal plural numeral, cvb: converb, dat: dative, decl: declarative, dem: demonstrative, dna: denominal adjective, erg: ergative, f: feminine, fact: factual (evidentiality), fut: future, gen: genitive, hab: habitual, imp: imperative, indir: indirect (evidentiality), inf: infinitive, ins: instrumental, ipfv: imperfective, loc: locative, m: masculine, multi_pl: multi-plural, n: neuter, neg: negator, nmlz: nominalizer, nom: nominative, npst: nonpast, obj: object, pfv: perfective, pl: plural, poss: possessive, prep: preposition, prf: perfect, prog: progressive, prs: present, pst: past, ptcp: participle, q: question word, rec_pst: recent past, rel: relative, sap: speech act participant, sg: singular, tam: tense aspect mood, vol: voluntative, wh: wh-question word.
2. In the 'stipulated lexicon' (Stump 2016:64–66), this specification may be redundantly available. Thus in a defaults-based approach, such as network morphology (Corbett & Fraser 1993, Brown & Hippisley 2012), the default specification might be that in a given language adpositions take the genitive. Then either this is true of all adpositions, or there is an override for those that take some other case value. We may wish to limit what is listed lexically (Pesetsky 1995:2–4), but see Jackendoff & Audring 2020:59–74 for an alternative view. Irrespective of what can be deduced from other properties and how much must be listed lexically, we need featural specifications, deduced or listed, of the type 'requires instrumental'. Such specifications are the baseline condition; the issue for splits is any specification of the type 'requires instrumental under condition X, and dative under condition Y'. Similarly, I take a realistic view of lexemes in assuming that part-of-speech information is specified, since even in the worst case—a language like English—this is straightforwardly available for most items (Potts 2008:358–60). The general approach to morphology is that justified in Anderson 2017, discussed in Spencer 2016 and Bonami et al. 2018:v–xiv.
3. Internal consistency deserves a brief mention, since research on that topic provides the model for our investigation of external consistency. Internal consistency concerns the structure of a lexeme's inflectional paradigm, the way in which a lexeme's featural specification is realized morphologically. Internally consistent lexemes have a simple mapping from the grid of feature descriptions (such as [past]) to the set of actual forms (like computed). We see this in two ways: (i) they have the same stem across their different forms (English compute has just one stem), and (ii) the inflectional material is shared across lexemes (the past-tense marker is shared with many verbs). But we also find inconsistent lexemes, showing phenomena such as suppletion (go ~ went), deponency, and defectiveness. Each of these phenomena has been investigated, leading to a specific typology for each. An original step was to take a more abstract view, analyzing together all of the phenomena that induce internal splits. The resulting typology proved surprisingly complete (Corbett 2015a). That typology started from the straightforward (consistent) lexemes, to establish a point in the theoretical space from which to calibrate the rich variety of actual examples. The Surrey Lexical Splits Database, which is based on that typology, is freely available at https://lexicalsplitsdb.surrey.ac.uk/. For dramatic examples of internal splits see Feist & Palancar 2021. I adopt the same techniques here: (i) taking a more abstract view of external splits than previously, in order to produce an overall typology, and (ii) using consistent lexemes as the baseline from which we calibrate. Internal splits can also be relevant, since they may induce external splits, as in 25 in §6.2 below (fuller analysis in Corbett 2022a). And keeping internal splits in mind will lead us to reassess examples treated previously as external splits (as in the analysis of Guugu Yimidhirr in supplementary materials §2.2, available at http://muse.jhu.g.sjuku.top/resolve/176).
4. Adopting this starting point is like agreeing to measure temperature from zero kelvins, rather than picking a mid-point and measuring both up and down (as with the Fahrenheit scale).
5. There is now a considerable literature, including a helpful survey (Bond 2019), a collection of papers (Brown et al. 2013), and an online bibliography at https://www.smg.surrey.ac.uk/approaches/canonical-typology/bibliography, which shows that the range is expanding steadily; as examples, Cormier et al. (2013) and Costello (2016) discuss sign languages from this perspective, and Kwon and Round (2015) analyze phonaesthemes.
6. While it is feasible and logical to represent part-of-speech information through features, those who do so often restrict the term 'feature' to morphosyntactic features (see Adger & Svenonius 2011 for helpful discussion); I follow that usage. Note that if a feature is purely inherent (Corbett 2012:66–68, 2013:55–56), handling it by further partitioning the parts of speech is an option. For instance, we could treat verbal aspect as a partition of the part of speech 'verb', or as a feature cross-cutting the verb lexemes. Here there can be useful discussion about the trade-off between the two. But where the feature has contextual uses, as gender and number do (shown by agreement), the duplication involved in partitioning the parts of speech gets unwieldy and misses the point, as argued by Sag et al. (2003:38–40) and Corbett (2012:1–4), among others. Indeed, the more fully orthogonal a feature is to the lexeme inventory, the stronger the case to recognize it as a feature. To see this, consider the extremes. If a proposed feature were needed for one lexeme only, most linguists would simply treat this as two lexemes, rather than expanding the feature inventory. At the other extreme, if a proposed feature applies to all or almost all lexemes (as number does in some languages), then there is no sensible alternative to accepting it as a feature (Corbett 2013:54).
7. The supplementary materials referenced throughout are available at http://muse.jhu.g.sjuku.top/resolve/176.
8. Our focus is external splits; for internal splits, see n. 3.
9. Our assumptions of consistency often hold good, so naturally we ask whether there is some analysis of the data that would avoid the need for a split, that is, whether one or another dimension of the analysis is not well grounded. For Turkish, we might postulate an additional part of speech, with four members. Or, in the other dimension, we might suggest an additional feature value, identical to the nominative under some circumstances and to the genitive in others. Such analytical trade-offs are a possibility to bear in mind: for some examples, they make sense; in others they would be a pointless complication (further discussion in §9). More generally, our reasonable assumption of consistency will lead us to find genuine and interesting splits. Indeed, the Turkish data point us in this direction.
10. We discuss the four relations as binary asymmetric relations determined by lexemes; the analysis generalizes to the constituents that are headed by these lexemes (see §3.5 on headedness). We build up from this base, since lexemes provide the most clear-cut types of split. And as we shall see, splits may be induced by single lexemes. Our original Turkish examples 1 and 2 involve just four specific lexemes with special government properties, and, as the fuller data in §5.6 below show, we need to also specify the pronouns affected. More generally, the data in §5.7 demonstrate clearly that splits can target (parts of) individual lexemes, hence the need to work at this concrete level.
11. I take the minimal baseline here to be syntactic. The alternative would be to attempt to give case values sufficiently rich meanings to determine the case value to be selected. The first option is the more practical one, since those case values that are more clearly semantic appear as adjuncts not constrained by a governor (see also the discussion of differential argument marking in the supplementary materials §3). If configurations of arguments determine a case frame, as in dependent case theory (Marantz 1991, M. Baker 2015, M. Baker & Bobaljik 2017), this is evidently less canonical government. For early discussion of issues with asymmetry in government see Mugdan 1991; and for more recent discussion of government see Pereltsvaig et al. 2018, Berrendonner & Deulofeu 2020, and Rudnev & Volkova 2020.
12. Dixon (1994:71–83) distinguished two idealized types: split-S systems, where each verb has a fixed syntactic frame determining the case value it governs, and fluid-S systems, where the single argument takes different case values 'depending on the semantics of a particular instance of use' (he cites Batsbi, bbl). This usage has caused confusion, since fluid-S is a type of split, yet it is used in opposition to split-S. The usage was modified by de Hoop and Malchukov (2007:1638–40), who extend the use of 'fluid' to object marking (as in Hindi, hin). For them, 'in fluid differential case marking, two forms exist for the same noun phrase in the same linguistic context' (2007:1639); the difference is in the semantic properties of the element that is case-marked. There are two problems. First, 'fluid' tends to imply variable or gradient, and gradience is not limited to semantic conditions. Second (though this is not an issue for de Hoop and Malchukov), researchers often use fluid-S to characterize whole language systems, which is problematic for languages like Batsbi (Holisky 1987), where there is a split in the marking of intransitive first- and second-person subjects, but some verbs typically take the ergative, some the absolutive, and some allow both (with semantic preferences). This is a split system, but 'fluid-S' is not an adequate label for it. Thus 'fluid-S' mixes notions that are best kept separate. The basic distinction is that we have situations where all items of a defined type are treated alike ('all intransitive subjects take the absolutive'). Against this, there are situations where the items are treated differently; this is a split. See Creissels 2008, Woolford 2017, and Witzlack-Makarevich & Seržant 2018 for further discussion of the relevant terms. For discussion of the difficulties raised by split-S systems for dependent case theory see J. Baker 2018:134–37. The substantial topic of differential argument marking is kept for the supplementary materials §3, which also includes an account of some of the substantial literature.
13. In this way we cover all of the 'core' uses of split. Some extend it further to include competition between constructions (as in 'split possession' (Stolz et al. 2008) or 'alienability split' (Ortmann 2018)). I stay with the core of splits, as defined in §2.3; for competition more generally see Corbett 2021 and references there.
14. Whatever the approach, an additional specification is required here; for instance, in dependent case theory, examples like 14 need to be specified as having inherent case. It is not only verbs in question; similar issues arise with adpositions. Thus, in Lithuanian (Ambrazas et al. 1997:406–7, 420–21) most prepositions take one case value (the most popular is genitive, then accusative, then instrumental). Furthermore, už 'behind, over, outside; later; by, for, etc.' takes genitive or accusative, as do a handful of others. Just one, po 'about, around, after', takes genitive, accusative, or instrumental.
15. Po is challenging in its various senses, and in different Slavonic languages as well—see for example Franks 1995, particularly on Russian, Serbo-Croat, and Polish. According to Mayo (1988), the main interest in the Slavonic family is with Belarusian (§5.4) and Russian, as discussed here. However, Polish is challenging when we consider the distributive use of po; see Łojasiewicz 1979, Przepiórkowski 1999, 2006, 2008, and Przepiórkowski & Patejuk 2013. For distributive po in Russian see Pesetsky 2013:79–80 and references there, and for temporal see Severskaia 2022. For general discussion of case alternations in diachrony see Kulikov 2013 and the sources he cites.
16. In accord with the 2017 'Deklaracija o zajedničkom jeziku' (http://jezicinacionalizmi.com/deklaracija/), I treat Serbo-Croat (hbs) as a pluricentric language, comparable to German or English, with four standards: Bosnian (bos), Croatian (hrv), Montenegrin (cnr), and Serbian (srp). See Corbett & Browne 2018 for a linguistic outline, and Bugarski 2012, 2019 for the literature on language status and sociolinguistic background.
17. Arsenijević and Gračanin-Yuksek (2016:5) discuss the partly similar item braća 'brothers', plural of brat 'brother', and claim that with this noun, in nonrestrictive relative clauses, the semantically agreeing form koje (accusative masculine plural) is acceptable, in addition to koju (accusative feminine plural); they do point out that some speakers find such sentences ill-formed (which is a problem for their analysis). In any case, since koju (accusative feminine singular) is possible, whether there is an alternative or not, this means that there is a split in agreement, determined by the case of the target, the relative pronoun. For further data on braća 'brothers' see Despić 2017:262–64.
18. According to Mathiassen (1996:86), however, the ordinary cardinal numerals 'seem to be preferred' in colloquial speech.
19. And not only on person and number. Ledgeway (2019:356) shows that person and number are typically relevant only when the auxiliary is in the present, and are overridden when TAM specifications are changed. For example, staying with the dialect of San Benedetto del Tronto, we find be generalized in the counterfactual perfect.
20. I have not found an instance requiring a different lexeme for absolutely every cell of the primary lexeme's paradigm, but this does not challenge the method, any more than the rarity of items at zero kelvins is a problem for physicists (compare also Piantadosi & Gibson 2014). Indeed, many of the examples of agreement we regularly discuss are not fully canonical—they do not show a unique form of the target for every specification of the controller. The value of the baseline is that it allows us to calibrate these examples accurately.