Unification Categorial Grammar

Unification Categorial Grammar:

or hesitations on the edge of NLP

Dominik Lukeš
http://www.dominiklukes.net

Praha/Edinburgh 1992

Contents
1. Introduction

2. Few paragraphs about Unification basics

3. Categorial Grammars

4. Primitives of Unification Categorial Grammar

5. Semantic Representation

6. Some notes on phonology

7. Conclusions and prospects

8. Using UCG Outside NLP

Note on HTML: The original document was written in LaTeX and then converted into WordPerfect and later into Word and from Word into HTML. Some of the formatting is therefore lost. I tried to edit the HTML version as best as I could but some of the indentation may still be skewed.

In this paper, I would like to offer a description of a grammar formalism called Unification Categorial Grammar (UCG). It is a sign-based approach to NLP, exceeding virtually framework of computational linguistics. I will try to place the formalism into its historical context and to show how it employs different sources from recent as well as no longer recent history of linguistics.

In his book (Shieber 1986, p.38) Shieber distinguishes two classes of formalisms, such that were designed primarily as tools and those devised with the intention of being linguistic theories. As the inventors of UCG declared, it was developed as ”a linguistic theory which could be implemented as a parser in a reasonably efficient manner” (Zeevat, Klein and Calder 1987); the present paper is basically devoted to the theoretical aspects in the sense that I will be concerned with the way formal description of natural language is approached and implementational problems involved will be touched upon briefly only when necessary. Linguistic viewpoint will not only make the description sometimes more intelligible but as I do hope it is also going to allow me further comparisons and generalization which might not be automatically associated with it in more specialized contexts.

Bearing in mind the fact that some of the elementary principles which UCG is making use of are comparatively little known I will devote two introductory sections to a short description of them. In the first section I will be concerned with few primitives of the method of unification as it is used in many contemporary linguistic theories (cf. Shieber 1986). In the course of second section I will try to trace briefly a history of natural language description approach known as Categorial Grammar (CG), postulate basic rules of its ”pure form”, and in short demonstrate how these can be used in practice and some obstacles it can meet. I will also shortly summarize certain extensions (UCG being one of them) adopted towards their solution. Following section will show how the method of unification and CG formalism are employed in the entire UCG formalism, through the notion of sign from HPSG.The fourth section will present UCG as a whole. The basic concepts introduced before will be enhanced by employing the semantic representation language (InL). Some comparisons will be drawn with the theory of mental spacesand generative lexicon. In a very brief manner, few general phonological notes will be made in the fifth section.In the last section, then, some further practical and theoretical questions will be raised. I will also briefly mention an attempt made by Dunbar (1988), to use the UCG formalism for implementation of a space grammar fragment.

Note on the scope of current analyses One may be surprised by comparatively small range of examples used for illustration, seemingly incompatible with my claim about linguistic nature of the present paper but as I stated above it is the formalism what is in the focus of the paper only I will be concerned mainly with its linguistically theoretical advantages and flaws. Thus no attempt will be made to resolve any particular problem troubling contemporary linguistics. To offer even a short fragment systematically described would not be possible in such a constrained space as is that of this article and it is even not my intention to do so. UCG is not definite theory of language and certainly is not able to account for all phenomena by which it is constituted. The present form of the formalism will rather serve as a formalist stimulus triggering further questions concerning formal as well as theoretical linguistics.It should be, therefore,borne in mind that all the analyses in the present paper are concerned only with highly unmarked cases. By highly unmarked I mean that even in uses where the patterns might be used markedly they are artificially unmarked. One will have to differentiate between examples that are meant to illustrate the actual UCG formalism and those concerned with ”real” natural language phenomena.

So there will only be few sentences that we will be able to analyze satisfactorily. First primitive sentences like in (1)

(1) Kris jumps.
Kross found a mole.
and more intricate ones as
(2) She leaves him.
Kross found himself.

to illustrate, first, how the formalism can cope with intersentential anaphora and with reflexivity in the second case. For more elaborate examples one should refer to the works cited in section 3.2 .

Few paragraphs about Unification basics

Unification became quite popular through the past decade as a method which can provide linguists with powerful means for natural language description and processing while allowing for maintaining lexicalist standpoint in the theoretical part of their efforts. Canonical work about unification became Shieber (1986) who summarizes the basics of the method and gives essential comparison of many different theories which make use of it. In fact, all what can be found in this section is in some form present in his book, as well. However, recently, a vast number of investigations is being carried out which could be classified as unification based, for an approach that parallels Shieber's see Johnson (1988) and a more up-todate survey of diversity of work can be found in Dorre et al. (1990). Here, I will try to sketch as much of the formal approach as I think is useful for introducing the notions and for further comparison with 'pure' CG and showing its use and precise definition in UCG.

Minimalist syntax

Syntax is, in the context of unification based formalism, hardly more than concatenation of units and since in lexicalist view most of syntactic information, i. e. the information on combinatorial constraints, is carried by the units themselves, the concatenation rules can be sufficiently expressed by context free (phrase structure) rule, and the ”lexical” information is added by annotating its nodes. The annotation takes form of equations such as in this PATR-II rule

(3) S ® NP VP

<S head> = <VP head>

<S head subject> = <NP head>

This rule generates what is called annotated tree. However, UCG uses Categorial Grammar as its syntax devise, so we can omit combinatorial domain of Shieber's approach and investigate in, somewhat, more detail that of information, that is, its structuring and combining within and across lexical entries.

Structures of features

So called feature structures is what provides us with information about distinctive units of grammar, from whatever level we can possibly choose, in a structured manner which can be not only quite convenient but also theoretically and computationally desirable approach. It consists of features (or the names of the features to be precise) and values assigned to them, so in

(4)

the features are case and number with the respective values nominative and plural. The (DNomPl) is just a name for the given feature structure.

What is important here, is that features can be assigned another feature structures as their values, like in

(5)

where the (DNomPl) functions as a value of the feature agreement. Here we assign a value Plural to the ordered pair of attributes <agreement,number>. This ordered pair is called path which is in this case assigned value Pl. It is obvious that this way the feature structure may become very complex, sharing many common features with the same values, and so, at this place templates are introduced. They may be thought of as abbreviations of or names for regularly repeating feature structures. For example we may call this feature structure ([cat:V]) simply V(erb). These templates are ordered in a hierarchy according to the principle of subsumption (see 2.3 below) it is called inheritance hierarchy. There are some other ways how to express more complex regularities like default inheritance used for inheriting only a constrained portion of information and lexical rules (used also in UCG) serving to the purpose of expressing morphological regularities like passive. The underlining principles of all lexical organisation rules are, however, the same. One ought to refer to Schieber (1986) for more details.

Historical interlude: Although the unification method has its beginnings in the late seventies, in the theoretical work of Bresnan, which influenced Kay in designing 'true' unification based grammar formalism (FUG). The concept of feature, however, is much older, it can be said as originating in phonology in 50's and earlier, and used for syntactic analysis by Chomsky, later on, in 60's (see Gazdar et al. 1985 for detail discussion, chpts. 2 and 5).

Unification itself

Unification is defined basically as a combination of information from two feature structures, which is carried out in order to obtain new (target) feature structure including information of both source feature structures, so for instance,

(6)
can unify with
(7)
to create a feature structure identical with that in (5).

In order to further specify basic rules for unification we need say a bit more about feature structures. The first, important thing, is that they can be reentrant, that is, one structure can actually contain more than one attribute with the same value. The second thing is the ordering of feature structures into the lattice where one structure subsumes the other according to the specificity of information it carries. Generally, the feature structure carrying more information is more specific i. e. is subsumed by that with less information. For instance, DNP1, Í DNPPl2, that is the former subsumes the latter. Simple definition of it may be that in (8).

(8) D Í D˘ iff D contains a subset of information in D˘

From it follows that variables subsume all other feature structures and atoms subsume none. Subsumption is very important relation, because we may define unification using it as follows.

(9) D is the unification of D˘ and D˘˘ (D=D˘Č D˘˘ ) if D˘Í D and D˘˘Í D, providing that the information they contain is compatible.
In the opposite case unification is said to have failed.

Unification is also monotonic, that is, all the information contained in the source feature structures is also preserved in the target one.

We can know leave unification to its doom and turn our attention to the other component of UCG.

Categorial Grammars

A brief history of CG

Let us first briefly examine the development of thought within Categorial Grammar(CG). It should be referred to Wood (1991), for a more detail description. CG made its first appearance in an article of Ajdukiewicz's (1935) Syntaktische Konnexitat. His system was, however, primarily devised for formal languages of mathematical logic and linguistics first profited from it when principles of his analysis were used by Bar-Hillel (1953) when he was developing an ”arithmetic like” description for natural language. Bar-Hillel took Ajdukiewicz's notion of syntactic categories and combined it with the method of immediate constituent analysis. The principle of this approach was in assigning each expression a category such that only a single combinatory rule would suffice to check syntactic connexity or well-formedness of the given string. Usually a small set of basic categories is taken, for the present purposes we accepted those of N(oun), NP(hrase) and S(entence) which are themselves sufficient for giving definite string and can combine with derived categories like N\S assigned to intransitive verbs and carrying information saying that in order to obtain a string with category of sentence this word must be preceded by another one with category N assigned to it. For instance in

(10) Pooh   thinks.  
  NP   NP\S  
        BFA
    S    

Pooh thinks is a syntactically connex string if syntactic categories assigned to its constituents can combine to produce category of sentence. The rule which applies here may be expressed generally as X X\Y ® Y the operation carried out is known as backward functional application (BFA) alternatively called cancellation because of its practical resemblance with arithmetic notation for operation with fractions.

Theoretical background It is important that one be aware of the fact that categorial grammar, or its application to linguistics, was not originally developed as a theory of language but as a grammar formalism. Bar-Hillel (1953) introduced his presentation of CG not as a new theory but as a promising formal system in which linguists may find it comfortable to express their findings. There are, however, many theories of categorial grammar describing its mathematical properties, formal power or applicability. And it is the distinction between a theory and a theory of that I like to draw here. The actual theory which is reflected by CG can be traced back to Frege's distinction between saturated and unsaturated expressions. Thus arguments are formal embodiment of the former while functors of the latter. This analysis is made in much more detail in Wood (1991).

Pure CG
Syntax

Let us now turn to a more precise definition of Categorial Grammar for more detail mathematical definition and discussion of mathematical properties of CG as well as some further references to vast literature being currently written on this topic. We first put constraints on the set of categories.

(11)a. If X and Y are categories then X/Y and Y\X are categories.

b. Nothing else is category.

Here we need differentiate between primitive or basic categories and derived ones. Derived categories work as functors and basic categories as arguments. Usually, arguments are N, NP, and S, and all other strings are ascribed functor categories and therefore play an active part in concatenation. And now let us define operation which can be applied on the concatenation of categories.

(12)a. X/Y Y ® X (Forward Functional Application - FFA)

b. Y Y\X ® X (Backward Functional Application - BFA)

Figure 1 Immediate constituent tree with CG categories as its nodes.

This is the definition of the ”purest” form of CG; let us have a look, now, at one more sample derivation. Consider sentence with transitive verb of category NP\S/NP

(13) The   hedgehog   carries   an   apple.    
  NP/N  
N
  NP\S/NP   NP/N   N    
                    FFA  
    NP           NP      
                    FFA  
          NP\S            
                    BFA  
        S              

We can imagine this also as an immediate constituent tree, enriched with primitive dependency, as in Fig. 1. However, there may be different ways of carrying operations still leading to assigning S category to the above string, just like in immediate constituent analysis, but certain others are excluded on formal basis. It should be noted that the active (functor) part of the derived categories may be treated as a way of encoding of the valency slots, whereas the argument part represents the information about to what kind of expression is the given string central, or in other words it is the representation of the information about subcategorization and the head (This resemblance to the traditional concept of valency was pointed out by Klein, e. g. Klein 1989).

Semantics As one of the very attractive features of CG is regarded its direct way of attaching semantics to syntactic operations. Functor words are then regarded as functions being applied on the semantics of argument expressions, see e. g. Steedman (1985) or Wood (1991) for more detailed discussion. Rule for the semantic operation may be then

(14) A/B:Á B:b ® A: Á (b)

This is achieved by b-reduction on lx[F(x)]. The semantics of (12) would then be.

(15) S:hedgehog,apple [lx ly (Def(x), Indef(y), carries(x,y)]

This seems to be well in accord with the principle of compositionality in its ”strong form” which holds that meaning of a complex linguistic expression is a direct function of the meanings of its constituents.

Unification and Categorial Grammar

Although, some of the implementations of CG in the framework of unification-based formalism are to be mentioned in 2.5.4, I would like to point out the general similarity of the two approaches. There should be no harm in regarding functional application as the unification of the information on syntactic valency in the concatenation of two constituents. So for example the concatenation of the string with category NP/N with category N, as in the example below,

(16)
the
 
solution
 
NP\N
 
N
     
FFA
    NP  

gives a unified string with all the relevant information gained by the application of the two categories. The difference here is that this ”unification” is not monotonic, namely that not all the information is preserved as it is in the formalism presented in section 2.

Some Obstacles

It has been noted, already by Bar-Hillel (1964), that CG in its pure form, sometimes called AB (Ajdukiewitz-Bar-Hillel) calculus, is not sufficient for a complete description of natural language. The most immense problems are caused, for instance, by long distance dependencies, parentheticals, etc. Some of the possible solutions to these problems are briefly discussed in the next subsection. Another set of problems presented by Bar-Hillel is that of assigning categories to lexical entries. He argues that it would be necessary to increase the number of basic categories in order distinguish various kinds of nominals like singular, plural and animate, inanimate and it would still be difficult to ”force natural language into straitjacket of immediate constituent model.” He offers a solution based on basic insights from transformational grammar. Similar way of solving these problems is adding feature structures-like encoding of the necessary information.

Some of the possible extensions to the AB calculus
Lambek calculus

Another way of overcoming the above mentioned problems is outlined in the framework of so called Lambek calculus. This approach achieves the desired result from within the framework of CG by extending the number of operations allowed on categories. This method was envisaged by Lambek (1958) and what is called Lambek calculus is just a set of operations of the general form given in the figure below together with those of AB as in (13).

(17)(X\Y)/Z ® X\(Y/Z) (Associativity - A)

X/Y Y/Z ® X/Z (Functional Composition - FC)

X ® Y/(Y\X) (Type raising - TR)

X/X ® (X/Z)/(Y/Z) (Division - D)

All these combinatorial rules may have their backward versions as well. Parentheses are inserted in order to mark precedence. These rules were presented first in the Lambek's article for reasons like the difficulties with handling sentences with personal pronouns, like he likes him having assigned S/N\S categories to pronouns and N\S/N to transitive verbs.

(18) he   likes   him  
  S/N\S   N\S/N   (S/N)\S  
            BFA
        S/N    
            FFC
     
S
     

The rule of type raising was, then, adopted in order to catch up the fact that pronouns and noun phrases may behave categorially the same way. So, in other words what was needed was the general acceptance of combination of functors with functors and the possibility of arguments becoming functors. These operations were further used e. g. by Steedman (1985) for analysis of discontinuous sentences like

(19) These apples the hedgehog must have been carrying.

using functional composition and the rule of subject type raising of the form NP ® NP\S/S with the semantic interpretation 2. Other ”long distance” phenomena are allowed as well. For more detail discussion of Lambek calculus and extensions of its semantic representation based on the system reminiscent of modal logic see Morrill et al. (1989), other properties are used as an active part of Steedman's CCG (Combinatorial Categorial Grammar - e. g. Steedman 1985). Division allows to relate the applicability of sentence modifiers to predicates as well.

Dependency in CG

This is a brief note on Pickering and Barry's (1989) work on constituency and dependency in CG. Their main concern was the implication resulting from the Lambek calculus that every string can be assigned a category, the type-constituent correspondence hypothesis. This is what causes big processing problems and in many ways makes the very formalism unsuitable for linguistic concerns. They alternative is to abandon the above hypothesis and impose further constrains on the constituent structure by what they call dependency constituency. In their framework only dependency preserving (DP) derivations are allowed. The resulting derivates are called dependency constituents. The derivation is DP if it conforms to the head (functor) - dependent (argument) relations. So, for instance, in

(20)Pooh thinks that Heffalumps like honey.

the substrings Pooh thinks, Pooh thinks that, that Heffalumps, etc. are, and thinks that Heffalumps, Pooh thinks that Heffalumps are not dependency constituents. We cannot go any further in the description of the implicatures of the proposals but it should be clear that various unclear issues must be settled, as, for instance, the very notion of the head - constituent relation. Note that similar ideas appeared already in Bar-Hillel (1953). He would probably call something similar thoroughly connex categories although the motivation is slightly different and Barry and Pickering were interested in the direction of the process of derivation and names could be sometimes regarded as dependency constituents, albeit it is not as easy as that and more investigation is needed. Barry and Pickering's remarks, however, have to be taken into account as well in UCG the more as they are inspired by certain insights of HPSG's notion of head and dependent.

Outside solutions

All the above described approaches to solving CG-IC grammar problems were aimed to preserve relative independence of CG formalism and improve its plausibility for NL description, somewhat, from within the formalism itself, for instance by devising new rules for combining categories. However, by such extensions CG not only gained expressiveness, but also lost some of its elegancy and became rather opaque. The path which was followed by others, was to simply take CG as such and to employ the best of it and supply its deficits by other means. This was done (apart from UCG), for instance, in HPSG see Klein (1989), or other unification-based formalisms, like those used by Uszkoreit (1986) or Karttunen (1986). It has been further argued, by Bouma (1987) that Steedman's theory is not able to account for certain phenomena like extraposition. He, following Uszkoreit (1986), proposes to use unification extension of CG. The process of doing it is representation of categories as feature structures with respective values for features argument, functor and directionality. The problem of gaps may be, then, solved, following GPSG and HPSG, for instance, by adding a new feature-value pair gap. In UCG, however, the use of CG is more literal, and its unification-friendliness, as pointed out in 3.3, is made fully use of and the enrichments consists of extending the structure of atomic categories. Moreover the semantic representation is completely reworked.

Primitives of Unification Categorial Grammar

The Sign in HPSG

The reason why HPSG (Head-Driven Phrase Structure Grammar) is particulary interesting is its claimed turn back to the threshold of European linguistics, Pollard (1985) says that he attempted to translate Saussure to the jargon of contemporary computational linguistics. I do not intend to investigate adequacy of this claim here but rather to show how they practically employ the Saussurian notion of sign in the framework of unification based formalism. The position held by Pollard and Sag (1987) is that we can express the dichotomy signifiant vs. signifie as the feature structure like

(21)

with attributes semantics and phonology, for instance, with values [TREE'] and [tri:] respectively. Pollard and Sag, however, do not merely take Saussearian theory of sign for granted. They incorporated syntactic information into it as the third argument so that they achieved the general appearance of sign as in (23) (Pollard and Sag,1987,pp. 51 ff.).

(22)

HPSG is a prototypical example of unification-based formalism and, as well as in PATR II, we can see, already here, the possibility to impose constraints on syntax from the level of semantic representation (see section 5).

The Core Notions of Unification Categorial Grammar

In the description of UCG I depended mostly on several papers that defined the formalism for the theoretical part, they were Klein (1988), Zeevat, Klein and Calder (1987) and some others concerned with extensions of the UCG formalisms like Klein (1987), Bird (1991), Moens et al. (1989) or Zeevat (1991). I will refer to particular works only when they differ among themselves or offer some interesting ideas not explicated in the others, the formalism in which the presentation is given is unified according to that developed in previous sections.

Categories and Signs

Now, we will finally turn to the entire UCG formalism. First I will slightly change and extend the definition of the set of categories as given in the section 3.2 according to the CG definition provided by Zeevat, Klein and Calder (1987) where the HPSG's sign will come to use. The sign, here, represents complex linguistic expression with three levels of representation. In this section, the levels of phonology and semantics will remain on the intuitive level. The somewhat enlarged set of categories in UCG will, now, be as follows.

(23)a. Primitive (basic), categories are N, NP, S.
b. Any primitive category is category.
c. If X is a category and E is a sign, then A/E is a category.

The active part of the complex categories may contain variables and variables may also occur at the levels of phonology and semantics. This makes the CG way of handling information yet closer to the unification-based approaches. The actual sign may then look like this

(24) w jumps
S\(w:NP:x)
jump’(x)

Every basic category may be further specified by a value attached to it in the attribute-value manner. The attributes which are regarded sufficient for English by Klein (1988) are as in (27).

Zeevat, Klein and Calder (1987) allow prepositional features for NPs like TO, BY, OF, FOR and more extensive range of V forms including Passive etc. The feature specification are inspired by those used in GPSG (Gazdar et al. 1985). It is obvious that for most other languages than English this selection of categorial features would not be sufficient and it is a sad truth that linguistics still does not have any adequate theory of grammatical relations and the way units enter into them and how information is passed over and moreover we are not quite sure what is a linguistic unit. Although, as already pointed out, here the ability of UCG to use semantic information in the domain of syntax may be and actually is employed at this moment, so that manyfold features playing their role in agreement, like gender, animacy etc., are expressed by semantic sorts (see section 5).

(25)

Zeevat, Klein and Calder (1987) also posit a fourth ”level” of representation for order with possible values PRE and POST, which we can, for present purposes, handle by bidirectional version of CG so that ”/” will have the role of placing functor's phonology before its argument and ”\” will then stand for the value PRE. From now on the active part of the category will be always to the right of the argument part. More should be said about variables. They can be used at each level of interpretation as well as for the whole sign. They, however, cannot range across levels. Some kind of ”intuitive” sorting may be e. g. that given in (26).

(26) E = (W:C:S)

(26) can be thought of as the most abstract sign, i. e. the variable E, and the part enclosed in parentheses as its ”implicit” structure that can be ”type raised” to maintain monotonicity in unification. Variables can stand also for features. C[F], then, is a variable standing for arbitrary feature and the underscore character ”_” will stand for particular features. We will have more to say about sorted variables in the section concerned with semantics.

Functional application revisited

As the units postulated in UCG have more complex structures than those of pure CG the basic (and the only one), combinatorial operation, functional application requires more precise definition where unification plays crucial role, unlike in CG where, as I have tried to show, it is contained only implicitly. Unification as defined for the purpose of UCG consists of two successive operations, instantiation and stripping. If we suppose that sign with complex categories work as functors and signs whose categories are basic play the role of arguments, then we may define instantiation as follows.

(27)The functor sign is instantiated with respect to its argument if it is the result of unification of the functor's active sign with the argument sign.

Instantiation of the sign fails if the unification with its argument fails.

(28)Stripping the sign E with category X/Ea consists of concatenation of the phonological representation of E with that of Ea and stripping off the active part of E's category.

The reformulated rule for functional application, is then given in (31),

(29) Ef Ea ® EAE iff E is the result of stripping the Efinstantiated with respect to Ea(i.e., Ef Ea ® EAE iff E= Ef ČEa)

A sign conforming to the above given properties is called wellformed sign and in the words of Zeevat, klein and Calder (1987) p. 197

The set of wellformed expressions can be defined as the phonologies of the set of wellformed signs. These in turn can be defined as the closure of the lexicon under functional application.

Now we have roughly explicated all rules for combining information contained in linguistic units and are in position to approach to few examples of lexical entries and derivations on them.

Sample derivation Let us show in some detail how the process of functional application is carried out following step by step the definitions given above. Consider simple sentence like in (30).

(30) Kris jumps.
The subject sign will be as in (31)
(31) Kris
NP[r,Sg,Subj]
Kris

and for that of intransitive verb is like in (32), only with syntactic features like specifications added to categories as below.

(32) w jumps
S[Fin]\w:NP[3,Sg,Subj]:x’
jump’ (x)

For the sake of illustration we will go through the whole process of unification of (31), and (32), So (33), is the instantiation of (32), with respect to (31),

(33) w jumps
S[Fin]\Kris:NP[3,Sg,Subj]:Kris’
jump’ (x)

And (34), is the result of stripping (33), and at the same time it is the sign of the sentence Kriss jumps, a result of functional application of (31), and (32).

(34) Kris jumps
S[Fin]
jump’ (Kris)

Semantic Representation

Now, we come to the pride of UCG, semantic representation. The formalism is named InL, standing for Indexed Language, and was developed primarily for UCG. The main influence on this formalism is to be sought in Kamp's DRT (Kamp 1981). That is why I will first briefly investigate some of its basic principles.

Discourse Representation Theory (DRT)

Kamp's primary intention was to bring close truth-conditional theories of meaning and what we might call representational approach. This leads him to the following characterization of truth.

A sentence S, or discourse D, with representation m is true in a model M iff M is compatible with m. The compatibility of m with M can be defined as the existence of proper embedding of m into M. The proper embedding is a map from the universe of m into that of M which, roughly speaking, preserves all the properties and relations which m specifies of the elements of its domain.

Kamp (1981), p. 278

That is, in order to determine truth conditions for any sentence we must take into account its structure which is the reflection of the model of universe it represents. The theory seems to be particulary successful in analyzing phenomena like pronominal anaphora and conditionals. It is also apparent that such a formulation of the semantic theory is ideal for unification based formalisms like UCG.

Anaphora The strategy for pronoun resolution in DRT is that every noun phrase is matched to one reference marker and the possible antecedent to the anaphoric pronoun is then selected from the set of available reference markers. Let us show how it is done in the DRT formalism. We can consider such simple two sentence discourse as in (35).

(35) a. The Hedgehog has an apple.
b. He eats it.

The discourse representation (DR) of (35 a) is shown in (36) and that of (35 b) in (37).

(36)[x,y].[The Hedgehog(x), an apple(y), x has y]

(37)[x,y].[x eats y] << [].[]

What we have here is the DR formalised as a list of reference markers and a list of conditions in the form [R].[C], and if anaphoric pronoun occurs the ”<<” designates that some preceding discourse is needed in order to properly instantiate its reference marker. The [] sign stands for an unspecified variable. Comma is used for conjunction, and the only other connective of the formalism is ”®” for implication. The representation of the whole discourse is then given in (38).

(38)[x,y].[x eats y]<<[x,y].[The Hedgehog(x),an apple(y),x has y]

This is called discourse representation structure (DRS). The most primitive DRS consists of just one single DR. It is important to note that the reference marker ranges across the boundaries of particular clauses or sentences, its boundaries are those of the discourse.

Conditionals, quantification, negation The treatment of conditionals would be based on the same principles as that of anaphoric information only in order the sentence of the form A ® B be true, every proper embedding of A's DR m is extendable to m' which is the DR of A and B. So in the case of the sentence

(39)If The Hedgehog has an apple, he eats it.

the DR is same as in (38). For counterfactuals, however, it would be necessary to postulate some additional connectives to express the presupposition of nonexistence.

Similarly, we might express quantificational phenomena and negation. It is shown in (40) and (41).

(40) a. " X
b. [x].[X(x)] ® [].[]

Thus, we may see how reference markers represent also the quantificational scope of NP. Description of indefinite NPs is the same as the universality of them is presupposed (see Johnson and Klein 1986).

(41) a. Ř X

b. [x].[X(x) ®^ ]

Here the ^ 1means a necessarily false formula (see Klein 1987). I think that I have roughly presented the most important theoretical and formal features of DRT that may be relevant for the description of InL. It is important to note, this formalism is monotonic, that is, no information is lost during discourse processing.

InL

Indexed Language is based roughly on the kind of formalism outlined above. The basic difference is that each expression introduces its own index. Expressions are sorted according to ontological types. This is formally expressed by dividing variables into sorts. Sorts are ordered into subsumption lattice of the sort used in feature structure formalisms. Example of such a lattice is given in Fig. 2. This sample should be rather viewed as a grossly oversimplified fragment, having, in fact, little in common with real world or language but in this context quite sufficient for illustration, as an example of a simple ad hoc world. This one is borrowed from Klein (1987) but much more detailed description is offered by Zeevat, Klein and Calder (1989). Examples illustrating the selection of sorts are given in (44)

(42)a. mole in the hole object (x, y)
b. water mass (m)
c. x walked event (e)
d. dropping process (p)
e. is nice state (s)
f. tomorrow unrestricted (a)
g. computers Pl object (X, Y)

There are few other things that should be made clear. First, inspite its appearance the sort lattice is not a tree, that it is possible to reach particular sorts by more than one path. For instance, mass objects are neuters at the same time in English. Second, particular sorts should be regarded as templates standing for the whole path leading to them, thus e. g. the sort female ought to be read as <UNRESTRICTED:COUNT:SINGULAR:FEM>. At this place I would like to point out that to elaborate further the description of semantic structures Pustejovsky's (1991) notion of qualia structures could be used and moreover it could contribute to linguistically more sound encoding of combinatorial restrictions, for instance using methods outlined in Copestake and Briscoe (1991), which are based on typed feature structures and use partial sorts and deterministic inheritance. The notation differs further from DRT in that the indices are not stacked in the list in the beginning but they are prefixed in square brackets to the expression by which they are introduced.

Figure 2 Subsumption lattice for sorts in InL
UCG lexical entries

We are now able to show the finite appearance of UCG lexical entries and illustrate what are advantages of this enriched semantic representation language. (32) with the full semantic representation is then as in (43).

(43) w jumps
S[fin]\w:NP[3, Sg, Subj]:x
[p][PRES(p), jump(p, x)]
This can unify with anything of the sort SINGULAR (i. e. <OBJECTUAL:COUNT:SINGULAR>).
Let us now proceed to more complex examples.

Transitive verbs and modifiers This example will serve as another illustration of how lexical entries are constructed for some other syntactic categories. Let us consider the sentence (44),

(44) Kross found a mole.

The signs of the transitive verb, indefinite article and noun are given in (45), (a.) trough (c.), respectively.

(45) a. w1 found w2

S[Fin]\ w1:NP[,_ ,_ Subj]: COUNT] / w2 : NP[\_, \_,Obj]:OBJECTUAL [e][PAST(e), find (e, x, OBJECTUAL)]

b. a w
(C|(C|NP[3, Sg, _]:[a]S):[a]S)/ w:N[F]:x
[x][S(x)]
c. mole
N[Sg]
mole'(x)

The type raised category with variables is assigned to NPs like in (47) or (48) below.

(46) {Kross}
C|(C|NP[3, Sg, _]:[a]S):[a]S
[x][Kross (x)]
(44) is obtained in three successive steps, outlined in (47), (48) and (49).
(47) a mole
C|(C|NP[3, Sg, _]:[a]S):[a]S
[x][INDEF(x),mole (x)]
(48) found a mole
S\w:NP[_, _, Subj]:COUNT
[e][PAST(e), find(e, x, y), [y]INDEF(y),mole(y)]]
(49) Kross found a mole.
S[Fin]
[e][[x][Kross(x)],PAST(e), find(e, x, y), [y][INDEF(y),mole(y)]]
Anaphora and reflexivity
(50)She loves him.
(51)Kross found himself.

These sentences are good examples of the anaphoric relations, iter- and intra-sentential, respectively. We can choose between treatment of pronouns as plain NPs sorted according to their gender and number or as active on the level of semantics. One example of the latter is given in 53,

(52) she
C|(C|NP:[3, Sg, Subj]:[a]S):[a]S
[FEM][x<<[FEM][x']]
The first sentence then comes out as,
(53) She loves him.
S[Fin]

[s][[FEM][x<<[x'][FEM]],[s][love'(s, x , y)],[y][y<<[y][MASC]]]

Reflexivity can be dealt with either by introducing an identity relation as in Klein (1987), or by allowing for an occurrence of an ”active” sign in the semantic description. See (55) and (56), for an example.

(54) Kross found himself.
S[Fin]

[e][[MASC][Kross(MASC)], PAST(e), find(e, MASC, y), [MASC2] [Kross:NP[3, Sing, Subj]:MASC(y)]]

(55) himself
C\NP[3, Sg, Obj]:MASC

[MASC][MASC]<<[NP[3, Sg, Subj]:[MASC][]]

This might be useful for instance for description of collocations or even more rigid freezes.

We could go on with analyses of other phenomena for ever, improving our formalism in many respects, but I think that what I presented here is sufficient for illustration of the ruling principles.

Some notes on phonology

So far, I have paid little attention to the topmost level of representation in the UCG sign. The exemplary sentences used throughout previous sections were those of written English. Hence, the, so called, phonological representation served as one where the actual concatenation of words was performed. It was quite sufficient for demonstration of the way UCG works but it is at no rate everything that must be dealt with if we want to put forth a valid analysis of language phenomena.

As one of the achievements of the present formalism is considered its ability to express syntactic constraints by means of semantic representation. More generally, the formalism allows for the constraints to be stated across levels. Nothing inhibits us, then, from putting concatenation restrictions through the level of phonetics and/or phonology. It is well known that prosodic requirements must often be met by syntactic structure of the sentence and nothing is better example than ”non-configurational” languages to syntax of which we have paid no attention at all throughout the paper. For instance, the phrasal verb rule that pronouns must be always inserted immediately after the verb may well be of prosodic rather than purely syntactic character, see (56).

(56)a. I handed my paper in.
b. I handed it in.
c.
I handed in my paper.
d.*I handed in it.

It is not only the combinatorial domain, however, what is closely related to the representation of form, semantics is also twinned with phonology. One very good example of this is obviously topic-focus articulation where, especially in English, the stress is frequently the only means of disambiguation. The UCG formalism was applied to this fact by Bird (1991), to whose starting claim that ”... the utterance is simultaneously a well defined phonological unit and a well-defined semantic unit” the UCG formalism simply suggested itself.

The problem of interpretation, nevertheless, remains. We still have to decide what data we shall treat as relevant. That is, what is to be encoded in the formalism. It is not to be settled by myself but it is a question that must be asked and it will be addressed again in next section.

Conclusions and prospects

In this paper I attempted to present Unification Categorial Grammar and put it in somewhat broader or different context than it is usual. UCG is a unification based formalism which employs principles of Categorial Grammar in its syntactic and combinatorial part. It uses a version of DRT with sorted variables (indices) for its semantic representation . As its two main advantages are recognized its ability to express combinatorial restrictions on the semantic level and its relative extendability. I also made some comparisons throughout the description with other theories and formalisms. Most importantly the theory of mental spaces and Pustejovsky's theory of generative lexicon. In this final section I would like to make some further general remarks on the psychological and linguistic validity of the method of unification and to mention briefly an attempt made by Dunbar (1988) to use the UCG formalism for space (cognitive) grammar and to discus in short what may prove to be obstacles to such efforts.

Psychology of unification

As a gain of any theory about language processing is often taken its psychological validity. Which is quite natural having in mind that humans are, after all, the best natural language processors so far. Hence, many NLP theorists shield their statements about language by reporting findings of psychologists. Psychological parallel is also accompanying the dispute between generativist and non-generativist camps in linguistics.

In his book on pattern recognition, D. W. J. Corcoran (1971) distinguishes between two kinds of perception\recognition theories, the active and passive ones. The former defines perception as ”some function of a match conducted between a stored or generated pattern and the input” (Corcoran 1971, p. 201), while the latter equates it to the analysis - resynthesis process. Both these theories meet a bunch of serious problems which arise if we try to apply them to real perception\recognition events. All of these are too complex to be fully investigated here, and they stimulate other questions connected with the very foundations of cognitive science.

Unification falls into the first class. The consequence is that it has proper means of analysis with predefined lexical entries only, a fault mentioned also bellow. And thus it is not able to change the setting of lexicon dynamically. However, the unification method seems to be preferred lately.

Although passive theories have the advantage of dynamicity, the generative approaches are usually not concerned with its various aspects. I believe that the two streams should merge somewhere. My favourite example is the well known fact from the theory of information and elsewhere, that processing of language by humans is not homogenous. For instance, when we read we perceive and comprehend whole chunks of text but stop having encountered an expression for the first time (ever). Then we must mount the analysis process following the rules for generating sounds from letters. Similarly, no one can be considered a speaker of a language (natural or other) based on the passive knowledge of rules. This reason seems strong enough to make one start contemplating combined approach. See bellow.

Cognitive Grammar and UCG

In this section I will shortly mention Dunbar's (1988) attempt to capture certain concepts of Langacker's cognitive (or space) grammar in the UCG formalism. It will be rather general prospect of the limitations of symbolic approach to NL analysis than a detail report on Dunbar's work which would have to be torn out of context.

Langacker summarized theoretical background of Cognitive (or Space) Grammar (SG) in his paper from 1982 and conveyed it in his book in 1987. From his manyfold insights most seem directly relevant here, in the light of what has been said above. Most important for elucidating previous section is a concept of unit. Unit status can be acquired by any linguistic structure depending on psychological factors of language acquisition, and a grammar is then defined as a ”structured inventory of conventional linguistic units” (Langacker 1982, p. 25) Structured according to him means that ”some units function as parts of other units” (ibid.). This, of course, places the possibilities of UCG close.

Another distinction to be made is that between compositionality and analyzability. Zeevat, klein and Calder (1987) take as one of the merits of UCG the fact that it is in accord with the principle of compositionality. This principle states that meaning of the whole is composed of meanings of its parts. Unquestionable, as it may seem, this does not hold in all cases. The principle is subjected to severe limitations whenever encountering metaphorical and/or idiomatic expressions (cf. Lakoff 1987). It is doubtfull whether meaning of a sentence like in (57) is really mere composition of the meanings of its parts, namely the, papers, covered, the case.

(57) The papers covered the case.

The question is, what are we to consider a proper part of a whole. The result is that apart from pragmatic considerations it must be the whole itself. It is indeed quiet often the fact that some units are forming a whole what constitutes its meaning. The whole view moreover clashes with the pprocessual characteristics of human NLP.

At this point the notion of analyzability comes to mind as it was introduced by Langacker (1982).

Dunbar finds UCG suitable for formalising of SG not for any of its particular analyses of real NL phenomena; he rather takes it as a useful formalism. All the above mentioned properties of the marriage of CG with unification and complex representation of the form and meaning interplay take place in his study of determiners. We might say that UCG served us as a metaphor for our thinking about processual characteristics of language.

What made the UCG so attractive? If we read the proposals of both of the parties we can hardly help noticing vast differences between aims and methods. The former is on quest for parser while the latter seeks psychological adequacy. Nevertheless UCG provides possibilities for formal encoding of combinatorial restrictions without any necessary commitments to the, as Langacker named it, linguistic archetype, that is syntax is not treated independently as a selfcontained domain. UCG is also non-generativist in nature. The investigation of the correspondences between the two theories will undoubtedly deserve further attention. However, it must be borne in mind that UCG despite all its conceivable enlargements will remain a symbolic approach and thus will have to face the problems attached to it as sketched in the previous sections. Any symbolic model will sooner or later stand before an unsolvable problem of interpretation, the investigators will multiple categories beyond any measure with few results. For instance, the subsumption lattice in section 5 is not even primitive. It is extended in Moens et al. (1989) but the extensions will never end. The problem, as I see it, is in wrong positing of the relationship between system and process, competence and performance. Because if we keep inventing categories all the outcome will be that unification-based approach will end up having a different lexical entry for any word to be found in texts and generativists will find themselves with regularity achieved by positing a new rule for generating of any single utterance. In other words most symbolic approaches lack dynamicity and it is possible that stochastic modelling will take their place because it does not use rigid categories imposed on text by scientists, that is it gathers knowledge of system through the direct analysis of process and not by analyzing it drained through the hypothetical system. The bootstraping problem may indeed prove crucial to modern linguistic ( or generally semiotic) thouhgt.

There is moreover the problem of categorization which is indeed crucial to modern linguistics and it is newly approached by some cognitive grammarians. The UCG formalism does not represent an obstacle in this respect because it is open to any kind of categorization approach. It is not very dynamic in its current layout but it can be made so or at least the principles governing its functioning will help us reach the edge.

Using UCG Outside of NLP

This paper was originally conceived as a purely theoretical enterprise with no particular concern in any particular language. However, having been offered to publish these meditations in as narrowly specialized volume as is the present issue I felt that I should relate it more closely to efforts of linguists working with dead languages and I came with following suggestions.

First, the above outlined formalism is particulary good for storing complex grammatical information from manyfold levels. And, moreover, it has been already implemented as a parser. Number three is by no means magical in connexion to sign. Depending on each linguist's goal the number of levels has been increased of the level carrying information about order. This formalism could be easily related to some of the recent efforts for computerized corpora of ancient texts and composing lexicons suitable for complex grammatical analysis. If we desire we can statistically analyze not only graphical properties of available texts but also syntactic information which is usually beyond reach of mechanical examination. The Egyptian, for instance, faces a problem of factual interpretation of hieroglyfic, let alone hieratic, texts, where such matters as reduplication of radicals that was not always expressed in the original script are at stake. With the set of categories available it would be quite easy to verify syntactic connexity of certain constructions and validate consistancy of syntactic predictions on quantitative grounds. The computerization of Egyptian texts also imposes on us the problem of transliteration. Here the need for computer-readable corpora clashes with demands of human investigators. The transliteration used so far has been hopelessly incomplete and always reflecting the intentions of its author and it could be sensibly used only along with the original text which is, when working with printed materials, a rather tiresome activity. Usage of computers supported moreover by the implementation of a formalisim akin to that outlined above will doubtlessly simplify matters to considerable extent.

Acknowledgments

I worked on this paper mostly during my stay at the Centre for Cognitive Science in Edinburgh which was made possible for me by Tempus grant from 1992.

Bar-Hillel, Y. (1953) A quasi-arithmetical notation for syntactic description. Language 29, 47-58

Bar-Hillel, Y. (1964) Some Linguistic Obstacles to Machine Translation. Chap. 6 in Language and Information. Reading, Mass.: Addison Wesley.

Barry, G. and M. Pickering (1989) Dependency and Constituency in Categorial Grammar. In G. Barry and G. Morril, eds., Studies in Categorial Grammar, Vol 5, pp. 23-46.Edinburgh: Centre for Cognitive Science.

Bird, S. (1991) Focus and Pharsing in Unification Categorial Grammar. In S. Bird, ed., Integrating Phonology, pp. 3-34. Edinburgh: Center for Cognitive Science. DYANA Deliverable.

Bouma, G. (1987) A Unification-Based Analysis of Unbounded Dependencies in Categorial Grammar. In J. Groenendijk, M. Stockholf and F. Veltman, eds., Sixth Amsterdam Colloquium, pp. 1-19, Institute for Language, Logic and Information, University of Amsterdam, Amsterdam.

Briscoe, T. and A. Copestake (1991) Lexical Operations in a Unification-Based Framework. In Lexical Semantics and Knowledge Representation, SIGLEX-ACL Workshop.

Corcoran, D. W. J. Pattern Recognition. Harmondsworth: Penguin Books.

Dorre, J., A. Eisele, J. Wedekind, J. Calder and M. Reape (1990) A Survey of Linguistically Motivated Extansions to Unification-Based Formalisms. In J. Wedekind, ed., A Survey of Linguistically Motivated Extansions to Unification-Based Formalisms, pp. 1-43. Edinburgh: Centre for Cognitive Science. DYANA Deliverable.

Dunbar, G. L. (1988) The Cognitive Lexicon. PhD thesis, Centre for Cognitive Science. University of Edinburgh.

Fauconnier, G. (1985) Mental Spaces: Aspects of Meaning Construction in Natural Language. Cambridge, Mass.: MIT Press.

Gazdar, G., E. Klein, G. Pullum and I. Sag (1985) Generalized Phrase Structure Grammar. London: Basil Blackwell.

Johnson, M. (1988) Attribute - Value Logic and the Theory of Grammar. Stanford, Ca.: Centre for the Study of Language and Information.

Kamp, H. (1981) A Theory of Truth and Semantic Representation. In J. A. G. Groenendijk, T. M. V. Jansen and M. B. J. Stokhof, eds., Formal Methods in the Study of Language, Vol. 136, pp. 277-322. Amsterdam: Mathematical Centre Tracts.

Karttunen, L. (1986) Radical Lexicalism. Report CSLI-86-68,Centre for the Study of Language and Information., Stanford, Ca. Paper presented at the Conference on Alternative Conceptions of Phrase Structure, July 1986, New York.

Klein, E. (1988) Topics in Unification Categorial Grammar. Edinburgh: Center for Cognitive Science.

Klein, E. (1987) DRT in Unification Categorial Grammar. In B. G. T. Lowden, ed., Proceedings of Alvey Workshop on Formal Semantics in Natural Language processing, March 6 1987, University of Essex.

Klein, E. (1989) Notes on CG and HPSG. ms.

Lakoff, G. (1987) Women, Fire and Dangerous Things, or what Categories Reveal about the Mind. Chicago:University of Chicago Press.

Lambek, J. (1958) The Mathematics of Sentence Structure. American Mathematical Monthly 65, 154-170.

Langacker, R. (1982) Space Grammar, Analysability, and the English Passive. Language 58, 1.

Langacker, R. (1987) Foundations of Cognitive Grammar, Vol. 1, Theoretical Prerequisites. Stanford: Stanford University Press.

Moens, M., J. Calder, E. Klein, M. Reape and H. Zeevat (1989) Expressing Generalizations in Unification-Based Grammar Formalisms. In Proceedings of the 4th Conference of the European Chapter of the Association for Computational Linguistics, pp. 174-181, University of Manchester Institute of Science and Technology.

Morill, G., N. Leslie, M. Hepple and G. Barry (1989) Categorial Deductions and Structural Operations. In G. Barry and G. Morril, eds., Studies in Categorial Grammar, Vol 5, pp. 23-46.Edinburgh: Centre for Cognitive Science.

Partee, B. H., A. ter Meulen amd R. E. Wall (1991) Mathematical Methods in Linguistics. Dordrecht: Kluwer Academic Publishers.

Pollard, C. J. and I. Sag (1987) Information-Based Syntax and Semantics, Vol. 1: Fundamentals. Stanford, Ca.: Centre for the Study of Language and Information.

Pollard, C. J. (1985) Lectures on HPSG. Unpublished lecture notes CSLI.
Pustejovsky, J. (1991) The Generative Lexicon. Computational Linguistics 17

Ajdukiewicz, K. (1935) Die syntaktische Konnexitat. Studia Philosophica 1, 1-27. English translation in: Polish Logic: 1920 - 1939, ed by Storrs McCaall, pp 207-231, Oxford University Press, Oxford.

Shieber, S. M. (1986) An Introduction to Unification-Based Approaches to Grammar. Stanford, Ca.: Centre for the Study of Language and Information.

Steedman, M. (1988) Combinators and Grammars. In R. Oehrle, E. Bach and D., Wheeler, eds., Categorial Grammars and Natural Language Structures, Dordrecht: Paper presented at the Conference on Categorial Grammar, Tuscon, June 1985.

Uszkoreit, H. (1986) Categorial Unification Grammars. In Proceedings of the 5th International Conference on Computational Linguistics, pp. 187-194, Bonn University, Bonn. Also CSLI Report CSLI-86-66.

Wood, M. M. (1991) An Introduction to Categorial Grammars. Technical Report Series UMCS-AI-91-12-5, Department of Computer Science, Unniversity of Manchester, Manchester.

Zeevat, H., E. Klein and J. Calder (1987) An Introduction to Unification Categorial Grammar. In N. J. Haddock, E. Klein and G. Morrill, eds., Edinburgh Working Papers in Cognitive Science, Vol. 1: Categorial Grammar, Unification Grammar, and Parsing.

Zeevat, H. (1991) Aspects of Discourse Semantics and Unification Grammar. PhD thesis, University of Amsterdam.