Seminar on Computational Semantics with Haskell

Parsing

July 6, 2011

Recursive Descent

The various parsers in this chapter are all deterministic simulations of a particular kind of non-deterministic pushdown automaton associated with context-free grammars. There are three well-known types of non-deterministic pushdown automata for CFGs that are used as bases of parsing: top-down, bottom-up, and left-corner (see here for more information). The one that's used here is the top-down variety, and the corresponding parsing method is also called "recursive descent". It is particularly natural in the context of functional programming. (Note that the term "stack-based" is often used to describe parsing based on NPDA.)

Well-known disadvantages of recursive descent parsing are its exponential time complexity and non-termination in the face of left recursion. Johnson (1995) addresses both issues in relation to implementations using functional programming.

Extraction

The implementation of the recursive-descent parser for sequentially indexed grammars given in the text is incorrect. The list of expected empty categories gets passed (i) top-down from a node to its first child; (ii) left-to-right from a node to its immediate right sibling; and (iii) bottom-up from a node with no right sibling to its parent. This allows binding of an empty category (trace) by a relative pronoun anywhere to its left, whether or not the relative pronoun "c-commands" the empty category.

*P> parses "the boy that Alice smiled helped"
[[.S[Masc,Sg,Thrd] [[.NP[Masc,Sg,Thrd,Nom] [the DET[],[.CN[Sg,Masc,Thrd] [boy CN[Sg,Masc,Thrd],[.COMP[] [that REL[],[.S[Thrd,Fem,Sg] [alice NP[Fem,Sg,Thrd,Nom],[.VP[Tense] [smiled VP[Tense]]]]]]]]]]],[.VP[Tense] [helped VP[Tense],# NP[]]]]]]

This implementation would actually be adequate for an unrestricted version of global index grammar (Castaño 2004).

Exercise 3

Fix the bug about extraction. (Hint: There is an obvious solution that passes the list of expected empty categories top-down using splitN, but there is a dual solution that passes the list of "discovered" empty categories bottom-up. Discuss how one method may be more efficient than the other. Note that in both approaches, you have to change the type of the stack parser.)
Take the corrected implementation of the parser for extraction, and add the rule \[ \mathbf{CN}\ \rightarrow \ \mathbf{CN}\ \mathbf{PP} \] in a way left-recursion does not arise, as with the \(\mathbf{CN}\ \rightarrow \ \mathbf{CN}\ \mathit{that}\ \mathbf{S}_{\mathbf{NP}}\) rule.

Modify the parser so that cases of extraction like the following are ruled out:

*P> parses "the boy that if smiled, then Alice smiled smiled"
[[.S[Masc,Sg,Thrd] [[.NP[Masc,Sg,Thrd,Nom] [the DET[],[.CN[Sg,Masc,Thrd] [boy CN[Sg,Masc,Thrd],[.COMP[] [that REL[],[.S[] [if COND[],[.S[] [# NP[Nom],[.VP[Tense] [smiled VP[Tense]]]]],[.S[Thrd,Fem,Sg] [alice NP[Fem,Sg,Thrd,Nom],[.VP[Tense] [smiled VP[Tense]]]]]]]]]]]]],[.VP[Tense] [smiled VP[Tense]]]]]]

Modify the parser so that sentences like the following are correctly parsed.

*P> parses "if the boy helped Alice and Alice smiled, then the boy admired Alice"
[]

Now your parser may parse sentences like "the boy that helped Alice and Alice smiled admired Alice". Implement the Coordinate Structure Island Constraint, by which such sentences are ruled out.
Now modify your parser so that sentences like "the boy that helped Alice and admired Dorothy smiled" get parsed. This is known as Across-the-Board extraction.

Links

Back to the seminar main page

Makoto Kanazawa

Last modified: 2017-04-06 19:08:12 JST