Chart Parser
by Jim Chapman
Home| Where we live | Jim| Isabel| James| Edward]
//
// Chart.h
//
// Who When What
// --- ---- ----
// Jim Chapman 15 May 97 Released.
//
// This file holds the declarations for the Chart Parser application's core classes,
// together with comments giving a general description of the program.
//
// How to Use the Program
// ======================
//
// User-Interface The application displays a dialog box containing a text control
// -------------- labelled 'Sentence', a 'Parse' button, and a 'Quit' button. Hit
// the Quit button to terminate the application, otherwise enter a
// sentence in the text control and then hit the Parse button. The sentence text must
// must be in lower case, the words must be separated by space characters and the text
// should contain no punctuation characters. The program will parse the sentence and
// will put up a dialog box displaying the results. If it could not parse the sentence,
// the dialog box will contain a message to that effect. Otherwise, it will contain a
// picture of the parse tree of the sentence. If the sentence was ambiguous (i.e. it
// could be parsed in several different ways), then there will be a spinner control
// allowing you to switch between the various parse trees. Dismiss the results dialog
// box by using its 'OK' button.
// The grammar and lexicon supplied will allow the system to parse sentences like
// "i saw the man in the park with an umbrella through my telescope" or
// "the cat sat on the green mat and ate the mouse".
//
// The Grammar File The program loads its grammar from a file called 'grammar.txt' in
// ---------------- its local directory. This file contains grammar rules, one per
// line. Blank lines and lines beginning with a ";" character are
// ignored. Each rule begins with a head category, which is followed by 0, 1 or many
// sub-categories. The grammar-loader does not understand meta-symbols like [, ], {, },
// (, ), +, *, etc. The program assumes that the head category of the first rule is the
// 'sentence category' - i.e. it will look for parse trees for that category spanning
// the entire input sentence. An example grammar file follows:
//
// <Cut here>
// |
// |
// V
// ; A trivial grammar for a subset of English:
//
// S NP VP
// S S CONJ S
// NP DETopt ADJstar N PPstar
// NP NP CONJ NP
// VP V NPopt PPstar
// VP VP CONJ VP
// PP P NP
// NPopt
// NPopt NP
// DETopt
// DETopt DET
// PPstar PP PPstar
// PPstar
// ADJstar ADJ ADJstar
// ADJstar
//
// If the program cannot find this file or cannot read it, it will display a message
// box complaining about this fact, and will then terminate.
//
// The Lexicon File The program loads its lexicon from a file called 'lexicon.txt' in
// ---------------- its local directory. This file contains lexical entries for all
// the words in the parser's vocabulary. Blank lines and lines
// beginning with a ";" character are ignored. Other lines are considered to be
// lexical entries. A lexical entry consists of a word (which should be in lower-case),
// followed by the names of one or more lexical categories to which that word belongs.
// The items in a lexical entry should be separated by spaces. An example lexicon file
// follows:
//
// <Cut here>
// |
// |
// V
// ; A lexicon for a trivial subset of English:
//
// the DET
// a DET
// an DET
// and CONJ
// or CONJ
// bob N V
// man N
// park N
// umbrella N
// telescope N
// mat N
// cat N
// mouse N
// i N
// car N
// saw V N
// sat V
// ate V
// park V
// went V
// gave V
// green ADJ N
// brown ADJ
// happy ADJ
// sad ADJ
// fat ADJ
// my ADJ
// your ADJ
// his ADJ
// through P
// on P
// in P
// with P
// of P
// to P
//
// If the program cannot find the lexicon file, or it cannot read it, or if the file
// is badly formatted, it will display a message box complaining about this fact, and
// will then terminate.
//
// Overview of Program
// ===================
//
// The Algorithm The program implements a standard bottom-up chart parsing
// ------------- algorithm. The detailed pseudo-code for this algorithm is presented
// in the comment text on the method Sentence::m_parse - the present
// section offers a conceptual overview that deliberately glosses over various points.
// The basis of the chart parsing algorithm is to hypothesise that certain
// sections of the input sentence are generated by particular rules from its grammar.
// These sections are identified by 'vertices' and the hypotheses are represented by
// 'edges'. The following diagram indicates the position of an edge that suggests the
// presence of an NP (noun phrase) for the words "the mat" in the sentence below
//
// edges: <----NP----->
// vertex #: 0 1 2 3 4 5 6
// words: the cat sat on the mat
//
// This edge has leftmost vertex 4, and rightmost vertex 6.
// When the algorithm creates a new edge, it firstly places it in a data structure called
// the 'agenda'. At a later time, it will get round to 'processing' the edge. At this
// time, it will:
// 1) remove the edge from the agenda
// 2) see what further hypotheses it can deduce from the edge (see below). For all such
// hypotheses, it will create new edges and put them on the agenda.
// 3) put this edge on to the 'chart'.
// Once an edge has been put on the chart, it is visible to stage (2) of the process
// above, in the course of of processing later edges.
// There are two kinds of edge - 'complete' edges, and 'incomplete' edges. A complete
// edge is a hypothesis that says "from vertex X to vertex Y there exists a string that
// definitely could be generated by rule X". An incomplete edge is a hypothesis that
// says "from vertex X to vertex Y there exists a string that looks like it might be
// generated by rule Z, but I'm going to need to find some more of rule Z's symbols
// after vertex Y in order to confirm this hypothesis".
// Stage (2) above therefore consists of the following, for any edge e:
// if e is complete
// find any incomplete edges immediately to the left of e that are looking
// for e's category as the next symbol in their rules. Create new edges by
// extending those incomplete edges over e.
// else
// find any complete edges immediately to the right of e that have the category
// that e is looking for, and create new edges by extending e over them.
//
// The chart parsing algorithm consists, in essence, of repeatedly taking an edge from
// the agenda and processing it. When there are no further edges on the agenda, the
// algorithm halts, and searches the chart to see if it has any complete edges that span
// all the way from the start vertex to the end vertex, and represent the
// category called "Sentence".
//
// The Data Structures The program uses the MFC collection classes extensively
// ------------------- in the implementation of the classes needed for the chart
// parser algorithm. Refer to Microsoft's documentation for the
// details of these collection classes. They key points about the collection classes
// are (1) they are mostly defined using templates (2) they define non-intrusive
// collections (3) you iterate through the MFC collection classes by using a variable
// of type POSITION.
//
// CATEGORY: In the grammar and lexicon files, and in the program's displayed output,
// categories are represenred by CString objects. Internally, they are represented as
// integers. I do this because integers can be more easily copied and compared than
// CStrings, and because they can be used as array indices. An array of CStrings is used
// to map between category names and category numbers; this array is kept inside the
// Grammar class (q.v.).
//
// RULE: A rule consists of information like "'S' generates 'NP VP'": it has a
// left-hand side (the category 'S'), and a right-hand side (the sequence of categories
// {NP, VP}). Internally, this information is stored as a category number and a list
// of category numbers.
//
// GRAMMAR: The grammar is represented by an object of class Grammar. It contains all
// the grammar rules used by the chart parser. The algorithm needs to access the
// grammar in two ways: (1) by asking for a list of all the grammar rules that have
// some particular category as the first item on the left hand side; (2) by asking for
// a list of all the grammar rules that have empty left hand sides. The Grammar object
// supports this type of access, implementing (1) by storing an array of lists of rules
// - where the array-index is a category number, and implementing (2) by storing
// a list of rules.
// The Grammar object also stores an array containing the names of all the categories
// that appear in the grammar and lexicon.
//
// LEXICON: The lexicon maps words to lists of category numbers. It is implemented,
// therefore, as an MFC map structure, mapping CStrings to lists of integers.
//
// VERTEX: A vertex indentifies a position within the sentence. The start of the first
// word is vertex 0, the end of the Nth word is vertex N. Within the program, vertices
// are represented as integers.
//
// EDGE: There are two kinds of edge - lexical edges and non-terminal edges. The
// two kinds of edge share the same access semantics, so they inherit from a virtual base
// class, Edge. An edge has a start vertex, an end vertex and a category number. It
// also has a boolean value to say whether it is 'complete' (i.e. whether it is trying
// to find further subedges in order to substantiate the hypothesis that it represents).
// Lexical edges represent words. Internally, they contain the index of the word in
// the sentence and the category number represented by the edge. Lexical edges are
// always complete.
// Non-terminal edges represent the attempt to apply some grammar rule to some subsection
// of the input sentence. Internally, they contain start and end vertex numbers, the
// category number of the edge, the rule represented by the edge, and a POSITION that
// indicates how much of the edge's rule has not yet been substantiated. If that
// POSITION is NULL, the edge is complete. Otherwise, the edge is incomplete. and
// is seeking the category whose number is at the point indicated by the POSITION
// iterator in the rule.
//
// CHART: This data structure stores all the edges that have been processed so far. The
// chart parser algorithm searches the chart in two ways: (1) asking for a list of all
// complete edges that represent some category and start at some vertex; (2) asking for
// all the incomplete edges that are seeking some category and that end at some vertex.
// Internally, the chart stores the edges in two matrices - one for complete edges, the
// other for incomplete edges. Each matrix is indexed by vertex number and category
// number; their elements are lists of edges.
//
// AGENDA: All edges that have not yet been processed are kept on the agenda. The
// core of the chart parsing algorithm consists of repeatedly removing the 'next' edge
// from the agenda, and processing it. The algorithm itself is agnostic about what
// 'next' means in this context - we are free to experiment with implementing the agenda
// as a stack, as a queue, or as some clever data structure to implement intelligent
// scheduling that always processes the most promising-looking edge first.
// In this program, we always run the parse to completion before reporting what sentence
// parses have been found, so there is no benefit to be had from such intelligent
// scheduling. Accordingly, the agenda is implemented as a queue.
//
#include "stdafx.h"
#include <afxtempl.h>
#include <assert.h>
class Grammar;
class Sentence;
//
// CatDescription
//
// Who When What
// --- ---- ----
// Jim Chapman 13 May 97 Released.
//
// This class stores a <number,CString> pair, which is used for mapping category
// numbers to category names. A list of pointers to these objects is stored in
// a Grammar object.
//
// Do not copy or assign CatDescription objects (there's no need to do so anyway).
//
// Methods:
// -------
// Contructor: trivial.
//
// Data Members:
// ------------
// m_number: the category number.
// m_name: the category name.
//
class CatDescription {
public:
CatDescription(int number, const CString &name)
: m_number(number), m_name(name) {};
CatDescription(const CatDescription &c) {assert(FALSE);};
CatDescription &operator =(const CatDescription&that) {assert(FALSE);return *this;};
int m_number;
CString m_name;
};
//
// Rule
//
// Who When What
// --- ---- ----
// Jim Chapman 13 May 97 Released.
//
// This class stores a grammar rule. A rule object states some category can generate a
// (possibly empty) sequence of other categories. For example the rule "S -> NP VP"
// states that the category S can generate "NP VP". In this example, we would say that
// the rule's category was S, and its sub-categories were the list "NP VP".
//
// Although I have included "->" in the text of the rule in the above example, note that
// in actual fact, when the program decodes a rule from a string, the string must not
// contain "->" - the rule's category is distinguished from its subcategories only by
// appearing first in the line.
//
// Do not copy or assign Rule objects (there's no need to do so anyway).
// Do not create rules on the stack, and do not delete them. This is because when you
// construct a rule, you always do so by giving the rule a grammar to add itself to - and
// the rule then becomes the property of that grammar.
//
// Methods:
// -------
// Contructor: builds a rule from a line like "S NP VP". The ctor adds the
// constructed rule to the Grammar that is passed as a parameter to the
// ctor.
// m_getCat : gets the rule's category number.
// m_getSubcats: gets the rule's sub-categories (as a list of category numbers).
// m_traceOut: prints the rule to the debug window.
//
// Data Members:
// ------------
// m_cat: the category number.
// m_subcats: the list of subcategory numbers.
//
class Rule {
public:
Rule(CString ruleText, Grammar &);
Rule(const Rule &r) {assert(FALSE);};
Rule &operator =(const Rule&that){assert(FALSE);return *this;};
const CList<int, int> & m_getSubcats() {return m_subcats;};
int m_getCat() {return m_cat;}
#ifdef _DEBUG
void m_traceOut(Grammar &g);
#endif
private:
int m_cat;
CList<int,int> m_subcats;
};
//
// A RuleList is a list of pointers to Rule objects.
//
typedef CTypedPtrList<CPtrList, Rule *> RuleList;
//
// Grammar
//
// Who When What
// --- ---- ----
// Jim Chapman 13 May 97 Released.
//
// A Grammar object stores:
// 1) a list of category descriptions (to map between category names and numbers).
// 2) a load of grammar rules, in a data-structure that is specialised for the kind of
// lookup that chart parsers need to do.
//
// You construct a grammar by loading it from a file; the Grammar ctor does this (by
// default, from a file called "grammar.txt").
//
// Do not copy or assign Grammar objects (there's no need to do so anyway).
//
// Methods:
// -------
// Constructor: reads the grammar in from the file whose name is given as the
// ctor's parameter. If a problem occurs while reading the file
// the program displays a message box and then exits.
// Destructor: deletes all the grammar's constituent rules.
// m_lookUpCatNumber: gets a category number for a given category name. If the
// Grammar hasn't seen the category before, it generates a new
// category number for it.
// m_lookUpCatName: given a category number, look up the name of the category.
// m_getEmptyRules: gets a list of all the rules that have no sub-categories
// (i,e. the empty sequence appears as the rule's left hand
// side). The list is not valid after the Grammar is destroyed.
// m_getRulesStartingWith: gets a list of all the rules with a particular category
// appearing as their first sub-category (i.e. as the first
// category on the rule's right hand side).
// The list is not valid after the Grammar is destroyed.
// m_addRule: adds a rule to the grammar. Do nothing with the rule
// afterwards; it is now the property of the Grammar object.
// m_getNumCats: tells you how many categories there are in the grammar.
// m_traceOut: prints the grammar to the debug window.
//
// Data Members:
// ------------
// m_catNames: a list of pointers to CatDescription objects, used for
// looking up category names from category numbers, and vice
// versa.
// m_emptyRules: a list of all the rules with null right hand sides.
// m_rulesByFirstCat: an array (indexed by category number) each element of which
// is a pointer to a list of rules. The list contains all the
// rules that have a particular category first on their left
// hand sides.
class Grammar {
public:
Grammar(const CString & fileName="grammar.txt");
~Grammar();
Grammar(const Grammar &) {assert(FALSE);};
Grammar & operator = (const Grammar &) {assert(FALSE); return *this;};
int m_lookUpCatNumber(const CString &);
CString m_lookUpCatName(int) const;
const RuleList & m_getEmptyRules() const;
const RuleList & m_getRulesStartingWith(int cat) const;
void m_addRule(Rule *rule);
int m_getNumCats() const {return m_catNames.GetCount();};
#ifdef _DEBUG
void m_traceOut();
#endif
private:
CTypedPtrList<CPtrList,CatDescription *> m_catNames;
RuleList m_emptyRules;
CTypedPtrArray<CPtrArray, RuleList*> m_rulesByFirstCat;
};
//
// Lexicon
//
// Who When What
// --- ---- ----
// Jim Chapman 13 May 97 Released.
//
// A Lexicon object stores all the words in the lexicon, in a map that allows us to
// find, for any particular word, a list of all the categories for that word,
// described by their category numbers.
//
// You construct a lexicon by loading it from a file; the Lexicon ctor does this
// (by default, from a file called "grammar.txt").
//
// Do not copy or assign Lexicon objects (there's no need to do so anyway).
//
// Methods:
// -------
// Constructor: reads the lexicon in from the file whose name is given as the
// parameter to the ctor. If anything goes wrong while reading the
// file, the program displays a message box and exits.
// Destructor: deletes all the entries in the map.
// m_getCatsForWord: given a word, this method returns a list containing all the lexical
// categories for that word. The list is not valid after the Lexicon
// is destroyed.
// m_traceOut: prints the lexicon to the debug window.
//
// Data Members:
// ------------
// m_entries: a map object that maps CString to pointer to list of integers. The
// CString is a word, and the integers are that word's lexical
// categories.
//
class Lexicon {
public:
Lexicon(Grammar &, const CString & fileName="lexicon.txt");
~Lexicon();
const CList<int,int> * m_getCatsForWord(const CString &word) const;
#ifdef _DEBUG
void m_traceOut(Grammar &);
#endif
private:
CMapStringToOb m_entries; // maps strings to CList<int,int>*
};
//
// Edge
//
// Who When What
// --- ---- ----
// Jim Chapman 13 May 97 Released.
//
// This is a virtual base class for the two kinds of edge that are used by the chart
// parsing algorithm.
// Edges must implement the following methods:
//
// Virtual Methods:
// ---------------
// m_getLeftVertex: the leftmost vertex of the edge
// m_getRightVertex: the rightmost vertex of the edge
// m_isComplete: does the edge need any further sub-categories?
// m_getCat: what category does the edge represent?
// m_getDepth: how far above the bottom of the parse tree is this edge? (to be
// precise, how long is the longest path from this node to a leaf
// node?)
// m_display: draw this edge and its subedges on the device context, expanding
// the bounding rectangle to include the edge. Note that edges
// which have leftmost vertex = rightmost vertex will not be drawn.
// m_traceOut: print the edge to the debug window.
//
class Edge {
public:
virtual ~Edge() {};
// for parsing:
virtual int m_getLeftVertex() const =0;
virtual int m_getRightVertex() const =0;
virtual BOOL m_isComplete() const =0;
virtual int m_getCat() const =0;
// for displaying:
virtual int m_getDepth()=0;
virtual CPoint m_display(CDC *pDC, int sentenceDepth, int parentDepth, Sentence &s,
Grammar &g, CRect &boundingRect)=0;
#ifdef _DEBUG
virtual void m_traceOut(Grammar &)=0;
#endif
};
//
// EdgeList
//
// An EdgeList is a list of pointers to Edge objects.
typedef CTypedPtrList<CPtrList, Edge *> EdgeList;
//
// LexicalEdge
//
// Who When What
// --- ---- ----
// Jim Chapman 13 May 97 Released.
//
// A LexicalEdge object stores lexical edges - i.e. edges that represent
// words - they are leaf nodes in the parse tree.
//
// Do not copy or assign LexicalEdge objects (there's no need to do so anyway).
//
// Methods:
// -------
// Comstructor: given a category and a word number (i.e. the index of a word in the
// sentence), constructs a corresponding lexical edge.
//
// Virtual Methods:
// ---------------
// Please refer to the documentation for the Edge class.
//
// Data Members:
// ------------
// m_wordNumber: the index of the word in the sentence (0=first word).
// m_cat: the category number, stating what category is represented by the edge.
//
class LexicalEdge : public Edge {
public:
LexicalEdge(int cat, int wordNum) : m_cat(cat), m_wordNumber(wordNum) {};
LexicalEdge(const LexicalEdge &) {assert(FALSE);};
LexicalEdge &operator=(const LexicalEdge &) {assert(FALSE); return *this;};
virtual m_getLeftVertex() const {return m_wordNumber;};
virtual m_getRightVertex() const {return m_wordNumber+1;};
virtual m_isComplete() const {return TRUE;};
virtual int m_getCat() const{return m_cat;};
// for displaying
virtual int m_getDepth() {return 0;};
virtual CPoint m_display(CDC *pDC, int sentenceDepth, int parentDepth, Sentence &s,
Grammar &g, CRect &boundingRect);
#ifdef _DEBUG
virtual void m_traceOut(Grammar &);
#endif
private:
int m_wordNumber;
int m_cat;
};
//
// NonTerminalEdge
//
// Who When What
// --- ---- ----
// Jim Chapman 13 May 97 Released.
//
// A NonTerminalEdge object stores non-terminal edges - i.e. edges that represent
// non-terminal categories of the grammar.
//
// Do not copy or assign NonTerminalEdge objects (there's no need to do so anyway).
//
// Methods:
// -------
// Constructor (1): constructs an edge starting and ending at a given vertex, which
// needs the entire string of categories on the right-hand side of the
// given rule.
// Constructor (2): constructs an edge by extending the given incomplete edge over the
// given complete edge. Something horrible will happen if you pass
// a combination of edges such that this extension is not valid.
//
// Virtual Methods:
// ---------------
// Please refer to the documentation for the Edge class.
//
// Data Members:
// ------------
// m_rule: a pointer to the grammar rule to which this edge relates.
// m_leftVertex: the leftmost vertex of this edge.
// m_rightVertex: the rightmost vertex of this edge.
// m_subedges: a (possibly empty) list of all the edges contained in this one -
// such edges correspond to categories from m_rule that have been
// found between m_leftVertex and m_rightVertex, and explained by
// this edge.
// m_positionInRule: an iterator that keeps track of where we have got to in this
// edge's rule. 'Extending' this edge equates to moving the
// iterator one place to the right. When the iterator is NULL, that
// means that the edge is complete.
// m_depth: used by NonTerminalEdge::m_getDepth to cache its result, and so
// to avoid recursing every time that method is called.
//
class NonTerminalEdge : public Edge {
public:
NonTerminalEdge(Rule *r, int vertex);
NonTerminalEdge(NonTerminalEdge *incompleteEdge, Edge *completeEdge);
NonTerminalEdge(const NonTerminalEdge &) {assert(FALSE);};
NonTerminalEdge &operator=(const NonTerminalEdge &) {assert(FALSE); return *this;};
virtual m_getLeftVertex() const {return m_leftVertex;};
virtual m_getRightVertex() const {return m_rightVertex;};
virtual m_isComplete() const;
virtual int m_getCat() const;
int m_getNeededCat() const;
// for displaying:
virtual int m_getDepth();
virtual CPoint m_display(CDC *pDC, int sentenceDepth, int parentDepth, Sentence &s,
Grammar &g, CRect &boundingRect);
#ifdef _DEBUG
virtual void m_traceOut(Grammar &);
#endif
private:
Rule *m_rule;
int m_leftVertex;
int m_rightVertex;
EdgeList m_subedges;
POSITION m_positionInRule;
// for displaying;
int m_depth;
};
//
// Chart
//
// Who When What
// --- ---- ----
// Jim Chapman 13 May 97 Released.
//
// A Chart object stores the 'chart' of the chart parser, in a data-structure optimised
// for the kinds of access that the chart parsing algorithm needs. The algorithm
// accesses entries in the chart in two ways:
// 1) "Give me a list of all complete edges whose category is X and which start on
// vertex #Y"
// 2) "Give me a list of all incomplete edges that next need to find a subedge of
// category X, and which end on vertex #Y".
// The Chart object contains two arrays of arrays (indexed [vertex][category]), each
// element of which is a pointer to a list of edges. These data-structures give a
// quick and efficient way to perform the necessary types of access to the chart.
//
// Do not copy or assign Chart objects (there's no need to do so anyway).
//
// Methods:
// -------
// Constructor: constructs a Chart object with data structures large
// enough for parsing a sentence having the given
// number of vertices, using a grammar with the given
// number of categories.
// m_completeEdgesForCatStartingAt: gets a list of all complete edges starting at some
// vertex, with some particular category.
// m_incompleteEdgesNeedingCatAt: gets a list of all incomplete edges ending at some
// vertex and needing some particular category.
//
// Data Members:
// ------------
// m_completeEdges: the array of arrays that stores all the complete
// edges.
// m_incompleteEdges: the array of arrays that stores all the incomplete
// edges.
//
class Chart {
public:
Chart(int maxVertex, int maxCat);
Chart(const Chart &) {assert(0);};
Chart &operator=(const Chart &) {assert(0); return *this;};
~Chart(); // delete every edge in both edge arrays
void m_addEdge(Edge *);
EdgeList * m_completeEdgesForCatStartingAt(int cat, int vertex);
EdgeList * m_incompleteEdgesNeedingCatAt(int cat, int vertext);
private:
// These members cause a warning C4786 in axftempl.h, because they result in
// class names that are over 255 characters long. This doesn't seem to have any
// pathological consequences.
CArray<CArray<EdgeList*,EdgeList*>,CArray<EdgeList*,EdgeList*>&> m_completeEdges;
CArray<CArray<EdgeList*,EdgeList*>,CArray<EdgeList*,EdgeList*>&> m_incompleteEdges;
};
//
// Agenda
//
// Who When What
// --- ---- ----
// Jim Chapman 13 May 97 Released.
//
// The Agenda class stores the 'agenda' of the chart parser algorithm. It is a store
// for the set of edges that have not been processed yet by the main loop of
// Sentence::m_parse.
//
// In this implementation, the agenda is a queue; I achieve this by sub-classing it from
// type list-of-pointers-to-edges, and adding suitable definitions for the
// methods that the chart parsing algorithm needs.
//
// Note that you should not delete an Agenda object unless it is empty. There is
// no circumstance in which this should happen, because the chart parsing algorithm
// will only terminate when the agenda is empty, and you shouldn't be deleting the
// agenda unless the algorithm has terminated.
//
// Methods:
// -------
// Destructor: does nothing special - but it asserts(FALSE) if there are any edges
// left on the agenda.
// m_addEdge: adds an edge to the agenda. Pass a pointer in to this method, and do
// nothing to your pointer after that.
// m_nextEdge: gets the next edge from the agenda. The method returns a pointer to the
// edge, or 0 if the agenda is empty.
//
class Agenda : public CTypedPtrList<CPtrList, Edge*>
{
public:
~Agenda() {assert(IsEmpty());};
void m_addEdge(Edge* e) {AddTail(e);};
Edge* m_nextEdge() {return IsEmpty() ? NULL : RemoveHead();};
};
//
// Sentence
//
// Who When What
// --- ---- ----
// Jim Chapman 13 May 97 Released.
//
// In order to parse a string of text, construct a Sentence from it, then call the
// Sentence's m_parse method. This will return a list of (pointers to) edges. Each
// element of that list represents a possible parsing of the sentence.
// Examine, or display, or whatever, the contents of the list, and then destroy the
// Sentence object. Do not do anything with the edge pointers after you have destroyed
// the Sentence, because the Sentence dtor will have freed the memory they point at.
//
// Do not copy or assign Sentence objects (there's no need to do so anyway).
//
// Methods:
// -------
// Constructor: given a string (which must not be empty), this method builds a
// Sentence object for it. It fills in m_words with the individual
// words of the sentence, and m_verticesArray with the character
// positions of each of the vertices of the sentence. It initalises
// m_chart and m_agenda to 0.
// Destructor: deletes the chart (and therefore, every edge in it), and the
// agenda (which must be empty when we are deleting the sentence
// object).
// m_parse: parses the sentence, and returns a list of possible parsings
// of it.
// m_vertexToCharNum: given a vertex number, call this method to determine how many
// character positions appear to the left of that vertex. This
// method is used for positioning edges when displaying them on the
// screen.
// m_getWord: given a zero-based index, i, looks up the character string of the
// (i+1)th word of the sentence.
//
// Data members:
// ------------
// m_verticesArray: this array lets us look up a vertex's character position, given
// vertex number.
// m_words: the words of the sentence, stored in an array.
// m_chart: the chart used for parsing the sentence, or 0 if m_parse has
// not been called yet.
// m_agenda: the agenda used for parsing the sentence, or 0 if m_parse has
// not been called yet.
//
class Sentence {
public:
Sentence(CString);
Sentence(const Sentence &) {assert(FALSE);};
Sentence & operator = (const Sentence&) {assert(FALSE); return *this;};
~Sentence();
EdgeList * m_parse(Grammar &, Lexicon &);
int m_vertexToCharNum(int) const;
CString m_getWord(int) const;
private:
CArray<int,int> m_verticesArray; // maps vertex numbers to character offsets.
CStringArray m_words;
Chart * m_chart;
Agenda * m_agenda;
};
home@jim-chapman.net