A Parser object processes the specified sgml files, and calls methods on a user defined Application object for each significant parsing event. The user defined Application object overrides the methods to react appropriately to these events (e.g. print back a modified sgml file, construct an abstract syntax tree...).
INTERFACEThese options define the behavior of the parser.SGML ; IMPORT Rd; TYPE ParserOptions = RECORD showOpenEntities, showOpenElements, outputCommentDecls, outputMarkedSections, outputGeneralEntities, mapCatalogDocument: BOOLEAN := FALSE; defaultDoctype: TEXT; addCatalog, includeParam, enableWarning, addSearchDir, activateLink, architecture: REF ARRAY OF TEXT := NIL; END;
ShowOpenEntities and showOpenElements produce information about
the corresponding entity and element when parsing error messages are
issued. OutputCommentDecls, outputMarkedSections and
outputGeneralEntities determine if the parser produces events
when these SGML constructs are encountered.
DefaultDoctype specifies the document type definition to use when
no DOCTYPE tag is found.
AddCatalog adds the specified file names as SGML DTD catalogs.
IncludeParam defines the specified names as parameter entities
set to INCLUDE (ENTITY % param INCLUDE); this way, sections
of sgml files which were IGNORE by default may be changed to
INCLUDE just by setting this option. EnableWarning enables the
named warnings: mixed mixed content model which does not allow #PCDATA,
sgmldecl dubious constructs in SGML declarations,
should non followed ISO 8879 recommendations,
default defaulted references,
duplicate duplicate entity declarations,
undefined undefined elements used in the DTD,
unclosed unclosed start and end tags,
empty empty start and end tags, net net-enabling start and end tags,
min-tag minimized start and end tags (equivalent to unclosed, empty
and net), unused-map defined but unused short reference maps,
unused-param defined but unused parameter entities,
notation-sysid notation for which no system identifier could be
generated, all equivalent to all the above,
no-idref do not warn about unresolved references, no-significant
do not warn about non significant characters in literals.
Parser <: ParserPublic;
ParserPublic = OBJECT METHODS
init(options: ParserOptions; programName: TEXT;
files: REF ARRAY OF TEXT; rds: REF ARRAY OF Rd.T := NIL): Parser;
run(a: Application): CARDINAL;
halt();
inhibitMessages(inhibit: BOOLEAN);
subdocumentParser(systemId: TEXT): Parser;
newParser(files: REF ARRAY OF TEXT; rds: REF ARRAY OF Rd.T := NIL):
Parser;
END;
The call p.init(o,p,f,r) initializes a parser with options o,
program name p (used in error messages), and f the array of names
of files to be parsed. When r is not specified, files from f are
opened. Otherwise, f is used for file names but the actual input is
taken from the readers in r.
The call p.run(a) starts parsing the files and calls back the
Application object a for each parsing event. It returns the number
of errors encountered once the parsing is through.
The call p.halt() stops the parsing, causing the run method to return.
It is usually called from one the the Application object methods.
The call p.inhibitMessages(b) disables error and warning messages when
b is TRUE.
The call p.subdocumentParser(s) creates a new parser ready to process
s which identifies a subdocument in the context of the file currently
parsed by p.
The call p.newParser(f,r) returns a parser using the same options as
p but ready to process a new set of files defined by f and r.
Since the options are the same (catalog name, search paths...), caching
of parsed document type definitions may occur for a significant speedup.
Application <: ApplicationPublic;
ApplicationPublic = OBJECT METHODS
init(): Application;
appInfo(READONLY e: AppinfoEvent);
startDtd(READONLY e: StartDtdEvent);
endDtd(READONLY e: EndDtdEvent);
endProlog(READONLY e: EndPrologEvent);
startElement(READONLY e: StartElementEvent);
endElement(READONLY e: EndElementEvent);
data(READONLY e: DataEvent);
sdata(READONLY e: SdataEvent);
pi(READONLY e: PiEvent);
externalDataEntityRef(READONLY e: ExternalDataEntityRefEvent);
subdocEntityRef(READONLY e: SubdocEntityRefEvent);
nonSgmlChar(READONLY e: NonSgmlCharEvent);
commentDecl(READONLY e: CommentDeclEvent);
markedSectionStart(READONLY e: MarkedSectionStartEvent);
markedSectionEnd(READONLY e: MarkedSectionEndEvent);
ignoredChars(READONLY e: IgnoredCharsEvent);
generalEntity(READONLY e: GeneralEntityEvent);
error(READONLY e: ErrorEvent);
openEntityChange();
getDetailedLocation(pos: Position): DetailedLocation;
END;
An instance of the Application type, or one of its descendant type, is
passed to a Parser and receives the parsing information as methods
being called back. Each of these methods receives a corresponding
parsing event structure.
The call a.init() initializes a, before it is used for parsing.
The call a.getDetailedLocation(pos) returns detailed information about
the location of pos within the currently parsed entity. It may only be
called from within one of the other methods.
The other methods are called by the Parser and are overidden in Application
type descendants to perform the desired work. AppInfo is called when the
APPINFO section of the SGML declaration is encountered, startDtd upon
encountering the Document Type Definition (DTD), endDtd at the end of
the DTD, endProlog at the end of the prolog (local markup declarations),
startElement when a start element tag is found, endElement for a
real or implied end element tag, data for character data (CDATA)
within elements or marked sections, sdata for special character
data (SDATA like bitmap images), pi for a processing instruction,
externalDataEntityRef for a reference to an external data entity,
subdocEntityRef for a reference to a subdoc entity,
nonSgmlChar for non SGML conforming characters, commentDecl for
a sequence of comments, markedSectionStart at the beginning of a marked
section, markedSectionEnd at the end of a marked section, ignoredChars
for character data within an IGNORE marked section, generalEntity
for a general entity definition (this occurs within the prolog except
for undefined entities which when referenced are set to the default
entity content), error upon encountering a parsing error, and
openEntityChange each time the currently opened entity changes.
PROCEDURE CharRefToCode(t: TEXT; VAR c: CharCode): BOOLEAN;While the input files only contain 8 bits ISO-8859 character codes, larger 16 bits UNICODE codes may be specified by (decimal or hexadecimal) character references. For this reason, all such 16 bits codes are kept
escaped as character references. Moreover, the special ampersand
character (&) is also kept as an entity reference throughout the
processing. This allows all the processing to use ordinary TEXT elements
which are limited to 8 bits characters. The call CharRefToCode(t,c)
return TRUE when a valid character reference is received in t
and returns the corresponding code in c. A valid character reference is
either &amp;, or &#decimalNumber;, or &#xHexaNumber;,
with the number within the interval 0..65535). This procedure is
typically used by applications to process 16 bits characters escaped
as character references in Sdata events.
TYPE
CharCode = [0..65535];
Position = CARDINAL;
ExternalId = RECORD
systemId: TEXT;
publicId: TEXT;
generatedSystemId: TEXT;
END;
(* Depending on the type of external identifier, each of these
fields may or may not be available (non NIL). At least one should
be non NIL. *)
Notation = RECORD
name: TEXT;
externalId: ExternalId;
END;
(* A named notation with the corresponding external identifier. *)
EntityDataType = { Sgml, CData, SData, NData, Subdoc, Pi };
EntityDeclType = { General, Parameter, Doctype, Linktype };
Entity = RECORD
name: TEXT;
dataType: EntityDataType;
declType: EntityDeclType;
internalText: TEXT;
(* Following valid if internalText is NIL *)
externalId: ExternalId;
attributes: REF ARRAY OF Attribute;
notation: Notation;
END;
(* For an internal entity, the replacement text is found in "internalText".
For external entities, an external identifier, attributes and a notation
are provided. *)
AttributeType = { Invalid, Implied, CData, Tokenized };
AttributeDefaulted = { Specified, Definition, Current };
CdataChunk = RECORD
nonSgmlChar: CHAR;
data: TEXT;
entityName: TEXT;
END;
(* For an SDATA entity reference, entityName is the entity name and data
the replacement text. For normal data, entityName is NIL and data
contains the character data. For non SGML conforming characters,
data and entityName are NIL and nonSgmlChar contains the character. *)
Attribute = RECORD
name: TEXT;
type: AttributeType;
defaulted: AttributeDefaulted;
cdataChunks: REF ARRAY OF CdataChunk;
tokens: TEXT;
isId: BOOLEAN;
isGroup: BOOLEAN;
entities: REF ARRAY OF Entity;
notation: Notation;
END;
(* If the attribute type is Cdata, the value is found in "cdataChunks",
otherwise if the type is Tokenized, the value is found in "tokens".
For an attribute type NOTATION notation is defined, ENTITY
or ENTITIES entities is defined. The field isId is TRUE for an attribute
of type ID. *)
(* The event structures all contain a position which may be used to
obtain detailed position information. *)
PiEvent = RECORD
pos: Position;
data: TEXT;
entityName: TEXT;
END;
(* The content of the processing instruction is in data. If it was
an entity reference, the entityName is provided (non NIL). *)
ElementContentType = { Empty, CData, RCData, Mixed, Element };
StartElementEvent = RECORD
pos: Position;
gi: TEXT;
contentType: ElementContentType;
included: BOOLEAN;
attributes: REF ARRAY OF Attribute;
END;
(* The element type (tag name) is in gi. *)
EndElementEvent = RECORD
pos: Position;
gi: TEXT;
END;
(* The element type is in gi. *)
DataEvent = RECORD
pos: Position;
data: TEXT;
END;
SdataEvent = RECORD
pos: Position;
text: TEXT;
entityName: TEXT;
END;
(* Reference to an internal sdata entity. The replacement text is in text
and the referenced entity in entityName. *)
ExternalDataEntityRefEvent = RECORD
pos: Position;
entity: Entity;
END;
SubdocEntityRefEvent = RECORD
pos: Position;
entity: Entity;
END;
NonSgmlCharEvent = RECORD
pos: Position;
c: CHAR;
END;
ErrorType = { Info, Warning, Quantity, IDRef, Capacity, OtherError };
ErrorEvent = RECORD
pos: Position;
type: ErrorType;
message: TEXT;
END;
AppinfoEvent = RECORD
pos: Position;
string: TEXT;
END;
StartDtdEvent = RECORD
pos: Position;
name: TEXT;
(* If it does not have an external ID all names within will be NIL *)
externalId: ExternalId;
END;
EndDtdEvent = RECORD
pos: Position;
name: TEXT;
END;
EndPrologEvent = RECORD
pos: Position;
END;
GeneralEntityEvent = RECORD
entity: Entity;
END;
CommentDeclEvent = RECORD
pos: Position;
comments: REF ARRAY OF TEXT;
seps: REF ARRAY OF TEXT;
END;
MarkedSectionStatus = { Include, RCData, CData, Ignore };
MarkedSectionParamType = { Temp, Include, RCData, CData, Ignore, EntityRef };
MarkedSectionParam = RECORD
type: MarkedSectionParamType;
entityName: TEXT;
END;
MarkedSectionStartEvent = RECORD
pos: Position;
status: MarkedSectionStatus;
params: REF ARRAY OF MarkedSectionParam;
END;
MarkedSectionEndEvent = RECORD
pos: Position;
status: MarkedSectionStatus;
END;
IgnoredCharsEvent = RECORD
pos: Position;
data: TEXT;
END;
DetailedLocation = RECORD
lineNumber: CARDINAL;
columnNumber: CARDINAL;
byteOffset: CARDINAL;
entityOffset: CARDINAL;
entityName: TEXT;
filename: TEXT;
END;
Debugging
PROCEDURE DumpDefinitions(this: Parser); END SGML.