The Web interface retrieves documents from the World Wide Web using an
http proxy server. Details about the HTTP protocol are in
http://info.cern.ch/hypertext/WWW/Protocols/HTTP/HTTP2.html
INTERFACEWeb ; IMPORT Date, IP, Rd, Thread; EXCEPTION Error(TEXT); TYPE T <: ROOT;
AWeb.Tidentifies an http proxy server. The routines in this interface that take aWeb.Tas a parameter accept the valueNIL, which represents the default proxy server obtained by callingSetup(NIL).
CONST
DefaultProxyHost = NIL;
(* At SRC, set it to "http://www-proxy.pa.dec.com:8080/" instead *)
DefaultNoProxyList = "";
(* At SRC, set it to "src-www,.dec.com" instead *)
PROCEDURE Setup (proxyURL, noProxyList: TEXT := NIL): T
RAISES {Error};
Return a data type representing an http proxy server.
proxyURL is the url for the proxy server; it should have the format:
http://hostname.blah.blah.blah:8080/
If proxyURL is NIL, it defaults to the environment variable
http_proxy. If http_proxy is empty or undefined, proxyURL
defaults to DefaultProxyHost. If DefaultProxyHost is NIL,
no proxy is used.
noProxyList specifies a set of domains for which the proxy should not
be consulted; the format is a comma-separated list of domain names, with
optional port. If noProxyList is NIL, it defaults to the environment
variable no_proxy. If no_proxy is empty or undefined, noProxyList
defaults to DefaultNoProxyList. If DefaultNoProxyList is the empty
string, the proxy will be consulted for every URL.
Details about proxies are at:
http://info.cern.ch/hypertext/WWW/Daemon/User/Proxies/ProxyClients.html
Setup raises Error if proxyURL is not in a valid format.
CONST
DefaultRequestFields = ARRAY [0 .. 0] OF TEXT{"Accept: */*"};
TYPE
MIMEType = {Application, Audio, Image, Message, Multipart, Text, Video,
Xperimental};
HTMLDate = TEXT;
Header = RECORD
httpVersion : TEXT;
statusCode : INTEGER;
reason : TEXT;
contentType : MIMEType;
contentSubType: TEXT;
(* optional fields: *)
allowed : TEXT := NIL;
public : TEXT := NIL;
contentLength: INTEGER := 0;
encoding : TEXT := NIL;
date : HTMLDate := NIL;
expires : HTMLDate := NIL;
lastModified : HTMLDate := NIL;
server : TEXT := NIL;
MIMEVersion : TEXT := NIL;
title : TEXT := NIL;
location : TEXT := NIL;
END;
Page = OBJECT
header : Header;
contents: TEXT;
END;
PROCEDURE Get ( url : TEXT;
VAR header: Header;
READONLY requestFields: ARRAY OF TEXT := DefaultRequestFields;
forceCache: BOOLEAN := FALSE;
debug : BOOLEAN := FALSE;
server : T := NIL ): Rd.T
RAISES {Error, Thread.Alerted, IP.Error};
Do aGETrequest, passing in therequestFields. By default, the proxy server will grab pages from a local cache, if one is available and if theurlis in the cache. WhenforceCacheisTRUE, the proxy server will explicitly not use any cache. TheErrorexception is raised if the header returned by the request is invalid in any way.
PROCEDURE GetHead (url: TEXT;
READONLY requestFields: ARRAY OF TEXT := DefaultRequestFields;
forceCache: BOOLEAN := FALSE;
server : T := NIL ): Rd.T
RAISES {Error, Thread.Alerted, IP.Error};
Do aHEADrequest, passing in therequestFields. By default, the proxy server will grab pages from a local cache, if one is available and if theurlis in the cache. WhenforceCacheisTRUE, the proxy server will explicitly not use any cache. Mostly for debugging use.
PROCEDURE Post ( url : TEXT;
argString : TEXT;
VAR header: Header;
READONLY requestFields: ARRAY OF TEXT := DefaultRequestFields;
server : T := NIL ): Rd.T
RAISES {Error, Thread.Alerted, IP.Error};
Do a POST request. Like GET, except for extra argument
PROCEDURE ParseHead (rd: Rd.T): Header RAISES {Error, Thread.Alerted};
Parses the information returned by GetHead.
PROCEDURE ToDate (t: HTMLDate): Date.T RAISES {Error, Thread.Alerted};
Takes an HTML format date, such as that in the Date or Last-modified field, and parses it into a Date.T
PROCEDURE AbsoluteURL (url, base: TEXT): TEXT;
Returns an absolute URL constructed fromurlandbase, the URL of the document containingurl.
PROCEDURE EncodeURL (t: TEXT): TEXT RAISES {Thread.Alerted};
Encodes special characters in a text string, such as the argument of an ISINDEX query
END Web.