Module sgml
:- use_module(library(sgml)).
Predicates for parsing HTML and XML documents.
Currently, two predicates are provided:
load_html(+Source, -Es, +Options)
load_xml(+Source, -Es, +Options)
These predicates parse HTML and XML documents, respectively.
Source must be one of:
a list of characters with the document contents
stream(S)
, specifying a stream S from which to read the contentfile(Name)
, where Name is a list of characters specifying a file name.
Es is unified with the abstract syntax tree of the parsed document, represented as a list of elements where each is of the form:
a list of characters, representing text
element(Name, Attrs, Children)
Name
, an atom, is the name of the tagAttrs
is a list ofKey=Value
pairs:Key
is an atom, andValue
is a list of charactersChildren
is a list of elements as specified here.
Currently, Options are ignored. In the future, more options may be provided to control parsing.
Example:
?- load_html("<html><head><title>Hello!</title></head></html>", Es, []).
Yielding:
Es = [element(html,[],
[element(head,[],
[element(title,[],
["Hello!"])]),
element(body,[],[])])].
library(xpath)
provides convenient reasoning about parsed documents. For example, to fetch the title of the document above, we can use:
?- load_html("<html><head><title>Hello!</title></head></html>", Es, []),
xpath(Es, //title(text), T).
Yielding T = "Hello!"
.
Use http_open/3
from library(http/http_open)
to read answers from web servers via streams.