parseDOM

Represents an entity in an XML document as a DOM tree.

parseDOM either takes a range of characters or an dxml.parser.EntityRange and generates a DOMEntity from that XML.

When parseDOM processes the XML, it returns a DOMEntity representing the entire document. Even though the XML document itself isn't technically an entity in the XML document, it's simplest to treat it as if it were an EntityType.elementStart with an empty name. That DOMEntity then contains child entities that recursively define the DOM tree through their children.

For DOMEntities of type EntityType.elementStart, _DOMEntity.children gives access to all of the child entities of that start tag. Other DOMEntities have no children.

Note that the type determines which properties of the DOMEntity can be used, and it can determine whether functions which a DOMEntity is passed to are allowed to be called. Each function lists which EntityTypes are allowed, and it is an error to call them with any other EntityType.

If parseDOM is given a range of characters, it in turn passes that to dxml.parser.parseXML to do the actual XML parsing. As such, that overload accepts an optional dxml.parser.Config as a template argument to configure the parser.

If parseDOM is given an EntityRange, the range does not have to be at the start of the document. It can be used to create a DOM for a portion of the document. When a character range is passed to it, it will return a DOMEntity with the type EntityType.elementStart and an empty name. It will iterate the range until it either reaches the end of the range, or it reaches the end tag which matches the start tag which is the parent of the entity that was the front of the range when it was passed to parseDOM. The EntityType.elementStart is passed by $(K_REF), so if it was not at the top level when it was passed to parseDOM (and thus still has elements in it when parseDOM returns), the range will then be at the entity after that matching end tag, and the application can continue to process the range after that if it so chooses.

  1. DOMEntity!R parseDOM(R range)
  2. DOMEntity!(ER.Input) parseDOM(ER range)
    DOMEntity!(ER.Input)
    parseDOM
    (
    ER
    )
    (
    ref ER range
    )
    if (
    isInstanceOf!(EntityRange, ER)
    )
  3. struct DOMEntity(R)

Parameters

range ER

Either a range of characters representing an entire XML document or a dxml.parser.EntityRange which may refer to some or all of an XML document.

Return Value

Type: DOMEntity!(ER.Input)

A DOMEntity representing the DOM tree from the point in the document that was passed to parseDOM (the start of the document if a range of characters was passed, and wherever in the document the range was if an EntityRange was passed).

Throws

XMLParsingException if the parser encounters invalid XML.

Examples

parseDOM with the default Config and a range of characters.

import std.range.primitives;

auto xml = "<root>\n" ~
           "    <!-- no comment -->\n" ~
           "    <foo></foo>\n" ~
           "    <baz>\n" ~
           "        <xyzzy>It's an adventure!</xyzzy>\n" ~
           "    </baz>\n" ~
           "    <tag/>\n" ~
           "</root>";

auto dom = parseDOM(xml);
assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);
assert(dom.children.length == 1);

auto root = dom.children[0];
assert(root.type == EntityType.elementStart);
assert(root.name == "root");
assert(root.children.length == 4);

assert(root.children[0].type == EntityType.comment);
assert(root.children[0].text == " no comment ");

assert(root.children[1].type == EntityType.elementStart);
assert(root.children[1].name == "foo");
assert(root.children[1].children.length == 0);

auto baz = root.children[2];
assert(baz.type == EntityType.elementStart);
assert(baz.name == "baz");
assert(baz.children.length == 1);

auto xyzzy = baz.children[0];
assert(xyzzy.type == EntityType.elementStart);
assert(xyzzy.name == "xyzzy");
assert(xyzzy.children.length == 1);

assert(xyzzy.children[0].type == EntityType.text);
assert(xyzzy.children[0].text == "It's an adventure!");

assert(root.children[3].type == EntityType.elementEmpty);
assert(root.children[3].name == "tag");

parseDOM with simpleXML and a range of characters.

import std.range.primitives : empty;

auto xml = "<root>\n" ~
           "    <!-- no comment -->\n" ~
           "    <foo></foo>\n" ~
           "    <baz>\n" ~
           "        <xyzzy>It's an adventure!</xyzzy>\n" ~
           "    </baz>\n" ~
           "    <tag/>\n" ~
           "</root>";

auto dom = parseDOM!simpleXML(xml);
assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);
assert(dom.children.length == 1);

auto root = dom.children[0];
assert(root.type == EntityType.elementStart);
assert(root.name == "root");
assert(root.children.length == 3);

assert(root.children[0].type == EntityType.elementStart);
assert(root.children[0].name == "foo");
assert(root.children[0].children.length == 0);

auto baz = root.children[1];
assert(baz.type == EntityType.elementStart);
assert(baz.name == "baz");
assert(baz.children.length == 1);

auto xyzzy = baz.children[0];
assert(xyzzy.type == EntityType.elementStart);
assert(xyzzy.name == "xyzzy");
assert(xyzzy.children.length == 1);

assert(xyzzy.children[0].type == EntityType.text);
assert(xyzzy.children[0].text == "It's an adventure!");

assert(root.children[2].type == EntityType.elementStart);
assert(root.children[2].name == "tag");
assert(root.children[2].children.length == 0);

parseDOM with simpleXML and an EntityRange.

import std.range.primitives : empty;
import dxml.parser : parseXML;

auto xml = "<root>\n" ~
           "    <!-- no comment -->\n" ~
           "    <foo></foo>\n" ~
           "    <baz>\n" ~
           "        <xyzzy>It's an adventure!</xyzzy>\n" ~
           "    </baz>\n" ~
           "    <tag/>\n" ~
           "</root>";

auto range = parseXML!simpleXML(xml);
auto dom = parseDOM(range);
assert(range.empty);

assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);
assert(dom.children.length == 1);

auto root = dom.children[0];
assert(root.type == EntityType.elementStart);
assert(root.name == "root");
assert(root.children.length == 3);

assert(root.children[0].type == EntityType.elementStart);
assert(root.children[0].name == "foo");
assert(root.children[0].children.length == 0);

auto baz = root.children[1];
assert(baz.type == EntityType.elementStart);
assert(baz.name == "baz");
assert(baz.children.length == 1);

auto xyzzy = baz.children[0];
assert(xyzzy.type == EntityType.elementStart);
assert(xyzzy.name == "xyzzy");
assert(xyzzy.children.length == 1);

assert(xyzzy.children[0].type == EntityType.text);
assert(xyzzy.children[0].text == "It's an adventure!");

assert(root.children[2].type == EntityType.elementStart);
assert(root.children[2].name == "tag");
assert(root.children[2].children.length == 0);

parseDOM with an EntityRange which is not at the start of the document.

import std.range.primitives : empty;
import dxml.parser : parseXML, skipToPath;

auto xml = "<root>\n" ~
           "    <!-- no comment -->\n" ~
           "    <foo></foo>\n" ~
           "    <baz>\n" ~
           "        <xyzzy>It's an adventure!</xyzzy>\n" ~
           "    </baz>\n" ~
           "    <tag/>\n" ~
           "</root>";

auto range = parseXML!simpleXML(xml).skipToPath("baz/xyzzy");
assert(range.front.type == EntityType.elementStart);
assert(range.front.name == "xyzzy");

auto dom = parseDOM(range);
assert(range.front.type == EntityType.elementStart);
assert(range.front.name == "tag");

assert(dom.type == EntityType.elementStart);
assert(dom.name.empty);
assert(dom.children.length == 1);

auto xyzzy = dom.children[0];
assert(xyzzy.type == EntityType.elementStart);
assert(xyzzy.name == "xyzzy");
assert(xyzzy.children.length == 1);

assert(xyzzy.children[0].type == EntityType.text);
assert(xyzzy.children[0].text == "It's an adventure!");

parseDOM at compile-time

enum xml = "<!-- comment -->\n" ~
           "<root>\n" ~
           "    <foo>some text<whatever/></foo>\n" ~
           "    <bar/>\n" ~
           "    <baz></baz>\n" ~
           "</root>";

enum dom = parseDOM(xml);
static assert(dom.type == EntityType.elementStart);
static assert(dom.name.empty);
static assert(dom.children.length == 2);

static assert(dom.children[0].type == EntityType.comment);
static assert(dom.children[0].text == " comment ");

Meta