EntityRange

Lazily parses the given range of characters as an XML document.

EntityRange is essentially a StAX parser, though it evolved into that rather than being based on what Java did, and it's range-based rather than iterator-based, so its API is likely to differ from other implementations. The basic concept should be the same though.

One of the core design goals of this parser is to slice the original input rather than having to allocate strings for the output or wrap it in a lazy range that produces a mutated version of the data. So, all of the text that the parser provides is either a slice or $(PHOBOS_REF takeExactly, std, range) of the input. However, in some cases, for the parser to be fully compliant with the XML spec, dxml.util.decodeXML must be called on the text to mutate certain constructs (e.g. removing any '\r' in the text or converting "&lt;" to '<'). But that's left up to the application.

The parser is not $(K_NOGC), but it allocates memory very minimally. It allocates some of its state on the heap so it can validate attributes and end tags. However, that state is shared among all the ranges that came from the same call to parseXML (only the range farthest along in parsing validates attributes or end tags), so save does not allocate memory unless save on the underlying range allocates memory. The shared state currently uses a couple of dynamic arrays to validate the tags and attributes, and if the document has a particularly deep tag depth or has a lot of attributes on a start tag, then some reallocations may occur until the maximum is reached, but enough is reserved that for most documents, no reallocations will occur. The only other times that the parser would allocate would be if an exception were thrown or if the range that was passed to parseXML allocates for any reason when calling any of the range primitives.

If invalid XML is encountered at any point during the parsing process, an XMLParsingException will be thrown. If an exception has been thrown, then the parser is in an invalid state, and it is an error to call any functions on it.

However, note that XML validation is reduced for any entities that are skipped (e.g. for anything in the DTD, validation is reduced to what is required to correctly parse past it, and when Config.skipPI == SkipPI.yes, processing instructions are only validated enough to correctly skip past them).

As the module documentation says, this parser does not provide any DTD support. It is not possible to properly support the DTD while returning slices of the original input, and the DTD portion of the spec makes parsing XML far, far more complicated.

A quick note about carriage returns$(COLON) per the XML spec, they are all supposed to either be stripped out or replaced with newlines or spaces before the XML parser even processes the text. That doesn't work when the parser is slicing the original text and not mutating it at all. So, for the purposes of parsing, this parser treats all carriage returns as if they were newlines or spaces (though they won't count as newlines when counting the lines for TextPos). However, they will appear in any text fields or attribute values if they are in the document (since the text fields and attribute values are slices of the original text). dxml.util.decodeXML can be used to strip them along with converting any character references in the text. Alternatively, the application can remove them all before calling parseXML, but it's not necessary.

struct EntityRange (
Config cfg
R
) if (
isForwardRange!R &&
isSomeChar!(ElementType!R)
) {
enum compileInTests;
}

Postblit

A postblit is present on this object, but not explicitly documented in the source.

Members

Aliases

Input
alias Input = R

The type of the range that EntityRange is parsing.

SliceOfR
alias SliceOfR = R

The type used when any slice of the original input is used. If R is a string or supports slicing, then SliceOfR is the same as R; otherwise, it's the result of calling $(PHOBOS_REF takeExactly, std, range) on the input.

config
alias config = cfg

The Config used for when parsing the XML.

Properties

text
SliceOfR text [@property getter]

Returns the textual value of this Entity.

Structs

Entity
struct Entity

Represents an entity in the XML document.

Meta