The exception type thrown when the XML parser encounters invalid XML.
Represents the type of an XML entity. Used by EntityRange.Entity.
A helper function for processing start tag attributes.
Helper function for creating a custom config. It makes it easy to set one or more of the member variables to something other than the default without having to worry about explicitly setting them individually or setting them all at once via a constructor.
Lazily parses the given range of characters as an XML document.
Takes an EntityRange which is at a start tag and iterates it until it is at its corresponding end tag. It is an error to call skipContents when the current entity is not EntityType.elementStart.
Skips entities until the given EntityType is reached.
Skips entities until the end tag is reached that corresponds to the start tag that is the parent of the current entity.
Treats the given string like a file path except that each directory corresponds to the name of a start tag. Note that this does not try to implement XPath as that would be quite complicated, and it really doesn't fit with a StAX parser.
This Config is intended for making it easy to parse XML by skipping everything that isn't the actual data as well as making it simpler to deal with empty element tags by treating them the same as a start tag and end tag with nothing but whitespace between them.
Used to configure how the parser works.
Lazily parses the given range of characters as an XML document.
Where in the XML document an entity is.
Whether the given type is a forward range of attributes.
1 auto xml = "<!-- comment -->\n" ~ 2 "<root>\n" ~ 3 " <foo>some text<whatever/></foo>\n" ~ 4 " <bar/>\n" ~ 5 " <baz></baz>\n" ~ 6 "</root>"; 7 { 8 auto range = parseXML(xml); 9 assert(range.front.type == EntityType.comment); 10 assert(range.front.text == " comment "); 11 range.popFront(); 12 13 assert(range.front.type == EntityType.elementStart); 14 assert(range.front.name == "root"); 15 range.popFront(); 16 17 assert(range.front.type == EntityType.elementStart); 18 assert(range.front.name == "foo"); 19 range.popFront(); 20 21 assert(range.front.type == EntityType.text); 22 assert(range.front.text == "some text"); 23 range.popFront(); 24 25 assert(range.front.type == EntityType.elementEmpty); 26 assert(range.front.name == "whatever"); 27 range.popFront(); 28 29 assert(range.front.type == EntityType.elementEnd); 30 assert(range.front.name == "foo"); 31 range.popFront(); 32 33 assert(range.front.type == EntityType.elementEmpty); 34 assert(range.front.name == "bar"); 35 range.popFront(); 36 37 assert(range.front.type == EntityType.elementStart); 38 assert(range.front.name == "baz"); 39 range.popFront(); 40 41 assert(range.front.type == EntityType.elementEnd); 42 assert(range.front.name == "baz"); 43 range.popFront(); 44 45 assert(range.front.type == EntityType.elementEnd); 46 assert(range.front.name == "root"); 47 range.popFront(); 48 49 assert(range.empty); 50 } 51 { 52 auto range = parseXML!simpleXML(xml); 53 54 // simpleXML skips comments 55 56 assert(range.front.type == EntityType.elementStart); 57 assert(range.front.name == "root"); 58 range.popFront(); 59 60 assert(range.front.type == EntityType.elementStart); 61 assert(range.front.name == "foo"); 62 range.popFront(); 63 64 assert(range.front.type == EntityType.text); 65 assert(range.front.text == "some text"); 66 range.popFront(); 67 68 // simpleXML splits empty element tags into a start tag and end tag 69 // so that the code doesn't have to care whether a start tag with no 70 // content is an empty tag or a start tag and end tag with nothing but 71 // whitespace in between. 72 assert(range.front.type == EntityType.elementStart); 73 assert(range.front.name == "whatever"); 74 range.popFront(); 75 76 assert(range.front.type == EntityType.elementEnd); 77 assert(range.front.name == "whatever"); 78 range.popFront(); 79 80 assert(range.front.type == EntityType.elementEnd); 81 assert(range.front.name == "foo"); 82 range.popFront(); 83 84 assert(range.front.type == EntityType.elementStart); 85 assert(range.front.name == "bar"); 86 range.popFront(); 87 88 assert(range.front.type == EntityType.elementEnd); 89 assert(range.front.name == "bar"); 90 range.popFront(); 91 92 assert(range.front.type == EntityType.elementStart); 93 assert(range.front.name == "baz"); 94 range.popFront(); 95 96 assert(range.front.type == EntityType.elementEnd); 97 assert(range.front.name == "baz"); 98 range.popFront(); 99 100 assert(range.front.type == EntityType.elementEnd); 101 assert(range.front.name == "root"); 102 range.popFront(); 103 104 assert(range.empty); 105 }
See Source File
$(LINK_TO_SRC dxml/_parser.d)
Copyright 2017 - 2023
This implements a range-based StAX _parser for XML 1.0 (which will work with XML 1.1 documents assuming that they don't use any 1.1-specific features). For the sake of simplicity, sanity, and efficiency, the DTD section is not supported beyond what is required to parse past it.
Start tags, end tags, comments, cdata sections, and processing instructions are all supported and reported to the application. Anything in the DTD is skipped (though it's parsed enough to parse past it correctly, and that can result in an XMLParsingException if that XML isn't valid enough to be correctly skipped), and the XML declaration at the top is skipped if present (XML 1.1 requires that it be there, but XML 1.0 does not).
Regardless of what the XML declaration says (if present), any range of $(K_CHAR) will be treated as being encoded in UTF-8, any range of $(K_WCHAR) will be treated as being encoded in UTF-16, and any range of $(K_DCHAR) will be treated as having been encoded in UTF-32. Strings will be treated as ranges of their code units, not code points. Note that like Phobos typically does when processing strings, the code assumes that BOMs have already been removed, so if the range of characters comes from a file that uses a BOM, the calling code needs to strip it out before calling parseXML, or parsing will fail due to invalid characters.
Since the DTD is skipped, entity references other than the five which are predefined by the XML spec cannot be fully processed (since wherever they were used in the document would be replaced by what they referred to, which could be arbitrarily complex XML). As such, by default, if any entity references which are not predefined are encountered outside of the DTD, an XMLParsingException will be thrown (see Config.throwOnEntityRef for how that can be configured). The predefined entity references and any character references encountered will be checked to verify that they're valid, but they will not be replaced (since that does not work with returning slices of the original input).
However, decodeXML or parseStdEntityRef from dxml.util can be used to convert the predefined entity references to what the refer to, and decodeXML or parseCharRef from dxml.util can be used to convert character references to what they refer to.
Primary Symbols
Parser Configuration Helpers
Helper Types Used When Parsing
Helper Functions Used When Parsing
Helper Traits