Package org.jsoup.parser
Class Parser
java.lang.Object
org.jsoup.parser.Parser
public class Parser extends Object
-
Constructor Summary
Constructors Constructor Description Parser(org.jsoup.parser.TreeBuilder treeBuilder)Create a new Parser, using the specified TreeBuilder -
Method Summary
Modifier and Type Method Description ParseErrorListgetErrors()Retrieve the parse errors, if any, from the last parse.org.jsoup.parser.TreeBuildergetTreeBuilder()Get the TreeBuilder currently in use.static ParserhtmlParser()Create a new HTML parser.booleanisContentForTagData(String normalName)(An internal method, visible for Element.booleanisTrackErrors()Check if parse error tracking is enabled.ParsernewInstance()Creates a new Parser as a deep copy of this; including initializing a new TreeBuilder.static Documentparse(String html, String baseUri)Parse HTML into a Document.static DocumentparseBodyFragment(String bodyHtml, String baseUri)Parse a fragment of HTML into thebodyof a Document.static List<Node>parseFragment(String fragmentHtml, Element context, String baseUri)Parse a fragment of HTML into a list of nodes.static List<Node>parseFragment(String fragmentHtml, Element context, String baseUri, ParseErrorList errorList)Parse a fragment of HTML into a list of nodes.List<Node>parseFragmentInput(String fragment, Element context, String baseUri)DocumentparseInput(Reader inputHtml, String baseUri)DocumentparseInput(String html, String baseUri)static List<Node>parseXmlFragment(String fragmentXml, String baseUri)Parse a fragment of XML into a list of nodes.ParseSettingssettings()Parsersettings(ParseSettings settings)ParsersetTrackErrors(int maxErrors)Enable or disable parse error tracking for the next parse.ParsersetTreeBuilder(org.jsoup.parser.TreeBuilder treeBuilder)Update the TreeBuilder used when parsing content.static StringunescapeEntities(String string, boolean inAttribute)Utility method to unescape HTML entities from a stringstatic ParserxmlParser()Create a new XML parser.
-
Constructor Details
-
Parser
public Parser(org.jsoup.parser.TreeBuilder treeBuilder)Create a new Parser, using the specified TreeBuilder- Parameters:
treeBuilder- TreeBuilder to use to parse input into Documents.
-
-
Method Details
-
newInstance
Creates a new Parser as a deep copy of this; including initializing a new TreeBuilder. Allows independent (multi-threaded) use.- Returns:
- a copied parser
-
parseInput
-
parseInput
-
parseFragmentInput
-
getTreeBuilder
public org.jsoup.parser.TreeBuilder getTreeBuilder()Get the TreeBuilder currently in use.- Returns:
- current TreeBuilder.
-
setTreeBuilder
Update the TreeBuilder used when parsing content.- Parameters:
treeBuilder- current TreeBuilder- Returns:
- this, for chaining
-
isTrackErrors
public boolean isTrackErrors()Check if parse error tracking is enabled.- Returns:
- current track error state.
-
setTrackErrors
Enable or disable parse error tracking for the next parse.- Parameters:
maxErrors- the maximum number of errors to track. Set to 0 to disable.- Returns:
- this, for chaining
-
getErrors
Retrieve the parse errors, if any, from the last parse.- Returns:
- list of parse errors, up to the size of the maximum errors tracked.
-
settings
-
settings
-
isContentForTagData
(An internal method, visible for Element. For HTML parse, signals that script and style text should be treated as Data Nodes). -
parse
Parse HTML into a Document.- Parameters:
html- HTML to parsebaseUri- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- parsed Document
-
parseFragment
Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context.- Parameters:
fragmentHtml- the fragment of HTML to parsecontext- (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation).baseUri- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.
-
parseFragment
public static List<Node> parseFragment(String fragmentHtml, Element context, String baseUri, ParseErrorList errorList)Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context.- Parameters:
fragmentHtml- the fragment of HTML to parsecontext- (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation).baseUri- base URI of document (i.e. original fetch location), for resolving relative URLs.errorList- list to add errors to- Returns:
- list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.
-
parseXmlFragment
Parse a fragment of XML into a list of nodes.- Parameters:
fragmentXml- the fragment of XML to parsebaseUri- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- list of nodes parsed from the input XML.
-
parseBodyFragment
Parse a fragment of HTML into thebodyof a Document.- Parameters:
bodyHtml- fragment of HTMLbaseUri- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- Document, with empty head, and HTML parsed into body
-
unescapeEntities
Utility method to unescape HTML entities from a string- Parameters:
string- HTML escaped stringinAttribute- if the string is to be escaped in strict mode (as attributes are)- Returns:
- an unescaped string
-
htmlParser
Create a new HTML parser. This parser treats input as HTML5, and enforces the creation of a normalised document, based on a knowledge of the semantics of the incoming tags.- Returns:
- a new HTML parser.
-
xmlParser
Create a new XML parser. This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.- Returns:
- a new simple XML parser.
-