This tutorial will clarify various methods for selecting elements and data within XML or HTML documents. I’ve included for you the explanations and examples of standard XPath methods.
XPath Node Selection
Node selection in XPath refers to choosing specific elements, attributes, or nodes within an XML or HTML document based on their type or location in the document’s hierarchy.
HTML
//img
Select all image elements on a page
XPath Attribute Selection
Attribute selection in XPath involves choosing elements within an XML or HTML document based on their attributes’ values.
HTML
//*[@id = 'gridItemRoot']
Select all Bestseller elements in a grid with the attribute id set to “gridItemRoot.”
XPath Predicate Filtering
Predicate filtering in XPath applies conditions or filters to select specific elements or nodes based on certain criteria. Use conditions inside square brackets to filter elements.
HTML
//span[contains(@class, 'sc-price') and number(translate(., '$', '')) < 10.00]
Select all price elements less than $10
XPath Positional Selection
Positional selection in XPath involves choosing elements within an XML or HTML document based on their position or index in its structure.
HTML
(//*[@id = 'gridItemRoot'])[4]
Select the fourth gridItemRoot element in the Bestseller grid
XPath Text Content Selection
Text content selection in XPath refers to choosing elements within an XML or HTML document based on the textual content contained within those elements.
HTML
//*[text()='The 48 Laws of Power']
Select the title element with “The 48 Laws of Power” text value
XPath Logical Operators
Logical operators in XPath are used to combine or modify conditions within an XPath expression to make more complex selections.
HTML
//div[@id='gridItemRoot' and //*[contains(@class, 'a-icon-star-small')] and .//span[contains(@class, 'sc-price') and number(translate(., '$', '')) < 10.00]]
Select all gridItemRoot elements that have both an “a-icon-star-small” element inside and a “sc-price” less than $10
XPath Axis Selection
Axis selection in XPath involves navigating the document’s hierarchy based on the relationships between elements and nodes, allowing you to select elements related to a specific context node.
XPath Parent Selection
The “parent” XPath is used to select the parent element of a given element. It allows you to navigate the document’s hierarchy to access a specific node’s immediate or nearest enclosing parent element.
Select the entire Best sellers grid — a parent element of the “Comics & Graphic Novels” element.
XPath Preceding Sibling Selection
Preceding sibling selection in XPath allows you to select elements that are siblings of a given context node and appear before it in the document’s hierarchy.
Select all preceding sibling elements for the “Comics & Graphic Novels” element.
XPath Following Sibling Selection
Following sibling selection in XPath allows you to select elements that are siblings of a given context node and appear after it in the document’s hierarchy.
Select all preceding following elements for the “Comics & Graphic Novels” element.
XPath Child Selection
Child selection in XPath involves selecting elements that are direct children of a given parent element or context node within the XML or HTML document.
Find the parent of the “Comics & Graphic Novels” element and get all child elements.
XPath Wildcards
Wildcard selection in XPath involves using wildcard symbols to match elements or attributes regardless of their specific names or values.
HTML
//*
Select all elements in the document.
XPath Functions
Functions in XPath are predefined operations or calculations that you can use within an XPath expression to manipulate or evaluate nodes, attributes, or values in XML or HTML documents.
These methods offer versatile ways to locate specific elements, attributes, or data within XML and HTML documents, making XPath a powerful tool for tasks such as web scraping, data extraction, and test automation.
Feel free to check the related article, where you can find how to build reliable locators using XPath