Title: | Tools for Working with XML Files as R Dataframes |
---|---|
Description: | On import, the XML information is converted to a dataframe that reflects the hierarchical XML structure. Intuitive functions allow to navigate within this transparent XML data structure (without any knowledge of 'XPath'). 'flatXML' also provides tools to extract data from the XML into a flat dataframe that can be used to perform statistical operations. It also supports converting dataframes to XML. |
Authors: | Joachim Zuckarelli [aut, cre] |
Maintainer: | Joachim Zuckarelli <[email protected]> |
License: | GPL-3 |
Version: | 0.1.1 |
Built: | 2025-02-12 05:26:04 UTC |
Source: | https://github.com/jsugarelli/flatxml |
flatxml
provides functions to easily deal with XML files. When parsing an XML document with fxml_importXMLFlat
, flatxml
produces a special dataframe that is \'flat\' by its very nature but contains all necessary information about the hierarchical structure of the underlying XML document (for details on the dataframe see the reference for the fxml_importXMLFlat
function).
flatxml
offers a set of functions to work with this dataframe.
Apart from representing the XML document in a dataframe structure, there is yet another way in which flatxml
relates to dataframes: the fxml_toDataFrame
and fxml_toXML
functions can be used convert XML data to dataframes and vice versa.
Each XML element, for example <tag attribute="some value">Here is some text</tag>
has certain characteristics that can be accessed via the flatxml
interface functions, after an XML document has been imported with fxml_importXMLFlat
. These characteristics are:
value: The (text) value of the element, "Here is some text"
in the example above
attributes: The XML attributes of the element, attribute
with its value "some value"
in the example above
children: The elements on the next lower hierarchical level
parent: The element of the next higher hierarchical level, i.e. the element to which the current element is a child
siblings: The elements on the same hierarchical level as the current element
The flatxml
interface to access these characteristics follows a simple logic: For each of the characteristics there are typically three functions available:
fxml_has...()
: Determines if the current XML element has (at least one instance of) the characteristic
fxml_num...()
: Returns the number of the characteristics of the current XML (e.g. the number of children elements)
fxml_get...()
: Returns (the IDs of) the respective characteristics of the current XML element (e.g. the children of the current element)
For values:
For attributes:
fxml_getAttribute
(note: no plural 's'!)
fxml_getAttributesAll
(get all attributes instead of a specific one)
For children:
For parents:
For siblings:
fxml_findPath
(search anywhere in the path to an XML element)
fxml_findPathFull
(find an element based on its complete path)
fxml_findPathRoot
(search in the path to an XML element starting at the top element [root node])
fxml_findPathBottom
(search in the path to an XML element starting at the lowest hierarchical level)
fxml_toDataFrame
(converts a (flattened) XML document to a dataframe)
fxml_toXML
(converts a dataframe to an XML document)
fxml_getElement
(name on an XML element (the tag
in <tag>…</tag>
)
fxml_getUniqueElements
(unique XML elements in the document)
fxml_getElementInfo
(all relevant information on an XML element (children, siblings, etc.)
fxml_getDepthLevel
(level of an element in the hierarchy of the XML document)
Finds all XML elements in an XML document that lie on a certain path, regardless of where exactly the path is found in the XML document. Sub-elements (children) of the elements on the search path are returned, too.
fxml_findPath(xmlflat.df, path, attr.only = NULL, attr.not = NULL)
fxml_findPath(xmlflat.df, path, attr.only = NULL, attr.not = NULL)
xmlflat.df |
A flat XML dataframe created with |
path |
A character vector representing the path to be searched. Each element of the vector is a hierarchy level in the XML document. Example: |
attr.only |
A list of named vectors representing attribute/value combinations the XML elements on the search path must match.
The name of an element in the list is the XML elment name to which the attribute belongs. The list element itself is a named vector.
The vector's elements represent different attributes (= the names of the vector elements) and their values (= vector elements).
Example: |
attr.not |
A list of vectors representing attribute/value combinations the XML elements on the search path must not match to be included in the results. See argument |
With fxml_findPath()
it does not matter where exactly in the hierarchy of the XML document the path is found. If, for example, path = c("tag1", "tag2")
then
the element with full XML path <xml><testdoc><tag1><tag2>
would be found, too.
Other fxml_findPath...()
functions allow for different search modes:
fxml_findPathRoot
: Search for path from the root node of the XML document downwards. Sub-elements are returned, too.
fxml_findPathFull
: Search for exact path (always starting from the root node). No sub-elements returned, as they have a different path than the search path.
fxml_findPathBottom
: Search for path from the bottom of the element hierarchy in the XML document.
The IDs (xmlflat.df$elemid.
) of the XML elements that are located on the provided path. Sub-elements of the elements on the search path are returned, too. NULL
, if no elements where found.
Joachim Zuckarelli [email protected]
fxml_findPathRoot
, fxml_findPathFull
, fxml_findPathBottom
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Find all XML elements with <data><record><field> in their XML path path <- c("data", "record", "field") fxml_findPath(xml.dataframe, path) # Find only those XML elements with <data><record><field> in their XML path that have the # "name" attribute of the <field> element set to "Sex" path <- c("data", "record", "field") fxml_findPath(xml.dataframe, path, attr.only = list(field = c(name = "Sex")))
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Find all XML elements with <data><record><field> in their XML path path <- c("data", "record", "field") fxml_findPath(xml.dataframe, path) # Find only those XML elements with <data><record><field> in their XML path that have the # "name" attribute of the <field> element set to "Sex" path <- c("data", "record", "field") fxml_findPath(xml.dataframe, path, attr.only = list(field = c(name = "Sex")))
Finds all XML elements in an XML document that lie on a certain path. The path of the found elements must end with the provided search path.
fxml_findPathBottom(xmlflat.df, path, attr.only = NULL, attr.not = NULL)
fxml_findPathBottom(xmlflat.df, path, attr.only = NULL, attr.not = NULL)
xmlflat.df |
A flat XML dataframe created with |
path |
A character vector representing the path to be searched. Each element of the vector is a hierarchy level in the XML document. Example: |
attr.only |
A list of named vectors representing attribute/value combinations the XML elements on the search path must match.
The name of an element in the list is the XML elment name to which the attribute belongs. The list element itself is a named vector.
The vector's elements represent different attributes (= the names of the vector elements) and their values (= vector elements).
Example: |
attr.not |
A list of vectors representing attribute/value combinations the XML elements on the search path must not match to be included in the results. See argument |
With fxml_findPathRoot()
, the search always starts at the bottom of the element hierarchy of the XML document. Only if the path of an elemends ends with the provided search path, it is returned as a result.
If, for example, path = c("tag1", "tag2")
then the element with full XML path <tag1><tag2><tag3>
would not be found, only if search path were c("tag2", "tag3")
.
Other fxml_findPath...()
functions allow for different search modes:
fxml_findPath
: Search for path anywhere in the XML document (not necessarily starting at the root node). Sub-elements are returned, too.
fxml_findPathRoot
: Search for path from the root node of the XML document downwards. Sub-elements are returned, too.
fxml_findPathFull
: Search for exact path (always starting from the root node). No sub-elements returned, as they have a different path than the search path.
The IDs (xmlflat.df$elemid.
) of the XML elements that are located on the provided path. NULL
, if no elements where found.
Joachim Zuckarelli [email protected]
fxml_findPath
, fxml_findPathRoot
, fxml_findPathFull
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Find all XML elements that have a path ending with <record><field> path <- c("record", "field") fxml_findPathBottom(xml.dataframe, path) # Find all XML elements that have a path ending with <record><field>, but only # those which have the "name" attribute of the <field> element set to "Sex" path <- c("record", "field") fxml_findPathBottom(xml.dataframe, path, attr.only = list(field = c(name = "Sex")))
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Find all XML elements that have a path ending with <record><field> path <- c("record", "field") fxml_findPathBottom(xml.dataframe, path) # Find all XML elements that have a path ending with <record><field>, but only # those which have the "name" attribute of the <field> element set to "Sex" path <- c("record", "field") fxml_findPathBottom(xml.dataframe, path, attr.only = list(field = c(name = "Sex")))
Finds all XML elements in an XML document that lie on a certain path. The path of the found elements must match exactly the search path.
fxml_findPathFull(xmlflat.df, path, attr.only = NULL, attr.not = NULL)
fxml_findPathFull(xmlflat.df, path, attr.only = NULL, attr.not = NULL)
xmlflat.df |
A flat XML dataframe created with |
path |
A character vector representing the path to be searched. Each element of the vector is a hierarchy level in the XML document. Example: |
attr.only |
A list of named vectors representing attribute/value combinations the XML elements on the search path must match.
The name of an element in the list is the XML elment name to which the attribute belongs. The list element itself is a named vector.
The vector's elements represent different attributes (= the names of the vector elements) and their values (= vector elements).
Example: |
attr.not |
A list of vectors representing attribute/value combinations the XML elements on the search path must not match to be included in the results. See argument |
With fxml_findPathRoot()
, the search always starts at the root node of the XML document. Only if an element has exactly the same path as the search path, it is returned as a result.
If, for example, path = c("tag1", "tag2")
then the element with full XML path <tag1><tag2><tag3>
would not be found, only if search path were c("tag1", "tag2", "tag3")
.
Other fxml_findPath...()
functions allow for different search modes:
fxml_findPath
: Search for path anywhere in the XML document (not necessarily starting at the root node). Sub-elements are returned, too.
fxml_findPathRoot
: Search for path from the root node of the XML document downwards. Sub-elements are returned, too.
fxml_findPathBottom
: Search for path from the bottom of the element hierarchy in the XML document.
The IDs (xmlflat.df$elemid.
) of the XML elements that are located on the provided path. Sub-elements of the elements on the search path are not returned as they have a different search path. NULL
, if no elements where found.
Joachim Zuckarelli [email protected]
fxml_findPath
, fxml_findPathRoot
, fxml_findPathBottom
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Find all XML elements that have the exact path <root><data><record> path <- c("root", "data", "record") fxml_findPathFull(xml.dataframe, path)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Find all XML elements that have the exact path <root><data><record> path <- c("root", "data", "record") fxml_findPathFull(xml.dataframe, path)
Finds all XML elements in an XML document that lie on a certain path. Search starts from the root node of the XML document. Sub-elements (children) of the elements on the search path are returned, too.
fxml_findPathRoot(xmlflat.df, path, attr.only = NULL, attr.not = NULL)
fxml_findPathRoot(xmlflat.df, path, attr.only = NULL, attr.not = NULL)
xmlflat.df |
A flat XML dataframe created with |
path |
A character vector representing the path to be searched. Each element of the vector is a hierarchy level in the XML document. Example: |
attr.only |
A list of named vectors representing attribute/value combinations the XML elements on the search path must match.
The name of an element in the list is the XML elment name to which the attribute belongs. The list element itself is a named vector.
The vector's elements represent different attributes (= the names of the vector elements) and their values (= vector elements).
Example: |
attr.not |
A list of vectors representing attribute/value combinations the XML elements on the search path must not match to be included in the results. See argument |
With fxml_findPathRoot()
, the search always starts at the root node of the XML document. If, for example, path = c("tag1", "tag2")
then
the element with full XML path <xml><testdoc><tag1><tag2>
would not be found, only if search path were c("xml", "testdoc", "tag1", "tag2")
Other fxml_findPath...()
functions allow for different search modes:
fxml_findPath
: Search for path anywhere in the XML document (not necessarily starting at the root node). Sub-elements are returned, too.
fxml_findPathFull
: Search for exact path (always starting from the root node). No sub-elements returned, as they have a different path than the search path.
fxml_findPathBottom
: Search for path from the bottom of the element hierarchy in the XML document.
The IDs (xmlflat.df$elemid.
) of the XML elements that are located on the provided path. Sub-elements of the elements on the search path are returned, too. NULL
, if no elements where found.
Joachim Zuckarelli [email protected]
fxml_findPath
, fxml_findPathFull
, fxml_findPathBottom
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Find all XML elements that have a path starting with <root><data><record><field> path <- c("root", "data", "record", "field") fxml_findPathRoot(xml.dataframe, path) # Find all XML elements that have a path starting with <root><data><record><field>, but only # those which have the "name" attribute of the <field> element set to "Sex" path <- c("root", "data", "record", "field") fxml_findPathRoot(xml.dataframe, path, attr.only = list(field = c(name = "Sex")))
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Find all XML elements that have a path starting with <root><data><record><field> path <- c("root", "data", "record", "field") fxml_findPathRoot(xml.dataframe, path) # Find all XML elements that have a path starting with <root><data><record><field>, but only # those which have the "name" attribute of the <field> element set to "Sex" path <- c("root", "data", "record", "field") fxml_findPathRoot(xml.dataframe, path, attr.only = list(field = c(name = "Sex")))
Returns the value of a specific attribute of an XML element.
fxml_getAttribute(xmlflat.df, elemid, attrib.name)
fxml_getAttribute(xmlflat.df, elemid, attrib.name)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
attrib.name |
Name of the attribute. |
The value of attribute attrib.name
of the XML element with ID elemid
. If the attribute is not existing, an error message is shown.
Joachim Zuckarelli [email protected]
fxml_hasAttributes
, fxml_numAttributes
, fxml_getAttributesAll
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Read the value of attribute "name" from the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_getAttribute(xml.dataframe, 4, "name")
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Read the value of attribute "name" from the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_getAttribute(xml.dataframe, 4, "name")
Returns all attributes of an XML element and their respective values.
fxml_getAttributesAll(xmlflat.df, elemid)
fxml_getAttributesAll(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
A named vector containing the attribute values of all attributes of the XML element with ID elemid
. The names of the vector are the names of the attributes. Returns NULL
if the element has no attributes at all.
Joachim Zuckarelli [email protected]
fxml_hasAttributes
, fxml_numAttributes
, fxml_getAttribute
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get all attribute of the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_getAttributesAll(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get all attribute of the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_getAttributesAll(xml.dataframe, 4)
Returns the children of an XML element.
fxml_getChildren(xmlflat.df, elemid)
fxml_getChildren(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
The IDs (xmlflat.df$elemid.
) of the children of the XML element with ID elemid
. If no children exist, NULL
is returned.
Joachim Zuckarelli [email protected]
fxml_hasChildren
, fxml_numChildren
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get all the children (sub-elements) of the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_hasChildren(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get all the children (sub-elements) of the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_hasChildren(xml.dataframe, 4)
Hierarchical position of an XML element
fxml_getDepthLevel(xmlflat.df, elemid)
fxml_getDepthLevel(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
The number of the hierarchy level of the XML element with ID elemid
. The root node of the XML data has hierarchy level 1.
Joachim Zuckarelli [email protected]
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Determine hierarchy level of XML element with ID 3 (xml.dataframe$elemid. == 3) fxml_getDepthLevel(xml.dataframe, 3)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Determine hierarchy level of XML element with ID 3 (xml.dataframe$elemid. == 3) fxml_getDepthLevel(xml.dataframe, 3)
Returns the element name of an XML element.
fxml_getElement(xmlflat.df, elemid)
fxml_getElement(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
Name of the element identified by the ID (xmlflat.df$elemid.
) elemid
.
Joachim Zuckarelli [email protected]
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get the XML element with ID 3 (xml.dataframe$elemid. == 3) fxml_getElement(xml.dataframe, 3)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get the XML element with ID 3 (xml.dataframe$elemid. == 3) fxml_getElement(xml.dataframe, 3)
Returns summary information on an XML element.
fxml_getElementInfo(xmlflat.df, elemid)
fxml_getElementInfo(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
A list with the following elements:
value
: The value of the XML element; return value of the fxml_getValue
function.
path
: A vector representing the path from the root element of the XML element document to the current element. Each XML element on the path is represented by a element of the vector. The vector elements are the names of the XML elements on the path.
depth.level
: The depth level (hierarchy level) of the XML element; return value of the fxml_getDepthLevel
function.
attributes
: A named vector with the attributes of the XML element (vector elements are the attributes' values, names of the vector elements are the attributes' names; return value of the fxml_getAttributesAll
function.
parent
: The parent of the XML element; return value of the fxml_getParent
function.
children
: The children of the XML element; return value of the fxml_getChildren
function.
siblings
: The siblings of the XML element; return value of the fxml_getSiblings
function.
Joachim Zuckarelli [email protected]
fxml_getElement
, fxml_getValue
, fxml_getDepthLevel
, fxml_getAttribute
, fxml_getChildren
, fxml_getParent
, fxml_getSiblings
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get all relevant information on the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_getElementInfo(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get all relevant information on the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_getElementInfo(xml.dataframe, 4)
Returns the parent of an XML element.
fxml_getParent(xmlflat.df, elemid)
fxml_getParent(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
The ID (xmlflat.df$elemid.
) of the parent node of the XML element with ID elemid
. If no parent exists (because XML node elemid
is the root node of the XML document) then NULL
is returned.
Joachim Zuckarelli [email protected]
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get the ID of the parent element of the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_getParent(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get the ID of the parent element of the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_getParent(xml.dataframe, 4)
Returns the siblings of an XML element, i.e. the elements on the same hierarchical level.
fxml_getSiblings(xmlflat.df, elemid)
fxml_getSiblings(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
The IDs (xmlflat.df$elemid.
) of the siblings of the XML element with ID elemid
. If no siblings exist, NULL
is returned.
Joachim Zuckarelli [email protected]
fxml_hasSiblings
, fxml_getSiblings
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get all the siblings (elements on the same hierarchy level) of the XML element with ID 4 # (xml.dataframe$elemid. == 4) fxml_getSiblings(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get all the siblings (elements on the same hierarchy level) of the XML element with ID 4 # (xml.dataframe$elemid. == 4) fxml_getSiblings(xml.dataframe, 4)
Returns the unique XML elements included in an XML document.
fxml_getUniqueElements(xmlflat.df)
fxml_getUniqueElements(xmlflat.df)
xmlflat.df |
A flat XML dataframe created with |
A vector with all the names of the elements included in the XML document xmlflat.df
. Every tag is only returned once, even if it occurs multiple times in the document. The return vector is empty (NULL
) if no elements exist.
Joachim Zuckarelli [email protected]
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Identify the unique XML elements fxml_getUniqueElements(xml.dataframe)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Identify the unique XML elements fxml_getUniqueElements(xml.dataframe)
Returns the value of an XML element.
fxml_getValue(xmlflat.df, elemid)
fxml_getValue(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
The value of the XML element with ID elemid
. NA
is returned if the element has no value.
Joachim Zuckarelli [email protected]
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get the value of the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_hasValue(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Get the value of the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_hasValue(xml.dataframe, 4)
Determines if an XML element has any attributes.
fxml_hasAttributes(xmlflat.df, elemid)
fxml_hasAttributes(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
TRUE
if the the XML element with ID elemid
has at least one attribute, FALSE
otherwise.
Joachim Zuckarelli [email protected]
fxml_getAttribute
, fxml_numAttributes
, fxml_getAttributesAll
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Check if the XML element with ID 4 (xml.dataframe$elemid. == 4) has any attributes fxml_hasAttributes(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Check if the XML element with ID 4 (xml.dataframe$elemid. == 4) has any attributes fxml_hasAttributes(xml.dataframe, 4)
Determines if an XML element has any children.
fxml_hasChildren(xmlflat.df, elemid)
fxml_hasChildren(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
TRUE
, if the the XML element with ID elemid
has at least one child, FALSE
otherwise.
Joachim Zuckarelli [email protected]
fxml_numChildren
, fxml_getChildren
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Check, if the XML element with ID 4 (xml.dataframe$elemid. == 4) has any # children (sub-elements) fxml_hasChildren(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Check, if the XML element with ID 4 (xml.dataframe$elemid. == 4) has any # children (sub-elements) fxml_hasChildren(xml.dataframe, 4)
Determines, if an XML element has a parent element.
fxml_hasParent(xmlflat.df, elemid)
fxml_hasParent(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
TRUE
, if a parent element for the XML element with ID elemid
exists, FALSE
otherwise (which would mean that the XML element is the root node of the XML document).
Joachim Zuckarelli [email protected]
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Check if the XML element with ID 4 (xml.dataframe$elemid. == 4) has a parent element fxml_hasParent(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Check if the XML element with ID 4 (xml.dataframe$elemid. == 4) has a parent element fxml_hasParent(xml.dataframe, 4)
Determines if an XML element has any siblings, i.e. elements on the same hierarchical level.
fxml_hasSiblings(xmlflat.df, elemid)
fxml_hasSiblings(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
TRUE
, if the the XML element with ID elemid
has at least one sibling, FALSE
otherwise.
Joachim Zuckarelli [email protected]
#' @seealso fxml_numSiblings
, fxml_getSiblings
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Check if XML element with ID 4 (xml.dataframe$elemid. == 4) has any siblings # (elements on the same hierarchy level) fxml_hasSiblings(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Check if XML element with ID 4 (xml.dataframe$elemid. == 4) has any siblings # (elements on the same hierarchy level) fxml_hasSiblings(xml.dataframe, 4)
Determines if an XML element carries a value.
fxml_hasValue(xmlflat.df, elemid)
fxml_hasValue(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
TRUE
if the XML element has a value (not being equal to NA
), FALSE
otherwise.
Joachim Zuckarelli [email protected]
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Check if element with ID 4 (xml.dataframe$elemid. == 4) carries a value fxml_hasValue(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Check if element with ID 4 (xml.dataframe$elemid. == 4) carries a value fxml_hasValue(xml.dataframe, 4)
Reads an XML document into a flat dataframe structure.
fxml_importXMLFlat(path)
fxml_importXMLFlat(path)
path |
Path to the XML document. Can be either a local path or a URL. |
The XML document is parsed and stored in a dataframe structure (flat XML). The first four columns of a flat XML dataframe are standard columns. Their names all end with a dot. These columns are:
elem.
: The element identifier of the current XML element (without the tag delimiters <
and >
).
elemid.
: A unique, ascending numerical ID for each XML element. The first XML element is assigned 1 as its ID. This ID is used by many of the flatxml
functions.
attr.
: Name of an attribute. For each attribute of an XML element the dataframe will have an additional row.
value.
: The value of either the attribute (if attr.
is not NA
) or the element itself (if attr.
is NA
). value.
is NA
, if the element has no value.
The columns after these four standard columns represent the 'path' to the current element, starting from the root element of the XML document in column 5 all
the way down to the current element. The number of columns of the dataframe is therefore determined by the depth of the hierarchical structure of the XML document.
In this dataframe representation, the hierarchical structure of the XML document becomes very easy to understand. All flatxml
functions work with this flat XML dataframe.
If an XML element has N attributes it is represented by (N+1) rows in the flat XML dataframe: one row for the value (with dataframe$value.
being NA
if the element has no value)
and one for each attribute. In the attribute rows, the names of the attributes are stored in the attr.
field, their respecitive values in the value.
field. Even if there are multiple rows
for one XML element, the elem.
and elemid.
fields still have the same value in all rows (because the rows belong to the same XML element).
A dataframe containing the XML document in a flat structure. See the Details section for more information on its structure.
Joachim Zuckarelli [email protected]
# Load example file with population data from United Nations Statistics Division example <- system.file("worldpopulation.xml", package="flatxml") # Create flat dataframe from XML xml.dataframe <- fxml_importXMLFlat(example)
# Load example file with population data from United Nations Statistics Division example <- system.file("worldpopulation.xml", package="flatxml") # Create flat dataframe from XML xml.dataframe <- fxml_importXMLFlat(example)
Determines the number of attributes of an XML element.
fxml_numAttributes(xmlflat.df, elemid)
fxml_numAttributes(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
The number of attributes of the XML element with ID elemid
.
Joachim Zuckarelli [email protected]
fxml_hasAttributes
, fxml_getAttribute
, fxml_getAttributesAll
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Determine the number of attributes of the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_numAttributes(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Determine the number of attributes of the XML element with ID 4 (xml.dataframe$elemid. == 4) fxml_numAttributes(xml.dataframe, 4)
Determines the number of children of an XML element.
fxml_numChildren(xmlflat.df, elemid)
fxml_numChildren(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
The number of children of the XML element with ID elemid
.
Joachim Zuckarelli [email protected]
fxml_hasChildren
, fxml_getChildren
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Determine the number of children (sub-elements) of the XML element with ID 4 # (xml.dataframe$elemid. == 4) fxml_numChildren(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Determine the number of children (sub-elements) of the XML element with ID 4 # (xml.dataframe$elemid. == 4) fxml_numChildren(xml.dataframe, 4)
Determines the number of siblings of an XML element, i.e. elements on the same hierarchical level.
fxml_numSiblings(xmlflat.df, elemid)
fxml_numSiblings(xmlflat.df, elemid)
xmlflat.df |
A flat XML dataframe created with |
elemid |
The ID of the XML element. The ID is the value of the |
The number of siblings of the XML element with ID elemid
.
Joachim Zuckarelli [email protected]
#' @seealso fxml_hasSiblings
, fxml_getSiblings
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Determine the number of siblings (elements on the same hierarchy level) of the XML element # with ID 4 (xml.dataframe$elemid. == 4) fxml_numSiblings(xml.dataframe, 4)
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Determine the number of siblings (elements on the same hierarchy level) of the XML element # with ID 4 (xml.dataframe$elemid. == 4) fxml_numSiblings(xml.dataframe, 4)
Converts an XML document to a dataframe.
fxml_toDataFrame( xmlflat.df, siblings.of, same.tag = TRUE, attr.only = NULL, attr.not = NULL, elem.or.attr = "elem", col.attr = "", include.fields = NULL, exclude.fields = NULL )
fxml_toDataFrame( xmlflat.df, siblings.of, same.tag = TRUE, attr.only = NULL, attr.not = NULL, elem.or.attr = "elem", col.attr = "", include.fields = NULL, exclude.fields = NULL )
xmlflat.df |
A flat XML dataframe created with |
siblings.of |
ID of one of the XML elements that contain the data records. All data records need to be on the same hierarchical level as the XML element with this ID. |
same.tag |
If |
attr.only |
A list of named vectors representing attribute/value combinations the data records must match.
The name of an element in the list is the XML element name to which the attribute belongs. The list element itself is a named vector.
The vector's elements represent different attributes (= the names of the vector elements) and their values (= vector elements).
Example: |
attr.not |
A list of vectors representing attribute/value combinations the XML elements must not match to be considered as data records. See argument |
elem.or.attr |
Either |
col.attr |
If |
include.fields |
A character vector with the names of the fields that are to be included in the result dataframe. By default, all fields from the XML document are included. |
exclude.fields |
A character vector with the names of the fields that should be excluded in the result dataframe. By default, no fields from the XML document are excluded. |
Data that can be read in are either represented in this way:<record>
<field1>Value of field1</field1>
<field2>Value of field2</field2>
<field3>Value of field3</field3>
</record>
...
In this case elem.or.attr
would need to be "elem"
because the field names of the data records (field1
, field2
, field3
) are the names of the elements.
Or, the XML data could also look like this: <record>
<column name="field1">Value of field1</column>
<column name="field2">Value of field2</column>
<column name="field3">Value of field3</column>
</record>
...
Here, the names of the fields are attributes, so elem.or.attr
would need to be "attr"
and col.attr
would be set to
"name"
, so fxml_toDataframe()
knows where to look for the field/column names.
In any case, siblings.of
would be the ID (xmlflat.df$elemid.
) of one of the <record>
elements.
A dataframe with the data read in from the XML document.
Joachim Zuckarelli [email protected]
fxml_importXMLFlat
, fxml_toXML
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Extract the data out of the XML document. The data records are on the same hierarchical level # as element with ID 3 (xml.dataframe$elemid. == 3). # The field names are given in the "name" attribute of the children elements of element no. 3 # and its siblings population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr", col.attr="name") # Exclude the "Value Footnote" field from the returned dataframe population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr", col.attr="name", exclude.fields=c("Value Footnote")) # Load example file with soccer world cup data (data from # https://www.fifa.com/fifa-tournaments/statistics-and-records/worldcup/index.html) # and create flat dataframe example2 <- system.file("soccer.xml", package="flatxml") xml.dataframe2 <- fxml_importXMLFlat(example2) # Extract the data out of the XML document. The data records are on the same hierarchical level # as element with ID 3 (xml.dataframe$elemid. == 3). #' # The field names are given as the name # of the children elements of element no. 3 and its siblings. worldcups.df <- fxml_toDataFrame(xml.dataframe2, siblings.of=3, elem.or.attr="elem")
# Load example file with population data from United Nations Statistics Division # and create flat dataframe example <- system.file("worldpopulation.xml", package="flatxml") xml.dataframe <- fxml_importXMLFlat(example) # Extract the data out of the XML document. The data records are on the same hierarchical level # as element with ID 3 (xml.dataframe$elemid. == 3). # The field names are given in the "name" attribute of the children elements of element no. 3 # and its siblings population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr", col.attr="name") # Exclude the "Value Footnote" field from the returned dataframe population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr", col.attr="name", exclude.fields=c("Value Footnote")) # Load example file with soccer world cup data (data from # https://www.fifa.com/fifa-tournaments/statistics-and-records/worldcup/index.html) # and create flat dataframe example2 <- system.file("soccer.xml", package="flatxml") xml.dataframe2 <- fxml_importXMLFlat(example2) # Extract the data out of the XML document. The data records are on the same hierarchical level # as element with ID 3 (xml.dataframe$elemid. == 3). #' # The field names are given as the name # of the children elements of element no. 3 and its siblings. worldcups.df <- fxml_toDataFrame(xml.dataframe2, siblings.of=3, elem.or.attr="elem")
Converts a dataframe to XML.
fxml_toXML( df, filename = NULL, element.tag = "record", indent = "\t", line.break = "\n", return.xml = FALSE )
fxml_toXML( df, filename = NULL, element.tag = "record", indent = "\t", line.break = "\n", return.xml = FALSE )
df |
The dataframe to be converted (also works with tibbles and the like) |
filename |
Name of the file to which the XML will be saved; default is |
element.tag |
The tag name of the XML element that will carry the data (see example) |
indent |
Character(s) used for indentation to make the XML prettier; tabulator ( |
line.break |
Character(s) that is written at the end of each line of the XML (line break |
return.xml |
If |
If return.xml == TRUE
the XML code is returned. If filename
is not NULL
then the XML is (additionally) written to the specified file.
Joachim Zuckarelli [email protected]
mydata<-data.frame(list(var1 = c("a", "b", "c"), var2 = c(1,2,3))) fxml_toXML(mydata, return.xml = TRUE)
mydata<-data.frame(list(var1 = c("a", "b", "c"), var2 = c(1,2,3))) fxml_toXML(mydata, return.xml = TRUE)