Title: | Converting XML Documents into Relational Data Models |
---|---|
Description: | Import an XML document with nested object structures and convert it into a relational data model. The result is a set of R dataframes with foreign key relationships. The data model and the data can be exported as SQL code of different SQL flavors. |
Authors: | Joachim Zuckarelli [aut, cre] |
Maintainer: | Joachim Zuckarelli <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-10-31 16:32:45 UTC |
Source: | https://github.com/jsugarelli/xml2relational |
Produces ready-to-run SQL INSERT
statements to import the
data transformed with toRelational()
into a SQL database.
getCreateSQL( ldf, sql.style = "MySQL", tables = NULL, prefix.primary = "ID_", prefix.foreign = "FKID_", line.break = "\n", datatype.func = NULL, one.statement = FALSE )
getCreateSQL( ldf, sql.style = "MySQL", tables = NULL, prefix.primary = "ID_", prefix.foreign = "FKID_", line.break = "\n", datatype.func = NULL, one.statement = FALSE )
ldf |
A list of dataframes created by
|
sql.style |
The SQL flavor that the produced |
tables |
A character vector with the names of the tables for whichs SQL
|
prefix.primary |
The prefix that is used in the relational data model of
|
prefix.foreign |
The prefix that is used in the relational data model of
|
line.break |
Line break character that is added to the end of each
|
datatype.func |
A function that is used to determine the data type of
the table fields. The function must take the field/column from the data
table (basically the result of |
one.statement |
Determines whether all |
If you want to produce SQL CREATE
statements that follow a
different SQL dialect than one of the built-in SQL flavors (i.e. MySQL,
TransactSQL and Oracle) you can provide the necessary information to
getCreateSQL()
via the sql.style
argument. In this case the
sql.style
argument needs to be a dataframe with the folling fields:
Column | Type | Description | Example |
Style |
character |
Name of the SQL flavor. |
"MySQL" |
NormalField |
character |
Template string for a normal, nullable field. | "%FIELDNAME% %DATATYPE%"
|
NormalFieldNotNull |
character |
Template string for non-nullable field. | "%FIELDNAME% %DATATYPE% NOT NULL" |
PrimaryKey |
character |
Template string for the definition of a primary key. | "PRIMARY KEY (%FIELDNAME%)" |
ForeignKey |
character |
Template string for the
definition of a foreign key. "FOREIGN KEY (%FIELDNAME%) REFERENCES
%REFTABLE%(%REFPRIMARYKEY%)" |
|
PrimaryKeyDefSeparate |
logical |
Indicates if primary key needs additional definition like a any other field. | TRUE |
ForeignKeyDefSeparate
|
logical |
Indicates if foreign key needs additional definition like a any other field. | TRUE |
Int |
character |
Name of integer data type. "INT" |
|
Int.MaxSize |
numeric |
Size limit of integer data type. | 4294967295 |
BigInt |
character |
Name of data type for integers larger than the size limit of the normal integer data type. | "BIGINT" |
Decimal |
character |
Name of data type for floating point numbers. |
"DECIMAL" |
VarChar |
character |
Name of data type for variable-size character fields. | "VARCHAR" |
VarChar.MaxSize |
numeric |
Size limit of variable-size character data type. | 65535 |
Text |
character
|
Name of data type for string data larger than the size limit of the variable-size character data type. | "TEXT" |
Date
|
character |
Name of data type date data. | "DATE"
|
Time |
character |
Name of data type time data |
"TIME" |
Date |
character |
Name of data type for combined date and time data. | "TIMESTAMP" |
In the template strings you can use the following placeholders, as you also see from the MySQL example in the table:
%FIELDNAME%
: Name of the field to be defined.
%DATATYPE%
: Datatype of the field to be defined.
%REFTABLE%
: Table referenced by a foreign key.
%REFPRIMARYKEY%
: Name of the primary key field of the table
referenced by a foreign key.
When you use your own defintion of an SQL
flavor, then sql.style
must be a one-row dataframe providing the
fields described in the table above.
You can use the datatype.func
argument to provide your own function
to determine how the data type of a field is derived from the values in
that field. In this case, the values of the columns Int
,
Int.MaxSize
, VarChar
, VarChar.MaxSize
, Decimal
and Text
in the sql.style
dataframe are ignored. They are
used by the built-in mechanism to determine data types. Providing your own
function allows you to determine data types in a more differentiated way,
if you like. The function that is provided needs to take a vectors of
values as its argument and needs to provide the SQL data type of these
values as a one-element character vector.
A character vector with exactly one element (if argument
one.statement = TRUE
) or with one element per CREATE
statement.
Other xml2relational:
getInsertSQL()
,
savetofiles()
,
toRelational()
# Find path to custmers.xml example file in package directory path <- system.file("", "customers.xml", package = "xml2relational") db <- toRelational(path) sql.code <- getCreateSQL(db, "TransactSQL", "address")
# Find path to custmers.xml example file in package directory path <- system.file("", "customers.xml", package = "xml2relational") db <- toRelational(path) sql.code <- getCreateSQL(db, "TransactSQL", "address")
Produces ready-to-run SQL INSERT
statements to import the
data transformed with toRelational()
into a SQL database.
getInsertSQL( ldf, table.name, line.break = "\n", one.statement = FALSE, tz = "UTC" )
getInsertSQL( ldf, table.name, line.break = "\n", one.statement = FALSE, tz = "UTC" )
ldf |
A list of dataframes created by
|
table.name |
Name of the table from the data table list |
line.break |
Line break character that is added to the end of each
|
one.statement |
Determines whether all |
tz |
The code of the timezone used for exporting timestamp data. Default it
|
A character vector with exactly one element (if argument
one.statement = TRUE
) or with one element per INSERT
statement.
Other xml2relational:
getCreateSQL()
,
savetofiles()
,
toRelational()
# Find path to custmers.xml example file in package directory path <- system.file("", "customers.xml", package = "xml2relational") db <- toRelational(path) sql.code <- getInsertSQL(db, "address")
# Find path to custmers.xml example file in package directory path <- system.file("", "customers.xml", package = "xml2relational") db <- toRelational(path) sql.code <- getInsertSQL(db, "address")
Saves a list of dataframes created from an XML source with
toRelational()
to CSV files, one file per dataframe (i.e.
table in the relational data model). File names are identical to the
dataframe/table names.
savetofiles(ldf, dir, sep = ",", dec = ".")
savetofiles(ldf, dir, sep = ",", dec = ".")
ldf |
A list of dataframes created by
|
dir |
The directory to save the CSV files in. Per default the working directory. |
sep |
Character symbol to separate fields in the CSV fil, comma by default. |
dec |
Decimal separator used for numeric fields in the CSV file, point by default. |
No return vaue.
Other xml2relational:
getCreateSQL()
,
getInsertSQL()
,
toRelational()
# Find path to custmers.xml example file in package directory path <- system.file("", "customers.xml", package = "xml2relational") db <- toRelational(path) savetofiles(db, dir = tempdir())
# Find path to custmers.xml example file in package directory path <- system.file("", "customers.xml", package = "xml2relational") db <- toRelational(path) savetofiles(db, dir = tempdir())
Imports an XML document and converts it into a set of dataframes each of which represents one table in the data model.
toRelational( file, prefix.primary = "ID_", prefix.foreign = "FKID_", keys.unique = TRUE, keys.dim = 6 )
toRelational( file, prefix.primary = "ID_", prefix.foreign = "FKID_", keys.unique = TRUE, keys.dim = 6 )
file |
The XML document to be processed. |
prefix.primary |
A prefix for the tables' primary keys (unique numeric
identifier for a data record/row in the table) . Default is |
prefix.foreign |
A prefix for the tables' foreign keys (). Default is
|
keys.unique |
Defines if the primary keys must be unique across all
tables of the data model or only within the table of which it is the
primary key. Default is |
keys.dim |
Size of the 'key space' reserved for primary keys. Argument
is a power of ten. Default is |
toRelational()
converts the hierarchical XML structure into a
flat tabular structure with one dataframe for each table in the data model.
toRelational()
determines automatically which XML elements need to
be stored in a separate table. The relationship between the nested objects
in the XML data is recreated in the dataframes with combinations of foreign
and primary keys. The foreign keys refer to the primary keys that
toRelational()
creates automatically when adding XML elements to a
table.
Column | Type | Description | Example |
Style |
character |
Name of the SQL flavor. |
"MySQL" |
NormalField |
character |
Template string for a normal, nullable field. | "%FIELDNAME% %DATATYPE%"
|
NormalFieldNotNull |
character |
Template string for non-nullable field. | "%FIELDNAME% %DATATYPE% NOT NULL" |
PrimaryKey |
character |
Template string for the definition of a primary key. | "PRIMARY KEY (%FIELDNAME%)" |
ForeignKey |
character |
Template string for the definition of a foreign key. | "FOREIGN KEY (%FIELDNAME%) REFERENCES
%REFTABLE%(%REFPRIMARYKEY%)" |
PrimaryKeyDefSeparate |
logical |
Indicates if primary key needs additional definition like a any other field. | TRUE |
ForeignKeyDefSeparate
|
logical |
Indicates if foreign key needs additional definition like a any other field. | TRUE |
Int |
character |
Name of integer data type. "INT" |
|
Int.MaxSize |
numeric |
Size limit of integer data type. | 4294967295 |
BigInt |
character |
Name of data type for integers larger than the size limit of the normal integer data type. | "BIGINT" |
Decimal |
character |
Name of data type for floating point numbers. |
"DECIMAL" |
VarChar |
character |
Name of data type for variable-size character fields. | "VARCHAR" |
VarChar.MaxSize |
numeric |
Size limit of variable-size character data type. | 65535 |
Text |
character
|
Name of data type for string data larger than the size limit of the variable-size character data type. | "TEXT" |
Date
|
character |
Name of data type date data. | "DATE"
|
Time |
character |
Name of data type time data |
"TIME" |
Date |
character |
Name of data type for combined date and time data. | "TIMESTAMP" |
In the template strings you can use the following placeholders, as you also see from the MySQL example in the table:
%FIELDNAME%
: Name of the field to be defined.
%DATATYPE%
: Datatype of the field to be defined.
%REFTABLE%
: Table referenced by a foreign key.
%REFPRIMARYKEY%
: Name of the primary key field of the table
referenced by a foreign key.
When you use your own defintion of an SQL
flavor, then sql.style
must be a one-row dataframe providing the
fields described in the table above.
You can use the datatype.func
argument to provide your own function
to determine how the data type of a field is derived from the values in
that field. In this case, the values of the columns Int
,
Int.MaxSize
, VarChar
, VarChar.MaxSize
, Decimal
and Text
in the sql.style
dataframe are ignored. They are
used by the built-in mechanism to determine data types. Providing your own
function allows you to determine data types in a more differentiated way,
if you like. The function that is provided needs to take a vectors of
values as its argument and needs to provide the SQL data type of these
values as a one-element character vector.
A list of standard R dataframes, one for each table of the data model. The tables are named for the elements in the XML document.
Other xml2relational:
getCreateSQL()
,
getInsertSQL()
,
savetofiles()
# Find path to custmers.xml example file in package directory path <- system.file("", "customers.xml", package = "xml2relational") db <- toRelational(path)
# Find path to custmers.xml example file in package directory path <- system.file("", "customers.xml", package = "xml2relational") db <- toRelational(path)
Transforming a hierarchical XML document into a relational data model.
xml2relational
The xml2relational
package is
designed to 'flatten' XML documents with nested objects into relational
dataframes. xml2relational
takes an XML file as input and converts
it into a set of dataframes (tables). The tables are linked among each
other with foreign keys and can be exported as CSV or ready-to-use SQL code
(CREATE TABLE
for the data model, INSERT INTO
for the data).
xml2relational
First, use
toRelational()
to read in an XML file and to convert into a
relational data model.
This will give you a list of dataframes, one
for each table in the relational data model. Tables are linked by foreign
keys. You can specify the naming convention for the tables' primary and
foreign keys as arguments in toRelational()
.
You can
now export the data structures of the tables (or a selection of tables)
using getCreateSQL()
. It support multiple SQL dialects, and
you also provide syntax and data type information for additional SQL
dialects.
You can also export the data as SQL INSERT
statements with the getInsertSQL()
. If you only want to
export the data as CSV use savetofiles()
to save the
dataframes produced by toRelational()
as comma-separated
files.