Title: | Work with Refreshable Datasets that Update their Data Automatically |
---|---|
Description: | Connects dataframes/tables with a remote data source. Raw data downloaded from the data source can be further processed and transformed using data preparation code that is also baked into the dataframe/table. Refreshable dataframes can be shared easily (e.g. as R data files). Their users do not need to care about the inner workings of the data update mechanisms. |
Authors: | Joachim Zuckarelli [aut, cre] |
Maintainer: | Joachim Zuckarelli <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-02-08 04:37:36 UTC |
Source: | https://github.com/jsugarelli/refreshr |
Checks if a dataframe/table is refreshable.
is.refreshr(df)
is.refreshr(df)
df |
Dataframe/table to be checked. |
TRUE
if the dataframe/table is of class refreshr
(i.e.
is of class "refreshr"), FALSE
otherweise.
## Not run: library(data.table) library(dplyr) # Load US unemployment rate from Bureau of Labor Statistics data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t") # Make refreshable and specify code for data preparation (filter raw data for # the overall US employment rate) with # being a placeholder for the downloaded # raw data data_refresh <- make_refreshable(data, load_code = "data.table::fread( \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\", sep=\"\t\")", prep_code = "filter(#, series_id==\"LNS14000000\")") # Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public) save(data_refresh, file = "refresh.RData") # Remove dataframe and reload it from file rm(data_refresh) load(file = "refresh.RData") # Refresh the dataframe data_refresh <- refresh(data_refresh) # Show properties of refreshable dataframe properties(data_refresh) # Check if refreshable dataframe is up-to-date with the remote data source uptodate(data_refresh) ## End(Not run)
## Not run: library(data.table) library(dplyr) # Load US unemployment rate from Bureau of Labor Statistics data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t") # Make refreshable and specify code for data preparation (filter raw data for # the overall US employment rate) with # being a placeholder for the downloaded # raw data data_refresh <- make_refreshable(data, load_code = "data.table::fread( \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\", sep=\"\t\")", prep_code = "filter(#, series_id==\"LNS14000000\")") # Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public) save(data_refresh, file = "refresh.RData") # Remove dataframe and reload it from file rm(data_refresh) load(file = "refresh.RData") # Refresh the dataframe data_refresh <- refresh(data_refresh) # Show properties of refreshable dataframe properties(data_refresh) # Check if refreshable dataframe is up-to-date with the remote data source uptodate(data_refresh) ## End(Not run)
Makes a dataframe/table refreshable, i.e. connects it with a data source and specifies code that is applied to the raw data after the data has been loaded (optional).
make_refreshable(df, load_code, prep_code = NULL)
make_refreshable(df, load_code, prep_code = NULL)
df |
The dataframe/table that is to be made refreshable |
load_code |
The code used to load the data from the data source. Please not that quotes need to be escaped (code\"). |
prep_code |
The code used to transform the raw data downloaded from the
data source. The placeholder |
A dataframe/table of class refreshr
that can be refreshed by
calling refresh()
.
## Not run: library(data.table) library(dplyr) # Load US unemployment rate from Bureau of Labor Statistics data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t") # Make refreshable and specify code for data preparation (filter raw data for # the overall US employment rate) with # being a placeholder for the downloaded # raw data data_refresh <- make_refreshable(data, load_code = "data.table::fread( \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\", sep=\"\t\")", prep_code = "filter(#, series_id==\"LNS14000000\")") # Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public) save(data_refresh, file = "refresh.RData") # Remove dataframe and reload it from file rm(data_refresh) load(file = "refresh.RData") # Refresh the dataframe data_refresh <- refresh(data_refresh) # Show properties of refreshable dataframe properties(data_refresh) # Check if refreshable dataframe is up-to-date with the remote data source uptodate(data_refresh) ## End(Not run)
## Not run: library(data.table) library(dplyr) # Load US unemployment rate from Bureau of Labor Statistics data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t") # Make refreshable and specify code for data preparation (filter raw data for # the overall US employment rate) with # being a placeholder for the downloaded # raw data data_refresh <- make_refreshable(data, load_code = "data.table::fread( \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\", sep=\"\t\")", prep_code = "filter(#, series_id==\"LNS14000000\")") # Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public) save(data_refresh, file = "refresh.RData") # Remove dataframe and reload it from file rm(data_refresh) load(file = "refresh.RData") # Refresh the dataframe data_refresh <- refresh(data_refresh) # Show properties of refreshable dataframe properties(data_refresh) # Check if refreshable dataframe is up-to-date with the remote data source uptodate(data_refresh) ## End(Not run)
Checks if a dataframe/table is refreshable.
properties(df, property = NULL, silent = FALSE)
properties(df, property = NULL, silent = FALSE)
df |
Dataframe/table to be checked. |
property |
One-element Character vector describing the property thatto
be queried. Either |
silent |
If silent the function will return (invisibly) the property
defined by |
if property == NULL
, i.e. all properties are queried, then NULL
is returned. Otherwise properties()
returns the value of the selected property.
## Not run: library(data.table) library(dplyr) # Load US unemployment rate from Bureau of Labor Statistics data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t") # Make refreshable and specify code for data preparation (filter raw data for # the overall US employment rate) with # being a placeholder for the downloaded # raw data data_refresh <- make_refreshable(data, load_code = "data.table::fread( \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\", sep=\"\t\")", prep_code = "filter(#, series_id==\"LNS14000000\")") #' # Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public) save(data_refresh, file = "refresh.RData") # Remove dataframe and reload it from file rm(data_refresh) load(file = "refresh.RData") # Refresh the dataframe data_refresh <- refresh(data_refresh) # Show properties of refreshable dataframe properties(data_refresh) # Check if refreshable dataframe is up-to-date with the remote data source uptodate(data_refresh) ## End(Not run)
## Not run: library(data.table) library(dplyr) # Load US unemployment rate from Bureau of Labor Statistics data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t") # Make refreshable and specify code for data preparation (filter raw data for # the overall US employment rate) with # being a placeholder for the downloaded # raw data data_refresh <- make_refreshable(data, load_code = "data.table::fread( \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\", sep=\"\t\")", prep_code = "filter(#, series_id==\"LNS14000000\")") #' # Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public) save(data_refresh, file = "refresh.RData") # Remove dataframe and reload it from file rm(data_refresh) load(file = "refresh.RData") # Refresh the dataframe data_refresh <- refresh(data_refresh) # Show properties of refreshable dataframe properties(data_refresh) # Check if refreshable dataframe is up-to-date with the remote data source uptodate(data_refresh) ## End(Not run)
Refreshes a refreshable dataframes/table by downloading the data from the source and executing the data preparation code (if such code has been specified).
refresh(df, silent = FALSE)
refresh(df, silent = FALSE)
df |
The refreshed dataframe/table that is to be updated. |
silent |
If |
The refreshed dataframe/table with up-to-date data.
## Not run: library(data.table) library(dplyr) # Load US unemployment rate from Bureau of Labor Statistics data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t") # Make refreshable and specify code for data preparation (filter raw data for # the overall US employment rate) with # being a placeholder for the downloaded # raw data data_refresh <- make_refreshable(data, load_code = "data.table::fread( \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\", sep=\"\t\")", prep_code = "filter(#, series_id==\"LNS14000000\")") # Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public) save(data_refresh, file = "refresh.RData") # Remove dataframe and reload it from file rm(data_refresh) load(file = "refresh.RData") # Refresh the dataframe data_refresh <- refresh(data_refresh) # Show properties of refreshable dataframe properties(data_refresh) # Check if refreshable dataframe is up-to-date with the remote data source uptodate(data_refresh) ## End(Not run)
## Not run: library(data.table) library(dplyr) # Load US unemployment rate from Bureau of Labor Statistics data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t") # Make refreshable and specify code for data preparation (filter raw data for # the overall US employment rate) with # being a placeholder for the downloaded # raw data data_refresh <- make_refreshable(data, load_code = "data.table::fread( \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\", sep=\"\t\")", prep_code = "filter(#, series_id==\"LNS14000000\")") # Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public) save(data_refresh, file = "refresh.RData") # Remove dataframe and reload it from file rm(data_refresh) load(file = "refresh.RData") # Refresh the dataframe data_refresh <- refresh(data_refresh) # Show properties of refreshable dataframe properties(data_refresh) # Check if refreshable dataframe is up-to-date with the remote data source uptodate(data_refresh) ## End(Not run)
Create refreshable dataframes/tables that automatically pull in data from an (internet) data source and transform the data (if neccessary) so that the user of your dataset does not have to worry about where to get the data from and how to update it.
Functions available:
make_refreshable()
: Makes
a dataframe/table refreshable.
refresh()
: Refreshes a
dataframe/table.
is.refreshr()
: Checks if a
dataframe/table is set up as refreshable.
uptodate()
:
Checks if a refreshable dataframe/table is up to date compared to the remote
data source.
properties()
: Prints or returns the main
properties of a refreshable dataframe/table.
Checks if a refreshable dataframe/table is up-to-date with its data source.
uptodate(df)
uptodate(df)
df |
Dataframe/table to be checked. |
Please note then updtodate()
needs to dowload the data from
the data source and process it according to the data preparation steps
defined in the prep
property of the refreshable dataframe/table in
order to compare it to the current data of the refreshable dataframe/table.
Depending on the amount of data and the complexity of the preparation steps
this may take some time.
TRUE
if if the dataframe/table properly reflects the state of
its data source, FALSE
otherweise.
## Not run: library(data.table) library(dplyr) # Load US unemployment rate from Bureau of Labor Statistics data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t") # Make refreshable and specify code for data preparation (filter raw data for # the overall US employment rate) with # being a placeholder for the downloaded # raw data data_refresh <- make_refreshable(data, load_code = "data.table::fread( \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\", sep=\"\t\")", prep_code = "filter(#, series_id==\"LNS14000000\")") # Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public) save(data_refresh, file = "refresh.RData") # Remove dataframe and reload it from file rm(data_refresh) load(file = "refresh.RData") # Refresh the dataframe data_refresh <- refresh(data_refresh) # Show properties of refreshable dataframe properties(data_refresh) # Check if refreshable dataframe is up-to-date with the remote data source uptodate(data_refresh) ## End(Not run)
## Not run: library(data.table) library(dplyr) # Load US unemployment rate from Bureau of Labor Statistics data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t") # Make refreshable and specify code for data preparation (filter raw data for # the overall US employment rate) with # being a placeholder for the downloaded # raw data data_refresh <- make_refreshable(data, load_code = "data.table::fread( \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\", sep=\"\t\")", prep_code = "filter(#, series_id==\"LNS14000000\")") # Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public) save(data_refresh, file = "refresh.RData") # Remove dataframe and reload it from file rm(data_refresh) load(file = "refresh.RData") # Refresh the dataframe data_refresh <- refresh(data_refresh) # Show properties of refreshable dataframe properties(data_refresh) # Check if refreshable dataframe is up-to-date with the remote data source uptodate(data_refresh) ## End(Not run)