Package 'refreshr'

Title: Work with Refreshable Datasets that Update their Data Automatically
Description: Connects dataframes/tables with a remote data source. Raw data downloaded from the data source can be further processed and transformed using data preparation code that is also baked into the dataframe/table. Refreshable dataframes can be shared easily (e.g. as R data files). Their users do not need to care about the inner workings of the data update mechanisms.
Authors: Joachim Zuckarelli [aut, cre]
Maintainer: Joachim Zuckarelli <[email protected]>
License: GPL-3
Version: 0.1.0
Built: 2025-02-08 04:37:36 UTC
Source: https://github.com/jsugarelli/refreshr

Help Index


Analysing refreshr objects

Description

Checks if a dataframe/table is refreshable.

Usage

is.refreshr(df)

Arguments

df

Dataframe/table to be checked.

Value

TRUE if the dataframe/table is of class refreshr (i.e. is of class "refreshr"), FALSE otherweise.

Examples

## Not run: 

library(data.table)
library(dplyr)

# Load US unemployment rate from Bureau of Labor Statistics
data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t")

# Make refreshable and specify code for data preparation (filter raw data for
# the overall US employment rate) with # being a placeholder for the downloaded
# raw data
data_refresh <- make_refreshable(data,
                     load_code = "data.table::fread(
                        \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\",
                        sep=\"\t\")",
                     prep_code = "filter(#, series_id==\"LNS14000000\")")

# Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public)
save(data_refresh, file = "refresh.RData")

# Remove dataframe and reload it from file
rm(data_refresh)
load(file = "refresh.RData")

# Refresh the dataframe
data_refresh <- refresh(data_refresh)

# Show properties of refreshable dataframe
properties(data_refresh)

# Check if refreshable dataframe is up-to-date with the remote data source
uptodate(data_refresh)

## End(Not run)

Making dataframes/tables refreshable

Description

Makes a dataframe/table refreshable, i.e. connects it with a data source and specifies code that is applied to the raw data after the data has been loaded (optional).

Usage

make_refreshable(df, load_code, prep_code = NULL)

Arguments

df

The dataframe/table that is to be made refreshable

load_code

The code used to load the data from the data source. Please not that quotes need to be escaped (code\").

prep_code

The code used to transform the raw data downloaded from the data source. The placeholder # can be used in this code to refer to the data downloaded from the data source.

Value

A dataframe/table of class refreshr that can be refreshed by calling refresh().

Examples

## Not run: 

library(data.table)
library(dplyr)

# Load US unemployment rate from Bureau of Labor Statistics
data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t")

# Make refreshable and specify code for data preparation (filter raw data for
# the overall US employment rate) with # being a placeholder for the downloaded
# raw data
data_refresh <- make_refreshable(data,
                     load_code = "data.table::fread(
                        \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\",
                        sep=\"\t\")",
                     prep_code = "filter(#, series_id==\"LNS14000000\")")

# Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public)
save(data_refresh, file = "refresh.RData")

# Remove dataframe and reload it from file
rm(data_refresh)
load(file = "refresh.RData")

# Refresh the dataframe
data_refresh <- refresh(data_refresh)

# Show properties of refreshable dataframe
properties(data_refresh)

# Check if refreshable dataframe is up-to-date with the remote data source
uptodate(data_refresh)

## End(Not run)

Analysing refreshr objects

Description

Checks if a dataframe/table is refreshable.

Usage

properties(df, property = NULL, silent = FALSE)

Arguments

df

Dataframe/table to be checked.

property

One-element Character vector describing the property thatto be queried. Either "load" for the load code (the code that refreshes data from the data source), "prep" for the data preparation code (of any), "source" for the data source (which properties() tries to identify from the load code), "lastrefresh" (the date/timestamp of the last refresh of the dataframe/table). If no property is selected (property == NULL, the default) then all properties are included in the output to the screen.

silent

If silent the function will return (invisibly) the property defined by property without making any outputs on the screen. Default is FALSE.

Value

if property == NULL, i.e. all properties are queried, then NULL is returned. Otherwise properties() returns the value of the selected property.

Examples

## Not run: 

library(data.table)
library(dplyr)

# Load US unemployment rate from Bureau of Labor Statistics
data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t")

# Make refreshable and specify code for data preparation (filter raw data for
# the overall US employment rate) with # being a placeholder for the downloaded
# raw data
data_refresh <- make_refreshable(data,
                     load_code = "data.table::fread(
                        \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\",
                        sep=\"\t\")",
                     prep_code = "filter(#, series_id==\"LNS14000000\")")
                     #'
# Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public)
save(data_refresh, file = "refresh.RData")

# Remove dataframe and reload it from file
rm(data_refresh)
load(file = "refresh.RData")

# Refresh the dataframe
data_refresh <- refresh(data_refresh)

# Show properties of refreshable dataframe
properties(data_refresh)

# Check if refreshable dataframe is up-to-date with the remote data source
uptodate(data_refresh)

## End(Not run)

Working with refreshable dataframes/tables

Description

Refreshes a refreshable dataframes/table by downloading the data from the source and executing the data preparation code (if such code has been specified).

Usage

refresh(df, silent = FALSE)

Arguments

df

The refreshed dataframe/table that is to be updated.

silent

If TRUE then refresh() will not show any outputs on the screen.

Value

The refreshed dataframe/table with up-to-date data.

Examples

## Not run: 

library(data.table)
library(dplyr)

# Load US unemployment rate from Bureau of Labor Statistics
data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t")

# Make refreshable and specify code for data preparation (filter raw data for
# the overall US employment rate) with # being a placeholder for the downloaded
# raw data
data_refresh <- make_refreshable(data,
                     load_code = "data.table::fread(
                        \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\",
                        sep=\"\t\")",
                     prep_code = "filter(#, series_id==\"LNS14000000\")")

# Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public)
save(data_refresh, file = "refresh.RData")

# Remove dataframe and reload it from file
rm(data_refresh)
load(file = "refresh.RData")

# Refresh the dataframe
data_refresh <- refresh(data_refresh)

# Show properties of refreshable dataframe
properties(data_refresh)

# Check if refreshable dataframe is up-to-date with the remote data source
uptodate(data_refresh)

## End(Not run)

Package 'refreshr'

Description

Create refreshable dataframes/tables that automatically pull in data from an (internet) data source and transform the data (if neccessary) so that the user of your dataset does not have to worry about where to get the data from and how to update it.

Functions available:

  • make_refreshable(): Makes a dataframe/table refreshable.

  • refresh(): Refreshes a dataframe/table.

  • is.refreshr(): Checks if a dataframe/table is set up as refreshable.

  • uptodate(): Checks if a refreshable dataframe/table is up to date compared to the remote data source.

  • properties(): Prints or returns the main properties of a refreshable dataframe/table.


Updating dataframes/tables

Description

Checks if a refreshable dataframe/table is up-to-date with its data source.

Usage

uptodate(df)

Arguments

df

Dataframe/table to be checked.

Details

Please note then updtodate() needs to dowload the data from the data source and process it according to the data preparation steps defined in the prep property of the refreshable dataframe/table in order to compare it to the current data of the refreshable dataframe/table. Depending on the amount of data and the complexity of the preparation steps this may take some time.

Value

TRUE if if the dataframe/table properly reflects the state of its data source, FALSE otherweise.

Examples

## Not run: 

library(data.table)
library(dplyr)

# Load US unemployment rate from Bureau of Labor Statistics
data <- fread("https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData", sep="\t")

# Make refreshable and specify code for data preparation (filter raw data for
# the overall US employment rate) with # being a placeholder for the downloaded
# raw data
data_refresh <- make_refreshable(data,
                     load_code = "data.table::fread(
                        \"https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData\",
                        sep=\"\t\")",
                     prep_code = "filter(#, series_id==\"LNS14000000\")")

# Save refreshable dataframe as RData file (e.g. to share dataset with coworkers or public)
save(data_refresh, file = "refresh.RData")

# Remove dataframe and reload it from file
rm(data_refresh)
load(file = "refresh.RData")

# Refresh the dataframe
data_refresh <- refresh(data_refresh)

# Show properties of refreshable dataframe
properties(data_refresh)

# Check if refreshable dataframe is up-to-date with the remote data source
uptodate(data_refresh)

## End(Not run)