| Title: | 'Apache' 'Arrow' Dataset-Backed Simulation Outputs for 'mrgsolve' |
|---|---|
| Description: | Provides tools for creating and managing file streams in support of large simulation or other outputs. |
| Authors: | Kyle T Baron [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-7252-5656>) |
| Maintainer: | Kyle T Baron <[email protected]> |
| License: | GPL (>=2) |
| Version: | 0.0.1.9000 |
| Built: | 2026-05-08 20:58:55 UTC |
| Source: | https://github.com/p-emex/mrgsim.ds |
Extracts the underlying arrow::Dataset from an mrgsimsds object, allowing
you to work directly with the Arrow API or pass the dataset to other
Arrow-aware tools. For a list, only mrgsimsds elements are retained and
a single dataset spanning all their files is returned.
as_arrow_ds(x, ...) ## S3 method for class 'mrgsimsds' as_arrow_ds(x, ...)as_arrow_ds(x, ...) ## S3 method for class 'mrgsimsds' as_arrow_ds(x, ...)
x |
an mrgsimsds object or a list of mrgsimsds objects. |
... |
not used. |
An 'Apache' 'Arrow' arrow::Dataset object.
mod <- house_ds(end = 5) out <- mrgsim_ds(mod, events = ev(amt = 100)) as_arrow_ds(out)mod <- house_ds(end = 5) out <- mrgsim_ds(mod, events = ev(amt = 100)) as_arrow_ds(out)
Coerce an mrgsimsds object to an arrow table
## S3 method for class 'mrgsimsds' as_arrow_table(x, ..., schema = NULL)## S3 method for class 'mrgsimsds' as_arrow_table(x, ..., schema = NULL)
x |
an mrgsimsds object. |
... |
passed to |
schema |
passed to |
An 'Apache' 'Arrow' arrow::Table of simulated data.
mod <- house_ds(end = 5) out <- mrgsim_ds(mod, events = ev(amt = 100)) arrow::as_arrow_table(out)mod <- house_ds(end = 5) out <- mrgsim_ds(mod, events = ev(amt = 100)) arrow::as_arrow_table(out)
The conversion is handled by as_arrow_ds().
as_duckdb_ds(x, ...)as_duckdb_ds(x, ...)
x |
an mrgsimsds object or a list of mrgsimsds objects. |
... |
passed to |
A tbl of the simulated data in DuckDB; see arrow::to_duckdb().
mod <- house_ds(end = 5) out <- mrgsim_ds(mod, events = ev(amt = 100)) if(requireNamespace("duckdb")) { as_duckdb_ds(out) }mod <- house_ds(end = 5) out <- mrgsim_ds(mod, events = ev(amt = 100)) if(requireNamespace("duckdb")) { as_duckdb_ds(out) }
Converts the output of mrgsolve::mrgsim() to an mrgsimsds object by
writing the simulation data to a parquet file in tempdir(). Files in
tempdir() are auto-deleted on garbage collection by default. Use
move_ds() or save_ds() to relocate files outside tempdir(), which
automatically disables gc, or call gc_ds() to control gc directly.
as_mrgsim_ds(x, verbose = FALSE, gc = TRUE)as_mrgsim_ds(x, verbose = FALSE, gc = TRUE)
x |
an mrgsims object. |
verbose |
if |
gc |
initial gc setting; if |
An object with class mrgsimsds.
mod <- house_ds() data <- ev_expand(amt = 100, ID = 1:10) out <- mrgsolve::mrgsim(mod, data) obj <- as_mrgsim_ds(out) objmod <- house_ds() data <- ev_expand(amt = 100, ID = 1:10) out <- mrgsolve::mrgsim(mod, data) obj <- as_mrgsim_ds(out) obj
Coerce an mrgsimsds object to a tbl
## S3 method for class 'mrgsimsds' as_tibble(x, ...) ## S3 method for class 'mrgsimsds' collect(x, ...) ## S3 method for class 'mrgsimsds' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'mrgsimsds' as_tibble(x, ...) ## S3 method for class 'mrgsimsds' collect(x, ...) ## S3 method for class 'mrgsimsds' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
an mrgsimsds object. |
... |
passed to |
row.names |
passed to |
optional |
passed to |
A tbl containing simulated data.
mod <- house_ds(end = 5) out <- mrgsim_ds(mod, events = ev(amt = 100)) as.data.frame(out) tibble::as_tibble(out) dplyr::collect(out)mod <- house_ds(end = 5) out <- mrgsim_ds(mod, events = ev(amt = 100)) as.data.frame(out) tibble::as_tibble(out) dplyr::collect(out)
Creates a new mrgsimsds object pointing to the same parquet files as x.
By default the new object takes ownership of those files, which means the
original object loses ownership and its files will not be deleted when it
is garbage collected.
copy_ds(x, own = TRUE)copy_ds(x, own = TRUE)
x |
an mrgsimsds object to copy. |
own |
logical; if |
A new mrgsimsds object with the same files and fields as x, a fresh
memory address, and pid set to the current process.
mod <- house_ds() out <- mrgsim_ds(mod) out2 <- copy_ds(out) check_ownership(out) check_ownership(out2)mod <- house_ds() out <- mrgsim_ds(mod) out2 <- copy_ds(out) check_ownership(out) check_ownership(out2)
Get the current location of mrgsimsds object files
current_location(x)current_location(x)
x |
an mrgsimsds object. |
Get names of backing files
files_ds(x)files_ds(x)
x |
an mrgsimsds object. |
A character vector of absolute paths to the parquet files backing x.
Controls whether the underlying parquet files are automatically deleted
when the object is garbage collected (value) and whether a message is
issued when that deletion occurs (notify). Set value = FALSE to protect
files from cleanup; set back to TRUE to re-enable automatic deletion.
The notify flag is intended for debugging only; the mrgsim.ds.show.gc
option provides the same behavior package-wide.
Calling gc_ds() with value locks the gc setting: once a value is
explicitly set, the package will never automatically change it when files are
moved or written. A warning is issued if gc is locked to TRUE but files
are moved outside of tempdir(), since those files would then be
auto-deleted on garbage collection.
gc_ds(x, value = NULL, notify = NULL, ...) ## S3 method for class 'mrgsimsds' gc_ds(x, value = NULL, notify = NULL, ...) ## S3 method for class 'list' gc_ds(x, value = NULL, notify = NULL, ...)gc_ds(x, value = NULL, notify = NULL, ...) ## S3 method for class 'mrgsimsds' gc_ds(x, value = NULL, notify = NULL, ...) ## S3 method for class 'list' gc_ds(x, value = NULL, notify = NULL, ...)
x |
an mrgsimsds object or a list of objects. |
value |
logical; if |
notify |
logical; if |
... |
not used. |
When x is an mrgsimsds object, it is returned invisibly with gc and/or
gc_notify updated.
When x is a list, it is returned invisibly with gc_ds() applied to
every mrgsimsds element; non-mrgsimsds elements are left unchanged.
mod <- modlib_ds("popex", outvars = "IPRED") data <- ev_expand(amt = 100, ID = 1:5) out <- mrgsim_ds(mod, data) out <- gc_ds(out, value = FALSE) out <- gc_ds(out, value = TRUE) out <- lapply(1:3, function(rep) { out <- mrgsim_ds(mod, data) out }) out <- gc_ds(out, value = FALSE)mod <- modlib_ds("popex", outvars = "IPRED") data <- ev_expand(amt = 100, ID = 1:5) out <- mrgsim_ds(mod, data) out <- gc_ds(out, value = FALSE) out <- gc_ds(out, value = TRUE) out <- lapply(1:3, function(rep) { out <- mrgsim_ds(mod, data) out }) out <- gc_ds(out, value = FALSE)
Check if object inherits mrgsimsds
is_mrgsimsds(x)is_mrgsimsds(x)
x |
object to check. |
TRUE if x inherits from mrgsimsds; FALSE otherwise.
mod <- house_ds() out <- mrgsim_ds(mod, events = ev(amt = 100)) is_mrgsimsds(out) is_mrgsimsds(list())mod <- house_ds() out <- mrgsim_ds(mod, events = ev(amt = 100)) is_mrgsimsds(out) is_mrgsimsds(list())
Functions for inspecting and cleaning up package-managed parquet files in
tempdir(). list_temp() shows what is present; purge_temp()
resets the simulation file system.
purge_temp() deletes all package-managed files unconditionally and clears
the ownership maps, resetting the system to a clean state. It is intended
for use in testing teardown or session cleanup, not routine usage.
list_temp(quietly = FALSE) purge_temp(quietly = FALSE)list_temp(quietly = FALSE) purge_temp(quietly = FALSE)
quietly |
if |
list_temp() returns a character vector of file paths invisibly, and prints
a summary to the console unless quietly = TRUE.
purge_temp() returns NULL invisibly.
mod <- house_ds() out <- lapply(1:10, \(x) mrgsim_ds(mod)) list_temp() purge_temp() list_temp()mod <- house_ds() out <- lapply(1:10, \(x) mrgsim_ds(mod)) list_temp() purge_temp() list_temp()
Use move_ds() to change the enclosing directory. rename_ds()
keeps the files in place, but changes the file names. combine_ds()
brings simulated data from multiple backing file into a single file.
Only move_ds() automatically updates the gc flag based on where the files
end up: files that remain under tempdir() keep gc = TRUE; files moved
outside tempdir() get gc = FALSE, protecting them from automatic
deletion. Neither rename_ds() nor combine_ds() changes the gc flag
because neither changes the file location.
This automatic adjustment is skipped if the gc setting has been locked by a
prior call to gc_ds(). A warning is issued if gc is locked to TRUE but
files land outside tempdir().
The object (x) is required to own the underlying files in order to move,
rename, or combine them.
All three functions modify x in place and file ownership stays with x.
move_ds(x, path, quietly = FALSE) rename_ds(x, id) combine_ds(x)move_ds(x, path, quietly = FALSE) rename_ds(x, id) combine_ds(x)
x |
an mrgsimsds object. |
path |
the new directory location for backing files. |
quietly |
if |
id |
a short name used to create data set files for the simulated output. |
All three functions return x invisibly. The updated file list is
accessible via x$files.
save_ds(), files_ds(), gc_ds()
mod <- house_ds() out <- lapply(1:3, \(x) { mrgsim_ds(mod, events = ev(amt = 100)) }) out <- reduce_ds(out) out <- rename_ds(out, "new-name") out$files out <- combine_ds(out) out$filesmod <- house_ds() out <- lapply(1:3, \(x) { mrgsim_ds(mod, events = ev(amt = 100)) }) out <- reduce_ds(out) out <- rename_ds(out, "new-name") out$files out <- combine_ds(out) out$files
Thin wrappers around mrgsolve model-loading functions (mread(),
mcode(), modlib(), house(), mread_cache()) that additionally call
save_process_info() to stamp the model with the current process ID. This
stamp is required by mrgsim_ds() to correctly associate simulation outputs
with the process that created them.
mread_ds(...) mcode_ds(...) modlib_ds(...) house_ds(...) mread_cache_ds(...)mread_ds(...) mcode_ds(...) modlib_ds(...) house_ds(...) mread_cache_ds(...)
... |
passed to the corresponding mrgsolve function. |
A model object with process information saved, suitable for use with
mrgsim_ds().
mod <- house_ds() modmod <- house_ds() mod
Runs mrgsolve::mrgsim() and writes simulation output to a parquet file in
tempdir(), returning an mrgsimsds object. Files in tempdir() are
auto-deleted on garbage collection by default. Use move_ds() or
save_ds() to relocate files outside tempdir(), which automatically
disables gc, or call gc_ds() to control gc directly. Note that full
argument names must be used for all arguments.
mrgsim_ds(x, ..., tags = list(), verbose = FALSE, gc = TRUE)mrgsim_ds(x, ..., tags = list(), verbose = FALSE, gc = TRUE)
x |
a model object loaded through |
... |
passed to |
tags |
a named list of atomic data to tag (or mutate) the simulated output. |
verbose |
if |
gc |
initial gc setting; if |
An object with class mrgsimsds.
as_mrgsim_ds(), mrgsimsds-methods.
mod <- house_ds() data <- ev_expand(amt = 100, ID = 1:10) out <- mrgsim_ds(mod, data, end = 72, delta = 0.1) out <- mrgsim_ds(mod, data, tags = list(rep = 1)) head(out)mod <- house_ds() data <- ev_expand(amt = 100, ID = 1:10) out <- mrgsim_ds(mod, data, end = 72, delta = 0.1) out <- mrgsim_ds(mod, data, tags = list(rep = 1)) head(out)
mrgsim.ds provides an Apache Arrow-backed
simulation output object for mrgsolve, greatly reducing
the memory footprint of large simulations and providing a high-performance
pipeline for summarizing huge simulation outputs. The arrow-based simulation
output objects in R claim ownership of their files on disk.
Those files are automatically removed when the owning object goes out of scope
and becomes subject to the R garbage collector. While "anonymous",
parquet-formatted files hold the data in tempdir() as you are working in
R, functions are provided to move this data to more permanent locations for
later use.
mrgsim.ds.show.gc: print messages to the console when object files are
removed prior to object cleanup.
Load models
Generate Apache Arrow dataset-backed outputs
S3 Methods
Move, rename, or combine files
Save and restore
Ownership
Work with lists of outputs
Manage tempdir
Enter dplyr / arrow pipelines with
Coerce to R objects
Maintainer: Kyle T Baron [email protected] (ORCID) [copyright holder]
Useful links:
Report bugs at https://github.com/p-emex/mrgsim.ds/issues
mod <- house_ds(end = 32) data <- evd_expand(amt = seq(100, 300, 10)) out <- mrgsim_ds(mod, data) out head(out) tail(out) plot(out, nid = 10) list_temp() ownership() ## Not run: rename_ds(out, "reg-100-300") list_temp() move_ds(out, "data/sim/regimens") ## End(Not run)mod <- house_ds(end = 32) data <- evd_expand(amt = seq(100, 300, 10)) out <- mrgsim_ds(mod, data) out head(out) tail(out) plot(out, nid = 10) list_temp() ownership() ## Not run: rename_ds(out, "reg-100-300") list_temp() move_ds(out, "data/sim/regimens") ## End(Not run)
Basic S3 methods for inspecting and plotting mrgsimsds objects: dim(),
head(), tail(), names(), and plot().
## S3 method for class 'mrgsimsds' dim(x) ## S3 method for class 'mrgsimsds' head(x, n = 6L, ...) ## S3 method for class 'mrgsimsds' tail(x, n = 6L, ...) ## S3 method for class 'mrgsimsds' names(x) ## S3 method for class 'mrgsimsds' plot( x, y = NULL, ..., nid = 16, batch_size = 20000, logy = FALSE, .dots = list() )## S3 method for class 'mrgsimsds' dim(x) ## S3 method for class 'mrgsimsds' head(x, n = 6L, ...) ## S3 method for class 'mrgsimsds' tail(x, n = 6L, ...) ## S3 method for class 'mrgsimsds' names(x) ## S3 method for class 'mrgsimsds' plot( x, y = NULL, ..., nid = 16, batch_size = 20000, logy = FALSE, .dots = list() )
x |
an mrgsimsds object, output from
|
n |
number of rows to return. |
... |
arguments to be passed to or from other methods. |
y |
a formula for plotting simulated data; if not provided, all columns will be plotted. |
nid |
number of subjects to plot. |
batch_size |
size of batch when reading data for plot method. |
logy |
if |
.dots |
a list of items to pass to |
head() and tail() only look at the first and last file in the data
set, respectively, when simulations are stored across multiple files. It is
possible this won't correspond to the first and last chunks rows of data
you will see when collecting the data via dplyr::collect().
dim(): integer vector of length 2 (rows, cols).
head(), tail(): a tibble of the first or last n rows.
names(): character vector of column names.
plot(): a plot object, returned invisibly.
mod <- house_ds(end = 24) mod <- omat(mod, diag(0.04, 4)) data <- ev_expand(amt = c(100, 300), ID = 1:20) set.seed(10203) out <- mrgsim_ds(mod, data = data) dim(out) head(out) tail(out) nrow(out) ncol(out) plot(out, ~ CP + RESP, nid = 10)mod <- house_ds(end = 24) mod <- omat(mod, diag(0.04, 4)) data <- ev_expand(amt = c(100, 300), ID = 1:20) set.seed(10203) out <- mrgsim_ds(mod, data = data) dim(out) head(out) tail(out) nrow(out) ncol(out) plot(out, ~ CP + RESP, nid = 10)
Standard dplyr verbs dispatched on an mrgsimsds object. Each verb extracts
the underlying Arrow Dataset and forwards all arguments to the corresponding
dplyr generic, returning a lazy Arrow query that can be materialized with
dplyr::collect().
## S3 method for class 'mrgsimsds' group_by(.data, ..., .add = FALSE, .drop = TRUE) ## S3 method for class 'mrgsimsds' select(.data, ...) ## S3 method for class 'mrgsimsds' mutate(.data, ...) ## S3 method for class 'mrgsimsds' filter(.data, ..., .preserve = FALSE) ## S3 method for class 'mrgsimsds' arrange(.data, ..., .by_group = FALSE) ## S3 method for class 'mrgsimsds' rename(.data, ...) ## S3 method for class 'mrgsimsds' summarise(.data, ..., .groups = NULL) ## S3 method for class 'mrgsimsds' distinct(.data, ..., .keep_all = FALSE) ## S3 method for class 'mrgsimsds' relocate(.data, ..., .before = NULL, .after = NULL) ## S3 method for class 'mrgsimsds' count(x, ..., wt = NULL, sort = FALSE, name = NULL) ## S3 method for class 'mrgsimsds' pull(.data, var = -1, name = NULL, as_vector = TRUE, ...)## S3 method for class 'mrgsimsds' group_by(.data, ..., .add = FALSE, .drop = TRUE) ## S3 method for class 'mrgsimsds' select(.data, ...) ## S3 method for class 'mrgsimsds' mutate(.data, ...) ## S3 method for class 'mrgsimsds' filter(.data, ..., .preserve = FALSE) ## S3 method for class 'mrgsimsds' arrange(.data, ..., .by_group = FALSE) ## S3 method for class 'mrgsimsds' rename(.data, ...) ## S3 method for class 'mrgsimsds' summarise(.data, ..., .groups = NULL) ## S3 method for class 'mrgsimsds' distinct(.data, ..., .keep_all = FALSE) ## S3 method for class 'mrgsimsds' relocate(.data, ..., .before = NULL, .after = NULL) ## S3 method for class 'mrgsimsds' count(x, ..., wt = NULL, sort = FALSE, name = NULL) ## S3 method for class 'mrgsimsds' pull(.data, var = -1, name = NULL, as_vector = TRUE, ...)
.data, x
|
an mrgsimsds object. |
... |
passed to the corresponding dplyr generic. |
.add, .drop
|
passed to |
.preserve |
passed to |
.by_group |
passed to |
.groups |
passed to |
.keep_all |
passed to |
.before, .after
|
passed to |
wt, sort
|
passed to |
name |
passed to |
var |
passed to |
as_vector |
passed to |
A lazy Arrow query object. Use dplyr::collect() to materialize the result
into a tibble. pull() is an exception — it collects immediately and returns
a vector.
library(dplyr) mod <- house_ds(end = 24) data <- evd_expand(amt = c(100, 300), ID = 1:10) out <- mrgsim_ds(mod, data) out |> filter(TIME > 0) |> select(ID, TIME, CP) |> collect() out |> group_by(ID) |> summarise(auc = sum(CP)) |> collect() out |> mutate(WEEK = TIME / 168) |> collect()library(dplyr) mod <- house_ds(end = 24) data <- evd_expand(amt = c(100, 300), ID = 1:10) out <- mrgsim_ds(mod, data) out |> filter(TIME > 0) |> select(ID, TIME, CP) |> collect() out |> group_by(ID) |> summarise(auc = sum(CP)) |> collect() out |> mutate(WEEK = TIME / 168) |> collect()
Functions to check ownership or disown simulation output files on disk.
One situation where you need to take over ownership is when you are
simulating in parallel, and the simulation happens in another R process.
mrgsim.ds ownership is established when the simulation returns and the
mrgsimsds object is created. When this happens in another R process (e.g.,
on a worker node), there is no way to transfer that information back to the
parent process. In that case, a call to take_ownership() once the results
are returned to the parent process would be appropriate. Typically, these
results are returned as a list and a call to reduce_ds() will create a
single object pointing to and owning multiple files. Therefore, it should be
rare to call take_ownership() directly; if doing so, please make sure you
understand what is going on.
ownership() list_ownership(full.names = FALSE) check_ownership(x) disown(x) take_ownership(x)ownership() list_ownership(full.names = FALSE) check_ownership(x) disown(x) take_ownership(x)
full.names |
if |
x |
an mrgsimsds object. |
check_ownership: TRUE if x owns the underlying files; FALSE
otherwise.
list_ownership: a data.frame of ownership information.
ownership: nothing; used for side effects.
disown: x is returned invisibly; it is not modified.
take_ownership: x is returned invisibly after its hash and the
package-level ownership maps are updated in place.
mod <- house_ds() out <- mrgsim_ds(mod, id = 1) check_ownership(out) ownership() list_ownership() e1 <- ev(amt = 100) e2 <- ev(amt = 200) out <- list(mrgsim_ds(mod, e1), mrgsim_ds(mod, e2)) sims <- reduce_ds(out) ownership() check_ownership(sims) check_ownership(out[[1]]) check_ownership(out[[2]])mod <- house_ds() out <- mrgsim_ds(mod, id = 1) check_ownership(out) ownership() list_ownership() e1 <- ev(amt = 100) e2 <- ev(amt = 200) out <- list(mrgsim_ds(mod, e1), mrgsim_ds(mod, e2)) sims <- reduce_ds(out) ownership() check_ownership(sims) check_ownership(out[[1]]) check_ownership(out[[2]])
Print an mrgsimsds object
## S3 method for class 'mrgsimsds' print(x, n = 8, ...)## S3 method for class 'mrgsimsds' print(x, n = 8, ...)
x |
an mrgsimsds object. |
n |
number of rows to show from the cached head data. |
... |
not used. |
x invisibly.
mod <- house_ds(end = 24) out <- mrgsim_ds(mod, events = ev(amt = 100)) print(out)mod <- house_ds(end = 24) out <- mrgsim_ds(mod, events = ev(amt = 100)) print(out)
Filters a mixed list down to only the elements that are mrgsimsds objects,
dropping anything else (e.g. NULL, data frames, character vectors). When
passed a single mrgsimsds object it is returned invisibly unchanged.
prune_ds(x, ..., inform = TRUE) ## S3 method for class 'mrgsimsds' prune_ds(x, ...) ## S3 method for class 'list' prune_ds(x, ..., inform = TRUE)prune_ds(x, ..., inform = TRUE) ## S3 method for class 'mrgsimsds' prune_ds(x, ...) ## S3 method for class 'list' prune_ds(x, ..., inform = TRUE)
x |
a list of R objects or a single mrgsimsds object. |
... |
not used. |
inform |
(list method only) issue a message when objects in some list slots are dropped. |
When x is a list, it will be returned with only the mrgsimsds objects
retained. If no mrgsimsds objects are found, an empty list is returned with
a warning.
When x is an mrgsimsds object, it will be invisibly returned.
mod <- house_ds(end = 24) out <- mrgsim_ds(mod, events = ev(amt = 100)) sims <- list(out, letters) prune_ds(sims)mod <- house_ds(end = 24) out <- mrgsim_ds(mod, events = ev(amt = 100)) sims <- list(out, letters) prune_ds(sims)
Combines a list of mrgsimsds objects — typically the replicate outputs from a parallel simulation — into one object backed by all of their parquet files. Ownership of every file is transferred to the new object.
reduce_ds(x, ...) ## S3 method for class 'mrgsimsds' reduce_ds(x, ...) ## S3 method for class 'list' reduce_ds(x, ...)reduce_ds(x, ...) ## S3 method for class 'mrgsimsds' reduce_ds(x, ...) ## S3 method for class 'list' reduce_ds(x, ...)
x |
a list of mrgsimsds objects or a single mrgsimsds object. |
... |
not used. |
The returned object always gets a fresh, unlocked gc state: gc_locked is
set to FALSE and gc is determined by file location via the same rule used
at creation time — TRUE if files are under tempdir(), FALSE otherwise.
Any gc lock set on the input objects is not carried over. To lock the gc
setting on the result, call gc_ds() after reducing.
When x is a list, a new mrgsimsds object is returned that owns all
underlying parquet files; the input objects are disowned.
When x is an mrgsimsds object, it is validated, refreshed, and returned
invisibly with its pid updated to the current process.
mod <- modlib_ds("popex", outvars = "IPRED") data <- ev_expand(amt = 100, ID = 1:10) out <- lapply(1:3, function(rep) { out <- mrgsim_ds(mod, data) out }) length(out) sims <- reduce_ds(out) sims check_ownership(sims) lapply(out, check_ownership)mod <- modlib_ds("popex", outvars = "IPRED") data <- ev_expand(amt = 100, ID = 1:10) out <- lapply(1:3, function(rep) { out <- mrgsim_ds(mod, data) out }) length(out) sims <- reduce_ds(out) sims check_ownership(sims) lapply(out, check_ownership)
Arrow dataset pointers become invalid when an object is created in a worker
process and returned to the head node (e.g. after a parallel simulation).
refresh_ds() rebuilds the pointer by re-opening the parquet files via
arrow::open_dataset() and updates pid and dim in place. Because
refreshing is itself the fix for an invalid pointer, it checks that files
exist but does not call safe_ds() first.
refresh_ds(x, ...) ## S3 method for class 'mrgsimsds' refresh_ds(x, ...) ## S3 method for class 'list' refresh_ds(x, ...)refresh_ds(x, ...) ## S3 method for class 'mrgsimsds' refresh_ds(x, ...) ## S3 method for class 'list' refresh_ds(x, ...)
x |
an mrgsimsds object or a list of objects. |
... |
for future use. |
When x is an mrgsimsds object, it is returned invisibly with its Arrow
pointer, pid, and dim refreshed in place.
When x is a list, it is returned invisibly with refresh_ds() applied to
every mrgsimsds element; non-mrgsimsds elements are left unchanged.
mod <- house_ds() data <- ev_expand(amt = 100, ID = 1:100) out <- lapply(1:3, function(rep) { mrgsim_ds(mod, data) }) refresh_ds(out)mod <- house_ds() data <- ev_expand(amt = 100, ID = 1:100) out <- lapply(1:3, function(rep) { mrgsim_ds(mod, data) }) refresh_ds(out)
save_ds() serializes an mrgsimsds object to an .rds file, moving the
backing parquet files to the same directory as file. Parquet filenames
are stored as bare basenames inside the .rds, so the .rds file and its
parquet files must stay in the same directory to be portable.
Do not restore the file with readRDS(); use read_ds() instead.
read_ds() deserializes a file written by save_ds(), rebuilds the Arrow
Dataset pointer, and transfers full ownership of the backing files to the
returned object.
save_ds(x, file, quietly = FALSE) read_ds(file)save_ds(x, file, quietly = FALSE) read_ds(file)
x |
an mrgsimsds object. |
file |
for |
quietly |
if |
save_ds() returns the path to the written .rds file, invisibly.
read_ds() returns the restored mrgsimsds object invisibly. gc is disabled
(gc = FALSE) on the returned object and the caller holds ownership of the
backing files.
mod <- house_ds() out <- mrgsim_ds(mod, events = ev(amt = 100)) file <- save_ds(out, file.path(tempdir(), "out.rds")) out2 <- read_ds(file)mod <- house_ds() out <- mrgsim_ds(mod, events = ev(amt = 100)) file <- save_ds(out, file.path(tempdir(), "out.rds")) out2 <- read_ds(file)
Stamps the model object with the current process ID and tempdir() path so
that mrgsim_ds() knows where to write output files. This is called
automatically by mread_ds(), house_ds(), and the other model-loading
wrappers. Call it directly only when you load a model through the base
mrgsolve functions (e.g. mrgsolve::mread()) and still want to use
mrgsim_ds().
save_process_info(x)save_process_info(x)
x |
a model object. |
An updated model object suitable for using with mrgsim_ds().
mod <- mrgsolve::house() mod <- save_process_info(mod)mod <- mrgsolve::house() mod <- save_process_info(mod)
Use these functions to escape the mrgsim.ds universe. write_parquet_ds()
writes all simulated data to a single .parquet file. write_dataset_ds()
writes to a directory, optionally partitioned, via arrow::write_dataset();
the caller takes responsibility for the resulting files.
write_parquet_ds(x, sink, ...) write_dataset_ds(x, path, ...)write_parquet_ds(x, sink, ...) write_dataset_ds(x, path, ...)
x |
an mrgsimsds object. |
sink |
passed to |
... |
passed to the underlying arrow function. |
path |
passed to |
write_parquet_ds() returns x invisibly.
write_dataset_ds() returns path invisibly.
save_ds() to persist an object while staying within the
mrgsim.ds universe.