Title: | Empirical Bayes Matrix Factorization |
---|---|
Description: | Methods for matrix factorization based on Wang and Stephens (2021) <https://jmlr.org/papers/v22/20-589.html>. |
Authors: | Jason Willwerscheid [aut, cre], Peter Carbonetto [aut], Wei Wang [aut], Matthew Stephens [aut], Eric Weine [ctb], Annie Xie [ctb], Gao Wang [ctb] |
Maintainer: | Jason Willwerscheid <[email protected]> |
License: | BSD_3_clause + file LICENSE |
Version: | 1.0.54 |
Built: | 2024-11-17 06:14:27 UTC |
Source: | https://github.com/willwerscheid/flashier |
Given a flash
object, returns the "fitted values"
.
## S3 method for class 'flash' fitted(object, ...)
## S3 method for class 'flash' fitted(object, ...)
object |
An object inheriting from class |
... |
Additional parameters are ignored. |
The matrix of "fitted values."
Given a flash_fit
object, returns the "fitted values"
.
## S3 method for class 'flash_fit' fitted(object, ...)
## S3 method for class 'flash_fit' fitted(object, ...)
object |
An object inheriting from class |
... |
Additional parameters are ignored. |
The matrix of "fitted values."
Fits an empirical Bayes matrix factorization (see Details for a
description of the model). The resulting fit is referred to as a "flash"
object (short for Factors and Loadings using Adaptive SHrinkage). Two
interfaces are provided. The flash
function provides a simple
interface that allows a flash object to be fit in a single pass, while
flash_xxx
functions are pipeable functions that allow for more
complex flash objects to be fit incrementally (available functions are
listed below under See Also). See the vignettes and
Examples for usage.
flash( data, S = NULL, ebnm_fn = ebnm_point_normal, var_type = 0L, greedy_Kmax = 50L, backfit = FALSE, nullcheck = TRUE, verbose = 1L )
flash( data, S = NULL, ebnm_fn = ebnm_point_normal, var_type = 0L, greedy_Kmax = 50L, backfit = FALSE, nullcheck = TRUE, verbose = 1L )
data |
The observations. Usually a matrix, but can also be a sparse
matrix of class |
S |
The standard errors. Can be |
ebnm_fn |
The function or functions used to solve the empirical Bayes
normal means (EBNM) subproblems. Most importantly, these functions specify
the families of distributions Any EBNM function provided by package |
var_type |
Describes the structure of the estimated residual variance.
Can be Note that if any portion of the residual variance is to be estimated, then
it is usually faster to set |
greedy_Kmax |
The maximum number of factors to be added. This will not
necessarily be the total number of factors added by |
backfit |
A "greedy" fit is performed by adding up to
|
nullcheck |
If |
verbose |
When and how to display progress updates. Set to
|
If is an
data matrix, then the rank-one
empirical Bayes matrix factorization model is:
where is an
-vector of loadings,
is a
-vector of factors, and
is an
matrix of residuals (or "errors").
Additionally:
The residual variance parameters are constrained to have
a simple structure and are fit via maximum likelihood. (For example, one
might assume that all standard errors are identical:
for some
and for all
,
).
The functions
and
are assumed to belong to
some families of priors
and
that are
specified in advance, and are estimated via variational approximation.
The general rank- empirical Bayes matrix factorization model is:
or
where is now a matrix of loadings and
is a matrix of
factors.
Separate priors and
are estimated via
empirical Bayes, and different prior families may be used for different
values of
. In general, then:
Typically, and
will be closed under
scaling, in which case
and
are only identifiable
up to a scaling factor
. In other words, we can write:
where is a diagonal matrix with diagonal entries
.
The model can then be made identifiable by constraining the scale of
and
for
.
A flash
object. Contains elements:
n_factors
The total number of factor/loadings pairs
in the fitted model.
pve
The proportion of variance explained by each factor/loadings pair. Since factors and loadings are not required to be orthogonal, this should be interpreted loosely: for example, the total proportion of variance explained could be larger than 1.
elbo
The variational lower bound achieved by the fitted model.
residuals_sd
Estimated residual standard deviations (these
include any variance component given as an argument to S
).
L_pm, L_psd, L_lfsr
Posterior means,
standard deviations, and local false sign rates for loadings .
F_pm, F_psd, F_lfsr
Posterior means,
standard deviations, and local false sign rates for factors .
L_ghat
The fitted priors on loadings
.
F_ghat
The fitted priors on factors
.
sampler
A function that takes a single argument
nsamp
and returns nsamp
samples from the posterior
distributions for factors and loadings
.
flash_fit
A flash_fit
object. Used by
flash
when fitting is not performed all at once, but
incrementally via calls to various flash_xxx
functions.
The following methods are available:
fitted.flash
Returns the "fitted values"
.
residuals.flash
Returns the expected residuals
.
ldf.flash
Returns an decomposition (see
Details above), with columns of
and
scaled
as specified by the user.
Wei Wang and Matthew Stephens (2021). "Empirical Bayes matrix factorization." Journal of Machine Learning Research 22, 1–40.
flash_init
, flash_greedy
,
flash_backfit
, and flash_nullcheck
. For more
advanced functionality, see flash_factors_init
,
flash_factors_fix
, flash_factors_set_to_zero
,
flash_factors_remove
, flash_set_verbose
, and
flash_set_conv_crit
.
For extracting useful data from flash
objects, see
fitted.flash
, residuals.flash
, and
ldf.flash
.
data(gtex) # Fit up to 3 factors and backfit. fl <- flash(gtex, greedy_Kmax = 3L, backfit = TRUE) # This is equivalent to the series of calls: fl <- flash_init(gtex) |> flash_greedy(Kmax = 3L) |> flash_backfit() |> flash_nullcheck() # Fit a unimodal distribution with mean zero to each set of loadings # and a scale mixture of normals with mean zero to each factor. fl <- flash(gtex, ebnm_fn = c(ebnm_unimodal, ebnm_normal_scale_mixture), greedy_Kmax = 3) # Fit point-laplace priors using a non-default optimization method. fl <- flash(gtex, ebnm_fn = flash_ebnm(prior_family = "point_laplace", optmethod = "trust"), greedy_Kmax = 3) # Fit a "Kronecker" (rank-one) variance structure (this can be slow). fl <- flash(gtex, var_type = c(1, 2), greedy_Kmax = 3L)
data(gtex) # Fit up to 3 factors and backfit. fl <- flash(gtex, greedy_Kmax = 3L, backfit = TRUE) # This is equivalent to the series of calls: fl <- flash_init(gtex) |> flash_greedy(Kmax = 3L) |> flash_backfit() |> flash_nullcheck() # Fit a unimodal distribution with mean zero to each set of loadings # and a scale mixture of normals with mean zero to each factor. fl <- flash(gtex, ebnm_fn = c(ebnm_unimodal, ebnm_normal_scale_mixture), greedy_Kmax = 3) # Fit point-laplace priors using a non-default optimization method. fl <- flash(gtex, ebnm_fn = flash_ebnm(prior_family = "point_laplace", optmethod = "trust"), greedy_Kmax = 3) # Fit a "Kronecker" (rank-one) variance structure (this can be slow). fl <- flash(gtex, var_type = c(1, 2), greedy_Kmax = 3L)
Adds an all-ones vector as a fixed set of loadings (if rowwise = TRUE
)
or fixed factor (if rowwise = FALSE
). Assuming (without loss of
generality) that the fixed factor/loadings is indexed as ,
a fixed set of loadings gives:
so that the (estimated) factor is shared
by all row-wise observations
.
A fixed factor gives:
so that the (estimated) set of loadings is
shared by all column-wise observations
.
flash_add_intercept(flash, rowwise = TRUE, ebnm_fn = ebnm_point_normal)
flash_add_intercept(flash, rowwise = TRUE, ebnm_fn = ebnm_point_normal)
flash |
A |
rowwise |
Should the all-ones vector be added as a fixed set of loadings ("row-wise") or a fixed factor ("column-wise")? See above for details. |
ebnm_fn |
As with other factor/loadings pairs, a prior is put on the
estimated factor (if |
The estimated factor (if rowwise = TRUE
) or set of loadings
(if rowwise = FALSE
) is initialized at the column-
or row-wise means of the data (or, if factor/loadings pairs have previously
been added, at the column- or row-wise means of the matrix of residuals)
and then backfit via function flash_backfit
.
The flash
object from argument flash
, with an
"intercept" added.
# The following are equivalent: init <- list(matrix(rowMeans(gtex), ncol = 1), matrix(1, nrow = ncol(gtex))) fl <- flash_init(gtex) |> flash_factors_init(init) |> flash_factors_fix(kset = 1, which_dim = "factors") |> flash_backfit(kset = 1) fl <- flash_init(gtex) |> flash_add_intercept(rowwise = FALSE)
# The following are equivalent: init <- list(matrix(rowMeans(gtex), ncol = 1), matrix(1, nrow = ncol(gtex))) fl <- flash_init(gtex) |> flash_factors_init(init) |> flash_factors_fix(kset = 1, which_dim = "factors") |> flash_backfit(kset = 1) fl <- flash_init(gtex) |> flash_add_intercept(rowwise = FALSE)
Backfits existing flash factor/loadings pairs. Whereas a "greedy" fit optimizes
each newly added factor/loadings pair in one go without returning to optimize
previously added pairs, a "backfit" updates all existing pairs in a cyclical
fashion. See flash
for examples of usage.
flash_backfit( flash, kset = NULL, extrapolate = TRUE, warmstart = TRUE, maxiter = 500, tol = NULL, verbose = NULL )
flash_backfit( flash, kset = NULL, extrapolate = TRUE, warmstart = TRUE, maxiter = 500, tol = NULL, verbose = NULL )
flash |
A |
kset |
A vector of integers specifying which factors to backfit.
If |
extrapolate |
Whether to use an extrapolation technique
inspired by Ang and Gillis (2019) to accelerate the fitting process.
Control parameters are handled via global options and can be set by
calling |
warmstart |
Whether to use "warmstarts" when solving the EBNM
subproblems by initializing solutions at the previous value of the fitted
prior |
maxiter |
The maximum number of backfitting iterations. An "iteration"
is defined such that all factors in |
tol |
The convergence tolerance parameter. After each update, the fit
is compared to the fit from before the update using a convergence
criterion function (by default, the difference in ELBO, but the criterion
can be changed via |
verbose |
When and how to display progress updates. Set to
|
The flash
object from argument flash
, backfitted
as specified.
Used in a flash
pipeline to clear timeout conditions set using
flash_set_timeout
.
flash_clear_timeout(flash)
flash_clear_timeout(flash)
flash |
A |
The flash
object from argument flash
, with
timeout settings cleared.
The default objective function used to determine convergence when fitting
a flash
object. Calculates the difference in the
variational lower bound ("ELBO") from one iteration to the next.
flash_conv_crit_elbo_diff(curr, prev, k)
flash_conv_crit_elbo_diff(curr, prev, k)
curr |
The |
prev |
The |
k |
Only used during sequential backfits (that is, calls to
|
This function is an example of a function that may be passed to
parameter fn
in function flash_set_conv_crit
to set
the convergence criterion for a flash pipeline. See
flash_set_conv_crit
for details and examples.
A scalar, which is compared against the tolerance parameter
tol
to determine whether a fit has converged.
flash_conv_crit_max_chg
flash_conv_crit_max_chg_L
,
flash_conv_crit_max_chg_F
An alternative objective function that can be used to determine
convergence when fitting a flash
object. Calculates the
maximum (absolute) change over all (posterior expected values for)
loadings and factors
. At each iteration, the
loadings vectors
and factors
are
-normalized.
flash_conv_crit_max_chg(curr, prev, k)
flash_conv_crit_max_chg(curr, prev, k)
curr |
The |
prev |
The |
k |
Only used during sequential backfits (that is, calls to
|
A scalar, which is compared against the tolerance parameter
tol
to determine whether a fit has converged.
flash_conv_crit_elbo_diff
,
flash_conv_crit_max_chg_L
flash_conv_crit_max_chg_F
An alternative objective function that can be used to determine
convergence when fitting a flash
object. Calculates the
maximum (absolute) change over all (posterior expected values for)
factors . At each iteration, the factors
are
-normalized.
flash_conv_crit_max_chg_F(curr, prev, k)
flash_conv_crit_max_chg_F(curr, prev, k)
curr |
The |
prev |
The |
k |
Only used during sequential backfits (that is, calls to
|
A scalar, which is compared against the tolerance parameter
tol
to determine whether a fit has converged.
flash_conv_crit_elbo_diff
,
flash_conv_crit_max_chg
flash_conv_crit_max_chg_L
An alternative objective function that can be used to determine
convergence when fitting a flash
object. Calculates the
maximum (absolute) change over all (posterior expected values for)
loadings . At each iteration, the loadings vectors
are
-normalized.
flash_conv_crit_max_chg_L(curr, prev, k)
flash_conv_crit_max_chg_L(curr, prev, k)
curr |
The |
prev |
The |
k |
Only used during sequential backfits (that is, calls to
|
A scalar, which is compared against the tolerance parameter
tol
to determine whether a fit has converged.
flash_conv_crit_elbo_diff
,
flash_conv_crit_max_chg
flash_conv_crit_max_chg_F
flash_ebnm
is a helper function that provides readable syntax for
constructing ebnm
functions that can serve as
arguments to parameter ebnm_fn
in functions flash
,
flash_greedy
, and flash_factors_init
(see
Examples below). It is also possible to write a custom function
from scratch: see Details below for a simple example. A more
involved example can be found in the "Extending ebnm with custom ebnm-style
functions" vignette in the ebnm
package.
flash_ebnm(...)
flash_ebnm(...)
... |
Parameters to be passed to function |
As input to parameter ebnm_fn
in functions flash
,
flash_greedy
, and flash_factors_init
,
it should suffice for many purposes to
provide functions from package ebnm
as is (for example, one might
set ebnm_fn = ebnm_point_laplace
). To use non-default
arguments, function flash_ebnm
may be used (see Examples).
Custom functions may also be written; for details, see the "Extending
ebnm with custom ebnm-style functions" vignette in the
ebnm
package.
A function that can be passed as argument to parameter
ebnm_fn
in functions flash
,
flash_greedy
, and flash_factors_init
.
# A custom EBNM function might be written as follows: my_ebnm_fn <- function(x, s, g_init, fix_g, output) { ebnm_res <- ebnm_point_laplace( x = x, s = s, g_init = g_init, fix_g = fix_g, output = output, control = list(iterlim = 10) ) return(ebnm_res) } # The following are equivalent: fl1 <- flash( gtex, ebnm_fn = my_ebnm_fn, greedy_Kmax = 2 ) fl2 <- flash( gtex, ebnm_fn = flash_ebnm( prior_family = "point_laplace", control = list(iterlim = 10) ), greedy_Kmax = 2 )
# A custom EBNM function might be written as follows: my_ebnm_fn <- function(x, s, g_init, fix_g, output) { ebnm_res <- ebnm_point_laplace( x = x, s = s, g_init = g_init, fix_g = fix_g, output = output, control = list(iterlim = 10) ) return(ebnm_res) } # The following are equivalent: fl1 <- flash( gtex, ebnm_fn = my_ebnm_fn, greedy_Kmax = 2 ) fl2 <- flash( gtex, ebnm_fn = flash_ebnm( prior_family = "point_laplace", control = list(iterlim = 10) ), greedy_Kmax = 2 )
Fixes loadings or factors
for one or more factor/loadings pairs, so that their values are not
updated during subsequent backfits. For a given pair, either the loadings
or factor can be fixed, but not both, and either all entries or a subset
can be fixed. To unfix, use function
flash_factors_unfix
. See
flash_factors_init
for an example of usage.
flash_factors_fix( flash, kset, which_dim = c("factors", "loadings"), fixed_idx = NULL, use_fixed_in_ebnm = NULL )
flash_factors_fix( flash, kset, which_dim = c("factors", "loadings"), fixed_idx = NULL, use_fixed_in_ebnm = NULL )
flash |
A |
kset |
A vector of integers indexing the factor/loadings pairs whose loadings or factors are to be fixed. |
which_dim |
Whether to fix factors or loadings. |
fixed_idx |
If |
use_fixed_in_ebnm |
By default, fixed elements are ignored when
solving the EBNM subproblem in order to estimate the prior |
The flash
object from argument flash
, with
factors or loadings fixed as specified.
Initializes factor/loadings pairs at values specified by init
. This
function has two primary uses: 1. One can initialize multiple
factor/loadings pairs at once using an SVD-like function and then optimize
them via function flash_backfit
. Sometimes this results in
a better fit than adding them one at a time via
flash_greedy
. 2. One can initialize factor/loadings pairs
and then fix the factor (or loadings) via function
flash_factors_fix
to incorporate "known" factors into a
flash
object. See below for examples of both use cases.
flash_factors_init(flash, init, ebnm_fn = ebnm_point_normal)
flash_factors_init(flash, init, ebnm_fn = ebnm_point_normal)
flash |
A |
init |
An SVD-like object (specifically, a list containing fields
|
ebnm_fn |
The function or functions used to solve the empirical Bayes
normal means (EBNM) subproblems. Most importantly, these functions specify
the families of distributions Any EBNM function provided by package |
The flash
object from argument flash
, with
factors and loadings initialized as specified.
# Initialize several factors at once and backfit. fl <- flash_init(gtex) |> flash_factors_init(init = svd(gtex, nu = 5, nv = 5)) |> flash_backfit() # Add fixed loadings with \ell_i identically equal to one. This can be # interpreted as giving a "mean" factor that accounts for different # row-wise means. ones <- matrix(1, nrow = nrow(gtex), ncol = 1) # Initialize the factor at the least squares solution. ls_soln <- t(solve(crossprod(ones), crossprod(ones, gtex))) fl <- flash_init(gtex) |> flash_factors_init(init = list(ones, ls_soln)) |> flash_factors_fix(kset = 1, which_dim = "loadings") |> flash_backfit() |> flash_greedy(Kmax = 5L)
# Initialize several factors at once and backfit. fl <- flash_init(gtex) |> flash_factors_init(init = svd(gtex, nu = 5, nv = 5)) |> flash_backfit() # Add fixed loadings with \ell_i identically equal to one. This can be # interpreted as giving a "mean" factor that accounts for different # row-wise means. ones <- matrix(1, nrow = nrow(gtex), ncol = 1) # Initialize the factor at the least squares solution. ls_soln <- t(solve(crossprod(ones), crossprod(ones, gtex))) fl <- flash_init(gtex) |> flash_factors_init(init = list(ones, ls_soln)) |> flash_factors_fix(kset = 1, which_dim = "loadings") |> flash_backfit() |> flash_greedy(Kmax = 5L)
Sets factor/loadings pairs to zero and then removes them from the
flash
object. Note that this will change the indices of
existing pairs.
flash_factors_remove(flash, kset)
flash_factors_remove(flash, kset)
flash |
A |
kset |
A vector of integers specifying which factor/loadings pairs to remove. |
The flash
object from argument flash
, with the
factors specified by kset
removed.
Reorders the factor/loadings pairs in a flash
object.
flash_factors_reorder(flash, kset)
flash_factors_reorder(flash, kset)
flash |
A |
kset |
A vector of integers specifying the new order of the
factor/loadings pairs. All existing factors must be included in
|
The flash
object from argument flash
, with the
factors reordered according to argument kset
.
Sets factor/loadings pairs to zero but does not remove them from the
flash
object (so as to keep the indices of existing pairs
the same).
flash_factors_set_to_zero(flash, kset)
flash_factors_set_to_zero(flash, kset)
flash |
A |
kset |
A vector of integers specifying which factor/loadings pairs to set to zero. |
The flash
object from argument flash
, with the
factors specified by kset
set to zero.
If loadings or factors
for one or
more factor/loadings pairs have been "fixed" using function
flash_factors_fix
, then they can be unfixed using
function flash_factors_unfix
.
flash_factors_unfix(flash, kset)
flash_factors_unfix(flash, kset)
flash |
A |
kset |
A vector of integers indexing the factor/loadings pairs whose values are to be unfixed. |
The flash
object from argument flash
, with
values for the factor/loadings pairs specified by kset
unfixed.
flash_fit
objects are the "internal" objects used by flash
functions to fit an EBMF model. Whereas flash
objects
(the end results of the fitting process) include user-friendly fields and
methods, flash_fit
objects were not designed for public
consumption and can be unwieldy. Nonetheless, some advanced
flash
functionality requires the wielding of
flash_fit
objects. In particular, initialization, convergence,
and "verbose" display functions all take one or more flash_fit
objects as input (see parameter init_fn
in function
flash_greedy
; parameter fn
in
flash_set_conv_crit
;
and parameter fns
in flash_set_verbose
).
For users who would like to write custom functions, the accessor functions
and methods enumerated below may prove useful. See
flash_set_verbose
for an example.
flash_fit(flash) flash_fit_get_pm(f, n) flash_fit_get_p2m(f, n) flash_fit_get_est_tau(f) flash_fit_get_fixed_tau(f) flash_fit_get_tau(f) flash_fit_get_elbo(f) flash_fit_get_KL(f, n) flash_fit_get_g(f, n)
flash_fit(flash) flash_fit_get_pm(f, n) flash_fit_get_p2m(f, n) flash_fit_get_est_tau(f) flash_fit_get_fixed_tau(f) flash_fit_get_tau(f) flash_fit_get_elbo(f) flash_fit_get_KL(f, n) flash_fit_get_g(f, n)
flash |
A |
f |
A |
n |
Set |
The following S3 methods are available for flash_fit
objects at all
times except while optimizing new factor/loadings pairs as part of a
"greedy" fit:
fitted.flash_fit
Returns the "fitted values"
.
residuals.flash_fit
Returns the expected residuals
.
ldf.flash_fit
Returns an decomposition,
with columns of
and
scaled as specified by the user.
See function descriptions below.
flash_fit_get_pm()
: The posterior means for the loadings matrix
(when parameter
n
is equal to 1
) or factor matrix
(when
n = 2
). While optimizing new factor/loadings pairs as part of
a "greedy" fit, only the posterior means for the new loadings
or factor
will be returned.
flash_fit_get_p2m()
: The posterior second moments for the loadings matrix
(when parameter
n
is equal to 1
) or factor matrix
(when
n = 2
). While optimizing new factor/loadings pairs,
only the posterior second moments for the new loadings
or factor
will be returned.
flash_fit_get_est_tau()
: Equal to , where
is the estimated portion of the residual variance (total, by row, or
by column, depending on the variance type).
flash_fit_get_fixed_tau()
: Equal to , where
is the
fixed portion of the residual variance (total, by row, or by column).
flash_fit_get_tau()
: The overall precision .
flash_fit_get_elbo()
: The variational lower bound (ELBO).
flash_fit_get_KL()
: A vector containing the KL-divergence portions of
the ELBO, with one value for each factor (when n = 2
) or set of
loadings (when n = 1
). While optimizing new factor/loadings pairs,
only the KL-divergence for the new factor or loadings will be returned.
flash_fit_get_g()
: A list containing estimated priors on loadings
(when
n = 1
) or factors (when
n = 2
). While optimizing new factor/loadings pairs, only the
estimated prior on the new factor or loadings will be returned.
Adds factor/loadings pairs to a flash object in a "greedy" manner. Up to
Kmax
pairs are added one at a time. At each step, flash_greedy
attempts to find an optimal additional (rank-one) factor given all
previously added factors. The additional factor is retained if it
increases the variational lower bound (ELBO); otherwise, fitting terminates.
See flash
for examples of usage.
flash_greedy( flash, Kmax = 1, ebnm_fn = ebnm_point_normal, init_fn = NULL, extrapolate = FALSE, warmstart = FALSE, maxiter = 500, tol = NULL, verbose = NULL )
flash_greedy( flash, Kmax = 1, ebnm_fn = ebnm_point_normal, init_fn = NULL, extrapolate = FALSE, warmstart = FALSE, maxiter = 500, tol = NULL, verbose = NULL )
flash |
A |
Kmax |
The maximum number of factors to be added. This will not
necessarily be the total number of factors added by
|
ebnm_fn |
The function or functions used to solve the empirical Bayes
normal means (EBNM) subproblems. Most importantly, these functions specify
the families of distributions Any EBNM function provided by package |
init_fn |
The function used to initialize factor/loadings pairs. Functions
|
extrapolate |
Whether to use an extrapolation technique
inspired by Ang and Gillis (2019) to accelerate the fitting process.
Control parameters are handled via global options and can be set by
calling |
warmstart |
Whether to use "warmstarts" when solving the EBNM
subproblems by initializing solutions at the previous value of the fitted
prior |
maxiter |
The maximum number of iterations when optimizing a greedily added factor/loadings pair. |
tol |
The convergence tolerance parameter. At each iteration, the fit
is compared to the fit from the previous iteration using a convergence
criterion function (by default, the difference in ELBO, but the criterion
can be changed via |
verbose |
When and how to display progress updates. Set to
|
The flash
object from argument flash
, with up
to Kmax
new factor/loadings pairs "greedily" added.
flash_greedy_init_default
,
flash_greedy_init_softImpute
,
flash_greedy_init_irlba
# The following are examples of advanced usage. See ?flash for basic usage. # Increase the maximum number of iterations in the default initialization # method. my_init_fn <- function(f) flash_greedy_init_default(f, maxiter = 500) fl <- flash_init(gtex) |> flash_greedy(init_fn = my_init_fn) # Use a custom initialization function that wraps function nmf from # package RcppML. nmf_init_fn <- function(f) { nmf_res <- RcppML::nmf(resid(f), k = 1, verbose = FALSE) return(list(as.vector(nmf_res$w), as.vector(nmf_res$h))) } fl.nmf <- flash_init(gtex) |> flash_greedy(ebnm_fn = ebnm_unimodal_nonnegative, init_fn = nmf_init_fn)
# The following are examples of advanced usage. See ?flash for basic usage. # Increase the maximum number of iterations in the default initialization # method. my_init_fn <- function(f) flash_greedy_init_default(f, maxiter = 500) fl <- flash_init(gtex) |> flash_greedy(init_fn = my_init_fn) # Use a custom initialization function that wraps function nmf from # package RcppML. nmf_init_fn <- function(f) { nmf_res <- RcppML::nmf(resid(f), k = 1, verbose = FALSE) return(list(as.vector(nmf_res$w), as.vector(nmf_res$h))) } fl.nmf <- flash_init(gtex) |> flash_greedy(ebnm_fn = ebnm_unimodal_nonnegative, init_fn = nmf_init_fn)
The default method for initializing the loadings and
factor values
of a new ("greedy") flash factor. It is
essentially an implementation of the power method, but unlike many existing
implementations, it can handle missing data and sign constraints. For details,
see Chapter 2.2.3 in the reference below.
flash_greedy_init_default( flash, sign_constraints = NULL, tol = NULL, maxiter = 100, seed = 666 )
flash_greedy_init_default( flash, sign_constraints = NULL, tol = NULL, maxiter = 100, seed = 666 )
flash |
A |
sign_constraints |
This parameter can be used to constrain the sign of
the initial factor and loadings. It should be a vector of length two with
entries equal to -1, 0, or 1. The first entry constrains the sign of the
loadings |
tol |
Convergence tolerance parameter. When the maximum (absolute)
change over all values |
maxiter |
Maximum number of power iterations. |
seed |
Since initialization is random, a default seed is set for reproducibility. |
A list of length two consisting of, respectively, the vector of
initial values for loadings and the vector of initial
factor values
.
Jason Willwerscheid (2021), Empirical Bayes Matrix Factorization: Methods and Applications. Ph.D. thesis, University of Chicago.
flash_greedy
,
flash_greedy_init_softImpute
,
flash_greedy_init_irlba
Initializes a new ("greedy") flash factor using irlba
. This
can be somewhat faster than flash_greedy_init_default
for large,
dense data matrices. For sparse matrices of class Matrix
, the
default initialization should generally be preferred.
flash_greedy_init_irlba(flash, seed = 666, ...)
flash_greedy_init_irlba(flash, seed = 666, ...)
flash |
A |
seed |
Since initialization is random, a default seed is set for reproducibility. |
... |
Additional parameters to be passed to |
A list of length two consisting of, respectively, the vector of
initial values for loadings and the vector of initial
factor values
.
flash_greedy
,
flash_greedy_init_default
,
flash_greedy_init_softImpute
Initializes a new ("greedy") flash factor using softImpute
.
When there is missing data, this can yield better results than
flash_greedy_init_default
without sacrificing much (if any) speed.
flash_greedy_init_softImpute(flash, seed = 666, ...)
flash_greedy_init_softImpute(flash, seed = 666, ...)
flash |
A |
seed |
Since initialization is random, a default seed is set for reproducibility. |
... |
Additional parameters to be passed to
|
A list of length two consisting of, respectively, the vector of
initial values for loadings and the vector of initial
factor values
.
flash_greedy
,
flash_greedy_init_default
,
flash_greedy_init_irlba
Sets up a flash
object with no factors. Since all other
flash_xxx
functions take a flash
or flash_fit
object
as their first argument, calling flash_init
should be the first step
in any flash
pipeline. See flash
for examples of usage.
flash_init(data, S = NULL, var_type = 0L, S_dim = NULL)
flash_init(data, S = NULL, var_type = 0L, S_dim = NULL)
data |
The observations. Usually a matrix, but can also be a sparse
matrix of class |
S |
The standard errors. Can be |
var_type |
Describes the structure of the estimated residual variance.
Can be Note that if any portion of the residual variance is to be estimated, then
it is usually faster to set |
S_dim |
If the argument to |
An initialized flash
object (with no factors).
Sets factor/loadings pairs to zero if doing so improves the variational
lower bound (ELBO). See flash
for examples of usage.
flash_nullcheck(flash, kset = NULL, remove = TRUE, tol = NULL, verbose = NULL)
flash_nullcheck(flash, kset = NULL, remove = TRUE, tol = NULL, verbose = NULL)
flash |
A |
kset |
A vector of integers specifying which factors to nullcheck.
If |
remove |
Whether to remove factors that have been set to zero from the
|
tol |
The "tolerance" parameter: if a factor does not improve the ELBO
by at least |
verbose |
When and how to display progress updates. For nullchecks,
updates are only displayed when |
The flash
object from argument flash
, with
factors that do not improve the ELBO by at least tol
either set
to zero or removed (depending on the argument to parameter remove
).
flash_factors_remove
,
flash_factors_set_to_zero
Creates a bar plot or sequence of bar plots, one for each value of in
kset
, with bars corresponding to individual posterior means for factors
or loadings
. Values are normalized so that the
maximum absolute value for each factor
or set of
loadings
is equal to 1 (see
ldf.flash
).
This type of plot is most useful when rows or columns
are small in number or ordered in a logical fashion
(e.g., spatially).
flash_plot_bar( fl, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, labels = FALSE, ... )
flash_plot_bar( fl, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, labels = FALSE, ... )
fl |
An object inheriting from class |
order_by_pve |
If |
kset |
A vector of integers specifying the factor/loadings pairs to be
plotted. If |
pm_which |
Whether to plot loadings |
pm_subset |
A vector of row indices |
pm_groups |
A vector specifying the group to which each row of the data
|
pm_colors |
A character vector specifying a color for each unique group
specified by |
labels |
Whether to label the bars along the |
... |
Additional arguments to be passed to
|
When there is more than one value of in
kset
, a sequence of
panels is created using the facet_wrap
function from
the ggplot2
package. In each panel, the order of bars is determined
by the order of the corresponding rows or columns in the data matrix;
they can be re-arranged using the pm_subset
argument.
A ggplot
object.
data(gtex) fl <- flash(gtex, greedy_Kmax = 4L, backfit = FALSE) flash_plot_bar(fl, pm_colors = gtex_colors) # Tweaks are often required to get x-axis labels to look good: library(ggplot2) flash_plot_bar(fl, pm_colors = gtex_colors, labels = TRUE, ncol = 1) + theme(axis.text.x = element_text(size = 8, angle = 60))
data(gtex) fl <- flash(gtex, greedy_Kmax = 4L, backfit = FALSE) flash_plot_bar(fl, pm_colors = gtex_colors) # Tweaks are often required to get x-axis labels to look good: library(ggplot2) flash_plot_bar(fl, pm_colors = gtex_colors, labels = TRUE, ncol = 1) + theme(axis.text.x = element_text(size = 8, angle = 60))
Creates a heatmap of posterior means for factors or loadings
. Values are normalized so that the maximum absolute value
for each factor
or set of loadings
is equal to 1 (see
ldf.flash
).
flash_plot_heatmap( fl, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, gap = 1, ... )
flash_plot_heatmap( fl, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, gap = 1, ... )
fl |
An object inheriting from class |
order_by_pve |
If |
kset |
A vector of integers specifying the factor/loadings pairs to be
plotted. If |
pm_which |
Whether to plot loadings |
pm_subset |
A vector of row indices |
pm_groups |
A vector specifying the group to which each row of the data
|
pm_colors |
A character vector of length 1, 2, or 3 defining the
diverging color gradient (low-mid-high) to be used by the heatmap. The
midpoint is set at zero. If one or two colors are supplied, then the
"mid" color will be set to white. If one color is supplied, then the "low"
and "high" colors (used for, respectively, negative and positive posterior
means) will be the same. If two are supplied, then the "low" color should
be provided first, followed by the "high" color. If all three are supplied,
then the "low" color should be provided first, followed by the "mid" color,
followed by the "high" color provided. The default color gradient is
|
gap |
The horizontal spacing between groups. Ignored if |
... |
Additional parameters to be passed to
|
By default, a 1-d embedding is used to arrange the rows or columns
in a "smart" manner. This behavior can be overridden via argument
loadings_order
, which is passed to function
structure_plot
.
A ggplot
object.
Creates a histogram or sequence of histograms of posterior means for factors
or loadings
. One plot is created for each
value of
in
kset
. Values are normalized so that the
maximum absolute value for each factor or set of
loadings
is equal to 1 (see
ldf.flash
).
If pm_groups
is specified, then overlapping semi-transparent
histograms are created, with one histogram per group specified by
pm_groups
. This option works best when the number of groups is small
or when groups are well separated across components.
flash_plot_histogram( fl, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, binwidth = NULL, bins = NULL, alpha = 0.5, ... )
flash_plot_histogram( fl, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, binwidth = NULL, bins = NULL, alpha = 0.5, ... )
fl |
An object inheriting from class |
order_by_pve |
If |
kset |
A vector of integers specifying the factor/loadings pairs to be
plotted. If |
pm_which |
Whether to plot loadings |
pm_subset |
A vector of row indices |
pm_groups |
A vector specifying the group to which each row of the data
|
pm_colors |
A character vector specifying a color for each unique group
specified by |
binwidth |
The width of the bins (a numeric value). The default is to
use the number of bins in |
bins |
Number of bins. Overriden by |
alpha |
A transparency value between 0 (transparent) and 1 (opaque). |
... |
Additional arguments to be passed to
|
A ggplot
object.
Creates a scatter plot or sequence of scatter plots, with position along the
-axis defined by posterior means for factors
or loadings
and position along the
-axis defined by a
user-supplied covariate. If a covariate is not supplied, then plots will
use data column or row means,
or
. One plot is created for
each value of
in
kset
. Values are normalized so that the
maximum absolute value for each factor or set of
loadings
is equal to 1 (see
ldf.flash
).
flash_plot_scatter( fl, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, covariate = NULL, shape = 1, labels = FALSE, n_labels = 0, label_size = 3, max_overlaps = Inf, ... )
flash_plot_scatter( fl, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, covariate = NULL, shape = 1, labels = FALSE, n_labels = 0, label_size = 3, max_overlaps = Inf, ... )
fl |
An object inheriting from class |
order_by_pve |
If |
kset |
A vector of integers specifying the factor/loadings pairs to be
plotted. If |
pm_which |
Whether to plot loadings |
pm_subset |
A vector of row indices |
pm_groups |
A vector specifying the group to which each row of the data
|
pm_colors |
A character vector specifying a color for each unique group
specified by |
covariate |
A numeric vector with one value for each plotted row |
shape |
The symbol used for the plots' points. See
|
labels |
Whether to label the points with the largest (absolute)
posterior means. If |
n_labels |
A (nonnegative) integer. The number of points to label. If
|
label_size |
The size of the label text (in millimeters). |
max_overlaps |
A (nonnegative) integer. For each text label, the number of overlaps with other text labels or other data points are counted, and the text label is excluded if it has too many overlaps. |
... |
Additional arguments to be passed to
|
A ggplot
object.
data(gtex) fl <- flash(gtex, greedy_Kmax = 4L, backfit = FALSE) flash_plot_scatter(fl) # Label axes and points: library(ggplot2) flash_plot_scatter(fl, labels = TRUE, n_labels = 3) + labs(y = "mean z-score across all SNPs") # For the full range of labelling options provided by the ggrepel package, set # labels = FALSE (the default setting) and add geom_text_repel() manually: library(ggrepel) flash_plot_scatter(fl, labels = FALSE, n_labels = 3) + geom_text_repel(size = 2.5, min.segment.length = 0)
data(gtex) fl <- flash(gtex, greedy_Kmax = 4L, backfit = FALSE) flash_plot_scatter(fl) # Label axes and points: library(ggplot2) flash_plot_scatter(fl, labels = TRUE, n_labels = 3) + labs(y = "mean z-score across all SNPs") # For the full range of labelling options provided by the ggrepel package, set # labels = FALSE (the default setting) and add geom_text_repel() manually: library(ggrepel) flash_plot_scatter(fl, labels = FALSE, n_labels = 3) + geom_text_repel(size = 2.5, min.segment.length = 0)
A scree plot is a line plot showing the proportion of variance explained by each factor/loadings pair in a flash fit. Note that since EBMF does not require factors and loadings to be orthogonal, "PVE" should be interpreted loosely: for example, the total proportion of variance explained could be larger than 1.
flash_plot_scree( fl, order_by_pve = FALSE, kset = NULL, labels = FALSE, label_size = 3, max_overlaps = Inf )
flash_plot_scree( fl, order_by_pve = FALSE, kset = NULL, labels = FALSE, label_size = 3, max_overlaps = Inf )
fl |
An object inheriting from class |
order_by_pve |
If |
kset |
A vector of integers specifying the factor/loadings pairs to be
plotted. If |
labels |
Whether to label the points in the scree plot with the indices of the factor/loading pairs they correspond to. Labels appear as "k1", "k2", "k3", etc. |
label_size |
The size of the label text (in millimeters). |
max_overlaps |
A (nonnegative) integer. For each text label, the number of overlaps with other text labels or other data points are counted, and the text label is excluded if it has too many overlaps. |
Unlike scree plots for PCA, a scree plot for a flash fit is not in general
monotonically decreasing. To ensure a monotonically decreasing scree
plot, set order_by_pve = TRUE
. Note, however, that if this is done
then the numbers on the -axis will no longer match the indices of
the components in the flash fit. This can also be true if argument
kset
has been specified. Thus one should consider setting
labels = TRUE
when order_by_pve = TRUE
or when kset
has been specified.
A ggplot
object.
data(gtex) fl <- flash(gtex, greedy_Kmax = 4L, backfit = FALSE) flash_plot_scree(fl) # For the full range of labelling options provided by the ggrepel package, set # labels = FALSE (the default setting) and add geom_text_repel() manually: library(ggrepel) flash_plot_scree(fl) + geom_text_repel(min.segment.length = 0)
data(gtex) fl <- flash(gtex, greedy_Kmax = 4L, backfit = FALSE) flash_plot_scree(fl) # For the full range of labelling options provided by the ggrepel package, set # labels = FALSE (the default setting) and add geom_text_repel() manually: library(ggrepel) flash_plot_scree(fl) + geom_text_repel(min.segment.length = 0)
Creates a “structure plot” (stacked bar plot) of posterior means for factors
or loadings
. Different “topics” or components
(that is, the different factor/loadings pairs, as specified by
kset
)
are represented by different colors. Values are normalized so that the
maximum absolute value for each factor or set of
loadings
is equal to 1 and then stacked (see
ldf.flash
). Note that structure plots were designed for
nonnegative loadings or “memberships”; if posterior means are not
nonnegative then a different type of plot should be used (e.g.,
flash_plot_heatmap
). By default, a 1-d embedding is used to
arrange the rows or columns
. This step is usually essential
to creating a readable structure plot; for details, see
structure_plot
.
flash_plot_structure( fl, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, gap = 1, ... )
flash_plot_structure( fl, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, gap = 1, ... )
fl |
An object inheriting from class |
order_by_pve |
If |
kset |
A vector of integers specifying the factor/loadings pairs to be
plotted. If |
pm_which |
Whether to plot loadings |
pm_subset |
A vector of row indices |
pm_groups |
A vector specifying the group to which each row of the data
|
pm_colors |
The colors of the “topics” or components (factor/loadings pairs). |
gap |
The horizontal spacing between groups. Ignored if |
... |
Additional parameters to be passed to
|
A ggplot
object.
Used in a flash
pipeline to set the criterion for
determining whether a greedy fit or backfit has "converged."
flash_set_conv_crit(flash, fn = NULL, tol)
flash_set_conv_crit(flash, fn = NULL, tol)
flash |
A |
fn |
The convergence criterion function (see Details below). If
|
tol |
The tolerance parameter (see Details below). The default, which is
set when the |
Function flash_set_conv_crit
can be used to customize
the convergence criterion for a flash
object. This criterion
determines when to stop optimizing a newly added factor
(see flash_greedy
) and when to stop backfitting
(flash_backfit
). Note that, because most alternative
convergence criteria do not make sense in the context of a nullcheck, it
does not set the "convergence" criterion for flash_nullcheck
(for example, flash_conv_crit_max_chg_L
would simply return
the maximum -normalized loading for each set of loadings
).
The criterion is defined by the function supplied as argument to fn
,
which must accept exactly three parameters,
curr
, prev
, and k
. curr
refers to the
flash_fit
object from the current iteration; prev
,
to the flash_fit
object from the previous iteration;
and, if the iteration is a sequential backfitting iteration (that is, a
flash_backfit
iteration with argument
extrapolate = FALSE
), k
identifies the factor/loadings pair
that is currently being updated (in all other cases, k
is
NULL
). The function must output a numeric value; if the value is
less than or equal to tol
, then the fit is considered to have
"converged." The meaning of "convergence" here varies according to the
operation being performed.
In the greedy algorithm, fn
simply compares the fit from
one iteration to the next. During a backfit, it similarly compares fits from
one iteration to the next, but it only considers the fit to have
converged when the value of fn
over successive updates to
all factor/loadings pairs is less than or equal to tol
. If,
for example, factor/loadings pairs are being
sequentially backfitted, then fits are compared before and
after the update to factor/loadings 1, before and after the update to
factor/loadings 2, and so on through factor/loadings
,
and backfitting only terminates when
fn
returns a value less
than or equal to tol
for all updates.
Package flashier
provides a number of functions that may be supplied
as convergence criteria: see
flash_conv_crit_elbo_diff
(the default criterion),
flash_conv_crit_max_chg
,
flash_conv_crit_max_chg_L
, and
flash_conv_crit_max_chg_F
. Custom functions may also be
defined. Typically, they will compare the fit in curr
(the current
iteration) to the fit in prev
(the previous iteration).
To facilitate working with flash_fit
objects, package
flashier
provides a number of accessors, which are enumerated in
the documentation for object flash_fit
. Custom functions
should return a numeric value that can be compared against tol
; see
Examples below.
The flash
object from argument flash
, with the
new convergence criterion reflected in updates to the "internal"
flash_fit
object. These settings will persist across
all subsequent calls to flash_xxx
functions in the same
flash
pipeline (unless, of course, flash_set_conv_crit
is
again called within the same pipeline).
fl <- flash_init(gtex) |> flash_set_conv_crit(flash_conv_crit_max_chg, tol = 1e-3) |> flash_set_verbose( verbose = 3, fns = flash_verbose_max_chg, colnames = "Max Chg", colwidths = 20 ) |> flash_greedy(Kmax = 3)
fl <- flash_init(gtex) |> flash_set_conv_crit(flash_conv_crit_max_chg, tol = 1e-3) |> flash_set_verbose( verbose = 3, fns = flash_verbose_max_chg, colnames = "Max Chg", colwidths = 20 ) |> flash_greedy(Kmax = 3)
Used in a flash
pipeline to set a maximum amount of fitting
time. Note that timeout conditions are only checked during greedy fits
and backfits, so that the total amount of fitting time can exceed the time
set by flash_set_timeout
(especially if, for example, there is a
nullcheck involving many factor/loading pairs). Also note that timeout
conditions must be cleared using function flash_clear_timeout
before any re-fitting is attempted.
flash_set_timeout( flash, tim, units = c("hours", "secs", "mins", "days", "weeks") )
flash_set_timeout( flash, tim, units = c("hours", "secs", "mins", "days", "weeks") )
flash |
A |
tim |
A numeric value giving the maximum amount of fitting time, with
the units of time specified by parameter |
units |
The units of time according to which parameter |
The flash
object from argument flash
, with the
timeout settings reflected in updates to the "internal" flash_fit
object. These settings will persist across all subsequent calls to
flash_xxx
functions until they are modified either by
flash_clear_timeout
or by another call to
flash_set_timeout
.
fl <- flash_init(gtex) |> flash_set_timeout(1, "secs") |> flash_greedy(Kmax = 30) |> flash_backfit() |> flash_nullcheck() |> flash_clear_timeout() # Always clear timeout at the end of a pipeline.
fl <- flash_init(gtex) |> flash_set_timeout(1, "secs") |> flash_greedy(Kmax = 30) |> flash_backfit() |> flash_nullcheck() |> flash_clear_timeout() # Always clear timeout at the end of a pipeline.
Used in a flash
pipeline to set the output that will be printed
after each greedy or backfitting iteration.
flash_set_verbose( flash, verbose = 1L, fns = NULL, colnames = NULL, colwidths = NULL )
flash_set_verbose( flash, verbose = 1L, fns = NULL, colnames = NULL, colwidths = NULL )
flash |
A |
verbose |
When and how to display progress updates. Set to A single tab-delimited table of values may also be output using
option |
fns |
A vector of functions. Used to calculate values to display
after each greedy/backfit iteration when |
colnames |
A vector of column names, one for each function in
|
colwidths |
A vector of column widths, one for each function in
|
Function flash_set_verbose
can be used to customize
the output that is printed to console while fitting a flash
object.
After each greedy or backfitting iteration (see, respectively,
flash_greedy
and flash_backfit
), each
function in argument fns
is successively evaluated and the
result is printed to console in a table with column names defined by
argument colnames
and column widths defined by argument
colwidths
.
Each function in fns
must accept exactly three parameters,
curr
, prev
, and k
: curr
refers to the
flash_fit
object from the current iteration; prev
,
to the flash_fit
object from the previous iteration;
and, if the iteration is a sequential backfitting iteration (that is, a
flash_backfit
iteration with argument
extrapolate = FALSE
), k
identifies the factor/loadings pair
that is currently being updated (in all other cases, k
is
NULL
). Package flashier
provides a number of functions that
may be used to customize output: see
flash_verbose_elbo
,
flash_verbose_elbo_diff
,
flash_verbose_max_chg
,
flash_verbose_max_chg_L
, and
flash_verbose_max_chg_F
. Custom functions may also be
defined. They might inspect the current flash_fit
object
via argument curr
; compare the fit in curr
to the fit from the
previous iteration (provided by argument prev
); or
ignore both flash_fit
objects entirely (for example, to
track progress over time, one might simply call Sys.time
).
To facilitate working with flash_fit
objects, package
flashier
provides a number of accessors, which are enumerated in
the documentation for object flash_fit
. Custom functions
should return a character string that contains the output exactly as it is
to displayed; see Examples below.
The flash
object from argument flash
, with the
new verbose settings reflected in updates to the "internal"
flash_fit
object. These settings will persist across
all subsequent calls to flash_xxx
functions until they are modified
by another call to flash_set_verbose
.
# Suppress all verbose output. fl <- flash_init(gtex) |> flash_set_verbose(0) |> flash_greedy(Kmax = 5) # Set custom verbose output. sparsity_F <- function(curr, prev, k) { g_F <- flash_fit_get_g(curr, n = 2) g_F_pi0 <- g_F$pi[1] # Mixture weight of the "null" component. return(g_F_pi0) } verbose_fns <- c(flash_verbose_elbo, flash_verbose_max_chg_F, sparsity_F) colnames <- c("ELBO", "Max Chg (Tiss)", "Sparsity (Tiss)") colwidths <- c(12, 18, 18) fl <- flash_init(gtex) |> flash_set_verbose( verbose = 3, fns = verbose_fns, colnames = colnames, colwidths = colwidths ) |> flash_greedy(Kmax = 3) # Output can be changed as needed. fl <- flash_init(gtex) |> flash_set_verbose(verbose = 1) |> flash_greedy(Kmax = 5L) |> flash_backfit(verbose = 3) |> flash_greedy(Kmax = 1L)
# Suppress all verbose output. fl <- flash_init(gtex) |> flash_set_verbose(0) |> flash_greedy(Kmax = 5) # Set custom verbose output. sparsity_F <- function(curr, prev, k) { g_F <- flash_fit_get_g(curr, n = 2) g_F_pi0 <- g_F$pi[1] # Mixture weight of the "null" component. return(g_F_pi0) } verbose_fns <- c(flash_verbose_elbo, flash_verbose_max_chg_F, sparsity_F) colnames <- c("ELBO", "Max Chg (Tiss)", "Sparsity (Tiss)") colwidths <- c(12, 18, 18) fl <- flash_init(gtex) |> flash_set_verbose( verbose = 3, fns = verbose_fns, colnames = colnames, colwidths = colwidths ) |> flash_greedy(Kmax = 3) # Output can be changed as needed. fl <- flash_init(gtex) |> flash_set_verbose(verbose = 1) |> flash_greedy(Kmax = 5L) |> flash_backfit(verbose = 3) |> flash_greedy(Kmax = 1L)
Replaces the data in a flash object with a new set of observations. Estimates of residual variances and the ELBO are also updated.
flash_update_data(flash, newdata, Y2_diff = NULL)
flash_update_data(flash, newdata, Y2_diff = NULL)
flash |
A |
newdata |
The new observations. Can be a matrix, a sparse matrix of
class |
Y2_diff |
Optionally, users can supply the (summed) changes in the
squared values of the data |
The flash
object from argument flash
, with
the data modified as specified by newdata
. Residual variances and
ELBO are also updated.
Displays the value of the variational lower bound (ELBO) at the current iteration.
flash_verbose_elbo(curr, prev, k)
flash_verbose_elbo(curr, prev, k)
curr |
The |
prev |
The |
k |
Only used during sequential backfits (that is, calls to
|
This function is an example of a function that may be passed to
parameter fns
in function flash_set_verbose
to
customize the output that is printed after each greedy or backfitting
iteration. See flash_set_verbose
for details and examples.
A character string, suitable for printing progress updates.
flash_verbose_elbo_diff
,
flash_verbose_max_chg
,
flash_verbose_max_chg_L
,
flash_verbose_max_chg_F
Displays the difference in the variational lower bound (ELBO) from one iteration to the next.
flash_verbose_elbo_diff(curr, prev, k)
flash_verbose_elbo_diff(curr, prev, k)
curr |
The |
prev |
The |
k |
Only used during sequential backfits (that is, calls to
|
This function is an example of a function that may be passed to
parameter fns
in function flash_set_verbose
to
customize the output that is printed after each greedy or backfitting
iteration. See flash_set_verbose
for details and examples.
A character string, suitable for printing progress updates.
flash_verbose_elbo
, flash_verbose_max_chg
,
flash_verbose_max_chg_L
, flash_verbose_max_chg_F
Displays the maximum (absolute) change over all (posterior expected values for)
loadings and factors
. At each iteration, the
loadings vectors
and factors
are
-normalized.
flash_verbose_max_chg(curr, prev, k)
flash_verbose_max_chg(curr, prev, k)
curr |
The |
prev |
The |
k |
Only used during sequential backfits (that is, calls to
|
This function is an example of a function that may be passed to
parameter fns
in function flash_set_verbose
to
customize the output that is printed after each greedy or backfitting
iteration. See flash_set_verbose
for details and examples.
A character string, suitable for printing progress updates.
flash_verbose_elbo
, flash_verbose_elbo_diff
,
flash_verbose_max_chg_L
, flash_verbose_max_chg_F
Displays the maximum (absolute) change over all (posterior expected values for)
factors . At each iteration, the factors
are
-normalized.
flash_verbose_max_chg_F(curr, prev, k)
flash_verbose_max_chg_F(curr, prev, k)
curr |
The |
prev |
The |
k |
Only used during sequential backfits (that is, calls to
|
This function is an example of a function that may be passed to
parameter fns
in function flash_set_verbose
to
customize the output that is printed after each greedy or backfitting
iteration. See flash_set_verbose
for details and examples.
A character string, suitable for printing progress updates.
flash_verbose_elbo
, flash_verbose_elbo_diff
,
flash_verbose_max_chg
, flash_verbose_max_chg_L
Displays the maximum (absolute) change over all (posterior expected values for)
loadings . At each iteration, the loadings vectors
are
-normalized.
flash_verbose_max_chg_L(curr, prev, k)
flash_verbose_max_chg_L(curr, prev, k)
curr |
The |
prev |
The |
k |
Only used during sequential backfits (that is, calls to
|
This function is an example of a function that may be passed to
parameter fns
in function flash_set_verbose
to
customize the output that is printed after each greedy or backfitting
iteration. See flash_set_verbose
for details and examples.
A character string, suitable for printing progress updates.
flash_verbose_elbo
, flash_verbose_elbo_diff
,
flash_verbose_max_chg
, flash_verbose_max_chg_F
Derived from data made available by the Genotype Tissue
Expression (GTEx) project (Lonsdale et al. 2013),
which provides -scores for assessing the significance of effects of
genetic variants (single nucleotide polymorphisms, or SNPs) on gene
expression across 44 human tissues. To reduce the data to a more manageable
size, Urbut et al. (2019) chose the "top" SNP for
each gene — that is, the SNP associated with the largest (absolute)
-score over all 44 tissues. This yields a
matrix
of
-scores, with rows corresponding to SNP-gene pairs and columns
corresponding to tissues. The dataset included here
is further subsampled down to 1000 rows.
gtex
is a matrix with 1000 rows and 44 columns, with rows
corresponding to SNP-gene pairs and columns corresponding to tissues.
https://github.com/stephenslab/gtexresults/blob/master/data/MatrixEQTLSumStats.Portable.Z.rds
Lonsdale et al. (2013). "The Genotype-Tissue Expression (GTEx) project." Nature Genetics 45(6), 580–585.
Urbut, Wang, Carbonetto, and Stephens (2019). "Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions." Nature Genetics 51(1), 187–195.
data(gtex) summary(gtex)
data(gtex) summary(gtex)
A custom palette used by Wang and Stephens (2021) to plot an
empirical Bayes matrix factorization of data from the GTEx project
(of which the gtex
data in package flashier is a
subsample).
The palette is designed to link similar tissues together visually. For
example, brain tissues all have the same color (yellow); arterial tissues
are shades of pink or red; etc.
gtex_colors
is a named vector of length 44, with names corresponding
to tissues (columns) in the gtex
dataset and values
giving hexadecimal color codes.
https://github.com/stephenslab/gtexresults/blob/master/data/GTExColors.txt
Wei Wang and Matthew Stephens (2021). "Empirical Bayes matrix factorization." Journal of Machine Learning Research 22, 1–40.
fl <- flash(gtex, greedy_Kmax = 4) plot(fl, pm_colors = gtex_colors)
fl <- flash(gtex, greedy_Kmax = 4) plot(fl, pm_colors = gtex_colors)
Given a flash
or flash_fit
object, returns the LDF
decomposition .
ldf(object, type) ## S3 method for class 'flash' ldf(object, type = "f") ## S3 method for class 'flash_fit' ldf(object, type = "f")
ldf(object, type) ## S3 method for class 'flash' ldf(object, type = "f") ## S3 method for class 'flash_fit' ldf(object, type = "f")
object |
An object inheriting from class |
type |
Takes identical arguments to function |
When the prior families and
are closed
under scaling (as is typically the case), then the EBMF model (as
described in the documention to function
flash
) is only
identifiable up to scaling by a diagonal matrix :
Method ldf
scales columns and
so that, depending on the argument to parameter
type
, their
1-norms, 2-norms, or infinity norms are equal to 1.
A list with fields L
, D
, and F
, each of which
corresponds to one of the matrices in the decomposition
(with the columns of
and
scaled according to
argument
type
). Note that D
is returned as a vector rather
than a matrix (the vector of diagonal entries in ). Thus, "fitted
values"
can be recovered as
L %*% diag(D) %*% t(F)
.
ldf(flash)
: LDF decomposition for flash
objects
ldf(flash_fit)
: LDF decomposition for flash_fit
objects
Plots a flash
object. Several types of plot are possible:
see parameter plot_type
below as well as functions
flash_plot_scree
, flash_plot_bar
,
flash_plot_heatmap
, flash_plot_histogram
,
flash_plot_scatter
, and flash_plot_structure
.
## S3 method for class 'flash' plot( x, include_scree = TRUE, include_pm = TRUE, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, plot_type = c("scree", "bar", "heatmap", "histogram", "scatter", "structure"), ... )
## S3 method for class 'flash' plot( x, include_scree = TRUE, include_pm = TRUE, order_by_pve = FALSE, kset = NULL, pm_which = c("factors", "loadings"), pm_subset = NULL, pm_groups = NULL, pm_colors = NULL, plot_type = c("scree", "bar", "heatmap", "histogram", "scatter", "structure"), ... )
x |
An object inheriting from class |
include_scree |
This parameter has been deprecated; please use
|
include_pm |
This parameter has been deprecated; please use
|
order_by_pve |
If |
kset |
A vector of integers specifying the factor/loadings pairs to be
plotted. If |
pm_which |
Whether to plot loadings |
pm_subset |
A vector of row indices |
pm_groups |
A vector specifying the group to which each row of the data
|
pm_colors |
A character vector specifying a color for each unique group
specified by |
plot_type |
The type of plot to return. Options include:
|
... |
Additional parameters to be passed to respective
|
A ggplot
object.
Given a flash
object, returns the expected residuals
.
## S3 method for class 'flash' residuals(object, ...)
## S3 method for class 'flash' residuals(object, ...)
object |
An object inheriting from class |
... |
Additional parameters are ignored. |
The matrix of expected residuals.
Given a flash_fit
object, returns the expected residuals
.
## S3 method for class 'flash_fit' residuals(object, ...)
## S3 method for class 'flash_fit' residuals(object, ...)
object |
An object inheriting from class |
... |
Additional parameters are ignored. |
The matrix of expected residuals.