Title: | Vintage Sparse PCA for Semi-Parametric Factor Analysis |
---|---|
Description: | Provides fast spectral estimation of latent factors in random dot product graphs using the vsp estimator. Under mild assumptions, the vsp estimator is consistent for (degree-corrected) stochastic blockmodels, (degree-corrected) mixed-membership stochastic blockmodels, and degree-corrected overlapping stochastic blockmodels. |
Authors: | Karl Rohe [aut], Muzhe Zeng [aut], Alex Hayes [aut, cre, cph] , Fan Chen [aut] |
Maintainer: | Alex Hayes <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2 |
Built: | 2025-01-04 05:57:36 UTC |
Source: | https://github.com/rohelab/vsp |
Find features most associated with cluster membership
bff(loadings, features, num_best)
bff(loadings, features, num_best)
loadings |
An |
features |
An |
num_best |
An integer indicating how many of the top features for differentiating between loadings you want. |
See vignette("bff")
.
An n
by k
matrix whose [i, j]
entry is the
ith "most important" feature for cluster j.
Add Z factor loadings to node table of tidygraph
bind_varimax_z(graph, fa, ...) bind_varimax_y(graph, fa, ...) bind_svd_u(graph, fa, ...) bind_svd_v(graph, fa, ...)
bind_varimax_z(graph, fa, ...) bind_varimax_y(graph, fa, ...) bind_svd_u(graph, fa, ...) bind_svd_v(graph, fa, ...)
graph |
A tidygraph::tbl_graph object. |
fa |
Optionally, a vsp object to extract varimax loadings from. If you do not passed a vsp object, one will be created. |
... |
Arguments passed on to
|
The same graph
object with columns factor1
, ..., factor{rank}
in the table of node information.
bind_varimax_y()
: Add Y factor loadings to node table of tidygraph
bind_svd_u()
: Add left singular vectors to node table of tidygraph
bind_svd_v()
: Add right singular vectors to node table of tidygraph
Get left singular vectors in a tibble
get_svd_u(fa, factors = 1:fa$rank) get_svd_v(fa, factors = 1:fa$rank) get_varimax_z(fa, factors = 1:fa$rank) get_varimax_y(fa, factors = 1:fa$rank)
get_svd_u(fa, factors = 1:fa$rank) get_svd_v(fa, factors = 1:fa$rank) get_varimax_z(fa, factors = 1:fa$rank) get_varimax_y(fa, factors = 1:fa$rank)
fa |
A |
factors |
The specific columns to index into. The most reliable option here is to index with an integer vector of column indices, but you could also use a character vector if columns have been named. By default returns all factors/singular vectors. |
A tibble::tibble()
with one row for each node, and one column
containing each of the requested factor or singular vector, plus
an additional id
column.
get_svd_v()
: Get right singular vectors in a tibble
get_varimax_z()
: Get varimax Y factors in a tibble
get_varimax_y()
: Get varimax Z factors in a tibble
data(enron, package = "igraphdata") fa <- vsp(enron, rank = 30) fa get_svd_u(fa) get_svd_v(fa) get_varimax_z(fa) get_varimax_y(fa)
data(enron, package = "igraphdata") fa <- vsp(enron, rank = 30) fa get_svd_u(fa) get_svd_v(fa) get_varimax_z(fa) get_varimax_y(fa)
Get most important hubs for each Z factor
get_z_hubs(fa, hubs_per_factor = 10, factors = 1:fa$rank) get_y_hubs(fa, hubs_per_factor = 10, factors = 1:fa$rank)
get_z_hubs(fa, hubs_per_factor = 10, factors = 1:fa$rank) get_y_hubs(fa, hubs_per_factor = 10, factors = 1:fa$rank)
fa |
A |
hubs_per_factor |
The number of important nodes to get per
latent factor. Defaults to |
factors |
The specific columns to index into. The most reliable option here is to index with an integer vector of column indices, but you could also use a character vector if columns have been named. By default returns all factors/singular vectors. |
A tibble::tibble()
where each row corresponds to a single
hub, and three columns:
id
: Node id of hub node
factor
: Which factor that node is a hub for. Nodes can be hubs
of multiple factors.
loading
: The actual value of the hubs factor loading for that factor.
get_y_hubs()
: Get most important hubs for each Y factor
data(enron, package = "igraphdata") fa <- vsp(enron, rank = 30) fa get_z_hubs(fa) get_y_hubs(fa)
data(enron, package = "igraphdata") fa <- vsp(enron, rank = 30) fa get_z_hubs(fa) get_y_hubs(fa)
When IPR for a given singular vector is O(1) rather than O(1 / sqrt(n)), this can indicate that the singular vector is localizing on a small subset of nodes. Oftentimes this localization indicates overfitting. If you see IPR values that are not close to zero (where "close to zero" is something you sort of have to pick up over time), then you need to some further investigation to see if you have localization and that localization corresponds to overfitting. Note, however, that not all localization is overfitting.
plot_ipr_pairs(fa)
plot_ipr_pairs(fa)
fa |
A |
A tibble::tibble()
with one row for each node, and one column
containing each of the requested factor or singular vector, plus
an additional id
column.
Plot the mixing matrix B
plot_mixing_matrix(fa)
plot_mixing_matrix(fa)
fa |
A |
A tibble::tibble()
with one row for each node, and one column
containing each of the requested factor or singular vector, plus
an additional id
column.
To avoid overplotting, plots data for a maximum of 1000 nodes. If there are more than 1000 nodes, samples 1000 nodes randomly proportional to row norms (i.e. nodes with embeddings larger in magniture are more likely to be sampled).
plot_varimax_z_pairs(fa, factors = 1:min(5, fa$rank), ...) plot_varimax_y_pairs(fa, factors = 1:min(5, fa$rank), ...) plot_svd_u(fa, factors = 1:min(5, fa$rank)) plot_svd_v(fa, factors = 1:min(5, fa$rank))
plot_varimax_z_pairs(fa, factors = 1:min(5, fa$rank), ...) plot_varimax_y_pairs(fa, factors = 1:min(5, fa$rank), ...) plot_svd_u(fa, factors = 1:min(5, fa$rank)) plot_svd_v(fa, factors = 1:min(5, fa$rank))
fa |
A |
factors |
The specific columns to index into. The most reliable option here is to index with an integer vector of column indices, but you could also use a character vector if columns have been named. By default returns all factors/singular vectors. |
... |
Arguments passed on to
|
A ggplot2::ggplot()
plot or GGally::ggpairs()
plot.
plot_varimax_y_pairs()
: Create a pairs plot of select Z factors
plot_svd_u()
: Create a pairs plot of select left singular vectors
plot_svd_v()
: Create a pairs plot of select right singular vectors
data(enron, package = "igraphdata") fa <- vsp(enron, rank = 3) plot_varimax_z_pairs(fa) plot_varimax_y_pairs(fa) plot_svd_u(fa) plot_svd_v(fa) screeplot(fa) plot_mixing_matrix(fa) plot_ipr_pairs(fa)
data(enron, package = "igraphdata") fa <- vsp(enron, rank = 3) plot_varimax_z_pairs(fa) plot_varimax_y_pairs(fa) plot_svd_u(fa) plot_svd_v(fa) screeplot(fa) plot_mixing_matrix(fa) plot_ipr_pairs(fa)
Create a screeplot from a factor analysis object
## S3 method for class 'vsp_fa' screeplot(x, ...)
## S3 method for class 'vsp_fa' screeplot(x, ...)
x |
A |
... |
Ignored, included only for consistency with S3 generic. |
A tibble::tibble()
with one row for each node, and one column
containing each of the requested factor or singular vector, plus
an additional id
column.
Give the dimensions of Z factors informative names
set_z_factor_names(fa, names) set_y_factor_names(fa, names)
set_z_factor_names(fa, names) set_y_factor_names(fa, names)
fa |
A |
names |
Describe new names for Z/Y factors. |
A new vsp_fa()
object, but the columns names of Z
and the
row names of B
have been set to names
(for set_z_factor_names
),
and the column names of B
and the column names of Y
have been
set to names
(for set_y_factor_names
).
set_y_factor_names()
: Give the dimensions of Y factors informative names
This code implements TODO.
vsp(x, rank, ...) ## Default S3 method: vsp(x, rank, ...) ## S3 method for class 'matrix' vsp( x, rank, ..., center = FALSE, recenter = FALSE, degree_normalize = TRUE, renormalize = FALSE, tau_row = NULL, tau_col = NULL, kaiser_normalize_u = FALSE, kaiser_normalize_v = FALSE, rownames = NULL, colnames = NULL, match_columns = TRUE ) ## S3 method for class 'Matrix' vsp( x, rank, ..., center = FALSE, recenter = FALSE, degree_normalize = TRUE, renormalize = FALSE, tau_row = NULL, tau_col = NULL, kaiser_normalize_u = FALSE, kaiser_normalize_v = FALSE, rownames = NULL, colnames = NULL, match_columns = TRUE ) ## S3 method for class 'dgCMatrix' vsp( x, rank, ..., center = FALSE, recenter = FALSE, degree_normalize = TRUE, renormalize = FALSE, tau_row = NULL, tau_col = NULL, kaiser_normalize_u = FALSE, kaiser_normalize_v = FALSE, rownames = NULL, colnames = NULL, match_columns = TRUE ) ## S3 method for class 'igraph' vsp(x, rank, ..., edge_weights = NULL)
vsp(x, rank, ...) ## Default S3 method: vsp(x, rank, ...) ## S3 method for class 'matrix' vsp( x, rank, ..., center = FALSE, recenter = FALSE, degree_normalize = TRUE, renormalize = FALSE, tau_row = NULL, tau_col = NULL, kaiser_normalize_u = FALSE, kaiser_normalize_v = FALSE, rownames = NULL, colnames = NULL, match_columns = TRUE ) ## S3 method for class 'Matrix' vsp( x, rank, ..., center = FALSE, recenter = FALSE, degree_normalize = TRUE, renormalize = FALSE, tau_row = NULL, tau_col = NULL, kaiser_normalize_u = FALSE, kaiser_normalize_v = FALSE, rownames = NULL, colnames = NULL, match_columns = TRUE ) ## S3 method for class 'dgCMatrix' vsp( x, rank, ..., center = FALSE, recenter = FALSE, degree_normalize = TRUE, renormalize = FALSE, tau_row = NULL, tau_col = NULL, kaiser_normalize_u = FALSE, kaiser_normalize_v = FALSE, rownames = NULL, colnames = NULL, match_columns = TRUE ) ## S3 method for class 'igraph' vsp(x, rank, ..., edge_weights = NULL)
x |
Either a graph adjacency matrix, igraph::igraph or
tidygraph::tbl_graph. If |
rank |
The number of factors to calculate. |
... |
These dots are for future extensions and must be empty. |
center |
Should the adjacency matrix be row and column centered?
Defaults to |
recenter |
Should the varimax factors be re-centered around the
original factor means? Only used when |
degree_normalize |
Should the regularized graph laplacian be used instead of the
raw adjacency matrix? Defaults to |
renormalize |
Should the regularized graph laplacian be used instead of the
raw adjacency matrix? Defaults to |
tau_row |
Row regularization term. Default is |
tau_col |
Column regularization term. Default is |
kaiser_normalize_u |
Whether or not to use Kaiser normalization
when rotating the left singular vectors |
kaiser_normalize_v |
Whether or not to use Kaiser normalization
when rotating the right singular vectors |
rownames |
Character vector of row names of |
colnames |
Character vector of column names of |
match_columns |
Should the columns of |
edge_weights |
When |
Sparse SVDs use RSpectra
for performance.
An object of class vsp
. TODO: Details
library(LRMF3) vsp(ml100k, rank = 2)
library(LRMF3) vsp(ml100k, rank = 2)
vsp_fa
objects are a subclass of LRMF3::fa_like()
, with additional
fields u
, d
, v
, transformers
, R_U
, and R_V
vsp_fa( u, d, v, Z, B, Y, transformers, R_U, R_V, rownames = NULL, colnames = NULL )
vsp_fa( u, d, v, Z, B, Y, transformers, R_U, R_V, rownames = NULL, colnames = NULL )
u |
A |
d |
A |
v |
A |
Z |
A matrix of embeddings for each observation. |
B |
A mixing matrix describing how observation embeddings and topics interact. Does not have to be diagonal! |
Y |
A matrix describing the compositions of various topics or factors. |
transformers |
A list of transformations from the |
R_U |
Varimax rotation matrix use to transform |
R_V |
Varimax rotation matrix use to transform |
rownames |
Identifying names for each row of the original
data. Defaults to |
colnames |
Identifying names for each column of the original
data. Defaults to |
A svd_fa
object.
Perform varimax rotation on a low rank matrix factorization
## S3 method for class 'svd_like' vsp( x, rank, ..., centerer = NULL, scaler = NULL, recenter = FALSE, renormalize = FALSE, kaiser_normalize_u = FALSE, kaiser_normalize_v = FALSE, rownames = NULL, colnames = NULL, match_columns = TRUE )
## S3 method for class 'svd_like' vsp( x, rank, ..., centerer = NULL, scaler = NULL, recenter = FALSE, renormalize = FALSE, kaiser_normalize_u = FALSE, kaiser_normalize_v = FALSE, rownames = NULL, colnames = NULL, match_columns = TRUE )
x |
Either a graph adjacency matrix, igraph::igraph or
tidygraph::tbl_graph. If |
rank |
The number of factors to calculate. |
... |
These dots are for future extensions and must be empty. |
centerer |
TODO |
scaler |
TODO |
recenter |
Should the varimax factors be re-centered around the
original factor means? Only used when |
renormalize |
Should the regularized graph laplacian be used instead of the
raw adjacency matrix? Defaults to |
kaiser_normalize_u |
Whether or not to use Kaiser normalization
when rotating the left singular vectors |
kaiser_normalize_v |
Whether or not to use Kaiser normalization
when rotating the right singular vectors |
rownames |
Character vector of row names of |
colnames |
Character vector of column names of |
match_columns |
Should the columns of |
library(LRMF3) library(RSpectra) s <- svds(ml100k, k = 2) mf <- as_svd_like(s) fa <- vsp(mf, rank = 2)
library(LRMF3) library(RSpectra) s <- svds(ml100k, k = 2) mf <- as_svd_like(s) fa <- vsp(mf, rank = 2)