Title: | Estimate Graph Dimension using Cross-Validated Eigenvalues |
---|---|
Description: | Cross-validated eigenvalues are estimated by splitting a graph into two parts, the training and the test graph. The training graph is used to estimate eigenvectors, and the test graph is used to evaluate the correlation between the training eigenvectors and the eigenvectors of the test graph. The correlations follow a simple central limit theorem that can be used to estimate graph dimension via hypothesis testing, see Chen et al. (2021) <arXiv:2108.03336> for details. |
Authors: | Fan Chen [aut] , Alex Hayes [cre, aut, cph] , Karl Rohe [aut] |
Maintainer: | Alex Hayes <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0.9000 |
Built: | 2024-11-05 06:14:20 UTC |
Source: | https://github.com/rohelab/gdim |
Estimate graph dimension via eigenvalue cross-validation (EigCV).
A graph has dimension k
if the first k
eigenvectors of its adjacency
matrix are correlated with its population eigenspace, and the others are not.
Edge bootstrapping sub-samples the edges of the graph (without replacement).
Edge splitting separates the edges into a training part and a testing part.
eigcv( A, k_max, ..., num_bootstraps = 10, test_portion = 0.1, alpha = 0.05, method = c("none", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr"), laplacian = FALSE, regularize = TRUE )
eigcv( A, k_max, ..., num_bootstraps = 10, test_portion = 0.1, alpha = 0.05, method = c("none", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr"), laplacian = FALSE, regularize = TRUE )
A |
The adjacency matrix of graph. Must be non-negative and integer valued. |
k_max |
The maximum dimension of the graph to consider. This many
eigenvectors are computed. Should be a non-negative integer smallish
relative the dimensions of |
... |
Ignored. |
num_bootstraps |
The number of times to bootstrap the graph. Since
cross-validated eigenvalues are based on a random graph split, they
are themselves random. By repeatedly computing cross-validated eigenvalues
for different sample splits, the idea is to smooth away some of the
randomness due to the graph splits. A small number of bootstraps
(3 to 10) usually suffices. Defaults to |
test_portion |
The portion of the graph to put into the test graph,
as opposed to the training graph. Defaults to |
alpha |
Significance level for hypothesis tests. Each dimension
|
method |
Method to adjust p-values for multiple testing. Must be
one of |
laplacian |
Logical value indicating where to compute cross-validated
eigenvalues for the degree-normalize graph Laplacian rather than the
graph adjacency matrix. Experimental and should be used with caution.
Defaults to |
regularize |
Only applicable when |
A eigcv
object, which is a list with the following named
elements.
estimated_dimension
: inferred graph dimension.
summary
: summary table of the tests.
num_bootstraps
: number of bootstraps performed.
test_portion
: graph splitting probability used.
alpha
: significance level of each test.
library(fastRG) set.seed(27) B <- matrix(0.1, 5, 5) diag(B) <- 0.3 model <- sbm( n = 1000, k = 5, B = B, expected_degree = 40, poisson_edges = FALSE, allow_self_loops = FALSE ) A <- sample_sparse(model) eigs<- eigcv(A, k_max = 10) eigs plot(eigs, type = "z-score") # default plot(eigs, type = "adjacency") plot(eigs, type = "laplacian")
library(fastRG) set.seed(27) B <- matrix(0.1, 5, 5) diag(B) <- 0.3 model <- sbm( n = 1000, k = 5, B = B, expected_degree = 40, poisson_edges = FALSE, allow_self_loops = FALSE ) A <- sample_sparse(model) eigs<- eigcv(A, k_max = 10) eigs plot(eigs, type = "z-score") # default plot(eigs, type = "adjacency") plot(eigs, type = "laplacian")
Plot cross-validated eigenvalues
## S3 method for class 'eigcv' plot(x, type = c("z-score", "adjacency", "laplacian"), threshold = 2, ...)
## S3 method for class 'eigcv' plot(x, type = c("z-score", "adjacency", "laplacian"), threshold = 2, ...)
x |
An |
type |
Specifies what to plot. Must be one of the following options:
|
threshold |
Only used when |
... |
Ignored. |
A ggplot2
object.
library(fastRG) set.seed(27) B <- matrix(0.1, 5, 5) diag(B) <- 0.3 model <- sbm( n = 1000, k = 5, B = B, expected_degree = 40, poisson_edges = FALSE, allow_self_loops = FALSE ) A <- sample_sparse(model) eigs<- eigcv(A, k_max = 10) eigs plot(eigs, type = "z-score") # default plot(eigs, type = "adjacency") plot(eigs, type = "laplacian")
library(fastRG) set.seed(27) B <- matrix(0.1, 5, 5) diag(B) <- 0.3 model <- sbm( n = 1000, k = 5, B = B, expected_degree = 40, poisson_edges = FALSE, allow_self_loops = FALSE ) A <- sample_sparse(model) eigs<- eigcv(A, k_max = 10) eigs plot(eigs, type = "z-score") # default plot(eigs, type = "adjacency") plot(eigs, type = "laplacian")
Print cross-validated eigenvalues
## S3 method for class 'eigcv' print(x, ...)
## S3 method for class 'eigcv' print(x, ...)
x |
An |
... |
Ignored. |
x
, but invisibly.
library(fastRG) set.seed(27) B <- matrix(0.1, 5, 5) diag(B) <- 0.3 model <- sbm( n = 1000, k = 5, B = B, expected_degree = 40, poisson_edges = FALSE, allow_self_loops = FALSE ) A <- sample_sparse(model) eigs<- eigcv(A, k_max = 10) eigs plot(eigs, type = "z-score") # default plot(eigs, type = "adjacency") plot(eigs, type = "laplacian")
library(fastRG) set.seed(27) B <- matrix(0.1, 5, 5) diag(B) <- 0.3 model <- sbm( n = 1000, k = 5, B = B, expected_degree = 40, poisson_edges = FALSE, allow_self_loops = FALSE ) A <- sample_sparse(model) eigs<- eigcv(A, k_max = 10) eigs plot(eigs, type = "z-score") # default plot(eigs, type = "adjacency") plot(eigs, type = "laplacian")