Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

Topological data analysis of human brain networks through order statistics

  • Soumya Das,

    Roles Formal analysis, Methodology, Validation, Writing – original draft

    Affiliation Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States of America

  • D. Vijay Anand,

    Roles Visualization, Writing – original draft

    Affiliation Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States of America

  • Moo K. Chung

    Roles Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing

    mkchung@wisc.edu

    Affiliation Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States of America

Abstract

Understanding the common topological characteristics of the human brain network across a population is central to understanding brain functions. The abstraction of human connectome as a graph has been pivotal in gaining insights on the topological properties of the brain network. The development of group-level statistical inference procedures in brain graphs while accounting for the heterogeneity and randomness still remains a difficult task. In this study, we develop a robust statistical framework based on persistent homology using the order statistics for analyzing brain networks. The use of order statistics greatly simplifies the computation of the persistent barcodes. We validate the proposed methods using comprehensive simulation studies and subsequently apply to the resting-state functional magnetic resonance images. We found a statistically significant topological difference between the male and female brain networks.

Introduction

Modeling the human brain connectome as graphs has become the cornerstone of neuroscience, enabling an efficient abstraction of the brain regions and their interactions [1, 2]. Graphs offer the simplistic construct with only a set of nodes and edges to describe the connectivity of the brain network [3]. The generalizability of graph representation allows one to obtain quantitative measures across multiple spatio-temporal scales ranging from the node level up to the whole network level [4, 5]. To build the graph representation of brain networks, the whole brain is usually parcellated into hundreds of disjoint regions, which serves as nodes and the edges are associated with weights that indicate the strength of connection between the brain regions [6]. The graph theory based models provide reliable measures such as small-worldness, modularity, centrality and hubs [79]. However, these measures are often affected by the choice of arbitrary thresholds on the edge weights and thus make comparisons across networks difficult [10, 11]. To overcome this issue, the topological data analysis (TDA) has emerged to be a powerful method to systematically extract information from hierarchical layers of abstraction [1215].

Persistent homology (PH), one of the TDA techniques, provides a coherent framework for obtaining topological invariants or features such as connected components and cycles at different spatial resolutions [1619]. These topological invariants are often used to provide robust quantification measures to assess the topological similarity between networks [6]. Mostly the persistent barcodes are represented as persistent landscapes or diagrams and their distributions are used to compute a topological distance measure [20]. The PH based topological distances are found to consistently outperform traditional graph based metrics [21]. The main idea of using PH to brain networks is to generate a sequence of nested networks over every possible threshold through a graph filtration, which builds the hierarchical structure of the brain networks at multiple scales [10, 2224].

In the graph filtration, a series of nested graphs containing topological information at different scales are produced. During the graph filtration, some topological features may live longer, whereas others die quickly. The filtration process tracks the birth, death and persistence of the topological features. The lifespans or persistence of these features are directly related to the topological properties of networks. The collection of intervals from births to deaths that defines persistences are called the barcodes which characterizes the topology of an underlying dataset [14]. The persistent diagram displays the paired births and deaths as scatter points [16, 20, 2529]. The Betti curves, which counts the number of such features over filtrations, provide comprehensive visualizations of these intervals [6]. Thus, it is instructive to develop a statistical inference procedure using the persistent barcodes in order to compare across different groups and achieve meaningful inferences. This requires the statistical version of TDA [27, 30]. [30] worked on computing confidence bands through bootstraps. [27] introduced persistent landscapes which lies in a vector space, where the sample mean and variance can be computed and thus enable a proper statistical inference. [31] worked on the hypothesis testing on the Gaussian kernel smoothing on persistent diagrams Analogous to the persistent barcodes, we also have their stochastic versions referred as the expected persistent barcodes. However, computing them requires complex theoretical constructs and they are generally approximated [3234].

Since the real brain networks are often affected by heterogeneity and intrinsic randomness [35, 36], it is challenging to build a coherent statistical framework to transform these topological features as quantitative measures to compare across different brain networks by averaging or matching [37]. The brain networks are inherently noisy which makes it even harder to establish similarity across networks. Thus, there is a need to develop a statistical model that accounts for the randomness and provides consistent results across networks. The statistical models based on the distributions are expected to be more robust and less affected by the presence of outliers. To this end, we use the concept of random graph to analyze brain networks across a population.

The random graphs have been investigated by many authors [3842]. A graph whose features related to nodes and edges are determined in a random fashion is called a random graph. The theory of random graphs lies at the intersection between graph theory and probability theory. They are usually described using a probability distribution or a stochastic process that generates them [43, 44]. The homology of random graphs have been studied studied by Kahle in particular. [45] investigated the connectivity of neighborhood complex of a random graph. [46] studied the expected topological properties of Rips complexes built on randomly distributed points in . [47] worked on the central limit theorem for Betti numbers of random simplical complexes. Random graphs are often encountered in graphical models, which build probabilistic models on the conditional dependency structures between nodes [48, 49]. However, topology is rarely investigated in graphical models.

In this paper, we propose a more adaptable random graph model for brain networks. We consider a random complete graph, where all the nodes are connected with its edge weights randomly drawn from a continuous distribution. The consideration of a complete graph model simplifies building graph filtration straightforward [22, 37]. We then compute the expected 0D and 1D barcodes through the order statistics [5055]. The use of order statistics in computing persistent homology features such as persistent barcodes and Betti numbers can drastically speed up the computation. Further, we propose the expected topological loss (ETL), which quantifies the 0D and 1D barcodes obtained through order statistics. We use the ETL as a test statistic to determine the topological similarity and dissimilarity between networks. The proposed random graph model and corresponding ETL methods are validated using extensive simulation studies with the ground truths. Subsequently, the method is applied to the resting-state functional magnetic resonance images (rs-fMRI) of the human brain.

Materials and methods

Data

We considered a resting-state fMRI dataset collected as part of the Human Connectome Project (HCP) [56, 57]. The dataset consisted of fMRI scans of 400 subjects (168 males and 232 females) over approximately 14.5 minutes using a gradient-echoplanar imaging sequence with 1200 time points [24, 37]. Informed consent was obtained from all participants by the Washington University in St. Louis institutional review board [58]. The ethics approval for using the HCP data was obtained from the local ethics committee of University of Wisconsin-Madison.

The human brain can be viewed as a weighted network with its neurons as nodes. However, considering a high number of neurons (∼ 1012) in a human brain, the traditional brain imaging studies parcellate the brain into a manageable number of mutually exclusive regions [5961]. These regions are then considered as nodes while the strength of connectivity between these regions are edges. For the considered dataset, the Automated Anatomical Labeling (AAL) template was employed to parcellate the brain volume into 116 non-overlapping anatomical regions [62] and the fMRI across voxels within each brain parcellation were averaged. This resulted 116 average fMRI time series with 1200 time points for each subject. Further, we removed fMRI volumes with significant head movements [63] because such movements are shown to produce spatial artifacts in functional connectivity [6466].

Simplicial complex

A simplex is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. A 0-simplex is a point, a 1-simplex is a line segment, and a 2-simplex is a triangle. In general, a k-simplex Sk is a convex hull of k + 1 affinely independent points : Whereas, a simplicial complex is a set of simplices that satisfies the following two conditions. (1) Every face of a simplex from is also in . (2) The non-empty intersection of any two simplices is a shared face [67]. We call a simplicial complex consisting of up to k-simplices a k-skeleton. Since graphs are a collection of nodes (0-simplices) and edges (1-simplices), they are 1-skeleton simplicial complexes. In a 1-skeleton, 0-dimensional (0D) holes are connected components while the 1-dimensional (1D) holes are cycles. A cycle or loop in a graph is a path that starts and ends at the same node but no other nodes in the path are overlapping. There is no higher dimensional homology beyond dimensions 0 and 1 in 1-skeleton [37].

Graph filtration

The brain networks are traditionally represented and analyzed as a graph, a 1-skeleton consisting of only nodes and edges [5962, 68]. The main focus of functional brain network analysis is quantifying and modeling the pairwise interaction between brain regions, which is usually called the effective connectivity [6972]. Thus, we limited our algebraic representation of brain networks to graphs. Compared to the vast body of studies analyzing brain networks as graphs, modeling them as higher order simplicial complexes are only few [17, 7375]. We used the graph filtration, which iteratively builds nested subgraphs of the original graph in a hierarchical manner [22]. Currently, this is the most often used filtration in analyzing brain networks due to its simplicity.

Consider a weighted graph G(p, w), where p is the number of nodes and w = (w1, …, wq) is a q-dimensional vector of edge weights with q = (p2p)/2. The binary graph Gϵ(p, wϵ) of G(p, w) has binary edge weight wϵ,i: The binary network Gϵ(p, wϵ) is a 1-skeleton. In 1-skeleton, 0-dimensional (0D) holes are connected components while the 1-dimensional (1D) holes are cycles. There is no higher dimensional homology beyond dimensions 0 and 1. The number of connected components and the number of independent cycles in a graph are referred to as the 0th Betti number (β0) and 1st Betti number (β1) respectively. For 1-skeletons, there is an efficient 1D filtration method called the graph filtration, which filters at the edge weights from −∞ to ∞ in a sequentially increasing manner [6, 37]. The graph filtration of G is defined as a collection of nested binary networks over increasing filtration values ϵ0 < ϵ1 < ⋯ < ϵk. We used the edge weights as the filtration values to make the graph filtration unique [6].

During the graph filtration, edges are deleted one at a time from the lowest edge weight to the highest. The deletion of an edge disconnect the graph into at most two. Thus, the number of connected components (β0) stays the same or increases at most by one. Euler characteristic χ of the graph is given by [76] Thus the change of Euler characteristic Δχ over the filtration is given by where the change of β0 is Δβ0 = 0 or 1. Subsequently the change of β0 is Δβ1 = −1 or 0. The number of cycles decrease at most by 1 [6, 77].

Birth-death decomposition

When we increase the filtration value ϵ, either one new connected component appears or one cycle disappears [6]. Once a connected component is born, it never dies implying an infinite death value. On the other hand, all the cycles are considered to be born at −∞. Therefore, we simply ignore the infinite death values of the connected components and the negative infinite birth values of the cycles and build the computation framework based on only the birth (death) values of the connected components (cycles) [37]. Also, the number of connected components (or cycles) is non-decreasing (or non-increasing) as ϵ increases. Subsequently, the 0D barcode is given by a set of increasing birth values: and the 1D barcode is given by a set of increasing death values: By tracing the birth values of connected components and the death values of cycles together, we can characterize the topology of the graph.

The above 0D and 1D barcodes summarize the persistences of connected components and cycles and are often visualized using persistent diagrams [16, 2527] and Betti curves. The Betti curves plot the Betti numbers with respect to the filtration values. Since the Betti numbers are monotonic, the Betti curve is a step function with a one-unit jump (or drop) at every birth (or death) values. The total number of finite birth values of connected components and the total number of death values of cycles are respectively [37]. The number of connected components (β0) and cycles (β1) in the complete graph G−∞ are 1 and m1 respectively. We note that every edge weight must be in either 0D barcode or 1D barcode as summarized in the following theorem [37].

Theorem 1 (Birth-death decomposition). The set of 0D birth values B(G) and 1D death values D(G) partition the edge weight vector w such that B(G) ∪ D(G) = w and B(G) ∩ D(G) = ϕ. The cardinalities of B(G) and D(G) are p − 1 and (p − 1)(p − 2)/2, respectively.

The schematic of graph filtration and birth-death decomposition for a random graph is presented in Fig 1. The cycles we identify using the birth-death decomposition algebraically independent of each other and hence form a basis for cycles [24, 37]. In binary graph in Fig 1, there is one cycle consisting of edge weights W(4), W(5) and W(6). The cycle can be algebraically represented as [W(4)] + [W(5)] + [W(6)] with the convention of putting clockwise orientation along the edges. In the complete graph G−∞, there are three independent cycles

thumbnail
Fig 1. Schematic of graph filtration and persistent barcodes computation.

We consider a random weighted graph with p = 4 nodes, where the number of edges is q = p(p − 1)/2 = 6. The random edge weights are {W1, W2, …, W6}. We order them using the order statistic as W(1) < W(2) < ⋯ < W(6). We remove each edge of the random graph one at a time in the graph filtration and construct the random birth and death sets of the connected components and cycles, respectively. The Betti-0 (lower right) and Betti-1 (lower left) curves are drawn using the birth and death sets. The blue and green shaded areas represent the areas under Betti-0 and Betti-1 curves. Later, we will consider the area under Betti-0 curve to quantify the curve and construct a test statistic to discriminate between two groups of networks.

https://doi.org/10.1371/journal.pone.0276419.g001

All other cycles can be represented as a linear combination of C1, C2 and C2. For instance,

The total number of algebraically independent cycles is the 1st Betti number β1, which is equivalent to the number of death values of cycles. For a complete graph with p nodes, the total number of edges is p(p − 1)/2. In a graph filtration, the total number of birth values of connected components equals to the number of edges p − 1 in the maximum spanning tree. From the birth-death decomposition, the remaining edges contribute to the death values of cycles. The remaining number of edges are [37]

The connected components characterize the modular structure or shape of a network whereas the cycles are loops in a network [24, 78]. In [24], the authors focused specifically on cycles in a brain network as they embed higher order signal transmission paths to provide insights of the functioning of the brain. The presence of more cycles in a network indicates a dense connection with stronger connectivity. The cycles in the brain network not only determines the propagation of information but also controls the feedback [79]. The connected components and cycles provide dependent but complementary information about the network.

Wasserstein distance on barcodes

Since there is no higher dimensional homology beyond dimensions 0 and 1 in 1-skeleton, the 0D and 1D barcodes together can characterize the topology of a network [14]. Therefore, the topological similarity between two such networks can be quantified using a distance metric between the corresponding 0D or 1D barcodes [80]. Often used metric is the Wasserstein distance [23, 8183]. Let G1(p, u) and G2(p, v) be two networks and the corresponding barcodes (or persistent diagrams) be P1 and P2. Then, the 2-Wasserstein distance on barcodes is given by over every possible bijection τ between P1 and P2 [23, 84, 85]. For graph filtrations, barcodes are 1D scatter points. Therefore, the bijection τ can be simplified to the norm between the sorted birth values of connected components or the sorted death values of cycles [23].

Theorem 2 Let G1 and G2 be two networks with p nodes and be the birth and death sets of the network Gk, k = 1, 2. Then, the 2-Wasserstein distance between the 0D barcodes for graph filtration is given by and the 2-Wasserstein distance between the 1D barcodes is

Expected persistent barcodes of random graph

Consider random graph , where its edge weights are drawn independent and identically from a distribution function FW. p is the number of nodes and W = (W1, …, Wq) is a q dimensional vector of random weights with q = (p2p)/2. The considered graph is complete and its edge weights are drawn randomly from a continuous distribution. To be mathematically precise, the considered random graph is almost surely complete. Since we the edge weights are drawn from a continuous distribution, the probability of having zero edge weight is nil. Fig 2 displays weighted brain networks randomly drawn from Beta distributions.

thumbnail
Fig 2. Visualization of simulated brain networks with 116 nodes.

Left: The empirical density functions of simulated edge weights from Beta(2, 5) (top) and Beta(5, 2) (bottom) distributions. Middle: The 116 × 116 correlation matrices constructed using the simulated edge weights. Right: Human brain networks with the simulated edge weights. Since correlation networks are too dense for visualization, we only displayed edges with values below 0.1 and above 0.9.

https://doi.org/10.1371/journal.pone.0276419.g002

If we apply a graph filtration on the random weighted graph , we have a set of random birth values of connected components (or random 0D barcode) and a set of random death values of cycles (or random 1D barcode). Since the notions of random birth and death values are abstract, it is important to turn them into deterministic topological descriptors. As often, one of the simplest way to turn a random object into a deterministic summary is to consider its average behavior. To that end, we study the expected birth and death values (or expected persistent barcodes) as follows.

Let be a random graph and its sorted random edge weights be where the subscript (i) indicates the ith smallest edge weight. For instance, W(1) = min1≤iqWi is the smallest edge weight while W(q) = max1≤iqWi. is the largest edge weight. Order statistics can be formulated by modeling indices (i) using random permutations while the actual edge weights are fixed nonrandom quantity. However, in order statistics, the indices (i) themselves are not considered as random but fixed [86, 87]. They simply indicates the order the random variables are indexed. Only what is in the ith variable is considered as random. In this study, we will follow the traditional convention in order statistics.

Let the random birth and death values of the connected components and cycles be where m0 = p − 1 and m1 = (p − 1)(p − 2)/2. Then, the expected birth and death values are given by and where indicates the standard expectation operator on a random weight.

In order to compute the expected birth and death values, we provide an explicit expression for , for k = 1, …, q, through Theorem 3 below.

Theorem 3 Let the edge weights W = {W1, W2, …, Wq} of a random graph be drawn from a distribution with cumulative distribution function (cdf) FW and probability density function (pdf) fW. Then, the expectation of the kth edge wight W(k) can be approximated by Proof Since the edge weights W1, W2, …, Wq are drawn from a distribution with a cdf FW and a pdf fW, the pdf of the kth order statistic W(k) is given by (1) W(k) does not follow a well-known distribution and, therefore, the computation of its mean and variance is difficult. However, [88] showed that the rth sample quantile of {W1, W2, …, Wq} is asymptotically normally distributed: (2) for large q, where stands for asymptotic normal distribution. Thus, the approximate mean and variance of W(k) can be found from (2) by letting r = k/(q + 1): The variance will be later used in computing confidence intervals.

Now, we use Theorem 3 and provide expressions for the expected birth and death values in Theorem 4 below.

Theorem 4 Let be a random graph, where its edge weights are drawn from a cdf FW. Then, the expected birth values of the connected components of are given by and the expected death values of the cycles of are given by

The proof follows Theorem 3. As a special case of Theorem 4, we show that if the edge weights follow uniform distribution in [0, 1], the expected birth and death values have a more simplified and an exact form. The expected birth values of the connected components are given by and the expected death values of the cycles are given by Then the distribution of the kth order statistic simplifies to which is the distribution of the Beta distribution with parameters k and q + 1 − k. Since the mean of a Beta(k, q + 1 − k) distribution has an exact form of k/(q + 1), we have

Once the expected birth and death values are computed, we can use them to plot Betti curves. In Fig 3 example, we generated two random graphs with p = 150 nodes and their edge weights drawn from Beta(2, 2) and Beta(2, 3) distributions. We observe that a slight change in distribution significantly affects the topology of a network.

thumbnail
Fig 3. Plots of Betti-0 (left) and Betti-1 (right) curves of random networks with edges drawn from Beta(2, 2) (in dotted red line) and Beta(2, 3) (in solid black line) distributions.

The change in distribution affects the topology of a network.

https://doi.org/10.1371/journal.pone.0276419.g003

Confidence bands on birth and death values

Given a set of n samples from a random graph , we show how to compute the confidence bands on the expected birth and death values. The method is later validated using a simulation study.

Let the random weights of the n sampled graphs be w1, w2, …, wn, where wi = (wi1, wi2, …, wiq) and wijFW. From the previous section, we know that W(k) follows a asymptotic Gaussian distribution with its mean and variance being and The density fW is estimated by averaging the n graphs with respect to their weights using Gaussian kernel density estimates (KDE). Let the average weight vector be . Then, the KDE of the pdf fW is given by [89] where K is the Gaussian kernel with bandwidth h. To estimate , we first find the empirical cdf of FW based on the averaged weight vector as Here, IA is an indicator function takeing value 1 if the event A is true and 0 otherwise. The inverse cumulative distribution of is then given by Once fW and are estimated, we plug-in the corresponding estimates in μk and and obtain the estimates and . Finally, we calculate the α% confidence intervals as where zα is such that for standard normal . For α = 95, we have zα = 1.96.

Inference on expected birth and death values

Since a graph can be topologically characterized by 0D and 1D barcodes, the topological similarity and dissimilarity between two graphs can be measured using the differences of such barcodes. To quantify these differences, we propose the expected topological loss (ETL) as follows.

Let and be two random graphs, where the random weights U = {U1, …, Uq} and V = {V1, …, Vq} are drawn from distribution functions FU and FV, respectively. Further, let the expected birth and death values of be Similarly, let the expected birth and death values of be Then, the ETL is given by (3)

In most applications, the distribution functions FU and FV are unknown. In such scenarios, we plug-in the corresponding empirical distribution function estimates and in (3). The ETL is a function of expected 0D and 1D barcodes. The expected 0D barcode can also be viewed as the expected heights of branching in a random merge tree [9095].

Application of ETL in discriminating networks

The ETL can be used to topologically discriminate between two groups of brain networks. Let Ω = {Ω1, …, Ωm} and Ψ = {Ψ1, …, Ψn} be two sets consisting of m and n complete networks each comprising p number of nodes. Let the empirical distribution functions of the edge weights of the graphs in group Ω be and the expected birth and death values for Ωi be which will be simply denoted as The average of expected birth and death values within the group Ω are given by Similarly, for the second group Ψ, let the average of expected birth and death values be The test statistic based on ETL to discriminate between groups is then given by (4) Large indicates a significant topological difference between the two groups whereas a small value suggests that there is no significant topological group difference. Considering the probability distribution of the test statistic is unknown, we use the permutation test [9699]. In this study, we use 100000 permutations for small scale simulation studies and half million permutations for large-scale real data.

A similar widely-used statistic is the maximum gap statistics. On a similar line to , the statistic is given by: (5) We will use to compare with the ETL statistic in the simulation study section.

Area under Betti curves in discriminating networks

The difference of β0 curves can be also quantified using the area under the curve (AUC) [100, 101]. The AUC for Ωi of the group Ω and for Ψj of the group Ψ are given by We compute the AUC by summing up the areas of rectangular blocks formed by the expected persistent barcodes. For example, in Fig 1, the area under the Betti-0 curve is 2(W(5)W(3)) + 3(W(6)W(5)).

To determine if AUC is significantly different between groups Ω and Ψ, we use the Wilcoxon rank sum test [102]. The Wilcoxon rank sum test is a nonparametric test of the null hypothesis, For randomly selected values X and Y from two populations, the probability of X being greater than Y is equal to the probability of Y being greater than X. This is unlike the previous situation, where we considered a ETL or a max-gap statistic. In those cases, since we are considering the distance between increasing births or deaths of two networks, the consideration of or norm in the statistic is meaningful. In contrast to distance-based methods, AUC offers an alternate area based topological inference procedure. The method can be equally applicable for area under Betti-1 curves as well.

Simulation study

Since there is no ground truth in real brain network data, we performed extensive simulation studies with known ground truth. The Matlab codes for simulation study is provided in https://github.com/laplcebeltrami/orderstat.

Validation of birth and death value estimates

We validate the method to estimate expected birth and death values. We generate the ground truth graph G(p, w) with given edge weights and calculate its birth and death values using the birth-death decomposition on p = 10 number of nodes [37]. The weight vector w is of dimension q = p(p − 1)/2 = 45. We drew the q variate random weights w from the Uniform(0, 1) distribution.

We then simulate n = 15 vectors of q-variate Gaussian noises and add them to the edge weights w of G(p, w) to have a set of n graphs (33) where with σ = 0.02. The produced set of graphs {G(p, w1), …, G(p, wn)} is considered as a realizations from a random graph and apply the proposed method to calculate the expected birth and death values along with their corresponding confidence bands. Then, we compare them with the initially calculated birth and death values of G(p, w). Fig 4 displays the schematic of the validation procedure. The original and expected birth (left panel) and death (right panel) values are plotted in Fig 5. The black line represents the original birth or death values, the dashed red line indicates the estimated birth or death values, and the dashed blue lines indicate the corresponding 95% upper and lower confidence bands. We observe that the dashed red lines almost overlap the black lines of the original birth and death values. In addition, the original birth or death values almost always lie within the confidence bands supporting the reliability of the proposed methodology.

thumbnail
Fig 4. Schematic of validate the proposed method to compute expected birth and death values.

The ground truth graph G is we used to calculate the Betti curves (solid black line) using the birth-death decomposition. We added Gaussian noise to edge weights to generate samples in each group. Then, we apply the proposed method on this set of sampled graphs and estimate the expected birth and death values and the Betti curves (dotted red line).

https://doi.org/10.1371/journal.pone.0276419.g004

thumbnail
Fig 5. Plots of the original and expected birth (left) and death (right) values.

The black line represents the original birth (or death) values, the dashed red line indicates the expected birth (or death) values, and the dashed blue lines indicate the corresponding 95% confidence intervals.

https://doi.org/10.1371/journal.pone.0276419.g005

Analyzing topological similarity between networks

We provide a toy example to illustrate whether the topological similarity or dissimilarity of two groups of networks, drawn from two different distributions or the same distribution, can be identified using the ETL statistic (4). We used the Beta(a, b) distribution, which are all defined in interval (0, 1). The shape parameters a and b allow for the variety of shapes including the shape of a uniform distribution Uniform(0, 1) when a = b = 1. We considered three different distributions (Fig 6).

thumbnail
Fig 6.

Top panel: Probability density functions of Beta(1, 1) or Uniform(0, 1) (in solid line), Beta(5, 2) (in dash-dotted line), and Beta(1, 5) (in dotted line). We sample the edge weights of random graphs from these three different distributions for validation purpose. Middle and bottom panel: Histogram plots of the ETL (middle) and maximum gap (bottom) test statistics and the corresponding observed test statistics (in dotted red lines) for the scenarios: Beta(1, 1) vs. Beta(5, 2) (left) and Beta(1, 1) vs. Beta(1, 1) (right) with 6 networks in each group.

https://doi.org/10.1371/journal.pone.0276419.g006

We generated two groups of networks. For the both groups, we simulated n = 6, 8, 10, and 12 networks. We investigated the performance of both small (p = 10) and large (p = 100) network settings. Small networks may not yield complex cyclic structures often present in large networks. However, the overall conclusions are the same regardless of the size of networks. For the permutation test, we considered 100000 permutations and repeated that 10 times to compute the average p-values. Table 1 tabulates the p-values for small and large network settings. In all the scenarios, networks drawn from the same distribution produced large p-values and networks drawn from different distributions had p-values smaller than 0.01. Therefore, we conclude that the proposed ETL statistic, based on expected birth and death values, can discriminate networks drawn from different distributions at 99% confidence level. The middle panel of Fig 6 plots the histograms of the ETL test statistic and the corresponding observed test statistics (in dotted red) for two specific scenarios: (i) Beta(1, 1) vs. Beta(5, 2) (left) and (ii) Beta(1, 1) vs. Beta(1, 1) (right) with 12 networks in each group.

thumbnail
Table 1. The average p-values obtained using the ETL statistic for various pairs of distributions for small (p = 10) and large (p = 100) network settings.

Here, the columns 6 networks, 8 networks, 10 networks, and 12 networks indicate the number of networks that we considered for both the groups. The p-values smaller than 0.01 indicate that our method can identify network differences at a 99% confidence level.

https://doi.org/10.1371/journal.pone.0276419.t001

Comparison of ETL against baselines

We compared the proposed ETL with several other widely-used baseline topological distances such as bottleneck, Gromov-Hausdorff (GH), and Kolmogorov-Smirnov (KS) distances [21, 103, 104]. We also compared the results with the maximum gap statistic defined earlier in (5). In all the scenarios, we considered two groups of networks each of size n = 6. The remaining simulation setting is similar to the above. The corresponding p-values are presented in Table 2 for small (p = 10) and large network (p = 100) settings. From the table, we observe that the ETL performs well in most scenarios. In particular, we note that the KS based methods do not perform well whereas the maximum gap based method is quite competitive. Further, for testing no network differences, all the distances perform well.

thumbnail
Table 2. The average p-values obtained using bottleneck, GH, KS, maximum gap, and ETL based statistics for small (p = 10) and large (p = 100) network settings.

There were 6 networks in each group. The p-values smaller than 0.01 indicate that the corresponding method can identify network differences at a 99% confidence level.

https://doi.org/10.1371/journal.pone.0276419.t002

Since the maximum gap based method exhibits a competitive performance with the ETL based method, we plot the histograms of the maximum gap statistics obtained over different permutations and the corresponding observed test statistics (in dotted red) for two specific scenarios: (i) Beta(1, 1) vs. Beta(5, 2) (left) and (ii) Beta(1, 1) vs. Beta(1, 1) (right) with 6 networks in each group; see the bottom panel of Fig 6. Although both the methods (ETL and maximum-gap) perform well, the ETL generally produces better results (i.e., its p-value is closer to 0 when there is a network difference and closer to 1 when there is no network difference).

Area under Betti curves

We also conducted a simulation study for the method based on the area under β0 curve. The considered simulation layout was the same as before. The obtained p-values are tabulated in Table 3 for small networks (p = 10) and large networks (p = 100). As shown in Fig 3, a slight change in distribution significantly changes the topology of the network and, therefore, the area under β0 curve varies significantly. This change is more visible especially for large networks, which incases AUC. However, the Wilcoxon rank sum test places ranks to the aggregated sample that combined the first and second sample, then considers the sum of ranks for the both samples. This makes the test statistic fairly robust even if the distributions are varied. This makes the p-value computation extremely stable for large networks. For example Table 3 shows the exactly same p-value of 0.0022 for p = 100. Similar to the ETL, this approach can also discriminate networks drawn from different distributions at a 99% confidence level.

thumbnail
Table 3. The average p-values obtained using Wilcoxon rank sum test on the areas under β0 curves for small (p = 10) and large (p = 100) network settings.

Here, the columns 6 networks, 8 networks, 10 networks, and 12 networks indicate the number of networks that we considered for both the groups. The p-values smaller than 0.01 indicate that our method can identify network differences at a 99% confidence level.

https://doi.org/10.1371/journal.pone.0276419.t003

Results

For each of the 400 subjects, we computed the whole-brain functional connectivity by calculating the Pearson correlation matrix over 1200 time points across 116 anatomical regions resulting in 400 correlation matrices of dimension 116 × 116. Therefore, using our notations, we have p = 116 nodes, q = p(p − 1)/2 = 6670 edges, m0 = p − 1 = 115, and m1 = (p − 1)(p−2)/2 = 6555. Fig 7 summarizes the average female (left) and male (right) brain networks (top) and the corresponding correlation matrices (bottom).

thumbnail
Fig 7. Visualization of the fMRI brain data.

Left panel: The average female brain network (top) and the corresponding correlation matrix (bottom). Middle panel: The average male brain network (top) and the corresponding correlation matrix (bottom). Right panel: The β0 (top) and β1 (bottom) curves for female (in solid black) and male (in dashed red) brain networks. For a better visualization, we consider a threshold value of 0:5 while plotting the brain networks so that they contain fewer number of edges.

https://doi.org/10.1371/journal.pone.0276419.g007

Two-sample test using ETL statistic

Given the 400 correlation matrices of 168 males and 232 females, we aim to check whether the proposed ETL statistic can determine the difference between the groups of males and females. We assume that the male and female edge weights are coming from distributions with cdfs FU and FV, respectively. However, these distribution functions are unknown. Therefore, we need to estimate them because the ETL statistic is constructed using these cdfs. To estimate the cdf, we average the male (female) correlation matrices across 168 subjects (232 subjects) and find the empirical cdf based on the averaged 6670 edge weights. The empirical cdfs of the average edge weights of females (in solid black line) and males (in dashed red line) are presented in the left panel of Fig 8. We observe that the empirical cdf corresponding to female is slightly higher than that of male. This suggests a relatively more number of edge weights with smaller values for the female, and a relatively more number of edge weights with bigger values for the male. In other words, the distribution of the female edge weights is slightly positively skewed than the male edge weights. Fig 9 plots the β0 and β1 curves of the average female and male networks (calculated in the standard way) and their corresponding estimated counterparts (computed using the expected birth and death values). We observe that the estimated Betti curves well approximate the original Betti curves.

thumbnail
Fig 8.

Left: Plot of the empirical cdfs of the average edge weights of females (in solid black line) and males (in dashed red line). Right: Histogram plot of the ETL statistic based on the resting-state fMRI dataset. The dotted red line represents the observed value of the ETL statistic.

https://doi.org/10.1371/journal.pone.0276419.g008

thumbnail
Fig 9. Plots of the original (solid black line) and estimated (dashed red line) β0 (top) and β1 (bottom) curves using expected birth and death values for the female (left) and male (right) brain networks.

https://doi.org/10.1371/journal.pone.0276419.g009

To conduct the test, we used the permutation test with 500000 random permutations. The observed test statistic is 4.9372 and the p-value is 0.0134. The histogram of test statistic is plotted in the right panel of Fig 8. We conclude that, although the weight distributions of the males and females are very close, the proposed ETL statistic can still discriminate them at a 95% confidence level.

Two-sample test using AUC statistic

We conducted a two-sample test using the method based on the area under β0 curve. The observed value of the Wilcoxon rank-sum statistic is 48374. The statistic corresponds to the p-value of 0.1036. That is, the test fails to discriminate male and female subjects if we use the traditional values of α, the level of significance, to be 0.05 or 0.1. However, if we relax this assumption a bit, the test can discriminate males and females at a confidence level of 89.5%.

The sex differences of resting state functional networks were previously investigated. There is known sex difference in the parietal region involved in spatial ability [105]. [106] reported sex differences in the left parietal, precentral and postcentral regions. The sex difference is also reported in the left rolandic operculum [107]. The previous rs-fMRI studies mainly focused on brain region specific analysis and not topological. Our topological methods are different. The use of the order statistic in quantifying topological difference between males and females is novel. This method identifies the impact of distribution differences in topological features. These specific results have not been observed before to best of our knowledge.

Discussion

The concept of random graphs was first proposed in mid-twentieth century [39] and has been of many researchers’ interest ever since [108112]. The concepts of TDA tools such as persistent barcodes were extended to handle stochastic cases, which triggered the computation of expected persistent barcodes. However, such computation may require complex theoretical constructs. In this article, we considered a random graph model for which the computation of expected persistent barcodes became simplified by using the order statistic.

[37] formulated a topological loss based on the birth and death values of connected components and cycles of a network that provided an optimal matching and alignment at the edge level. In this article, we extended this formulation to a random graph scenario and proposed the expected topological loss (ETL) based on the expected birth and death values. We use the ETL as a test statistic to discriminate between two groups of networks. We validated this method using a simulation study. We showed that the ETL can identify group differences at a 99% confidence level whereas it produces large p-values when there is no network differences. We compared the proposed approach with baseline approaches and established an overall superior performance of the proposed method. Further, we considered the area under the Betti curves [101]. This resulted a scalar quantification of the curves which was used to discriminate between the groups. A respective simulation study showed its successful discriminative ability whenever there are network differences. We also applied the developed tools in a resting-state brain fMRI dataset and showed that they can differentiate male and female brain networks.

To calculate the expected persistent barcodes, we computed the unknown distribution using the nonparametric empirical distribution function. However, one may also consider hierarchical or Bayesian parametric models for the edge weights instead. For example, one may consider the edge weights to be drawn from a distribution, where the location parameter μ and the dispersion parameter σ2 have a Gaussian and an inverse gamma conjugate prior, respectively. The parameters of the prior distributions will allow flexibility while we can still enjoy the advantages of a parametric model. This direction can be pursued in the future.

We can also use different filtration schemes such as relative filtration [113] or normalized filtration that scales filtration values between 0 and 1 [114]. As long as the order of sorted edge weights are not changed, they will not affect the statistical results. The Wasserstein distance we used is defined on the sorted edge weights. As long as we do not change the value of edge weights, the statistical results will not change.

Our methodology is based on the graph filtration, which gives both 0D and 1D persistence as monotonic 1D functions of birth and death values only. On the other hand, the clique filtration [115], does not produce monotone persistence or barcodes and the proposed method is not directly applicable [116121]. Our method is applicable to any filtration that provides monotone persistence or Betti curves. The proposed graph filtration computes the barcodes in , which is significantly faster than Rips filtrations. In traditional Rips filtrations, the computational complexity grows rapidly with the number of simplices [122]. With p nodes, the size of the k-skeleton grows as pk+ 1 and the computational run time is [123, 124]. Compared to the graph filtration, the Rips filtration constructed using Ripser package [125] is about 8 times slower in a computer. On top of that, we also need to compute the Wasserstein distance between persistent diagrams [126]. The Wasserstein distance computation requires expensive optimization process involving run-time [127, 128].

Our algorithm exploits the geometric structure of the graph filtration, resulting in the persistence diagram representation in the form of 1-dimensional sorted scalar values. Thus, the proposed method computes he Wasserstein distance in bypassing multitude of computational bottlenecks. For resampling based statistical inference such as the permutation test, the computational bottleneck is caused by repreatedlby computing the test statistic over the random permutations of group labels at least half million times [129]. This is impractical if the Wasserstein distance for the Rips filtrations has to be used for 400 networks and then the whole computation has to be done repeatedly half million times. The development of scalable computation will have a significant impact in resampling based statistical inference.

Graph filtrations produce the monotone birth and death values over filtrations. Since our birth and death values are exactly edge weights, the slight changes in edge weight distribution will correspondingly change the birth and death values slightly. Since the expected topological loss (ETL) is sum of differences of birth and death values, it will also change correspondingly. This is the very reason which assures that our method can successfully discriminate topological group differences whenever there is a difference in edge distributions between two groups.

The monotone Betti curves usually follow S-shaped curves similar to the pattern of cumulative distribution functions. The pattern of S-shaped Betti curves do not change drastically even if different dataset is used as long as they are weighted graphs. To demonstrate this, we plotted Betti curves for task-fMRI. Fig 10 shows the Betti curves for task-fMRI, where cognitive inhibition was measured using go/no-go paradigm in 144 subjects on 100 brain regions [130, 131]. The correlation matrices were computed separately for the inhibition (go) and initiation (no-go) blocks of fMRI time series. The monotonic Betti curves show almost identical pattern of Betti curves in rs-fMRI networks in our study.

thumbnail
Fig 10. The average Betti curves of obtained from the graph filtration on correlation matrices computed separately for the inhibition (go) and initiation (no-go) blocks of fMRI time series in cognitive aging study [130, 131].

https://doi.org/10.1371/journal.pone.0276419.g010

Acknowledgments

We like to thank Shih-Gu Huang of the National University of Singapore for providing support for fMRI processing. We also thank Gary Shiu of University of Wisconsin-Madison for discussion on the computational runtime on Rips filtrations.

References

  1. 1. Bassett DS, Sporns O. Network neuroscience. Nature neuroscience. 2017;20(3):353–364. pmid:28230844
  2. 2. Sporns O. Graph theory methods: applications in brain networks. Dialogues in Clinical Neuroscience. 2018;20(2):111. pmid:30250388
  3. 3. Bullmore E, Sporns O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature reviews neuroscience. 2009;10(3):186–198. pmid:19190637
  4. 4. Stam CJ, Reijneveld JC. Graph theoretical analysis of complex networks in the brain. Nonlinear biomedical physics. 2007;1(1):1–19.
  5. 5. Rubinov M, Sporns O. Complex network measures of brain connectivity: uses and interpretations. Neuroimage. 2010;52(3):1059–1069. pmid:19819337
  6. 6. Chung MK, Lee H, DiChristofano A, Ombao H, Solo V. Exact topological inference of the resting-state brain networks in twins. Network Neuroscience. 2019;3:674–694. pmid:31410373
  7. 7. Bassett DS. Small-World Brain Networks. The Neuroscientist. 2006;12:512–523. pmid:17079517
  8. 8. Bassett DS, Porter MA, Wymbs NF, Grafton ST, Carlson JM, Mucha PJ. Robust detection of dynamic community structure in networks. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2013;23(1):013142. pmid:23556979
  9. 9. Van Den Heuvel MP, Sporns O. Rich-club organization of the human connectome. Journal of Neuroscience. 2011;31(44):15775–15786. pmid:22049421
  10. 10. Chung MK, Hanson JL, Lee H, Adluru N, Alexander AL, Davidson RJ, et al. Persistent Homological Sparse Network Approach to Detecting White Matter Abnormality in Maltreated Children: MRI and DTI Multimodal Study. MICCAI, Lecture Notes in Computer Science (LNCS). 2013;8149:300–307. pmid:24505679
  11. 11. Lee H, Kang H, Chung MK, Kim BN, Lee DS. Persistent brain network homology from the perspective of dendrogram. IEEE Transactions on Medical Imaging. 2012;31:2267–2277. pmid:23008247
  12. 12. Zomorodian AJ, Carlsson G. Computing Persistent Homology. Discrete and Computational Geometry. 2005;33:249–274.
  13. 13. Singh G, Memoli F, Ishkhanov T, Sapiro G, Carlsson G, Ringach DL. Topological analysis of population activity in visual cortex. Journal of Vision. 2008;8:1–18. pmid:18831634
  14. 14. Ghrist R. Barcodes: The persistent topology of data. Bulletin of the American Mathematical Society. 2008;45:61–75.
  15. 15. Carlsson G, Memoli F. Persistent clustering and a theorem of J. Kleinberg. arXiv preprint arXiv:08082241. 2008.
  16. 16. Edelsbrunner H, Harer J. Persistent Homology—a Survey. Contemporary Mathematics. 2008;453:257–282.
  17. 17. Petri G, Expert P, Turkheimer F, Carhart-Harris R, Nutt D, Hellyer PJ, et al. Homological scaffolds of brain functional networks. Journal of The Royal Society Interface. 2014;11:20140873. pmid:25401177
  18. 18. Sizemore AE, Giusti C, Kahn A, Vettel JM, Betzel RF, Bassett DS. Cliques and cavities in the human connectome. Journal of computational neuroscience. 2018;44:115–145. pmid:29143250
  19. 19. Lee H, Chung MK, Kang H, Choi H, Kim YK, Lee DS. Abnormal hole detection in brain connectivity by kernel density of persistence diagram and Hodge Laplacian. In: IEEE International Symposium on Biomedical Imaging (ISBI); 2018. p. 20–23.
  20. 20. Bubenik P. The persistence landscape and some of its properties. In: Topological Data Analysis. Springer; 2020. p. 97–117.
  21. 21. Chung MK, Lee H, Solo V, Davidson RJ, Pollak SD. Topological distances between brain networks. International Workshop on Connectomics in Neuroimaging. 2017; p. 161–170.
  22. 22. Lee H, Chung MK, Kang H, Kim BN, Lee DS. Computing the shape of brain networks using graph filtration and Gromov-Hausdorff metric. MICCAI, Lecture Notes in Computer Science. 2011;6892:302–309. pmid:21995042
  23. 23. Songdechakraiwut T, Shen L, Chung MK. Topological learning and its application to multimodal brain network integration. Medical Image Computing and Computer Assisted Intervention (MICCAI). 2021; p. in press, http://pages.stat.wisc.edu/~mchung/papers/song.2021.MICCAI.pdf.
  24. 24. Anand DV, Chung MK. Hodge-Laplacian of Brain Networks and Its Application to Modeling Cycles. arXiv preprint arXiv:211014599. 2021.
  25. 25. Berry E, Chen YC, Cisewski-Kehe J, Fasy BT. Functional summaries of persistence diagrams. Journal of Applied and Computational Topology. 2020;4(2):211–262.
  26. 26. Chazal F, Fasy BT, Lecci F, Rinaldo A, Wasserman L. Stochastic convergence of persistence landscapes and silhouettes. In: Proceedings of the thirtieth annual symposium on Computational geometry; 2014. p. 474–483.
  27. 27. Bubenik P. Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research. 2015;16:77–102.
  28. 28. Biscio CA, Møller J. The accumulated persistence function, a new useful functional summary statistic for topological data analysis, with a view to brain artery trees and spatial point process applications. Journal of Computational and Graphical Statistics. 2019;28:671–681.
  29. 29. Chen YC, Wang D, Rinaldo A, Wasserman L. Statistical analysis of persistence intensity functions. arXiv preprint arXiv:151002502. 2015.
  30. 30. Fasy BT, Kim J, Lecci F, Maria C. Introduction to the R package TDA. arXiv preprint arXiv:14111830. 2014.
  31. 31. Kwitt R, Huber S, Niethammer M, Lin W, Bauer U. Statistical topological data analysis-a kernel perspective. Advances in neural information processing systems. 2015;28.
  32. 32. Gayet D, Welschinger JY. Lower estimates for the expected Betti numbers of random real hypersurfaces. Journal of the London Mathematical Society. 2014;90(1):105–120.
  33. 33. Salepci N, Welschinger JY. Tilings, packings and expected Betti numbers in simplicial complexes. arXiv preprint arXiv:180605084. 2018.
  34. 34. Wigman I. On the expected Betti numbers of the nodal set of random fields. Analysis & PDE. 2021;14(6):1797–1816.
  35. 35. King JB, Prigge MB, King CK, Morgan J, Weathersby F, Fox JC, et al. Generalizability and reproducibility of functional connectivity in autism. Molecular Autism. 2019;10(1):1–23. pmid:31285817
  36. 36. Blinowska KJ, Kaminski M. Functional brain networks: random,“small world” or deterministic? PloS one. 2013;8(10):e78763. pmid:24205313
  37. 37. Songdechakraiwut T, Chung MK. Topological learning for brain networks. 2020; p. arXiv:2012.00675.
  38. 38. Gilbert EN. Random graphs. The Annals of Mathematical Statistics. 1959;30(4):1141–1144.
  39. 39. Erdös P, Rényi A. On the evolution of random graphs. Bull Inst Internat Statist. 1961;38:343–347.
  40. 40. Newman MEJ, Strogatz SH, Watts DJ. Random graphs with arbitrary degree distributions and their applications. Physical Review E. 2001;64:26118. pmid:11497662
  41. 41. Janson S, Rucinski A, Luczak T. Random graphs. John Wiley & Sons; 2011.
  42. 42. Bobrowski O, Kahle M. Topology of random geometric complexes: a survey. Journal of applied and Computational Topology. 2018;1(3):331–364.
  43. 43. Bollobás B, Béla B. Random graphs. 73. Cambridge university press; 2001.
  44. 44. Frieze A, Karoński M. Introduction to random graphs. Cambridge University Press; 2016.
  45. 45. Kahle M. The neighborhood complex of a random graph. Journal of Combinatorial Theory, Series A. 2007;114:380–387.
  46. 46. Kahle M. Random geometric complexes. Discrete & Computational Geometry. 2011;45:553–573.
  47. 47. Kahle M, Meckes E. Limit theorems for Betti numbers of random simplicial complexes. Homology, Homotopy and Applications. 2013;15:343–374.
  48. 48. Jordan MI. Learning in graphical models. MIT press; 1999.
  49. 49. Bishop CM. Pattern recognition and machine learning. springer; 2006.
  50. 50. Wilks SS. Order statistics. Bulletin of the American Mathematical Society. 1948;54(1):6–50.
  51. 51. Rényi A. On the theory of order statistics. Acta Mathematica Academiae Scientiarum Hungarica. 1953;4(3-4):191–231.
  52. 52. David HA, Nagaraja HN. Order statistics. John Wiley & Sons; 2004.
  53. 53. Arnold BC, Balakrishnan N, Nagaraja HN. A first course in order statistics. SIAM; 2008.
  54. 54. Ahsanullah M, Nevzorov VB, Shakil M. An introduction to order statistics. vol. 8. Springer; 2013.
  55. 55. Balakrishnan N, Cohen AC. Order statistics & inference: estimation methods. Elsevier; 2014.
  56. 56. Van Essen DC, Ugurbil K, Auerbach E, Barch D, Behrens TEJ, Bucholz R, et al. The Human Connectome Project: a data acquisition perspective. NeuroImage. 2012;62:2222–2231. pmid:22366334
  57. 57. Glasser MF, Sotiropoulos SN, Wilson JA, Coalson TS, Fischl B, Andersson JL, et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage. 2013;80:105–124. pmid:23668970
  58. 58. Bzdok D, Varoquaux G, Grisel O, Eickenberg M, Poupon C, Thirion B. Formal models of the network co-occurrence underlying mental operations. PLoS computational biology. 2016;12:e1004994. pmid:27310288
  59. 59. Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage. 2006;31:968–980. pmid:16530430
  60. 60. Hagmann P, Kurant M, Gigandet X, Thiran P, Wedeen VJ, Meuli R, et al. Mapping human whole-brain structural networks with diffusion MRI. PLoS One. 2007;2(7):e597. pmid:17611629
  61. 61. Arslan S, Ktena SI, Makropoulos A, Robinson EC, Rueckert D, Parisot S. Human brain mapping: A systematic comparison of parcellation methods for the human cerebral cortex. NeuroImage. 2018;170:5–30. pmid:28412442
  62. 62. Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage. 2002;15:273–289. pmid:11771995
  63. 63. Power JD, Barnes KA, Snyder AZ, Schlaggar BL, Petersen SE. Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage. 2012;59:2142–2154. pmid:22019881
  64. 64. Van Dijk KRA, Sabuncu MR, Buckner RL. The influence of head motion on intrinsic functional connectivity MRI. NeuroImage. 2012;59:431–438. pmid:21810475
  65. 65. Satterthwaite TD, Wolf DH, Loughead J, Ruparel K, Elliott MA, Hakonarson H, et al. Impact of in-scanner head motion on multiple measures of functional connectivity: relevance for studies of neurodevelopment in youth. NeuroImage. 2012;60:623–632. pmid:22233733
  66. 66. Caballero-Gaudes C, Reynolds RC. Methods for cleaning the BOLD fMRI signal. NeuroImage. 2017;154:128–149. pmid:27956209
  67. 67. Edelsbrunner H, Harer J. Computational topology: An introduction. American Mathematical Society; 2010.
  68. 68. Fornito A, Zalesky A, Bullmore E. Fundamentals of Brain Network Analysis. New York: Academic Press; 2016.
  69. 69. Park HJ, Friston KJ, Pae C, Park B, Razi A. Dynamic effective connectivity in resting state fMRI. NeuroImage. 2018;180:594–608. pmid:29158202
  70. 70. Horwitz B, Warner B, Fitzer J, Tagamets MA, Husain FT, Long TW. Investigating the neural basis for functional and effective connectivity. Application to fMRI. Philosophical Transactions of the Royal Society B: Biological Sciences. 2005;360(1457):1093–1108. pmid:16087450
  71. 71. Schlösser R, Gesierich T, Kaufmann B, Vucurevic G, Hunsche S, Gawehn J, et al. Altered effective connectivity during working memory performance in schizophrenia: a study with fMRI and structural equation modeling. Neuroimage. 2003;19(3):751–763. pmid:12880804
  72. 72. Dirkx MF, den Ouden H, Aarts E, Timmer M, Bloem BR, Toni I, et al. The cerebral network of Parkinson’s tremor: an effective connectivity fMRI study. Journal of Neuroscience. 2016;36(19):5362–5372. pmid:27170132
  73. 73. Giusti C, Ghrist R, Bassett DS. Two’s company, three (or more) is a simplex. Journal of computational neuroscience. 2016;41(1):1–14.
  74. 74. Reimann MW, Nolte M, Scolamiero M, Turner K, Perin R, Chindemi G, et al. Cliques of neurons bound into cavities provide a missing link between structure and function. Frontiers in computational neuroscience. 2017; p. 48. pmid:28659782
  75. 75. Tadić B, Andjelković M, Melnik R. Functional geometry of human connectomes. Scientific reports. 2019;9(1):1–12. pmid:31427676
  76. 76. Adler RJ, Bobrowski O, Borman MS, Subag E, Weinberger S. Persistent homology for random fields and complexes. In: Borrowing strength: theory powering applications–a Festschrift for Lawrence D. Brown. Institute of Mathematical Statistics; 2010. p. 124–143.
  77. 77. Chung MK, Huang SG, Gritsenko A, Shen L, Lee H. Statistical inference on the number of cycles in brain networks. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE; 2019. p. 113–116.
  78. 78. Carlsson G. Topology and data. Bulletin of the American Mathematical Society. 2009;46:255–308.
  79. 79. Lind PG, Gonzalez MC, Herrmann HJ. Cycles and clustering in bipartite networks. Physical review E. 2005;72:056127. pmid:16383708
  80. 80. Mileyko Y, Mukherjee S, Harer J. Probability measures on the space of persistence diagrams. Inverse Problems. 2011;27(12):124007.
  81. 81. Vallender S. Calculation of the Wasserstein distance between probability distributions on the line. Theory of Probability & Its Applications. 1974;18(4):784–786.
  82. 82. Mi L, Zhang W, Gu X, Wang Y. Variational Wasserstein clustering. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 322–337.
  83. 83. Mi L, Zhang W, Wang Y. Regularized Wasserstein means for aligning distributional data. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34; 2020. p. 5166–5173.
  84. 84. Panaretos VM, Zemel Y. Statistical aspects of Wasserstein distances. Annual review of statistics and its application. 2019;6:405–431.
  85. 85. Kolouri S, Zou Y, Rohde GK. Sliced Wasserstein kernels for probability distributions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 5258–5267.
  86. 86. Conover WJ. Practical Nonparametric Statistics. New York: Wiley; 1980.
  87. 87. Gibbons JD, Chakraborti S. Nonparametric Statistical Inference. Chapman & Hall/CRC Press; 2011.
  88. 88. Mosteller F. On some useful “inefficient” statistics. In: Selected Papers of Frederick Mosteller. Springer; 2006. p. 69–100.
  89. 89. Duong T, Hazelton ML. Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian Journal of Statistics. 2005;32(3):485–506.
  90. 90. O’Neil P, Cheng E, Gawlick D, O’Neil E. The log-structured merge-tree (LSM-tree). Acta Informatica. 1996;33(4):351–385.
  91. 91. Sears R, Ramakrishnan R. bLSM: a general purpose log structured merge tree. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data; 2012. p. 217–228.
  92. 92. Morozov D, Weber G. Distributed merge trees. In: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming; 2013. p. 93–102.
  93. 93. Liu T, Seyedhosseini M, Tasdizen T. Image segmentation using hierarchical merge tree. IEEE transactions on image processing. 2016;25(10):4596–4607.
  94. 94. Nigmetov A, Morozov D. Local-global merge tree computation with local exchanges. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis; 2019. p. 1–13.
  95. 95. Samardzic N, Qiao W, Aggarwal V, Chang MCF, Cong J. Bonsai: High-performance adaptive merge tree sorting. In: 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE; 2020. p. 282–294.
  96. 96. Thompson PM, Cannon TD, Narr KL, van Erp T, Poutanen VP, Huttunen M, et al. Genetic influences on brain structure. Nature Neuroscience. 2001;4:1253–1258. pmid:11694885
  97. 97. Zalesky A, Fornito A, Harding IH, Cocchi L, Yücel M, Pantelis C, et al. Whole-brain anatomical networks: Does the choice of nodes matter? NeuroImage. 2010;50:970–983. pmid:20035887
  98. 98. Nichols TE, Holmes AP. Nonparametric permutation tests for functional neuroimaging: A primer with examples. Human Brain Mapping. 2002;15:1–25. pmid:11747097
  99. 99. Winkler AM, Ridgway GR, Douaud G, Nichols TE, Smith SM. Faster permutation inference in brain imaging. NeuroImage. 2016;141:502–516. pmid:27288322
  100. 100. Chung MK, Hanson JL, Ye J, Davidson RJ, Pollak SD. Persistent Homology in Sparse Regression and Its Application to Brain Morphometry. IEEE Transactions on Medical Imaging. 2015;34:1928–1939. pmid:25823032
  101. 101. Xu F, Garai S, Chung M, Caciagli L, Saykin AJ, Bassett DS, et al. Identifying topological changes of structural connectome in MCI and AD through persistent homology. In preperation. 2021.
  102. 102. Haynes W. In: Dubitzky W, Wolkenhauer O, Cho KH, Yokota H, editors. Wilcoxon Rank Sum Test. New York, NY: Springer New York; 2013. p. 2354–2355. Available from: https://doi.org/10.1007/978-1-4419-9863-7_1185.
  103. 103. Cohen-Steiner D, Edelsbrunner H, Harer J. Stability of Persistence Diagrams. Discrete and Computational Geometry. 2007;37:103–120.
  104. 104. Chazal F, Cohen-Steiner D, Guibas LJ, Mémoli F, Oudot SY. Gromov-Hausdorff Stable Signatures for Shapes using Persistence. In: Computer Graphics Forum. vol. 28; 2009. p. 1393–1403.
  105. 105. Koscik T, O’Leary D, Moser DJ, Andreasen NC, Nopoulos P. Sex differences in parietal lobe morphology: relationship to mental rotation performance. Brain and cognition. 2009;69:451–459. pmid:18980790
  106. 106. Xu Y, Lindquist MA. Dynamic connectivity detection: an algorithm for determining functional connectivity change points in fMRI data. Frontiers in Neuroscience. 2015;9:285. pmid:26388711
  107. 107. Rubin LH, Yao L, Keedy SK, Reilly JL, Bishop JR, Carter CS, et al. Sex differences in associations of arginine vasopressin and oxytocin with resting-state functional brain connectivity. Journal of neuroscience research. 2017;95:576–586. pmid:27870395
  108. 108. Kovalenko I. Theory of random graphs. Cybernetics. 1971;7(4):575–579.
  109. 109. Barabási AL, Albert R. Emergence of scaling in random networks. science. 1999;286(5439):509–512. pmid:10521342
  110. 110. Karoński M, Scheinerman ER, Singer-Cohen KB. On random intersection graphs: The subgraph problem. Combinatorics, Probability and Computing. 1999;8(1-2):131–159.
  111. 111. Chung F, Lu L. Connected components in random graphs with given expected degree sequences. Annals of combinatorics. 2002;6(2):125–145.
  112. 112. Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In: European conference on principles of data mining and knowledge discovery. Springer; 2005. p. 133–145.
  113. 113. Murai T, Nakata M, Sato Y. A note on filtration and granular reasoning. In: Annual Conference of the Japanese Society for Artificial Intelligence. Springer; 2001. p. 385–389.
  114. 114. Kannan H, Saucan E, Roy I, Samal A. Persistent homology of unweighted complex networks via discrete Morse theory. Scientific reports. 2019;9:1–18. pmid:31554857
  115. 115. Petri G, Scolamiero M, Donato I, Vaccarino F. Topological strata of weighted complex networks. PloS one. 2013;8(6):e66506. pmid:23805226
  116. 116. Stolz BJ, Harrington HA, Porter MA. The topological “shape” of Brexit. arXiv preprint arXiv:161000752. 2016.
  117. 117. Stolz BJ, Harrington HA, Porter MA. Persistent homology of time-dependent functional networks constructed from coupled time series. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2017;27(4):047410. pmid:28456167
  118. 118. Ignacio PSP, Darcy IK. Tracing patterns and shapes in remittance and migration networks via persistent homology. EPJ Data science. 2019;8(1):1.
  119. 119. Nguyen M, Aktas M, Akbas E. Bot detection on social networks using persistent homology. Mathematical and Computational Applications. 2020;25(3):58.
  120. 120. Piangerelli M, Maestri S, Merelli E. Visualising 2-simplex formation in metabolic reactions. Journal of Molecular Graphics and Modelling. 2020;97:107576. pmid:32179422
  121. 121. Giunti B, Houry G, Kerber M. Average complexity of matrix reduction for clique filtrations. arXiv preprint arXiv:211102125. 2021.
  122. 122. Topaz CM, Ziegelmeier L, Halverson T. Topological data analysis of biological aggregation models. PLoS One. 2015; p. e0126383. pmid:25970184
  123. 123. Solo V, Poline JB, Lindquist M, Simpson SL, Bowman CMK D, Cassidy B. Connectivity in fMRI: a review and preview. IEEE Transactions on Medical Imaging. 2018; p. in press.
  124. 124. Chung MK, Smith A, Shiu G. Reviews: Topological Distances and Losses for Brain Networks. arXiv e-prints. 2020; p. arXiv–2102.08623.
  125. 125. Bauer U. Ripser: efficient computation of Vietoris–Rips persistence barcodes. Journal of Applied and Computational Topology. 2021;5(3):391–423.
  126. 126. Sharathkumar R, Agarwal PK. Algorithms for the transportation problem in geometric settings. In: Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms. SIAM; 2012. p. 306–317.
  127. 127. Edmonds J, Karp RM. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM (JACM). 1972;19:248–264.
  128. 128. Kerber M, Morozov D, Nigmetov A. Geometry helps to compare persistence diagrams. Journal of Experimental Algorithmics. 2017;22.
  129. 129. Chung MK, Xie L, Huang SG, Wang Y, Yan J, Shen L. Rapid Acceleration of the Permutation Test via Transpositions. In: International Workshop on Connectomics in Neuroimaging. vol. 11848. Springer; 2019. p. 42–53.
  130. 130. Rieck JR, Baracchini G, Nichol D, Abdi H, Grady CL. Dataset of functional connectivity during cognitive control for an adult lifespan sample. Data in Brief. 2021;39:107573. pmid:34877370
  131. 131. Rieck JR, Baracchini G, Nichol D, Abdi H, Grady CL. Reconfiguration and dedifferentiation of functional networks during cognitive control across the adult lifespan. Neurobiology of Aging. 2021;106:80–94. pmid:34256190