Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Identifying and Characterizing Nodes Important to Community Structure Using the Spectrum of the Graph

  • Yang Wang,

    Affiliation Department of Systems Science, School of Management and Center for Complexity Research, Beijing Normal University, Beijing, People's Republic of China

  • Zengru Di,

    Affiliation Department of Systems Science, School of Management and Center for Complexity Research, Beijing Normal University, Beijing, People's Republic of China

  • Ying Fan

    yfan@bnu.edu.cn

    Affiliation Department of Systems Science, School of Management and Center for Complexity Research, Beijing Normal University, Beijing, People's Republic of China

Correction

13 Dec 2011: Wang Y, Di Z, Fan Y (2011) Correction: Identifying and Characterizing Nodes Important to Community Structure Using the Spectrum of the Graph. PLOS ONE 6(12): 10.1371/annotation/1935b388-2831-4fb1-b8f2-914ab91c1ddc. https://doi.org/10.1371/annotation/1935b388-2831-4fb1-b8f2-914ab91c1ddc View correction

Abstract

Background

Many complex systems can be represented as networks, and how a network breaks up into subnetworks or communities is of wide interest. However, the development of a method to detect nodes important to communities that is both fast and accurate is a very challenging and open problem.

Methodology/Principal Findings

In this manuscript, we introduce a new approach to characterize the node importance to communities. First, a centrality metric is proposed to measure the importance of network nodes to community structure using the spectrum of the adjacency matrix. We define the node importance to communities as the relative change in the eigenvalues of the network adjacency matrix upon their removal. Second, we also propose an index to distinguish two kinds of important nodes in communities, i.e., “community core” and “bridge”.

Conclusions/Significance

Our indices are only relied on the spectrum of the graph matrix. They are applied in many artificial networks as well as many real-world networks. This new methodology gives us a basic approach to solve this challenging problem and provides a realistic result.

Introduction

Networks, despite their simplicity, represent the interaction structure among components in a wide range of real complex systems, from social relationships among individuals, to interactions of proteins in biological systems, to the interdependence of function calls in large software projects. The network concept has been developed as an important tool for analyzing the relationship of structure and function for many complex systems in the last decades[1][5]. Many real-world systems show the existence of structural modules that play significant and defined functional roles, such as friend groups in social networks, thematic clusters on the world wide web, functional groups in biochemical or neural networks [6]. Exploring network communities is important for the reasons listed below [7]: 1) communities reveal the network at a coarse level, 2) communities provide a new aspect for understanding dynamic processes occurring in the network and 3) communities uncover relationships among the nodes that, although they can typically be attributed to the function of the system, are not apparent when inspecting the graph as a whole. As a result, it is not surprising that recent years have witnessed an explosion of research on community structure in graphs, and a huge number of methods or techniques have been designed [6], [8][17](see [9] as a review).

It is believed that community structure is important to the function of a system [18][20]. In many situations, it might be desirable to control the function of modular networks by adjusting the structure of communities. For example, in biological systems, one might like to identify the nodes that are key to communities and protect them or disrupt them, such as in the case of lung cancer [19]. In epidemic spreading, one would like to find the important nodes to understand the dynamic processes, which could yield an efficient method to immunize modular networks [20]. Such strategies would greatly benefit from a quantitative characterization of the node importance to community structure. Some important work related to this topic has been proposed. In 2006, Newman proposed a community-based metric called “Community Centrality” to measure node importance to communities [8]. His basic idea relies on the modularity function . Those vertices that contribute more to are more important for the communities than those vertices that contribute less. Kovacs et al. also proposed an influence function to measure the node importance to communities [21].

In fact, the important nodes can have distinct functions with respect to community structure. Some previous studies have also revealed such classifications. Guimera et al. have proposed a classification of the nodes based on their roles within communities, using their within-module degree and their participation coefficient [22]. They divided the hubs into three categories: provincial hubs, connector hubs and kinless hubs. Other approaches have also been suggested to discuss the connection between nodes and modularity in biological networks, by dividing hub nodes into two categories called “party hubs” and “date hubs” [23][25]. When removed from the network, party and date hubs have strikingly distinct effects on the overall topology of the network. Recently, Kovacs et al. proposed an interesting approach. They introduced an integrative method family to detect the key nodes, overlapping communities and “date” and “party” hubs [21]. In a very recent work, the authors mentioned that modular networks naturally allow the formation of clusters, and hubs connecting the modules would enhance the integration of the whole network, such as in the case of neuron networks [26]. As a result, it is intuitive that nodes that are important to communities can be divided into “community cores” and “bridges”. However, using the previous methods such as participation coefficient and the influence function to distinguish these two kinds of vertices, the exact communities of the network must first be given [21], [22]. In contrast, it is interesting to characterize node importance to communities without knowing the exact partition of the network.

It is understood that the adjacency matrix contains all the information of the network. Developing methods based only on the adjacency matrix of the network to detect important nodes to communities and then distinguish them as either “community core” or “bridge” is an interesting and important problem in network research. In this manuscript, based only on the adjacency matrix of the network, we try to access the fundamental questions: how to evaluate the node importance to communities and how to distinguish different kinds of important nodes? It is implied that in many cases the spectrum of the adjacency matrix gives an indication of the community structure in the network [27]. If the network has strong communities, the largest eigenvalues of the adjacency matrix are significantly larger than the magnitudes of all the other eigenvalues. These large eigenvalues are key quantities to the community structure. For this reason, we suggest a basic approach to solve the above open problem using the spectrum of the graph. We define the importance of nodes to communities as the relative change in the largest eigenvalues of the network adjacency matrix upon their removal. Furthermore, using the eigenvectors of the graph Laplacian, we divide the important nodes into community cores and bridges. We apply our method to many networks, including artificial networks and real-world networks. This new methodology gives us a basic approach to solve this challenging problem and provides a realistic result.

Methods

Centrality Metric Based on the Spectrum of the Adjacency Matrix

We consider a binary network with nodes. The adjacency matrix is the matrix with elements if there is an edge joining vertices and , otherwise . We denote each eigenvalue of by and the corresponding eigenvector by , such that . The eigenvectors are orthogonal and normalized. The eigenvalues are ordered by decreasing magnitude: . It is easy to show that is symmetric and the eigenvalues of are real. Consider the case of networks that have communities. It is implied that when these communities are disconnected, each one has its own largest eigenvalue. With proper labeling of the nodes, the matrix will have a block matrix structure with blocks. Blocks on the diagonal correspond to the adjacency matrices of the individual communities, while the off-diagonal blocks correspond to the edges between communities; in other words, we can consider them as a perturbation. Therefore, can be written as(1)where is a matrix whose diagonal block elements are the diagonal block elements of and whose off-diagonal block elements are zeros, while is a matrix with zeros on its diagonal blocks and with the off-diagonal blocks of as its off-diagonal block elements. Chauhan et al. have proved that if the perturbation strength is small, the largest eigenvalues of disconnected communities are perturbed more weakly than the perturbation applied [27]. The spectrum of the adjacency matrix of a network gives a clear indication of the number of communities in the network. If the network has strong communities, the largest eigenvalues are well separated from others. These eigenvalues are key quantities to the community structure.

For this reason, we define the importance of node to communities as the relative change in the largest eigenvalues of the network adjacency matrix upon its removal:(2)where is the number of communities. To avoid the computational cost, we use perturbation theory to provide approximations of in terms of the corresponding eigenvector . Let us denote the matrix before the removal of the node by and the matrix after the removal by ; the eigenvalue of this matrix is , and the corresponding eigenvector is . For large matrices, it is reasonable to assume that the removal of a node has a small effect on the whole matrix and the spectral properties of the network, so that and are small. We obtain(3)

The effect on the adjacency matrix of removing node is given by . We cannot assume that the is small because , so we set where is small and is the unit vector for the component. Left multiplying (3) by and neglecting second order terms and , we obtain(4)

For a large network (), we know that ; therefore, we can write(5)Because , we obtain(6)Finally, the importance of node to the community structure is obtained by(7)where is the number of communities, is the th element of and lies in the interval . If is large, node is important to the community structure; otherwise, is on the periphery of the community.

If a network which has nodes and communities, it indicates that . In order to let the sum of the index scales to 1, we define the new index as that obeys . Then we consider an ER random network with nodes as a null model, the network is homogeneous and there expects no important nodes to communities. So the index of each node in the null model would be . Thus could be a criterion to evaluate the significance of the nodes. If index of a node is large than we consider it as important nodes.

Using this metric , we can quantify the node importance to the community structure. If the node is important to the community structure, when we remove it from the network, the relative changes of the largest eigenvalues are large; otherwise, the changes are small. Before applying , the value of needs to be determined. The determination of the number of communities is important in community analysis and still open for researchers. Generally speaking, every algorithm for detecting communities should have a method to give the best number of the partition. So there are already some suggestions to determine the number of communities [9]. Using the spectrum of the graph is also an easy way to detect the optimal number of the communities [27], [28]. If is given, our method can characterize the node importance to communities without knowing the exact partition of the network.

Distinguish Two Kinds of Important Nodes

As mentioned above, there are two kinds of nodes that are important to communities. One is the “community core”, and the other is the “bridge” between communities. Each will affect communities deeply upon its removal. When we remove the “community core”, the community structure in the network will become fuzzy, while the community structure will become clear when we remove the “bridge”. See Fig. 1 for an example. Vertices 1 and 8 are the “community cores”, and they organize their respective communities. Meanwhile, node 15 is the “bridge” between the two communities. The “community core” is the leader in the community, and it can organize the function of each community. In contrast, the “bridge” connects the modules and can enhance the integration of the whole network. It is believed that a combination of both segregation and integration, such as in neural systems, is crucial [26]. It is clear that effectively disconnected and fully non-synchronous regions cannot allow collective or integrative action of the elements. Similarly, a fully synchronized regime does not allow separated or segregated performance of the elements. Therefore, both situations are biologically unrealistic, as can be seen from the existence of related conditions, such as epileptic seizures (collective phenomena) and Parkinson's disease (segregated phenomena) [29]. For this reason, both the “community core” and the “bridge” are important to communities, but they play different roles. The metric we proposed before can determine the nodes that are important to communities, but now a method to distinguish these two kinds of important nodes is needed.

thumbnail
Figure 1. Sketch of a network composed of 15 nodes.

The diameter of one vertex is proportional to the centrality metric . Moreover, the color of one vertex is related to the index -score. Red vertices behave like “overlapping” nodes or “bridges” between communities, and yellow vertices often lie inside their own communities.

https://doi.org/10.1371/journal.pone.0027418.g001

In agreement with earlier findings [21], [23][25], we assumed that bridge nodes should have more inter-modular positions than community cores. The existence of bridge nodes often leads to some inter-modular edges. Given a graph, the simplest and most direct way to construct a partition of the graph is to solve the mincut problem (minimize the number of edges between communities ) [30]. In practice, however, this method often does not lead to satisfactory partitions. The problem is that, in many cases, the solution of mincut simply separates one individual vertex from the rest of the graph. Of course, this is not what we want to achieve in clustering, as clusters should be reasonably large groups of points. Due to this shortcoming in the mincut problem, one common objective function to encode the desired information is RatioCut [31]:(8)where is the size of community . If the sizes of the communities are almost the same, the RatioCut problem reduces to the mincut problem.

The Condition of . If the network is divided into only two communities (), we define an index vector with elements:(9)Then the RatioCut function is obtained as follows [28]:(10)where is the number of vertices in the network and is the graph Laplacian. is defined as for and , where is the degree of node . We also have two constraints on : and . Here the partition problem is equal to the problem(11)If the components of the vector are allowed to take arbitrary values, it can be seen immediately that the solution of this problem is given by the vector that is the eigenvector corresponding to the second-smallest eigenvalue of , denoted by . So we can approximate a minimizer of RatioCut by the second eigenvector of . Unfortunately, the components of are only allowed to take two particular values.

Thus, the simplest solution is achieved by assigning vertices to one of the groups according to the sign of the eigenvector . In other words, we assign vertices as follows: if , we assign vertex to community ; otherwise, we assign it to . Assignation priority begins with the most positive and the most negative; the node with the most positive magnitude is first to be assigned to , then the second and so on, while the node with the most negative magnitude is similarly the first to be assigned to . If a node's corresponding element is close to zero, it may have nearly equal membership in both communities, and we can assign it to both communities. In conclusion, if the network is divided into only two communities, we can use this method to characterize which are the “community cores” and which are the “bridge” between communities. If node is a “community core”, is relatively large; otherwise, is near zero.

The Condition of . Consider the division of a network into nonoverlapping communities, where is the number of communities. We define an -index matrix with one column for each community, , by(12)Following the previous section, we obtain(13)where is the trace of a matrix and is the transpose matrix of . is a semi-positive and symmetric matrix. We can write , where is the eigenvector of , and is the diagonal matrix of eigenvalues . We therefore obtain(14)It can also be written as(15)Now we define the vertex vector of as , and let(16)If the network has almost equal-sized communities, then equation (15) can be written as(17)where is the set of vertices belonging to community and is the community size.

Minimizing the RatioCut can be equated with the task of choosing the nonnegative quantities so as to place as much of the weight as possible in the terms corresponding to the low eigenvalues and as little as possible in the terms corresponding to the high eigenvalues. This equates to the following maximization problem:(18)where is a parameter. We could choose if the community structure was clear. To this end, we propose an easy way to distinguish two kinds of important nodes using the theory of the graph Laplacian. If the community structure is quite clear, we focus on the vertex vector magnitude in the first terms, denoted by the :(19)

If the index of a given vertex is nearly zero, it indicates that the presence of that node results in a large RatioCut. Thus it is considered as a “bridge” node. Moreover, it also need to state the criterion of the index . The same as in Eq. (7), for a network with nodes and communities, it indicates that . We can also define the new index as and then . Then we consider an ER random network with nodes as a null model, the network is homogeneous and there expects no “bridge” nodes to communities. So the index of each node in the null model would be . Thus could also be a criterion to evaluate the “bridgeness” of the nodes. If the -score of a given vertex is smaller than , we believe that this vertex has nearly equal membership in more than one community, and it is likely to be the “bridge” of these communities. This discrimination process equates to the “fuzzy” division of the network into communities. In many cases, this type of fuzzy division could result in a more accurate picture of real-world networks.

Our method requires less computational cost than other methods. Since most of the real-world network is sparse, combining the Lanczos and QL algorithms, we expect to be able to find all eigenvalues and eigenvectors of a sparse symmetric matrix in time , where and is the number of edges and nodes, respectively [32]. On the other hand, the method proposed in Ref. [8] is slower than ours since the modularity matrix is not sparse. So from this point of view, our method has the advantage compared with the method proposed in Ref. [8]. On the other hand, the method proposed by Ref. [21] has runtime complexity and .

Results

Now we test the validity of our indices and -score introduced before in various artificial networks and real-world networks.

Artificial Networks

First, we consider a sketch composed of 15 nodes (see Fig. 1) formed by two communities. It is intuitive that vertices 1, 8 and 15 are important to the community structure in this sketch. Vertices 1 and 8 are the so-called “community cores”, and they organize both the communities. Vertex 15 is the “bridge” between communities, and it connects these two communities. As we discussed before, removing vertex 1 or 8 will make the community structure fuzzy, and removing vertex 15 will make it clear.

Here we use the index proposed by Hu et al.[14] to measure the significance of communities:(20)where is the eigenvalue of the graph Laplacian, is the average value of through , is the average degree of the network and is the number of vertices in the network. In networks with strong communities (many links are within communities with very sparse connections outside), is always large. Here we focus on the change of due to the removal of vertices, denoted by . We also use the centrality metric proposed by Newman [8], which we denote here by . The results are shown in Tab. 1. Through , it is implied that vertices 1 and 8 are more important than other vertices because the magnitude of is relatively larger than others. Moreover, their removal makes the communities fuzzy, while vertex 15 acts like a “bridge” between the communities, and its removal makes the communities clear. We can see that our centrality metric performs quite well; it can identify not only the “community cores”, but also the “bridge” between communities. can also identify the “community cores”, but it has some problems. One issue is that its values tend to span a rather small dynamic range from largest to smallest. Moreover, in some cases (such as this sketch), cannot recognize important vertices among communities. In calculating the index , we need to go through every vertex in the network, incurring significant computational cost. In contrast, our method provides a more efficient way, requiring less computational cost, and yields the correct answer.

Here we use the classical GN benchmark presented by Girvens and Newman to test the measurements [12]. Each network has nodes that are divided into four communities (c = 4) with 32 nodes each. Edges between two nodes are introduced with different probabilities, which depend on whether the two nodes belong to the same community or not. Each node has links on average with its fellows in the same community and links with the other communities, and we impose . The communities become fuzzier and thus more difficult to identify as increases. Because the GN benchmark is a homogenous network, there should not be any nodes that are important to the community structure. To check whether our conjecture is correct or not, we let so that the community structure is quite clear and average the result for the GN benchmark over 100 configurations of networks. From the result, all the nodes' index lie in the interval . The mean value of is 0.0078, and the standard deviation is 0.0008. It can be concluded that, in the GN benchmark, there are no nodes that are important to the community structure.

We may also test the method on the more challenging LFR benchmark presented by Lancichinetti et al.[33]. In the LFR benchmark, the degree distribution obeys a power-law distribution , and the sizes of the communities are also taken from a power-law distribution with an exponent . Moreover, each node shares a fraction of its links with other nodes of its own community and a fraction with others in the rest of the network. The community structure can be adjusted by the mixing parameter . Without loss of generality, we let and the size of the network . Our numerical results in the LFR benchmark are shown in Fig. 2. In this case, there is no “bridge” between communities because . We may also calculate the -score, of which the mean value is 0.001 and the standard deviation is . which indicates that there is no obvious “bridge” nodes in LFR benchmark. Moreover, the centrality metric is positively correlated with node degree (), but some vertices have quite high centrality while having relatively low degree, and thus the correlation index is not very high. Moreover, we have varied the parameter in the LFR benchmark and given the changes of indices with the change of . In the related calculations, we used the predetermined number of communities as the in the metrics. Because if the whole network becomes fuzzy and how to determine the community number is a tough problem. We consider the largest degree nodes in both the biggest and the smallest communities and the results are obtained by averaging over 20 independent realizations. From the result in Fig. 3, it is implied that with the network become fuzzy, the index of the largest degree nodes in both the biggest and the smallest communities tend to become bigger while the index -score becomes smaller.

thumbnail
Figure 2. The distribution of index and the correlation between and node degree in LFR benchmark.

(a) The Zipf plot of the nodes' centrality to communities. The dash line indicates the threshold . (b) The centrality metric we propose is correlated with node degree. The parameters in the LFR benchmark are as follows: and the size of the network .

https://doi.org/10.1371/journal.pone.0027418.g002

thumbnail
Figure 3. The indices and -score as a function of the parameter in LFR benchmark.

The parameters in the LFR benchmark are as follows: and the size of the network . The results are obtained by averaging over 20 independent realizations.

https://doi.org/10.1371/journal.pone.0027418.g003

Real-world Networks

We apply our method to some real-world networks, such as the Zachary club network [34], the word association network [35], the scientific collaboration network [36], and the C. elegans neural network [37].

First, we consider a famous example of a social network, the Zachary's karate club network. This network represents the pattern of friendships among members of a karate club at a North American university. It contains 34 vertices, and the links between vertices are the friendships between people. The nodes labeled as 1 and 34 correspond to the club instructor and the administrator, respectively. They had a conflict which resulted in the breakup of the club. Most other nodes have a relationship with node 1, node 34, or both. In this network, . The numerical results are shown in Fig. 4 and Fig. 5. In Fig. 4(a), we can see that nodes 1 and 34 are the most important nodes in the communities. Our method to distinguish important nodes are shown in Fig. 4(b). Node 3 is considered as a “bridge” node between communities and displays a smaller value of -score. Moreover, we compared the “bridge” nodes with overlapping nodes found by the method suggested in Ref. [38]. We found that the two results are usually consistent with each other. That means the bridges are usually overlapping nodes, such as node 3. However, there are some differences. For instance, our method considers vertex 14 as a bridge node while in Ref. [38] the authors doesn't consider it as an overlapping node. However, vertex 14 has the degree 5 and it links both communities so considering it as a bridge node is also acceptable. From what we discussed before, bridge nodes are more likely to be overlapping nodes. Furthermore, we compare our method with Newman's. This result is also shown in Fig. 4(a), and the two metrics are normalized by(21)where is the average value of each index and is the standard deviation of each index. It is implied that these two methods have some differences. In our method, nodes 1 and 34 are absolutely more important than other nodes, while in Newman's method, nodes 2 and 33 are also quite important, even more than node 1. In this network, the modularity function reaches its maximum value when the network is divided into 4 communities; this fact may be the cause of the differences between the results of these two methods. The visualization of the karate network with our two measurements is sketched in Fig. 5. The diameter of each vertex is proportional to the centrality metric . A large diameter indicates an important vertex. Additionally, the color of each vertex is related to the index -score. Red vertices behave like “overlapping” nodes or “bridges” between communities, and yellow vertices often lie inside their own communities.

thumbnail
Figure 4. The usage of our method in Zachary's karate club network.

It is shown that our method works quite well. Nodes 1 and 34 are the instructor and the administrator, respectively. In Fig. 4(a), we can see that these two nodes are more important to the community structure than other nodes. We also compare our method with Newman's and find that the two methods exhibit some differences. In Fig. 4(b), it is implied that Node 3 is likely to be a “bridge” node since it displays a rather low -score.

https://doi.org/10.1371/journal.pone.0027418.g004

thumbnail
Figure 5. Sketch of the Zachary's karate club network, which is composed of 34 vertices.

Vertex diameters indicate the community centrality . The color of each vertex is proportional to the index -score.

https://doi.org/10.1371/journal.pone.0027418.g005

Second, we analyze the word association network starting from the word “Bright”. This network was built on the University of South Florida Free Association Norms [35]. An edge between words A and B indicates that some people associate the word B to the word A. The graph displays four communities, corresponding to the categories Intelligence, Astronomy, Light, Colors. The word Bright is related to all of them by construction. We applied our method to this network, and the results are shown in Fig. 6. From the results, we can observe that our method considers Bright, Sun, Smart, Moon as important nodes to the community structure. It may be inferred from the result that Moon and Smart are the “community cores”, while Bright and Sun are the “bridges” between communities. Indeed, our metric yields the correct answer. For example, Smart is the core of the community Intelligence, while Moon is the core of the community Astronomy. Meanwhile, the -score of node Bright is 0.006, which is close to zero. We would therefore conclude that it is a “bridge” between communities, and Bright is in fact the “bridge” among these four communities, as the network was originally derived from it. Moreover, we have investigated the effect of node removal on the indices and and the results show that the removal of “community core” makes the network fuzzy while the community structure becomes clear when the “bridge” is removed.

thumbnail
Figure 6. Index and -score for the nodes of the word association network.

The node importance versus vertex rank is shown in (a). In (b), we distinguish “community cores” and “bridges” using the index -score.

https://doi.org/10.1371/journal.pone.0027418.g006

We may also apply our method to social networks, such as the scientist collaboration network [36], and neural networks, such as the C. elegans neural network [37]. We analyzed the largest connected component of each network. The scientist collaboration network represents scientists whose research centers on the properties of networks of one kind or another. There are 379 vertices, representing scientists who are divided into 12 communities. Edges are placed between scientists who have published at least one paper together. The neural network of C. elegans contains 302 neurons and 2,359 links. This network is divided into 3 communities, with each node representing a neuron and each link representing a synaptic connection between neurons. Here we consider the C. elegans neural network to be undirected. The results are shown in Fig. 7.

thumbnail
Figure 7. The usage of our method in scientist collaboration network and C. elegans neural network.

The centrality metric and -score for the scientist collaboration network (a,b). The centrality metric and -score are also calculated in the C. elegans neural network (c,d).

https://doi.org/10.1371/journal.pone.0027418.g007

In the scientist collaboration network, our centrality metric identifies “group leaders”, such as M. Newman, S. Boccaletti, and A. Barabasi. Their -scores are not very large because they often have some collaboration between scientists outside their own communities. We can also find so-called “community cores” based on our method, such as R. Sole, and “bridge” vertices among some communities, such as B. Kahng. As we know, the C. elegans neural networks are composed of sensory neurons, interneurons and motor neurons. The neurons with high centrality metrics often have the most important functions, and all of them are interneurons, such as , , , and . These classes, which synapse onto motor neurons in the ventral cord, are among the most prominent neurons in the whole nervous system. They generally have larger-diameter processes than other neurons and have many synaptic connections [37], [39]. As a result, they have larger than other vertices, while the typical -score in these classes is quite small. In the C. elegans neural network, most of the important nodes are likely to be “bridge” nodes since the connection between communities is more necessary and frequent due to some special functions.

Applications in Weighted networks

Our method can be generalized to weighted networks because the adjacency matrix in an undirected weighted network is real and symmetric. Thus, in weighted networks, the importance of a node and its role in communities are also characterized by its and -score. Let us first consider an artificial weighted network. We use similarity weight in this weighted network. A higher weight means a closer relationship between vertices. At first, 10 nodes form a complete network and are divided into two communities with 5 nodes each. We assign vertices 4 and 9 as the core of each community, each of which has links with weight 2 connecting to vertices within its community and weight 0.2 connecting to outside vertices. All other intra-connections have weight 1, and all other inter-connections have weight 0.2. Then we introduce vertex 11 as the bridge between the two communities. It connects to all 10 nodes with weight 1. The index and -score for each node are given in Tab. 2. The results indicate that vertices 4, 9 and 11 are more important than the other vertices, while vertex 11 is a “bridge” between these two communities. Our method works quite well in this small artificial weighted network.

thumbnail
Table 2. Centrality metrics and -score in a complete weighted network.

https://doi.org/10.1371/journal.pone.0027418.t002

As an example of a real-world weighted network, we investigate the collaboration network among scientists working at the Santa Fe Institute (the SFI network). Here we consider it as a weighted, undirected network. Collaboration events between the scientists can be repeated again and again, and a higher frequency of collaboration usually indicates a closer relationship. Furthermore, weights can be assigned to the scientists' collaboration quite naturally: an article with authors corresponds to a collaboration act of weight between every pair of its authors [40]. The results for the SFI collaboration network are sketched in Fig. 8. Vertex diameters indicate the community centrality . The color of each vertex is proportional to the index -score. Red vertices behave like “overlapping” nodes or “bridges” between communities, and yellow vertices often lie inside their own communities. We do not know the specific names; however, we observe that the positions of the large vertices are just like the “group leaders”. Vertices 2, 12 and 24 are so-called “community cores” in communities because their -scores are quite large. In fact, they are the group leaders in the fields of Mathematical Ecology, Statistical Physics and Structure of RNA, respectively. However, vertices 1, 9 and 11 are the “bridges” between communities, and they have relative small -scores. Interestingly, the result in the weighted network is different from the one in the corresponding unweighted network. It can be concluded that the edge weight may affect the result. For example, vertex 9 and vertex 11 collaborate quite often; this makes both of them quite important in a weighted network, while in an unweighted network, neither of them is very important to the community structure.

thumbnail
Figure 8. Sketch of the SFI scientific collaboration network as a weighted, undirected network.

It has 118 scientists. Vertex diameters indicate the community centrality . The color of each vertex is proportional to the index -score.

https://doi.org/10.1371/journal.pone.0027418.g008

Discussion

In this paper, we characterize the node importance to community structure using the spectrum of the graph. The eigenspectrum of the adjacency matrix gives a clear indication of the number of “dominant” communities in a network [27]. We give a centrality metric based on the spectrum of the adjacency matrix of the graph, and it can identify the nodes important to the community structure in many cases. In addition, we propose an index to distinguish the two kinds of important nodes that we term “community cores” and “bridges” using the spectrum of the graph Laplacian. We demonstrate a variety of applications of our method to both artificial and real-world networks representing social and neural networks. Our method works well in many cases without knowing the exact community structure, although the number of communities should be known.

If the network have very heterogeneous cluster sizes the limitation is likely to occur. There are two results for the limitation that are both related with the properties of the adjacency matrix. One is that we cannot find the real community structure when communities are very different in size. In Ref. [27], the authors have proved that if where is the size of the communities, the method cannot detect the small communities. The other problem is that when communities are very different in size, even we know the real communities by other methods, the index may not show the real importance of the node in small communities because the index is also based on the spectrum of the adjacency matrix. Considering a network composed with two isolated communities. The size of the smaller one is always 10 and we define . Let each community be an ER random network with the probability of connecting . The numerical result in Fig. 9 shows the similar limitation of the index . It cannot identify the important nodes in the small communities when the communities are in very different size.

thumbnail
Figure 9. To test the limitation of our method.

Considering a network composed with two communities but these two communities are not connected with each other and the size of the smaller one is always . The figure shows the index as a function of where the probability of connecting an edge between two nodes in each community .

https://doi.org/10.1371/journal.pone.0027418.g009

Our method can also be used in weighted networks. From our result in the SFI network, it can be inferred that edge weight may affect the result. Furthermore, it may generalize to directed networks because the Perron-Frobenius eigenvalues are often real and positive [41]. We have yet to treat the case of directed networks. The identification of such key nodes is important and could potentially be used to identify the organizer of the community in social networks, to develop an immunization strategy in an epidemic process, to identify key nodes in biological networks and so on. We hope our results may be helpful to future research.

Acknowledgments

The authors thank Di Huan, An Zeng, and Hongzhi You for help in the analysis of C. elegans neural network.

Author Contributions

Conceived and designed the experiments: YF ZD. Performed the experiments: YW. Analyzed the data: YW. Wrote the paper: YW ZD.

References

  1. 1. Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74: 47–97.
  2. 2. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45: 467–256.
  3. 3. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286: 509–512.
  4. 4. Watts DJ, Strogatz S (1998) Collective dynamics of ‘small-world’ networks. Nature 393: 440–442.
  5. 5. Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU (2006) Complex networks: Structure and dynamics. Physics Reports 424: 175–308.
  6. 6. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99: 7821–7826.
  7. 7. Lancichinetti A, Kivela M, Saramaki J, Fortunato S (2010) Characterizing the community structure of complex networks. PloS ONE 5: e11976.
  8. 8. Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74: 036104.
  9. 9. Fortunato S (2010) Community detection in graphs. Physics Reports 486: 75–174.
  10. 10. Wu F, Huberman BA (2004) Finding communities in linear time: A physics approach. Eur Phys J B 38: 331–338.
  11. 11. Gfeller D, Ghappelier JC, De Los Rios P (2005) Finding instabilities in the community structure of complex networks. Phys Rev E 72: 056135.
  12. 12. Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103: 8577–8582.
  13. 13. Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72: 027104.
  14. 14. Hu Y, Ding Y, Fan Y, Di Z (2010) How to measure significance of community structure in complex networks. ArXiv:1002.2007v1.
  15. 15. Hu Y, Nie Y, Yang H, Cheng J, Fan Y, et al. (2010) Measuring the significance of community structure in complex networks. Phys Rev E 82: 066106.
  16. 16. Karrer B, Levina E, Newman MEJ (2008) Robustness of community structure in networks. Rhys Rev E 77: 046119.
  17. 17. Lancichinetti A, Radicchi F, Ramasco JJ (2010) Statistical significance of communities in networks. Phys Rev E 81: 046110.
  18. 18. Spirin V, Mirny LA (2003) Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci USA 100: 12123–12128.
  19. 19. Sun L, Li M, Jiang L, Tan L (2007) Comparative analysis of the gene co-regulatory network of normal and cancerous lung. Physica A 384: 739–746.
  20. 20. Liu Z, Hu B (2005) Epidemic spreading in community networks. Europhys Lett 72: 315.
  21. 21. Kovacs IA, Palotai R, Szalay MS, Csermely P (2010) Community landscapes: an integrative approach to determine overlapping network module hierarchy, identify key nodes and predict network dynamics. PLoS ONE 5: e12528.
  22. 22. Guimera R, Amaral LAN (2005) Functional cartography of complex metabolic networks. Nature 433: 895–900.
  23. 23. Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, et al. (2004) Evidence for dynamically organized modularity in the yeast proteincprotein interaction network. Nature 430: 88–93.
  24. 24. Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, et al. (2007) Still stratus not altocumulus: further evidence against the date/party hub distinction. PLoS Biol 5: e154.
  25. 25. Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, et al. (2006) Stratus not altocumulus: A new view of the yeast protein interaction network. PLoS Biol 4: e317.
  26. 26. Zhao M, Zhou C, Chen Y, Hu B, Wang B (2010) Complexity versus modularity and heterogeneity in oscillatory networks: Combining segregation and integration in neural systems. Phys Rev E 82 82: 046225.
  27. 27. Chauhan S, Girvan M, Ott E (2009) Spectral properties of networks with community structure. Phys Rev E 80: 056114.
  28. 28. Luxburg UV (2007) A tutorial on spectral clustering. Statistics and Computing 17: 395–416.
  29. 29. Stam CJ (2005) Nonlinear dynamical analysis of eeg and meg: Review of an emerging field. Clin Neurophysiol 116: 2266–2301.
  30. 30. Fiedler M (1973) Algebraic connectivity of graphs. Czech Math J 23: 298–305.
  31. 31. Hagen L, Kahng A (1992) New spectral methods for ratio cut partitioning and clustering. IEEE Trans Computer-Aided Design 11: 1074–1085.
  32. 32. Newman MEJ (2010) Networks: An Introduction. Oxford UK: Oxford University Press.
  33. 33. Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78: 046110.
  34. 34. Zachary WW (1977) An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33: 452–473.
  35. 35. Nelson DL, McEvoy CL, Schreiber TA (1998) The university of south florida word association, rhyme, and word fragment norms.
  36. 36. URL http://www-personal.umich.edu/mejn/netdata/.
  37. 37. White JG, Southgate E, Thomson JN, Brenner S (1986) The structure of the nervous system of the nematode caenorhabditis elegans. Philos Trans R Soc London B 314: 1–340.
  38. 38. Li D, Leyva I, Almendral JA, Sendi∼na Nadal I, Buldú JM, et al. (2008) Synchronization interfaces and overlapping communities in complex networks. Phys Rev Lett 101: 168701.
  39. 39. Tsalik EL, Hobert OL (2003) Functional mapping of neurons that control locomotory behavior in caenorhabditis elegans. Neurobiol J 56: 178–197.
  40. 40. Ramasco JJ, Morris SA (2003) Social inertia in collaboration networks. Phys Rev E 73: 016122.
  41. 41. MacCluer CR (2000) The many proofs and applications of perron's theorem. SIAM Rev 42: 487–498.