From graph topology to ODE models for gene regulatory networks

Xiaohan Kang; Bruce Hajek; Yoshie Hanzawa

doi:10.1371/journal.pone.0235070

Abstract

A gene regulatory network can be described at a high level by a directed graph with signed edges, and at a more detailed level by a system of ordinary differential equations (ODEs). The former qualitatively models the causal regulatory interactions between ordered pairs of genes, while the latter quantitatively models the time-varying concentrations of mRNA and proteins. This paper clarifies the connection between the two types of models. We propose a property, called the constant sign property, for a general class of ODE models. The constant sign property characterizes the set of conditions (system parameters, external signals, or internal states) under which an ODE model is consistent with a signed, directed graph. If the constant sign property for an ODE model holds globally for all conditions, then the ODE model has a single signed, directed graph. If the constant sign property for an ODE model only holds locally, which may be more typical, then the ODE model corresponds to different graphs under different sets of conditions. In addition, two versions of constant sign property are given and a relationship between them is proved. As an example, the ODE models that capture the effect of cis-regulatory elements involving protein complex binding, based on the model in the GeneNetWeaver source code, are described in detail and shown to satisfy the global constant sign property with a unique consistent gene regulatory graph. Even a single gene regulatory graph is shown to have many ODE models of GeneNetWeaver type consistent with it due to combinatorial complexity and continuous parameters. Finally the question of how closely data generated by one ODE model can be fit by another ODE model is explored. It is observed that the fit is better if the two models come from the same graph.

Citation: Kang X, Hajek B, Hanzawa Y (2020) From graph topology to ODE models for gene regulatory networks. PLoS ONE 15(6): e0235070. https://doi.org/10.1371/journal.pone.0235070

Editor: Enrique Hernandez-Lemus, Instituto Nacional de Medicina Genomica, MEXICO

Received: January 24, 2020; Accepted: June 8, 2020; Published: June 30, 2020

Copyright: © 2020 Kang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The computer simulation source code is available at https://github.com/Veggente/graph-ode.

Funding: This work was supported by the Plant Genome Research Program from the National Science Foundation (NSF-IOS-PGRP-1823145) to B.H. and Y.H., and by the Communication and Information Foundations program from the National Science Foundation (NSF-CCF-CIF-1900636) to B.H.

Competing interests: The authors have declared that no competing interests exist.

Introduction

A gene regulatory network is a collection of molecular classes such that each molecular class interacts with a small number of other molecular classes, creating a sparse graph structure [1]. A goal of systems biology is to understand gene regulatory networks and infer them from data [2, 3]. A directed graph with vertices representing genes and signed edges representing gene-to-gene interactions, also known as a circuit model [4] or a logical model [5], is a model with a high level of abstraction (see S1 Appendix). The vertices of such graph models often only consist of the genes but not the properties of the derived proteins because the latter information is usually not available. An ordinary differential equation (ODE) model is far more detailed than a graph model: they quantitatively describe the dynamics of the time-varying mRNA and protein concentrations of the genes, and can be used to capture complex effects, including protein–protein interaction, post-translational modification, environmental signals, diffusion of proteins in different parts of the cell, and various time constants. As a result, ascribing a directed graph to a biologically plausible gene regulatory network can miss important biological details and dynamics because of the abstraction. However, it is significantly more challenging to ascribe a particular ODE model to a gene regulatory network than to ascribe a directed graph because an ODE model requires much finer classification with possibly orders of magnitude more amount of data. As one example, the work [6] is notable for successful identification of an ODE model that captures the gene regulatory network underlying the dynamics of the circadian clock. The ODE model in [6] is based on a number of previous empirical and modeling studies, and it is shown that parameters for the model can be selected to give a good match to the data. In general, however, without such prior knowledge, the relation between the graph models and the ODE models is unclear. The purpose of this paper is to explore the connections between the two types of models.

We propose a property of the ODE models, called the constant sign property (CSP), such that an ODE model corresponds to a single graph model under a set of conditions if and only if the ODE model satisfies CSP under that set of conditions. An ODE model is said to satisfy global constant sign property (GCSP) if it satisfies CSP under all conditions, in which case the ODE model corresponds to a single graph model. Typically, an ODE model corresponds to different graph models under different conditions characterizing the context-dependent and time-varying nature of biological systems [7, 8]. An ODE model that does not satisfy GCSP is illustrated in Fig 1.

Download:

Fig 1. Network reconstruction for an ODE model in the study [9] without global CSP.

The ODE model f governs the dynamics of all parts of the plant, and expression data collected from different parts of a plant (flower vs. leaf) can correspond to different graph models.

https://doi.org/10.1371/journal.pone.0235070.g001

One particularly rich class of ODE models that satisfy GCSP are based on GeneNetWeaver [10, 11], the software used to generate expression data in DREAM challenges 3–5 [11–13] and recently applied to single-cell analysis [14, 15]. In these ODE models a layer of intermediate elements called modules are constructed with transcription factors (TFs) as their input and target genes their output. The activity level of a module depends on its input and its type, and determines the production rate of its output. The modules model the binding of protein complexes to DNA in transcriptional regulation. TFs can regulate the target gene through one or multiple modules. Assuming for each TF and each target gene there is only one module that takes the TF as an input and the target gene as an output, we show that CSP is satisfied, so each GeneNetWeaver ODE model has a well-defined graph model associated with it. The combinatorial nature of the number of possible module configurations (i.e., the number of the modules and their input and output) and the continuous value parameters make the GeneNetWeaver ODE models extremely rich.

The organization of this paper is as follows. In the first subsection of the Materials and Methods section, we describe the ODE models and the graph models, and propose two notions of CSP. In the second subsection of the Materials and Methods section, we describe ODE models based on GeneNetWeaver. The Results section has three subsections. In the first, a relation of the two notions of CSP is provided. In the second, the GeneNetWeaver ODE models are shown to satisfy the constant sign property, and their complexity is investigated. In the third, a case study of a core soybean flowering network based on the literature is presented to demonstrate the use of the GeneNetWeaver ODE models. First it is illustrated that a single signed, directed graph model has a large space of consistent ODE models. Second, to study how different the GeneNetWeaver ODE models are, we explore the problem of numerically fitting parameters of one ODE model to synthetic expression data generated from another. The generalization, implication and limitation of CSP are discussed before the concluding remarks.

Materials and methods

ODE model and constant sign property

In this section we define the constant sign property, a property under which ODE models are consistent with signed directed graphs. Roughly speaking, CSP holds when unilaterally increasing the expression level of one gene causes the expression level of another gene to move in one direction. In other words, the effect of one regulator gene has a constant sign on a target gene. In rare cases, CSP may hold globally, regardless of the expression levels of all the genes and the concentrations of any other molecular classes. More generally, CSP may hold only for a set of expression levels and system parameters, leading to a local definition. We present the precise definition of CSP in this section.

Let x₁(t), x₂(t), …, x_n(t) be the mRNA abundances for the n genes (the observables) at time t. Let x_n+1(t), x_n+2(t), …, x_n+m(t) be the protein concentrations (the unobservables) at time t, which may include derived (protein complexes and modifications like protein phosphorylation) and localized (e.g., cytoplasmic and nuclear) proteins. Let x_n+m+1(t), x_n+m+2(t), …, x_n+m+l(t) be the strengths of the chemical and environmental signals (the controllables, e.g., temperature and photoperiod) at time t. Let x(t) = (x_i(t):i∈[n+m+l]) be the system state at time t, where [n] denotes the set of integers {1, 2, …, n}. Let be the parameters of the ODE model and let be the time derivative of x_i as a function of the (n + m + l)-dimensional system state and the parameters for i ∈ [n + m]. Note the domain of f_i is assumed to be the entire Euclidean space rather than a subset of it without loss of generality because one can always restrict f_i to a subset of states that x takes. Examples of f for the single-input case (n + m + l = 1) include the Michaelis–Menten kinetics and the more general Hill kinetics. Examples of f for the multi-input case (n + m + l ≥ 2) include the Shea–Ackers model [16, 17], which is the average production rate based on a Gibbs measure of the control states, and the GeneNetWeaver model to be discussed later in this paper, which models the additive effect of multiple intermediate Shea–Ackers type modules. Both the Shea–Ackers model and the GeneNetWeaver model generalize the Hill kinetics to multi-input scenarios in their own ways and are, among many other sophisticated ODE models, within the framework of ODE models in this paper.

Formally, given the numbers of molecular classes (i.e., n classes of mRNAs, m classes of proteins, and l classes of molecular signals), the dynamics of an ODE model are characterized by the collection of time derivatives for the uncontrollable variables f = (f_i:i∈[n + m]). In the rest of the paper an ODE model refers to the collection of the functions f. The trajectories of the mRNA and protein concentrations evolving with time depend on , where are the initial conditions of the mRNAs and proteins at time 0, are the predefined external signal strengths for all time, and are the parameters. The trajectories can then be obtained by solving the following initial value problem.

Note the signals (x_i:n + m + 1≤i≤n + m + l) are exogenously controlled and not solved via the equations. In this paper we assume existence and uniqueness of the solution on the entire positive time horizon for ease of exposition. The concept of CSP can be easily generalized to ODE models where only local solutions exist.

Infinitesimal monotonicity.

We first define a version of monotonicity called infinitesimal monotonicity such that CSP using this definition of monotonicity can be applied to a broad class of ODE models.

Roughly speaking, infinitesimal monotonicity characterizes the monotone influence of one observed variable on another over a sufficiently short period of time. Such monotonicity depends on the current system state. For each regulator–target pair, to avoid external and indirect influence, we clamp the exogenous signals as well as the observed variables other than the target to their initial values, so only the unobserved variables and the target observed variable are allowed to change with time. The clamped value of the regulator can be perturbed. A change in the constant value of the regulator can cause a change in the target observed variable in continuous time, possibly through one or multiple unobserved variables. The system with the input at the regulator observable and output at the target observable is thus treated as a black box in the sense that one does not need to know its internal states (the unobservables) to determine the infinitesimal monotonicity of the system. This assumes that the initial internal states are fixed.

Given the ODE model f, and given a state and parameters , let j be the target gene and let the dynamics of the clamped ODE model be driven by for any k ∈ [n + m + l]. Here [a: b] denotes the set of integers {a, a + 1, …, b}. Then determines the dynamics of a system where the mRNA abundances and exogenous signals remain constant across time except for the mRNA abundance of gene j. Fix a potential regulator gene i ≠ j and let be the solution of the initial value problem with initial condition (x_i + h, x_−i), dynamics , parameters λ. Note here η^(j) also includes the clamped exogenous signals. Also note that for any t we have and

The following definition gives a precise characterization of the target gene expression to be strictly increasing or decreasing with respect to the regulator gene expression in a small future time period.

Definition 1 (Infinitesimal monotonicity). For an ODE model f at state x with parameters λ and (i, j)∈[n]² with i ≠ j, the infinitesimal monotonicity for i on j is given by

Equivalently, in less mathematical terms, B_inf(i, j, x, λ) = ∅ indicates gene i does not affect gene j at state x and parameters λ. The cases with B_inf(i, j, x, λ) = {1} and {−1} indicate gene i activates or represses gene j, respectively, at state x and parameters λ in a small time period with small perturbation. The case with B_inf(i, j, x, λ) = {1, −1} indicates gene i does not affect gene j in a monotone way.

Remark 1. Note the case B_inf(i, j, x, λ) = {1, −1} can happen when the expression level of the target gene j reaches the maximum with respect to x_i, so that a change of x_i in either direction will cause the solution to decrease for small t, in which case the monotonicity is indeterminate (neither increasing nor decreasing).

In practice the values of x and λ may be unknown, so we are interested in how B_inf varies with x and λ. Usually we expect some level of continuity of B_inf with respect to x and λ, so the infinitesimal monotonicity of the ODE model may be consistent in a small set of (x, λ) pairs, denoted by S. In the case when S equals the entire state–parameter space, the infinitesimal monotonicity is consistent globally. The following definition generalizes Definition 1 by checking the consistency of infinitesimal monotonicity over a set S, and defines an associated graph.

Definition 2 (Infinitesimal gene regulatory graph). The infinitesimal gene regulatory graph of an ODE model f over is given by a graph , where the set of edge labels B_inf(S) = (B_inf(i, j, S):(i, j)∈[n]², i≠j) is defined by and the set of edges is

Equivalently, in less mathematical terms, B_inf(i, j, S) = ∅ indicates gene i does not affect gene j when (x, λ) is in S. The case with B_inf(i, j, S) = {1} indicates gene i can increase gene j for some (x, λ) in S, but cannot decrease gene j for any (x, λ) in S. The case with B_inf(i, j, S) = {−1} indicates gene i can decrease gene j for some (x, λ) in S, but cannot increase gene j for any (x, λ) in S. The case with B_inf(i, j, S) = {1, −1} indicates the monotonicity is indeterminate over S.

Definition 3 (Infinitesimal constant sign property). An ODE model f satisfies the infinitesimal constant sign property over if or B_inf(i, j, S) = {−1}. In other words, the ODE model satisfies infinitesimal constant sign property on S if no pair of (i, j) has indeterminate monotonicity on S.

Remark 2. The set S represents the set of states where the infinitesimal CSP holds. If S is the entire state space then we say the infinitesimal CSP holds globally. Complex biological systems usually do not satisfy CSP globally, but may satisfy CSP locally over the set S where the system states reside. For example, in Fig 1, the gene expressions in the flowers may be contained in set S₁ where the infinitesimal CSP is satisfied with a gene regulatory graph G₁, while the gene expressions in the leaves may be contained in set S₂ that does not intersect with S₁, and the infinitesimal CSP is satisfied with a different gene regulatory graph G₂.

Sum–product monotonicity.

Infinitesimal monotonicity gives a natural notion of monotonicity, but it is expressed in terms of the solutions of the differential equations, and solving the differential equations can be analytically challenging and numerically unstable. Hence, in this section we focus on ODE models with a smooth f and propose another notion of monotonicity that does not require solving the system of ODEs.

Definition 4 (Molecular graph). The molecular graph of an ODE model is a graph whose vertices are the internal molecular classes (i.e., the observables and the unobservables) and whose edges indicate non-constant effects among the internal molecular classes with signs indicating monotonicity of the effects. Formally, given an ODE model f, the molecular graph at state with parameters is a directed graph with vertices [n + m] and edges , where

In other words if f_j does not actually depend on x_i. See Fig 2(A) for an example of a molecular graph. Note in general we could have edges from unobservables to unobservables (e.g., protein–protein interactions) and from observables to observables (modeling fast translation where mRNA abundances and protein concentrations are considered the same).

Download:

Fig 2. A molecular graph and its corresponding gene regulatory graph for the single-loop network in the study [18].

(A) The molecular graph for the ODE model of the single-loop network. Blue edges indicate positive first-order partial derivatives, and red edges indicate negative first-order partial derivatives. (B) The corresponding global gene regulatory graph for (A) with blue edges indicating activation and red edges indicating repression (the constant sign property is satisfied globally under both notions of CSP by Proposition 1).

https://doi.org/10.1371/journal.pone.0235070.g002

The molecular graph represents the interactions among all the molecular classes. However, usually only the mRNA abundances are measured; the proteins and their derived products are not measured, making the molecular graph only partially observed. As a result, one often seeks an induced graph on the mRNA classes, which leads to the following definitions analogous to the clamped systems for infinitesimal monotonicity.

Definition 5 (Unobserved path of length q for q ≥ 1). Given a molecular graph, the set of unobserved paths from one mRNA to another is the set of paths that do not go though another mRNA. Formally, given n, m, l, and edges and i, j ∈ [n] with i ≠ j, the set of unobserved paths of length q connecting i and j is

Definition 6 (Molecular distance). The molecular distance from i to j is

Definition 7 (Sum–product monotonicity). For genes i and j, state x and parameters λ, the sum–product monotonicity is defined by where .

Note B_sum is only based on derivatives of f, not solving the ODEs. It plays a similar role as B_inf. Thus we can define sum–product gene regulatory graph and sum–product constant sign property in a similar way as Definitions 2 and 3. A relation between the infinitesimal monotonicity and the sum–product monotonicity is given in 1 in the Results section.

GeneNetWeaver ODE model

We consider a differential equation model such that transcription factors participate in modules which bind to the promoter regions of a given target gene. This model is based on the GeneNetWeaver software version 3 [10]. Part of the model of the popular simulator is described in the studies [12] and [11], but there is no good reference that precisely describes the model. So in this section we describe the generative model in GeneNetWeaver based on a given directed graph, and show in the next section that the CSP is satisfied. Note GeneNetWeaver models are a special class of ODE models with the molecular graphs being bipartite, resulting in no unobserved paths of length greater than 2, unlike the general case as illustrated in Fig 2. GeneNetWeaver allows fast protein–protein interactions though the f function, but does not characterize slow protein–protein interactions or external signals.

The model in GeneNetWeaver is based on standard modeling assumptions (see [19]) including statistical thermodynamics, as described in the study [20]. The activity level of the promoter of a gene is controlled by one or more cis-regulatory modules, which for brevity we refer to as modules. A module can be either an enhancer or a silencer. Each module has one or more transcription factors as activators, and possibly one or more TFs as deactivators. For each target gene, a number of modules are associated with its TFs such that each TF is an input of one of the modules. For simplicity assume that each module regulates only a single target gene.

Let be a directed signed graph with vertices [n], edge set , and edge signs b. For target gene j, let be the set of its TFs and let be a partition of N_j according to the input of the modules. Then the modules for target gene j can be indexed by the tuple (K, j) (denoted by K: j in the subscripts), where . Note each TF regulates the target gene j only through one module. The random model for assignment of the TFs to modules and of the parameters in GeneNetWeaver is summarized in S2 Appendix. Let the sets of activators and deactivators for module K: j be A_{K: j} and D_{K: j} with A_{K: j}∪D_{K: j} = N_j and A_{K: j}∩D_{K: j} = ∅. For a module K: j, let c_{K: j} be the type (1 for enhancer and −1 for silencer), r_{K: j} the mode (1 for synergistic binding and 0 for independent binding). Note r_{K: j} only matters for multi-input modules (i.e., those with |K|>1). Let β_{K: j} ≥ 0 be the absolute effect of module K: j on gene j in mRNA production rate. Note that by the construction in S2 Appendix, it is guaranteed that .

Let x_i(t) and y_i(t) be the mRNA and protein concentrations for gene i at time t. We ignore t in the remainder of the paper for simplicity. The dynamics are given by and where f_i(y) is the relative activation rate for gene i (i.e. the mRNA production rate for gene i for the normalized variables) discussed in the next two subsections, is the translation rate of protein i, and δ_i and are the degradation rates of the mRNA and the protein. Because only x is observed in RNA-seq experiments, without loss of generality the unit of the unobserved protein concentrations can be chosen such that for all i (see nondimensionalization in the study [12]). Note the GeneNetWeaver model is a special ODE model with m = n and l = 0.

Activity level of a single module.

For edge (i, j), the normalized expression level of gene i, ν_ij, is defined by where k_ij is the Michaelis–Menten normalizing constant and h_ij is a small positive integer, the Hill constant, representing the number of copies of the TF i that need to bind to the promoter region of gene j to activate the gene. (If gene i is not bound to the promoter region of gene j, it is like taking the Hill constant equal to zero and thus normalized expression level equal to one.) The activity level of module K: j denoted by M_{K: j}, which is the probability that module K: j is active, is given in the following three cases.

Type 1 modules: Input TFs bind to module independently

In this case, r_{K: j} = 0, and we have

Interpreting each fraction as the probability that an activator is actively bound (or a deactivator is not bound), the activation M_{K: j} is the probability that all the inputs of module K: j are working together to activate the module, i.e., the probability that the module is active. It is assumed that for a module to be active, all the activators must be bound and all the deactivators must be unbound, and all the bindings happen independently.

One can think of module K: j as a system with 2^{|A_{K: j}|+ |D_{K: j}|} possible states of the inputs. Suppose each input j binds with rate ν_ij and unbinds with rate 1 independently. Then the stationary probability of the state that all the activators are bound and none of the deactivators is bound is M_{K: j}.

Alternatively, one can assign additive energy of to each bound input gene i and energy zero to each unbound gene. Then M_{K: j} is the probability that all activators are bound and none of the deactivators is bound in the Gibbs measure. In other words, the Type 1 modules are Shea–Ackers models with all binding states possible and only the one state with all the activators initiating transcription.

Type 2 modules: TFs are all activators and bind to module as a complex

In this case, D_{K: j} = ∅, r_{K: j} = 1, and we have

One can think of such a module as a system with only two states: bound by the activator complex, or unbound. The transition rate from unbound to bound is , and that from bound to unbound is 1. Then the activation of the module is the probability of the bound state in the stationary distribution, given by M_{K: j}.

Alternatively, this corresponds to the Shea–Ackers model as in the previous case, except all the states other than fully unbound and fully bound are unstable (i.e. have infinite energy).

Type 3 modules: Some TFs are deactivators and bind to module as a complex

In this case, D_{K: j} ≠ ∅ and r_{K: j} = 1, and we have (1)

In this case the system can be in one of three states: unbound, bound by the activator complex, and bound by the deactivated (activator) complex. The Gibbs measure in the Shea–Ackers model for Type 3 modules with three stable states (i.e. have finite energy) assigns probability M_{K: j} to the activated state.

Note if ∏_{i ∈ ∅} ν_ij is understood to be 0 then Eq (1) reduces to Type 2 when D_{K: j} = ∅. However historically was understood as 1 in an early version of GeneNetWeaver and caused a bug of wrong Type 2 modules.

Remark 3. Presumably it is possible for there to be more than three stable states for a module, so additional types of modules could arise, but for simplicity, following GeneNetWeaver, we assume at least one of the three cases above holds.

Remark 4. If a module K: j has only one input i (i.e. K = {i}) then the module is type 1 and or . We will see later in the random model of GeneNetWeaver that only the former (single activator) is allowed.

GeneNetWeaver software uses the 3 types of modules derived above. In all three cases the activation M_{K: j} is monotonically increasing in y_i for activators i ∈ A_{K: j}, and monotonically decreasing in y_i for deactivators i ∈ D_{K: j}.

Production rate as a function of multiple module activations.

The relative activation of gene j as a function of the protein concentrations y is (2) where α_j,s is the relative activation of the promoter under the module configuration s. Note that α in Eq (2) gives degrees of freedom, one for every possible subset of the modules being active. However, following the GeneNetWeaver computer code [10], we assume that the interaction among the modules is linear, meaning that for some choice of α_j,basal, , and , we have for any configuration , (3)

This reduces the number of degrees of freedom for α to . Then, combining Eqs (2) and (3) yields (4) where S is distributed by the product distribution of the Bernoulli distributions with means . So the relative activation, or the mRNA production rate, of a gene is given by the basal activation plus the inner product of the module effects and the module activation. We also note that the effect of the modules is not assumed to be statistically independent: all we need to know to compute the relative activation of a gene are the marginal probability of activation of the single modules.

Taking into account the three different types of modules described in the previous section on activity level of a single module, Eq (4) yields the following expression for the relative activation of gene j: (5)

As we will see in the Results section, f satisfies the CSP. Note that in the actual GeneNetWeaver source code every α_j,s is truncated to the interval [0, 1]: where is the projection of x to the [0, 1] interval. Then the relative activation in each state may not be linear in the individual module effects. In that case one has to resort to Eq (2) instead of Eq (5) for computing the mRNA production rate. The resulting truncated model does not necessarily satisfy the CSP because f_j may not be monotone in M_{K: j} in Eq (2).

Results

A relation between infinitesimal monotonicity and sum–product monotonicity

The following result establishes the equivalence of the two notions of monotonicity for ODE models that satisfy the sum–product CSP. So if the sum–product CSP holds, we do not need to distinguish between the sum–product CSP and the infinitesimal CSP. Consequently, given an ODE model, one can easily find the corresponding graph models for different system parameters, external signals, and internal states by calculating the sum products of the first-order partial derivatives of the input function f.

Proposition 1. If f is smooth and satisfies the sum–product CSP over , then it also satisfies the infinitesimal CSP over S, and the sum–product gene regulatory graph and the infinitesimal gene regulatory graph are the same.

proof. It suffices to show B_sum(i, j, x, λ) = B_inf(i, j, x, λ) if B_sum(i, j, x, λ)≠{1, −1} for any (x, λ)∈S. For fixed i, j, x, λ, let be the solution of the clamped initial value problem at time t with initial condition η(0, h) = (x_i + h, x_−i). We are interested in the sign of

If then we readily have B_sum(i, j, x, λ) = B_inf(i, j, x, λ) = ∅. Suppose . Then by Corollary 4.1 in Section 5 of [21] (page 101), f being smooth implies g is also smooth, and we can show that (see the proof in S3 Appendix) (6)

Hence by the multivariate Taylor’s theorem (see, e.g., [22]) as (t, h) → (0, 0). So g(t, h) has the same sign as Δ(i, j, x, λ)t^q h in a sufficiently small neighborhood of (0, 0). Hence B_sum(i, j, x, λ) = B_inf(i, j, x, λ).

Remark 5. If multiple ODE models satisfy CSP with the same gene regulatory graph, then they can be combined into a single ODE model with different parameterization so that the combined ODE model still satisfies CSP with the same gene regulatory graph. For example, ODE models for different environmental temperatures can be either considered different models or a single unified model with different temperature parameter. Then the temperature-specific models satisfy CSP with the same gene regulatory graph if and only if the unified model satisfies CSP for all temperatures.

Remark 6. The effect of a gene on itself can be either autoregulation or degradation. The two effects can be distinguished with the molecular graph: a self-loop with negative derivative indicates degradation, and a loop of multiple hops indicates autoregulation. The infinitesimal monotonicity does not distinguish the two effects.

The following is an example of an ODE model that does not satisfy CSP globally, based on the interactions among FT, TFL1, FD, and LFY genes in the study [9].

Example 1. Consider a four-gene ODE model with the following dynamics for gene 4. where we use x for both the mRNA and protein concentrations. The biological meaning could be genes 1 and 3 form a protein complex that activates gene 4, while genes 2 and 3 form a protein complex that represses gene 4. Then it can be checked that the effect of gene 3 on gene 4 does not satisfy the CSP globally. Indeed, one can check that

So gene 3 activates gene 4 if , and represses gene 4 if .

Here is an example of a molecular graph having a shorter unobserved path dominating a longer unobserved path with the opposite sign, taken from part of the gene regulatory network in the study [23], achieving CSP with the sign of the shorter path (see Fig 3).

Download:

Fig 3. Molecular graph and gene regulatory graph of the ELF4–GI regulation in the study [23].

(A) The molecular graph with blue edges indicating positive partial derivatives and red edges indicating negative partial derivatives. (B) The gene regulatory graph.

https://doi.org/10.1371/journal.pone.0235070.g003

Example 2. The mRNA ELF4^m is transcribed into the protein ELF4^p, which then forms the complex EC^c with the protein LUX^p. The complex EC^c induces the transcription of the mRNA GI^m. Then there is a 3-hop path (ELF4^m–ELF4^p–EC^c–GI^m) and a 4-hop path (ELF4^m–ELF4^p–LUX^p–EC^c–GI^m) from ELF4^m to GI^m with opposite signs. The ODE model of the molecular graph satisfies CSP with ELF4 activating GI in the gene regulatory graph.

GeneNetWeaver: CSP and complexity

In this section GeneNetWeaver models (without the truncation of the α terms in the implementation) are shown to satisfy the CSP globally, regardless of the parameters and the system states, and thus correspond to the signed directed graphs that were used to generate the models. Moreover, when data is generated through multifactorial perturbation for the DREAM challenge (primarily for generation of stationary expression levels, rather than trajectories), each ensemble of networks produced is also associated with the same directed signed graph. This is in contrast to the Shea–Ackers model, which is shown to be able to generate non-monotone behavior [17]. Formally we have the following result.

Proposition 2. Given any directed signed graph, the ensemble of the GeneNetWeaver models satisfy CSP over (0, ∞)²ⁿ and the gene regulatory graphs coincide with the given graph.

Proof. Fix any model of the ensemble of GeneNetWeaver models for the given graph. For any target gene j and its regulator i ∈ N_j, there exists a unique module, indexed by K:j, whose input includes i. Then for any of the three module types,

Then by Eq (4), and

Because only c_{K: j} and can be negative in , the sum–product of the first-order partial derivatives of the path from x_i to x_j has the same sign as , which is consistent with the sign b_ij in the given graph by the construction in S2 Appendix. Hence by Proposition 1 the fixed ODE model satisfies CSP over all positive state vectors with gene regulatory graph equal to the given graph. Repeat this for all ODE models in the ensemble and the proposition is proved. We now discuss the complexity of GeneNetWeaver ODE models for a given gene regulatory graph. The complexity comes from both the large number of parameters and the combinatorial nature of the module configurations. The complexity indicates that ODE models are both much more detailed and considerably harder to infer compared to the graphical models.

For each gene i there are 5 non-negative real parameters (α_i,basal, x_i(0), y_i(0), , ). For each edge (i, j) there is a non-negative real parameter (k_ij) and an integer parameter (h_ij). For each module K: i there is a positive real parameter (β_{K: i}) and two binary parameters (c_{K: i} and r_{K: i}).

The module configuration encodes great combinatorial complexity. Given a gene has K ≥ 1 input genes, the number of ways to partition the genes into modules is the Kth Bell number. The first ten Bell numbers are 1, 2, 5, 15, 52, 203, 877, 4140, 21147, and 115975. In addition, each input to a given module needs to be classified as an activator or deactivator.

Case study: Soybean flowering networks

In this section the similarities of the ODE models corresponding to three different graph models are studied. First the classes of ODE models are listed for the three graph models. Then, to investigate their similarities, we generate expression data from one ODE model, and fit another model to the data by optimizing the parameters. The level of fitness of one class of ODE model to the data generated from another is used as a metric of similarity. As we will see, ODE models corresponding to the same graph model tend to have a higher similarity, while those from different graph models tend to have a lower similarity, as long as the least-squares problem is sufficiently overdetermined. The result implies that the graph model corresponding to the ODE model may be recovered with moderate amount of data, while the amount of data required for ODE model recovery may be of a much higher order. The simulation code for the data fitting results is available at [24].

Five-gene graph and ODE models.

In this section we explicitly write out the classes of GeneNetWeaver ODE models of three graph models. The first two graph models are compiled from the literature, with only the sign of one edge different between them (the difference is discovered in the study [25]). The third graph model is an arbitrary five-gene repressilator for comparison purpose.

Flowering network with COL1a activating E1.

A graph model of a five-gene soybean flowering network is shown in Fig 4. The network is based on the flowering network for Arabidopsis and homologs of Arabidopsis genes found in soybean (see Table 1). The corresponding gene IDs are shown in Table 2.

Download:

Fig 4. A graph model of the core flowering network for soybean.

https://doi.org/10.1371/journal.pone.0235070.g004

Download:

Table 1. Core flowering genes.

https://doi.org/10.1371/journal.pone.0235070.t001

Download:

Table 2. Core flowering genes.

https://doi.org/10.1371/journal.pone.0235070.t002

The mRNA and proteins concentrations of the soybean genes E1, COL1a, FT4, FT2a, and AP1a are denoted by (x_i)_{1≤i ≤ 5} and (y_i)_{1≤i ≤ 5}. The differential equations based on the GeneNetWeaver model are (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)

Here (x)⁺ = max{x, 0}. We apply nondimensionalization by setting δ_i = α_i,basal + ∑_j β_{j: i}, so that the steady state expression levels are between 0 and 1. We can see that given the graph, there are 15 configurations of the ODEs (3 for x₃ times 5 for x₅). We use [i, j] with 1 ≤ i ≤ 3 and 1 ≤ j ≤ 5 to denote the configuration using the ith equation for x₃ and the jth equation for x₅, and use the symbol F_[i,j],+ to denote the class of flowering network ODE models with configurations [i, j] (the plus sign signifies the activation regulation of COL1a on E1). The initial conditions, namely the 5 mRNA abundances x(0)’s and the 5 protein concentrations y(0)’s, are 10-dimensional. In addition, there are 24–26 positive real parameters (depending on the configuration) and 7 discrete parameters (the Hill coefficients) for the dynamics. For example, for configuration [1, 1], the parameters for the dynamics consist of the basal activations α’s (5), the Michaelis–Menten constants k’s (7), the absolute effect of modules β’s (7), the translation rate ρ’s (5), summing up to 24 parameters.

Flowering network with COL1a repressing E1.

A slight variant of the soybean flowering graph model in Fig 4 is shown in Fig 5. Note the only difference is the sign of the edge from COL1a to E1. The symbol F_[i,j],− denotes the class of ODE models Eqs (7)–(16) with the ith and the jth configurations in Eqs (9) and (11), but with Eq (4) replaced by (17) Here the negative sign in F_[i,j],− signifies the repression regulation of COL1a on E1. The number of parameters is the same as the network in Fig 4.

Download:

Fig 5. A variant of the graph model of the core flowering network for soybean.

https://doi.org/10.1371/journal.pone.0235070.g005

Repressilator.

An arbitrary repressilator network is shown in Fig 6.

Download:

Fig 6. A five-gene repressilator graph model.

https://doi.org/10.1371/journal.pone.0235070.g006

The symbol R denotes the class of ODE models for the repressilator, given below. (18) (19) (20) (21) (22)

There is only one possible configuration for each target gene. The dynamics involve 20 parameters.

Data generation.

The synthetic expression dataset is generated as follows. For the generated data, we use F_[1,1],+ (the flowering network with configuration [1, 1] and COL1a activating E1) with a fixed set of parameters for the dynamics. For a single set of trajectories (i.e., for a single plant), we use a set of initial values x(0)’s and y(0)’s generated uniformly at random between 0 and 1. The entire dataset may consist of only a single set of trajectories, corresponding to a single plant; or the dataset may consist of multiple sets of trajectories, corresponding to multiple plants. If multiple sets of trajectories are used, the initial conditions for each set of trajectories are generated independently, while the parameters for the dynamics are the same across all sets of trajectories. In other words, we model distinct plants by assuming distinct initial conditions, while using common parameters for the dynamics. To produce the data, the x variables are sampled at time points 0, 1, 2, 3, 4, 5, 6, so that each set of trajectories (i.e., each plant) produces 35 data points. Because each set of trajectories is sampled at different times from the system with one initial condition representing different stages of a single plant, the synthetic datasets are of multi-shot sampling, as opposed to one-shot sampling in practice where each individual is only sampled once [29]. We also generate random expression datasets with reflected Brownian motions with covariance 0.05, and denote such a stochastic model by B.

Fitting results.

The counts for data points and parameters are summarized in Table 3. Note that with a single set of trajectories, the number of parameters is close to the number of data points. As the number of sets of trajectories increases, the number of data points outgrows the number of parameters because each additional set provides 35 new data points while only allowing 10 more parameters from the initial conditions (because the dynamic parameters are shared across all sets of trajectories).

Download:

Table 3. Number of parameters in different ODE models.

https://doi.org/10.1371/journal.pone.0235070.t003

A Basin-hopping algorithm in the Python package LMFIT [30] is used to perform the global optimization of the curve fitting (see details in the source code of the simulation [24]). The sample size varies between 35 and 350 depending on the number of sets of trajectories. The fit is evaluated by the fitting loss and the coefficients of determination (R²) shown in Tables 4 and 5. The fitting loss function for two S×T×n tensors x and is defined by where S is the number of sets of trajectories in the dataset, T the number of time points, and n the number of genes.

Download:

Table 4. Fitting losses using different classes of ODE models on different synthetic datasets.

https://doi.org/10.1371/journal.pone.0235070.t004

Download:

Table 5. Coefficients of determination using different classes of ODE models on different synthetic datasets.

https://doi.org/10.1371/journal.pone.0235070.t005

Note the time scale of the ODE is assumed to be known, which restricts how fast the expression levels can change. The time scale thus acts as a regularizer to prevent overfitting.

We make the following observations from Tables 4 and 5.

The implemented optimization algorithm failed to find the optimal parameters in row 1 (the best fit should be a perfect fit with zero loss), but the relative loss compared to the average nondimensionalized expression level 0.5 is very small (less than 0.5%), and the coefficients of determination are close to 1. Both indicate a near-optimal fit.
ODE models from all three graph models (rows 1, 2, 3, and 4) fit the synthetic flowering network data well when there are only one or two sets of trajectories (columns 1 and 2). The relative losses are less than 1% and R² is larger than 0.9997. We can see from Table 3 that the number of data points is close to the number of parameters in the S = 1 setting, and only moderately larger in the S = 2 setting. So when S ≤ 2 the three graph models in this case study are nearly indistinguishable. In other words, one may not be able to infer the graph structure with very limited data.
When fitting the models to 5 or 10 sets of trajectories simultaneously, i.e., when the system is sufficiently overdetermined, only the models from the correct graph (rows 1 and 2) fit well. The models from incorrect graphs (rows 3 and 4) suffer a roughly 4% relative loss after fitting for 10 sets of trajectories and R² falls below 0.998. Note that F_[1,1],− differs from the ground truth of the data F_[1,1],+ only by the sign of one edge, while the model R shares no edges in common with the ground truth at all. Yet the fitness of the slight variant of the ground truth graph is as bad as the completely different repressilator graph.
Both F_[1,1],+ and F_[3,5],+ fit the F_[1,1],+ data very well for all numbers of sets of trajectories (rows 1 and 2). This indicates the classes of ODE models with different configurations of the same graph model are similar in terms of data fitting. Consequently, even with data sufficient to infer the correct graph model, it may be impossible to infer the specific ODE model.
The models from the flowering network cannot fit the random dataset (reflected Brownian motions with covariance 0.05) well. It turns out that the ODE models with 34 parameters have trouble following the highly variable 35 data points from the reflected Brownian motions. The low fitness level to the random dataset shows great redundancy in the parameters in terms of generating data points. It also indicates the fitting results to the synthetic ODE data are significant compared to fitting a random dataset.

Discussion

Generalization of CSP to related gene regulatory network models

The concept of CSP can be applied to many other models. We first explain this for continuous-state models, and then for discrete-state models.

Continuous-state models.

A network model somewhat similar to ODE models is a fixed-point model. The study by Van den Bulcke et al. [31] uses a fixed-point model for gene regulatory networks. ODE models based on Michaelis–Menten and Hill kinetics and linear degradation terms are used to determine the expression level of a given gene as a function of the expression levels of other genes. Then a fixed point is produced. This can model equilibrium points, also known as resting points, of ODE models. The concept of constant sign property can be applied to fixed-point models as well. Van den Bulcke et al. [31] focuses on models for the network topology, which is not addressed in this paper.

Other continuous-state models have been used for gene regulatory networks. The study by Mendes et al. [32] simulates gene regulatory networks using a biochemical simulator called Gepasi [33], which models complex biochemical pathways using ODEs. For such biochemical systems, constant sign property discussed in this paper can be used to find the causal dependency among observed variables (e.g., mRNA abundances in the special case of gene regulatory networks). In order to avoid the difficult calibration of the parameters in ODEs, Ocone et al. [34] models the promoter by a binary state process and approximates the transcription–translation network with stochastic differential equations. Constant sign property can be easily generalized to such hybrid models by introducing a notion of monotonicity for the stochastic systems. It is worth mentioning that constant sign property is defined with directionality for causal relationship among the genes and not suitable for models based on mere correlation (e.g., graphical Gaussian models [35]).

Discrete-state models.

One common type of discrete models used for gene regulatory networks are Bayesian networks (see, e.g., Friedman et al. [36]). Boolean networks, as a special case of Bayesian networks, are used to capture qualitative gene regulation (see, e.g., Liang et al. [37]), for which constant sign property can be defined based on the monotonicity of the boolean functions. The study by Husmeier [38] evaluates a dynamic Bayesian network inference algorithm using simulated data based on an ODE model whose genetic network model is taken from Zak et al. [39] and whose equations are taken from chemical kinetics (see Chapter 22 of Atkins and de Paula [40]). Similarly, the study by Smith et al. [41] also proposes a dynamic Bayesian network algorithm, and evaluates its performance on sampled and quantized data from a dynamic Bayesian network simulator that models different regions of the brain of songbirds regulated by their behaviors. The simulated data is generated with a small step size before being sampled, and thus resembles an ODE model simulator. For the dynamic Bayesian network gene expressions are quantized to discrete values. The constant sign property can also be applied to dynamic Bayesian network models using a partial order of the conditional distributions (e.g., stochastic dominance) of target genes given the expressions of their regulators. Husmeier [38] gives an example of a graphical model that is more detailed than the gene regulatory graph in this paper. Although both the GeneNetWeaver model and the ODE models in Husmeier [38] are based on chemical kinetic equations, one difference is that the Michaelis–Menten and the Hill kinetics in GeneNetWeaver arise from considerations of a faster time scale of the binding of TF to the promoter regions (see Alon [19]). Nevertheless, both GeneNetWeaver and the ODE models for realistic simulation in Husmeier [38] fall into the general framework of ODE models in this paper and hence the constant sign property we have proposed applies to both.

Implication of GCSP

GCSP of an ODE model generalizes the notion of a linear dynamical system by allowing the variation of the state vector (i.e., the concentrations of molecular classes) to be nonlinear in the state vector so long as the overall effect of the most influential pathways in the molecular graph keeps the same sign (i.e., activation stays activation and repression stays repression regardless of the expression of the regulator, the target gene, or any other molecular classes). Biologically, GCSP indicates homogeneity of the gene regulatory network in the sense that the qualitative properties of gene regulation are preserved after cellular differentiation and under different external conditions. Lack of GCSP indicates significant change in regulatory functions after cellular differentiation and under different external conditions. Note that GCSP is more likely to hold for the subnetwork of a small number of genes compared to a larger network.

Limitation of infinitesimal CSP

The definitions of CSP proposed in this paper focus on short time behavior. Over short time periods, the paths with the smallest number of hops dominate. Often the shortest paths have the strongest influence, as seen in Example 2. But in some cases the shortest paths could be weaker than some slightly longer paths, and if the longer paths have an opposite sign, then the focus on short time and shortest paths can be misleading, because the longer paths will take over quickly after the brief initial dominance by the shortest paths. In the extreme case of a complete molecular graph, where every molecular class has a (possibly tiny) regulatory effect on every other molecular class, the gene regulatory graph defined in this paper would be determined by only the direct edges in the molecular graph and all the actual biological pathways would be entirely ignored. This also shows the importance of network sparsity.

Conclusion

Gene regulatory networks are modeled at different abstraction levels with tradeoff between accuracy and tractability. Graph models with signed directed edges provide circuit-like characterization of gene regulation, while ODE models quantify detailed dynamics for various molecular classes. The constant sign property proposed in this paper connects the two types of models by identifying a set of conditions under which ODE models correspond to a single graph model, and provides a deeper understanding of the context-dependent and time-varying nature of gene regulatory networks. A class of ODE models for a given graph model based on the source code of a popular software package GeneNetWeaver is described in detail and shown to satisfy the global constant sign property. Exploration of data fitting of one ODE model to the data generated from another shows better fit when two models have the same graph model.

Supporting information

S1 Appendix. Basic model of gene interaction.

A brief review on both graph models and ODE models is given here.

https://doi.org/10.1371/journal.pone.0235070.s001

(PDF)

S2 Appendix. Random model for production functions used in GeneNetWeaver.

Specific module generation and parameter ranges in GeneNetWeaver are described here.

https://doi.org/10.1371/journal.pone.0235070.s002

(PDF)

S3 Appendix. Proof of Eq (6).

A proof of the equation involving the partial derivatives of the solution of dynamical systems.

https://doi.org/10.1371/journal.pone.0235070.s003

(PDF)

References

1. Davidson E, Levin M. Gene regulatory networks. Proc Natl Acad Sci USA. 2005;102(14):4935–4935.
- View Article
- Google Scholar
2. Emmert-Streib F, Dehmer M, Haibe-Kains B. Gene regulatory networks and their applications: Understanding biological and medical problems in terms of networks. Front Cell Dev Biol. 2014;2.
- View Article
- Google Scholar
3. Emmert-Streib F, Dehmer M, Haibe-Kains B. Untangling statistical and biological models to understand network inference: The need for a genomics network ontology. Front Genet. 2014;5.
- View Article
- Google Scholar
4. Kim HD, Shay T, O’Shea EK, Regev A. Transcriptional regulatory circuits: Predicting numbers from alphabets. Science. 2009;325(5939):429–432.
- View Article
- Google Scholar
5. Karlebach G, Shamir R. Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol. 2008;9(10):770–780.
- View Article
- Google Scholar
6. Seaton DD, Smith RW, Song YH, MacGregor DR, Stewart K, Steel G, et al. Linked circadian outputs control elongation growth and flowering in response to photoperiod and temperature. Mol Syst Biol. 2015;11(1). pmid:25600997
- View Article
- PubMed/NCBI
- Google Scholar
7. Kolar M, Song L, Ahmed A, Xing EP. Estimating time-varying networks. Ann Appl Stat. 2010;4(1):94–123.
- View Article
- Google Scholar
8. Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004;431(7006):308–312.
- View Article
- Google Scholar
9. Jaeger KE, Pullen N, Lamzin S, Morris RJ, Wigge PA. Interlocking feedback loops govern the dynamic behavior of the floral transition in Arabidopsis. Plant Cell. 2013;25(3):820–833.
- View Article
- Google Scholar
10. Schaffter T, Marbach D. GeneNetWeaver; 2012. Available from: https://github.com/tschaffter/gnw.
11. Schaffter T, Marbach D, Floreano D. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics. 2011;27(16):2263–2270.
- View Article
- Google Scholar
12. Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G. Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA. 2010;107(14):6286–6291.
- View Article
- Google Scholar
13. Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, et al. Wisdom of crowds for robust gene network inference. Nat Methods. 2012;9(8):796–804. pmid:22796662
- View Article
- PubMed/NCBI
- Google Scholar
14. Chan TE, Stumpf MPH, Babtie AC. Gene regulatory network inference from single-cell data using multivariate information measures. Cell Syst. 2017;5(3):251–267.e3.
- View Article
- Google Scholar
15. Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods. 2020;17(2):147–154.
- View Article
- Google Scholar
16. Shea MA, Ackers GK. The OR control system of bacteriophage lambda: A physical-chemical model for gene regulation. J Mol Biol. 1985;181(2):211–230.
- View Article
- Google Scholar
17. Gedeon T, Mischaikow K, Patterson K, Traldi E. When activators repress and repressors activate: A qualitative analysis of the Shea–Ackers model. Bull Math Biol. 2008;70(6):1660–1683.
- View Article
- Google Scholar
18. Locke JCW, Southern MM, Kozma-Bognár L, Hibberd V, Brown PE, Turner MS, et al. Extension of a genetic network model by iterative experimentation and mathematical analysis. Mol Syst Biol. 2005;1(1). pmid:16729048
- View Article
- PubMed/NCBI
- Google Scholar
19. Alon U. An introduction to systems biology: Design principles of biological circuits. CRC press; 2006.
20. Ackers GK, Johnson AD, Shea MA. Quantitative model for gene regulation by lambda phage repressor. Proc Natl Acad Sci USA. 1982;79(4):1129–1133.
- View Article
- Google Scholar
21. Hartman P. Ordinary Differential Equations. 2nd ed. SIAM; 2002.
22. Zorich VA. Mathematical Analysis II. Springer Berlin Heidelberg; 2016. Available from: https://doi.org/10.10072F978-3-662-48993-2.
23. Pokhilko A, Fernández AP, Edwards KD, Southern MM, Halliday KJ, Millar AJ. The clock gene circuit in Arabidopsis includes a repressilator with additional feedback loops. Mol Syst Biol. 2012;8(1):574.
- View Article
- Google Scholar
24. Kang X. Graph and ODE models simulations; 2020. Available from: https://github.com/Veggente/graph-ode.
25. Wu F, Kang X, Wang M, Haider W, Price WB, Hajek B, et al. Transcriptome-enabled network inference revealed the GmCOL1 feed-forward loop and its roles in photoperiodic flowering of soybean. Front Plant Sci. 2019;10.
- View Article
- Google Scholar
26. Cao D, Li Y, Lu S, Wang J, Nan H, Li X, et al. GmCOL1a and GmCOL1b function as flowering repressors in soybean under long-day conditions. Plant Cell Physiol. 2015;56(12):2409–2422.
- View Article
- Google Scholar
27. Zhai H, Lü S, Liang S, Wu H, Zhang X, Liu B, et al. GmFT4, a homolog of FLOWERING LOCUS T, is positively regulated by E1 and functions as a flowering repressor in soybean. PLOS ONE. 2014;9(2):e89030.
- View Article
- Google Scholar
28. Nan H, Cao D, Zhang D, Li Y, Lu S, Tang L, et al. GmFT2a and GmFT5a redundantly and differentially regulate flowering through interaction with and upregulation of the bZIP transcription factor GmFDL19 in soybean. PLOS ONE. 2014;9(5):e97669.
- View Article
- Google Scholar
29. Kang X, Hajek B, Wu F, Hanzawa Y. Time series experimental design under one-shot sampling: The importance of condition diversity. PLOS ONE. 2019;14(10):e0224577.
- View Article
- Google Scholar
30. Newville M, Stensitzki T, Allen DB, Ingargiola A. LMFIT: Non-Linear Least-Square Minimization and Curve-Fitting for Python; 2014. Available from: https://zenodo.org/record/11813.
31. Van den Bulcke T, Van Leemput K, Naudts B, van Remortel P, Ma H, Verschoren A, et al. SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinform. 2006;7(1):43.
- View Article
- Google Scholar
32. Mendes P, Sha W, Ye K. Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics. 2003;19(Suppl 2):ii122–ii129.
- View Article
- Google Scholar
33. Mendes P. Biochemistry by numbers: simulation of biochemical pathways with Gepasi 3. Trends Biochem Sci. 1997;22(9):361–363.
- View Article
- Google Scholar
34. Ocone A, Millar AJ, Sanguinetti G. Hybrid regulatory models: A statistically tractable approach to model regulatory network dynamics. Bioinformatics. 2013;29(7):910–916.
- View Article
- Google Scholar
35. Wille A, Zimmermann P, Vranová E, Fürholz A, Laule O, Bleuler S, et al. Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biol. 2004;5(11):R92. pmid:15535868
- View Article
- PubMed/NCBI
- Google Scholar
36. Friedman N, Linial M, Nachman I, Pe’er D. Using Bayesian networks to analyze expression data. J Comput Biol. 2000;7(3-4):601–620.
- View Article
- Google Scholar
37. Liang S, Fuhrman S, Somogyi R. REVEAL, a general reverse engineering algorithm for inference of genetic network architectures. Pac Symp Biocomput. 1998;3:18–29.
- View Article
- Google Scholar
38. Husmeier D. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics. 2003;19(17):2271–2282.
- View Article
- Google Scholar
39. Zak DE, Doyle III FJ, Gonye GE, Schwaber JS. Simulation studies for the identification of genetic networks from cDNA array and regulatory activity data. In: Proc Int Conf Syste Biol; 2001. p. 231–238.
40. Atkins P, de Paula J. Physical Chemistry. 9th ed. Oxford University Press; 2010.
41. Smith VA, Jarvis ED, Hartemink AJ. Evaluating functional network inference using simulations of complex biological systems. Bioinformatics. 2002;18(Suppl 1):S216–S224.
- View Article
- Google Scholar

[ref1] 1. Davidson E, Levin M. Gene regulatory networks. Proc Natl Acad Sci USA. 2005;102(14):4935–4935.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Emmert-Streib F, Dehmer M, Haibe-Kains B. Gene regulatory networks and their applications: Understanding biological and medical problems in terms of networks. Front Cell Dev Biol. 2014;2.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Emmert-Streib F, Dehmer M, Haibe-Kains B. Untangling statistical and biological models to understand network inference: The need for a genomics network ontology. Front Genet. 2014;5.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Kim HD, Shay T, O’Shea EK, Regev A. Transcriptional regulatory circuits: Predicting numbers from alphabets. Science. 2009;325(5939):429–432.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Karlebach G, Shamir R. Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol. 2008;9(10):770–780.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Seaton DD, Smith RW, Song YH, MacGregor DR, Stewart K, Steel G, et al. Linked circadian outputs control elongation growth and flowering in response to photoperiod and temperature. Mol Syst Biol. 2015;11(1). pmid:25600997
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref7] 7. Kolar M, Song L, Ahmed A, Xing EP. Estimating time-varying networks. Ann Appl Stat. 2010;4(1):94–123.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004;431(7006):308–312.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref9] 9. Jaeger KE, Pullen N, Lamzin S, Morris RJ, Wigge PA. Interlocking feedback loops govern the dynamic behavior of the floral transition in Arabidopsis. Plant Cell. 2013;25(3):820–833.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref10] 10. Schaffter T, Marbach D. GeneNetWeaver; 2012. Available from: https://github.com/tschaffter/gnw.

[ref11] 11. Schaffter T, Marbach D, Floreano D. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics. 2011;27(16):2263–2270.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref12] 12. Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G. Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA. 2010;107(14):6286–6291.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref13] 13. Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, et al. Wisdom of crowds for robust gene network inference. Nat Methods. 2012;9(8):796–804. pmid:22796662
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref14] 14. Chan TE, Stumpf MPH, Babtie AC. Gene regulatory network inference from single-cell data using multivariate information measures. Cell Syst. 2017;5(3):251–267.e3.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods. 2020;17(2):147–154.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Shea MA, Ackers GK. The OR control system of bacteriophage lambda: A physical-chemical model for gene regulation. J Mol Biol. 1985;181(2):211–230.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Gedeon T, Mischaikow K, Patterson K, Traldi E. When activators repress and repressors activate: A qualitative analysis of the Shea–Ackers model. Bull Math Biol. 2008;70(6):1660–1683.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Locke JCW, Southern MM, Kozma-Bognár L, Hibberd V, Brown PE, Turner MS, et al. Extension of a genetic network model by iterative experimentation and mathematical analysis. Mol Syst Biol. 2005;1(1). pmid:16729048
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref19] 19. Alon U. An introduction to systems biology: Design principles of biological circuits. CRC press; 2006.

[ref20] 20. Ackers GK, Johnson AD, Shea MA. Quantitative model for gene regulation by lambda phage repressor. Proc Natl Acad Sci USA. 1982;79(4):1129–1133.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref21] 21. Hartman P. Ordinary Differential Equations. 2nd ed. SIAM; 2002.

[ref22] 22. Zorich VA. Mathematical Analysis II. Springer Berlin Heidelberg; 2016. Available from: https://doi.org/10.10072F978-3-662-48993-2.

[ref23] 23. Pokhilko A, Fernández AP, Edwards KD, Southern MM, Halliday KJ, Millar AJ. The clock gene circuit in Arabidopsis includes a repressilator with additional feedback loops. Mol Syst Biol. 2012;8(1):574.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref24] 24. Kang X. Graph and ODE models simulations; 2020. Available from: https://github.com/Veggente/graph-ode.

[ref25] 25. Wu F, Kang X, Wang M, Haider W, Price WB, Hajek B, et al. Transcriptome-enabled network inference revealed the GmCOL1 feed-forward loop and its roles in photoperiodic flowering of soybean. Front Plant Sci. 2019;10.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref26] 26. Cao D, Li Y, Lu S, Wang J, Nan H, Li X, et al. GmCOL1a and GmCOL1b function as flowering repressors in soybean under long-day conditions. Plant Cell Physiol. 2015;56(12):2409–2422.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref27] 27. Zhai H, Lü S, Liang S, Wu H, Zhang X, Liu B, et al. GmFT4, a homolog of FLOWERING LOCUS T, is positively regulated by E1 and functions as a flowering repressor in soybean. PLOS ONE. 2014;9(2):e89030.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref28] 28. Nan H, Cao D, Zhang D, Li Y, Lu S, Tang L, et al. GmFT2a and GmFT5a redundantly and differentially regulate flowering through interaction with and upregulation of the bZIP transcription factor GmFDL19 in soybean. PLOS ONE. 2014;9(5):e97669.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref29] 29. Kang X, Hajek B, Wu F, Hanzawa Y. Time series experimental design under one-shot sampling: The importance of condition diversity. PLOS ONE. 2019;14(10):e0224577.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref30] 30. Newville M, Stensitzki T, Allen DB, Ingargiola A. LMFIT: Non-Linear Least-Square Minimization and Curve-Fitting for Python; 2014. Available from: https://zenodo.org/record/11813.

[ref31] 31. Van den Bulcke T, Van Leemput K, Naudts B, van Remortel P, Ma H, Verschoren A, et al. SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinform. 2006;7(1):43.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref32] 32. Mendes P, Sha W, Ye K. Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics. 2003;19(Suppl 2):ii122–ii129.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref33] 33. Mendes P. Biochemistry by numbers: simulation of biochemical pathways with Gepasi 3. Trends Biochem Sci. 1997;22(9):361–363.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref34] 34. Ocone A, Millar AJ, Sanguinetti G. Hybrid regulatory models: A statistically tractable approach to model regulatory network dynamics. Bioinformatics. 2013;29(7):910–916.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref35] 35. Wille A, Zimmermann P, Vranová E, Fürholz A, Laule O, Bleuler S, et al. Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biol. 2004;5(11):R92. pmid:15535868
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

[ref36] 36. Friedman N, Linial M, Nachman I, Pe’er D. Using Bayesian networks to analyze expression data. J Comput Biol. 2000;7(3-4):601–620.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref37] 37. Liang S, Fuhrman S, Somogyi R. REVEAL, a general reverse engineering algorithm for inference of genetic network architectures. Pac Symp Biocomput. 1998;3:18–29.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref38] 38. Husmeier D. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics. 2003;19(17):2271–2282.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref39] 39. Zak DE, Doyle III FJ, Gonye GE, Schwaber JS. Simulation studies for the identification of genetic networks from cDNA array and regulatory activity data. In: Proc Int Conf Syste Biol; 2001. p. 231–238.

[ref40] 40. Atkins P, de Paula J. Physical Chemistry. 9th ed. Oxford University Press; 2010.

[ref41] 41. Smith VA, Jarvis ED, Hartemink AJ. Evaluating functional network inference using simulations of complex biological systems. Bioinformatics. 2002;18(Suppl 1):S216–S224.
View Article
Google Scholar

[110] View Article

[111] Google Scholar

Figures

Abstract

Introduction

Materials and methods

ODE model and constant sign property

Infinitesimal monotonicity.

Sum–product monotonicity.

GeneNetWeaver ODE model

Activity level of a single module.

Production rate as a function of multiple module activations.

Results

A relation between infinitesimal monotonicity and sum–product monotonicity

GeneNetWeaver: CSP and complexity

Case study: Soybean flowering networks

Five-gene graph and ODE models.

Flowering network with COL1a activating E1.

Flowering network with COL1a repressing E1.

Repressilator.

Data generation.

Fitting results.

Discussion

Generalization of CSP to related gene regulatory network models

Continuous-state models.

Discrete-state models.

Implication of GCSP

Limitation of infinitesimal CSP

Conclusion

Supporting information

S1 Appendix. Basic model of gene interaction.

S2 Appendix. Random model for production functions used in GeneNetWeaver.

S3 Appendix. Proof of Eq (6).

References