Skip to main content
Advertisement
  • Loading metrics

Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming

  • Stephen Gang Wu,

    Affiliation Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America

  • Yuxuan Wang,

    Affiliation Department of Computer Science and Engineering, Ohio State University, Columbus, Ohio, United States of America

  • Wu Jiang,

    Affiliation Boxed Wholesale, Edison, New Jersey, United States of America

  • Tolutola Oyetunde,

    Affiliation Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America

  • Ruilian Yao,

    Affiliation State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, People’s Republic of China

  • Xuehong Zhang,

    Affiliation State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, People’s Republic of China

  • Kazuyuki Shimizu,

    Affiliation Institute of Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan

  • Yinjie J. Tang ,

    yinjie.tang@seas.wustl.edu (YJT); forrest.bao@gmail.com (FSB)

    Affiliation Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America

  • Forrest Sheng Bao

    yinjie.tang@seas.wustl.edu (YJT); forrest.bao@gmail.com (FSB)

    Affiliation Department of Electrical and Computer Engineering, University of Akron, Akron, Ohio, United States of America

Abstract

13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species.

Author Summary

Metabolic information is important for disease treatment, bioprocess optimization, environmental remediation, biogeochemical cycle regulation, and our understanding of life’s origin and evolution. 13C-MFA can quantify microbial physiology at the level of metabolic reaction rates. To speed up microbial characterizations and fluxomic studies, we hypothesize that genetic and environmental factors generate specific fluxome patterns that can be recognized by machine learning. Aided by constraint programming and quadratic optimization, our platform based on machine learning (ML) can predict meaningful metabolic information about bacterial species in their environments. Further, it can offer constraints to improve the accuracy of flux balance analysis. This study infers that the bacterial metabolic network has a certain degree of rigidity in allocating carbon fluxes, and different microbial species may share common regulatory strategies for balancing carbon and energy metabolisms. As a proof of concept, we demonstrate that the use of data-driven artificial intelligence (AI) approaches, e.g., ML, may assist mechanistic based models to elucidate the topology of microbial fluxomes.

Introduction

With the advent of systems biology tools, such as genomics, transcriptomics, proteomics, and metabolomics during the last decade, the understanding of intracellular metabolisms from genotype to phenotype has been dramatically boosted. Notably, 13C metabolic flux analysis (13C-MFA) enables the quantification of metabolic reaction rates in vivo [1]. It determines carbon metabolic fluxes using the mass isotopomer distribution (MID) of proteinogenic amino acids or free metabolites from 13C labeling experiments. 13C-MFA is considered as a reliable measurement of central metabolic reaction rates [2], which has demonstrated its power in discovering novel pathways [3, 4], validating gene functions [3], verifying engineered strains [5, 6], and revealing energy metabolism of host strains [7]. In the past decade, advanced parallel bioreactor systems, mass spectrometry, and computational tools resolving metabolic fluxes have been developed [811], which improved the precision of flux profiles [12] and extended 13C-MFA’s application to the non-stationary metabolic phase [13, 14]. On the other hand, broad applications of 13C-MFA are still hindered because 13C experiments, biomass analysis, and flux calculations are expensive and time-consuming [15]. Moreover, some microbial systems may not be amenable to 13C-MFA if they require complex nutrients or their genome annotation is incomplete [16]. Before performing 13C-MFA on non-model species, laborious work is needed to examine extracellular metabolites, to characterize unknown pathways, and to analyze biomass compositions.

This study aims to employ an artificial intelligence (AI) approach called machine learning (ML) to investigate bacterial fluxomics patterns. ML is a powerful tool in systems biology [17] and has demonstrated successes in omics studies [18, 19]. For example, the precision of genome annotation on the model species C. elegans has been significantly enhanced by employing a simplified Support Vector Machine (SVM) method. Researchers have reached an accuracy of 75% on controversial genes [20]. At the transcriptomics level, ML approaches have been frequently used for disease identification. For instance, SVM has successfully recognized the gene expression patterns of hepatocellular carcinoma (HCC) [21], diffuse large B-cell lymphoma (DLBCL) [22] and ovarian cancer [23]. At the proteomics level, Supek et al. have employed a combined approach by integrating the Principal Component Analysis (PCA) method with SVM, to enhance analytic power in identifying “fingerprint” proteins (i.e., unique proteins in each tissue) from different horseradish tissues (leaf, teratoma, and tumor) grown in vitro [24]. In metabolomics, an SVM method can resolve the NMR data of metabolites in urine samples from different groups of people (healthy vs. pneumonia) [25]. In metabolic modeling, Karp’s group have adopted ML algorithms to predict the existence of various pathways for metabolic network reconstruction in different organisms [19].

The general idea of ML is to statistically build a numerical predictive model or an estimator which is a function f : Xy that maps a vector of numbers called the feature vector to a vector of numbers called the target or the label. In many cases, the target is a 1-D vector, or a scalar. One may consider the feature vector as the input and the target as the output of the model. If the target takes discrete values, we call the ML model a classifier. Otherwise, a regressor. A commonly used classifier is binary classifier, where the cardinality (size) of the target set y is 2, e.g., y = {+1,−1}. In this paper, we build a regressor for each flux, where stands for the set of all real numbers. In supervised ML, a pair of a feature vector and a target form a training sample. Given a finite set of N samples {(X1, y1), …, (XN, yN)}, an ML algorithm will find such a function, usually through solving a numerical optimization problem, to minimize the predictive error. Samples used to train a model form the training set while those for testing the performance form the test set. Given a new piece of data, numerically represented as a vector Xnew, the model f will predict the target f(Xnew), e.g., a flux value given reaction parameters where are represented by the vector Xnew in this paper. The models learned through ML are usually not analytical models that can be represented using equations. Rather, they are numerical operators. For example, an artificial neural network (ANN) model can be represented by a series of weight and bias matrices, each of which is for one layer. A poor model can only predict well on the training set as if it only “remembers” the training samples, while a good model can learn the patterns among data and still be accurate on samples it has never “seen”. Hence, researchers make the training and test sets mutually exclusive. A mechanism called cross validation is used to ensure the mutual exclusiveness of training and test sets while making full use of the dataset.

A distinct advantage for ML applications is that they can reduce the need for costly experimental supplies and time-consuming benchwork. Despite the progress in utilizing ML methods in systems biology, there is no similar application in the fluxomics field to predict the flux profile. Therefore, we conceived the idea of integrating ML strategies with fluxomics research. To efficiently employ ML methods, a dataset with a sufficient number of samples is a prerequisite. Recently, a 13C-MFA dataset named CeCaFDB has been constructed, which includes more than 100 papers mostly on prokaryotic species [26]. Based on this dataset, five categorical and sixteen continuous features were initiated to describe the environmental and genetic factors involved in 13C-MFA of bacterial species. Unlike most omics projects employing ML approaches, this work built regressors rather than classifiers: 29 lumped central metabolic fluxes were adopted as the outputs to describe the central carbon metabolism of bacteria species. A 10-fold cross validation evaluated the performance of different algorithms. Furthermore, we included a knowledge-based system to check whether user inputs were biologically meaningful. Lastly, quadratic programming was employed to adjust the fluxes predicted by ML to satisfy stoichiometric constraints. Our web-based platform MFlux provides reasonable predictions for central metabolic flux profiles on 30 bacteria species, and it can be accessed online at http://mflux.org along with the training data. Although our platform is still in the early phase, our trial to integrate AI approaches with mechanistic models will have broad impacts on both systems biology and metabolic engineering fields.

Methods

Data collection

The dataset used to build MFlux are constructed from the literature. The total uptake rate of carbon sources is normalized as 100; all other fluxes are normalized based on the uptake rate of carbon sources. We obtained 13C-MFA information for bacterial species from the CeCaFDB dataset and added a few recent papers (approximately 120 papers in total, as of January 2015). 13C-MFA data related to photosynthetic bacteria was excluded in ML study because of their diverse CO2 fixation pathways, light-sensitive fluxomes, and insufficient sampling sizes for ML. For photosynthetic species, MFlux currently only reports a general description of their fluxomic features based on corresponding references.

In heterotrophic microorganisms, interconversions between glycolysis metabolites (phosphoenolpyruvate and pyruvate) and TCA cycle metabolites (oxaloacetate and malate) involve a set of anaplerotic reactions (e.g., phosphoenolpyruvate carboxylase, phosphoenolpyruvate carboxykinase, pyruvate carboxylase, and malic enzyme) serving as a key switch point for carbon flux distribution in bacteria [27]. These reactions balancing both carbon and cofactors may be employed by different microbial species. For example, E. coli anaplerotic pathways involve phosphoenolpyruvate carboxylase and malic enzyme, while Bacillus species furnish pyruvate carboxylase (the pyruvate shunt). In the case of Corynebacterium, both phosphoenolpyruvate carboxylase and pyruvate carboxylase are functional [28, 29]. These anaplerotic pathways can re-route fluxes when central pathway such as pyruvate kinase is knocked out. To ease the ML efforts, the anaplerotic pathways were lumped into two routes that exchanges fluxes between the TCA cycle and the glycolysis nodes: (MALPYR + CO2 and PEP + CO2OAA). This simplification also considered the fact that 13C-MFA has poor resolutions on anaplerotic fluxes because various combinations of these reactions could generate similar labeling patterns in amino acids [30].

Feature vector for ML

As mentioned earlier, supervised ML builds models based on the samples, each of which is a pair of a feature vector and a target. Based on published 13C-MFA methodologies and microbial physiologies, we proposed five categorical features: species, nutrient types, oxygen conditions, engineering method, genetic background, and cultivation methods. There were two considerations when choosing those features. First, genetic modifications can significantly re-organize fluxomes. To improve the predictability on mutant strains, our platform allows toggling on or off certain central pathways (by manually setting the flux boundaries) in engineered strains. Second, the factor of cultivation method aims to reveal fluxome differences between shake flask cultures (a pseudo-steady state approach) and bioreactor cultures (a well-controlled fermentation or chemostat cultivation). Meanwhile, we introduced sixteen continuous features: growth rate, substrate uptake rate, and the ratio of multiple substrate uptakes (glucose, fructose, galactose, gluconate, glutamate, citrate, xylose, succinate, malate, lactate, pyruvate, glycerol, acetate and NaHCO3, as shown in Fig 1). Since the features include both categorical and continuous ones, one-hot encoders were used to convert categorical feature values into real numbers. Each feature was then standardized into zero mean and unit variance as assumed by many ML approaches. For each predicted flux, or the target/label in ML terminology, we scaled it into the interval [0, 1] by the min-max method. In addition to the min-max method, we also tested unit-variance-zero-mean standardization for scaling flux values, and the result was quite similar.

thumbnail
Fig 1. A universal central metabolic pathway for bacteria.

The central carbon metabolic pathway is simplified into 29 fluxes, used as the outputs of our model.

https://doi.org/10.1371/journal.pcbi.1004838.g001

Machine learning algorithms

The problem of predicting fluxes was modeled as a regression problem in ML where a computer program learns from existing data to estimate continuous variables. Twenty-nine regressors were trained to predict the 29 fluxes. We tested three widely-applied ML algorithms, including k-nearest neighbors (k-NN), decision tree, and SVM. To ensure a fair comparison, we performed a grid search for the best parameter set of each algorithm. The detailed parameter sets for 29 SVM-based regression models can be found from our web page. The programming language used for this project was Python 2.7 and the numpy and scikit-learn modules were utilized for machine learning [31]. Program files for training the models and testing them are wrapped in S1 Program. Full version including web-end code is released under GNU GPL v3 at https://bitbucket.org/forrestbao/influx

Model evaluation and cross validation

Considering the limited number of samples in the current dataset, we adopted a 10-fold cross validation. An N-fold cross validation works as follows. All samples in our dataset are spliced into N equal parts. In each iteration, N − 1 parts are used as the training set, while the remaining as the test set. In the next iteration, the test set will be rotated to another part of the data, and the training set will consist of all other samples. This procedure will stop when all parts of the data have been incorporated into the test set exactly once, and training set exactly N − 1 times. Finally, the accuracy of the model can be calculated by checking the prediction result for each sample. For each flux, the error in cross validation is computed using Mean Squared Error (MSE).

Stoichiometric constraints and boundary

One unique feature of our method is incorporating the overall mass balance through central metabolic pathways. The stoichiometric equations in Fig 1 under steady state are summarized as follows: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21)

Specifically, v1 represents the flux from carbon substrate (either glucose or galactose) to G6P since both glucose and galactose can be catabolized to G6P, vaa1 and vaa2 represent fluxes involved in biomass building block synthesis or extracellular products, while vbm represents carbon fluxes going to biomass from different precursors [32].

A series of linear constraints can be derived from the stoichiometric equations above and used to restrain fluxes predicted by the ML methods: (22) (23) (24) (25) (26) (27) (28) (29) (30)

Among equations listed above, Eq 22 indicates the case for co-metabolism of both C6 sugars. Meanwhile, a list of inequality constraints can be drawn, given that all biomass fluxes are non-negative: (31) (32) (33) (34) (35) (36) (37) (38) (39) (40) (41) (42)

Among all inequality constraints, Eq 39 works well except for the case of zwf knockout, where the direction of Eq 39 could be reversed [33].

Flux adjustment using stoichiometric constraints

We adopted a quadratic programming method similar to minimization of metabolic adjustment (MOMA) [34], to tune fluxes to satisfy the stoichiometric constraints. The CVXOPT package for Python was employed here for quadratic programming [35]. The optimization problem is modeled as (43) where the vector is the flux values predicted by ML, the vector v = [v1, …, v29] is the flux values to be solved in this optimization problem, the function Scaled(⋅) using Min-Max scaling to scale all fluxes into the range [0, 1], the matrix S is obtained from all equality constraints from Eq 22 to Eq 30, and the matrix A is obtained from all inequality constraints from Eq 31 to Eq 42. Notably, the biomass composition for a same species varies significantly under various conditions. Therefore, the quadratic programming looses mass balance constraints toward biomass synthesis. The purpose of scaling fluxes into the same range is to avoid the bias because fluxes have different dynamic ranges. The objective function f(v) can be rewritten into a standard quadratic programming problem using the following steps: (44) where Mini and Maxi are the range of the i-th flux. Since the last term and the coefficient 2 are constants, they can be omitted from the objective function. Hence, Eq 43 can be rewritten in standard quadratic programming form as (45)

For the upper and lower boundaries of each flux, i.e., Maxi and Mini, we used the maximal and minimal values observed in multiple datasets as the default values. Users can manually set desired values for the upper/lower bound of any specific flux in MFlux webpage, or they can opt to not use any boundaries. For instance, users can simply set the boundary of a certain flux as zero if this specific gene is knocked out.

Constraint programming and input checking

To ensure user inputs, e.g., growth rates, oxygen usage, and substrate uptake rates, are biologically meaningful, MFlux first checks the satisfiability (e.g., whether cell growth rate is realistic) of input values [36]. The biological meaningfulness is represented using constraint programming [37], where each input is treated as a variable of a given domain. A set of inputs lacking of biological meaning will cause those constraints to be unsatisfied and MFlux will report an error message to warn the user. The Python module python-constraint [38] is used as the constraint solver.

Overall system design

Different parts of MFlux mentioned above are put together as illustrated in Fig 2. The prediction on 29 fluxes is done via an RBF-kernel SVM, whose outcome will be finalized by quadratic programming. Users can set boundary constraints to represent information about genes that are knocked out on the species, and such information will be used in quadratic programming. If parameters set by the user are not biologically meaningful, a warning message will be displayed. In the future, users will also have the option to enter flux constraints and settings of their own experiment to improve the prediction accuracy of MFlux.

Results

Pathway map and statistical analysis

The core metabolism of bacteria is summarized into a pathway map in Fig 1. Considering the availability of information, 29 major fluxes with 14 potential substrates were used to represent a universal heterotrophic carbon metabolism for non-photosynthetic bacteria species, which includes glycolysis, the tricarboxylic acid (TCA) cycle, the pentose phosphate (PP) pathway, the Entner–Doudoroff (ED) pathway, the glyoxylate shunt and the anaplerotic pathway. It is difficult for 13C-MFA to precisely resolve the anaplerotic pathway fluxes [39]. Information on the anaplerotic pathway is either incomplete or not precise in many publications in our dataset. Consequently, we simplified the anaplerotic pathway into two reversible fluxes. Similarly, we ignored several overflow fluxes which occasionally appear in 13C-MFA anaerobic metabolisms (e.g., the secretion of formate, butyrate, or pyruvate), because of lacking sufficient samples for machine learning. The omission of those fluxes can also partially explain the high prediction error in some fluxes (e.g., v8: Pyruvate → Acetyl-CoA).

By statistical analysis, we determined the variation between each flux profile and the average flux profile from our 13C-MFA dataset. The average value, the range, and the 95% confidence interval for each flux are shown in Fig 3. The most conservative fluxes from our dataset include the non-oxidative pentose phosphate pathway and the glyoxylate shunt. The former pathway supplies precursors for bio-synthesizing amino acids (i.e., histidine, phenylalanine, and tyrosine) and nucleotides. The latter acts as an alternative carbon reserving path to the TCA cycle and is inhibited by the presence of glucose (most 13C-MFA is based on the glucose metabolism). All 29 fluxes are found to have a relatively narrow confidence interval compared to possible flux ranges, suggesting that fluxes of different bacteria species varies in a relatively small range. This is because most 13C-MFA studies are focusing on models species (e.g., E. coli and B. subtilis) and glucose based metabolism, while there are much less MFA efforts to study non-model species or metabolism of carbon substrates other than sugars (i.e., bias of fluxome research across).

thumbnail
Fig 3. Overview of central metabolic fluxes collected in our dataset.

“Flux range” represents the variation of each flux in the 13C-MFA dataset. “95% confidence interval” indicates that 95% of flux data were within a small range. “Average flux value” is the average value in each flux based on all data in our 13C-MFA dataset.

https://doi.org/10.1371/journal.pcbi.1004838.g003

Optimization of algorithms and parameters

To decide the most suitable ML algorithm, we first performed a grid search in the parameter space, using a dataset of wild type (WT) samples only. The best results of three different algorithms (for SVM, linear kernel only here) are presented in Fig 4. SVM makes better predictions than either the decision tree or k-NN on most fluxes. After this step, we carried out a second round of grid search to optimize parameters and improve the performance of SVM on the whole phenotype (WP) dataset (both WT and engineered). Both the linear kernel and radial bias function (RBF) kernel were included in this round of grid search.

thumbnail
Fig 4. A comparison of three ML algorithms: SVM, k-NN, and decision tree.

The best cross-validation results on 29 fluxes are compared. All tests in this step were performed on the WT dataset only.

https://doi.org/10.1371/journal.pcbi.1004838.g004

Better cross validation was expected from the SVM models trained on the WT dataset, rather than on the WP dataset, while sophisticated genetic variations are not included in the WT dataset. However, cross-validation results refuted our initial thought: the models from the WP dataset demonstrated significantly better performance than those trained on the WT dataset (data shown in Fig 5). This result can be interpreted as that the size of the training set is a major factor affecting the model quality, especially when the training set is relatively small (the sizes of WT and WP datasets are about 150 and 450 samples, respectively). We also compared the SVM results using the linear kernel with those using the RBF kernel, and the RBF kernel showed slightly better performance (Fig 6). The parameter set producing the most accurate cross-validation result was used to configure MFlux. Notably, prediction on v11 (the second step of the oxidative PP pathway) and v24 (the glyoxylate shunt) have relatively large variations. Two factors may contribute to this fact. Both v11 and v24 have relatively narrow ranges (see Fig 1) and consequently even small numerical variations will generate larger relative errors for both fluxes. Meanwhile, genetic modifications may influence both v11 (e.g., zwf knockout [40]) and v24 (e.g., ppc knockout [41]) significantly. For instance, knocking out zwf in E. coli will cause a zero flux in v10 (the oxidative pentose phosphate pathway, OPP pathway) [42]. However, the lack of sufficient information on flux re-organization mechanisms in engineered microbes reduces ML predictability. This is because most engineered microbial fluxomics studies are focused on a few model species such as E. coli. To resolve this problem, the MFlux platform allows the users to manually set the boundaries of central fluxes to improve prediction quality (e.g., setting a zero flux through the OPP pathway for E. coli zwf mutant).

thumbnail
Fig 5. Best results by SVM on WT and WP datasets.

Grid searches are performed on both linear and RBF kernels. The results from WP dataset are much better than those from the WT dataset. The result indicated that the size of the dataset is an important factor affecting the predictive power of machine learning models.

https://doi.org/10.1371/journal.pcbi.1004838.g005

thumbnail
Fig 6. A comparison between linear-kernel SVM and RBF-kernel SVM.

The best cross-validation results of linear kernel and RBF kernel after grid searches on WP dataset are very similar. The RBF kernel is employed in the final model for flux prediction.

https://doi.org/10.1371/journal.pcbi.1004838.g006

Flux correction by quadratic programming

After parameter optimization, the SVM models of the best parameter sets can predict with relatively small error. However, the flux profile predicted by the ML method does not necessarily satisfy the inherent stoichiometric constraints of metabolic networks because the ML methods do have big enough dataset at this stage to reflect this. The situation could get even worse where specific fluxes predicted by the ML algorithm may go beyond a biologically meaningful range (e.g., the predicted glyoxylate shunt flux v24 may have a negative value). To address those issues, we employed quadratic programming for flux correction as described in the Methods section. More rational results with improved accuracy are expected after flux correction. An essential assumption of this step is that ML predictions are relatively close to real values reported in the literature. This hypothesis is backed by our cross-validation results further validated in the following case studies.

Case studies

To demonstrate the functionality of MFlux, we carried out tests on 20 cases, and the results are illustrated in Fig 7. Brief information for each case is listed in Table 1, and comprehensive results are included in S1 and S2 Tables. In general, MFlux can achieve decent flux predictions. Here we demonstrate two cases which are Cases 8 and 16.

thumbnail
Fig 7. Summary of root mean squared error (RMSE) from 20 case studies: averaged flux from 13C-MFA dataset, ML-only, and MFlux (ML + quadratic programming).

The average RMSE is 7.7 from ML-only, and 5.6 from MFlux. Detailed information is in S1 and S2 Tables.

https://doi.org/10.1371/journal.pcbi.1004838.g007

thumbnail
Table 1. Summary of 20 cases of study.

Glc, glucose; Xyl, xylose; Lac, lactate; Ace, acetate; KO, knockout.

https://doi.org/10.1371/journal.pcbi.1004838.t001

In Case 8, B. subtilis strain uptakes the mixed substrates succinate and glutamate. To illustrate mixed substrates co-metabolisms, we tested MFlux with 13C-MFA data of B. subtilis reported by Chubukov et al. [44]. Microbial fermentation fed with multiple substrates of low price is promising for the biotechnology industry. However, there are very few quantitative analyses of this topic. In this test, we adopted the same set of parameters found in the literature (S1 Table, Case 8) as the inputs of MFlux. For flux correction, we directly took the default boundary settings for quadratic programming. A comparison of flux profiles reported by 13C-MFA, predicted by ML only, and predicted by MFlux (i.e., ML + quadratic programming) is illustrated in Fig 8. ML-only approach and MFlux accurately predict on most fluxes, closely matching the 13C-MFA flux profiles with Root Mean Squared Error (RMSE) under 5. For ML, the predictions have large variation on specific fluxes (e.g., v11—oxidative PP pathway and v19– TCA cycle). Quadratic programming can further adjust flux profiles and reduce deviations of flux predictions. The corrected flux profiles also meet the basic stoichiometric relationship of the metabolic network. The final prediction from MFlux shows improvement with RMSE reduces to 3.2.

thumbnail
Fig 8. A comparison of the 13C-MFA flux, the flux predicted by ML only, and the flux predicted by MFlux in Case 8.

B. subtilis was incubated in a shake flask (37 C, 300 rpm, aerobic condition), and supplied with labeled succinate and glutamate as carbon sources in M9 minimal medium. Detailed information is in S1 Table.

https://doi.org/10.1371/journal.pcbi.1004838.g008

In Case 16, G. thermoglucosidasius strain M10EXG grows under microaerobic conditions. G. thermoglucosidasius is a thermophilic and ethanol tolerant bacterium which can convert both hexose and pentose into ethanol [28]. To predict its central fluxomes, the parameter set used is listed in S1 Table, along with the default boundary settings for flux correction. A heat map (Fig 9) visualizes 13C-MFA fluxes with ML-only fluxes and MFlux results. The results are encouraging: ML-only prediction gives an RMSE of 4.0, while MFlux uses both ML and quadratic programming to improve the prediction to an RMSE of only 3.0. Among the 20 case studies, the average flux set has very large variations (RMSE of 33.5) from actual 13C-MFA fluxes (S2 Table). In this case, MFlux reduces the deviations of predicted fluxes from 13C-MFA values.

thumbnail
Fig 9. A comparison of the 13C-MFA flux, the flux predicted by ML only, and the flux predicted by MFlux in Case 16.

G. thermoglucosidasius M10EXG was incubated in sealed bottles (micro-aerobic condition), supplied with glucose as a carbon source. Detailed information is in S2 Table.

https://doi.org/10.1371/journal.pcbi.1004838.g009

For species with genetic modifications in major pathways (Cases 2, 3, 4, 12, and 13, E. coli and C. glutamicum), MFlux predictions have an RMSE between 5 and 10, higher than the RMSE for prediction of wild type strains. Since MFlux is currently unable to capture complex regulatory mechanisms of flux reorganization, human-computer interaction can be employed by manually tuning boundary values of certain fluxes to improve flux prediction quality. For example, knocking out ppc on E. coli may activate the glyoxylate shunt [41, 42]. The users can assign a non-zero lower boundary of the glyoxylate shunt when running MFlux.

Improving flux balance analysis of microbial metabolism via MFlux

Stoichiometry-based flux balance analysis (FBA) is an important mechanistic tool to predict unknown cell metabolism [50]. Accurate FBA prediction relies highly on setting the objective function and the flux constraints appropriately (based on thermodynamics or experimental analysis). Here, we compare FBA with MFlux for predicting E. coli metabolisms. The latest version of E. coli iJO1366 genome-scale model (2583 fluxes) was used [51]. Two comparative case studies were performed on E. coli fluxomes: one case for glucose based 13C-MFA via parallel labeling experiments [12] and the other for glucose and glycerol co-utilization (unpublished data from the Shimizu Group). Neither of the test cases was included in the training set of MFlux. Given 13C-MFA results as the control, MFlux results apparently have smaller RMSEs than FBA predictions. In the first case, the FBA has an RMSE of 11.3, while MFlux has an RMSE of 6.5 (Fig 10A). In the second case, the FBA has an RMSE of 22.5, while MFlux has an RMSE of 5.1 (Fig 10B). To circumvent variations caused by alternative solutions in FBA, we also employed pFBA and geometricFBA for both cases [52, 53] (S2 Table). In general, pFBA does not show better results compared with FBA for either case, while geometricFBA does not converge in our calculation.

thumbnail
Fig 10. A comparison of the 13C-MFA flux, the flux predicted by MFlux, and the flux predicted by FBA.

FBA analysis is simulated by an E. coli iJO1366 model (latest version) with default boundary settings from the reference [54]. The default values of growth associated maintenance energy (GAM) and non-growth associated maintenance energy (NGAM) were adopted. A) E. coli fluxome of glucose metabolism was precisely measured via parallel labeling experiments (a recent paper not in our dataset) [12]. B) E. coli fluxome of glycerol and glucose co-metabolism as measured by Drs. Yao and Shimizu (unpublished data). The E. coli strain was cultured in chemostat fermentor with a working volume of 1 L(37 C). The dilution rate in the continuous culture was 0.35 h−1. [1-13C] glucose and [1, 3-13C] glycerol were used for tracer experiments. The flux calculation is based on a previous method [42]. The RMSE from FBA is 22.5, while the RMSE from MFlux (this work) is 5.1. The COBRA toolbox running on MATLAB R2012b was employed for FBA/pFBA/geometricFBA simulation, and Gurobi 5.5 was used for linear programming. Detailed information is included in S2 Table.

https://doi.org/10.1371/journal.pcbi.1004838.g010

FBA alone has given good predictions of growth rate as well as input and output fluxes, but not of intercellular fluxes. It is difficult to obtain actual P/O ratios, the ATP maintenance cost, the oxygen flux, and the transhydrogenase activities [55]. These energy/cofactor variables strongly affect the fluxes in the oxidative PP pathway (NADPH generation) and the TCA cycle (NADH, NADPH, and FADH2 generation). Without proper flux constraints and objective functions, it is more challenging for FBA to narrowly determine intracellular fluxomes in suboptimal metabolisms, especially for co-metabolism dual substrates because of the large solution space for the cell metabolism to optimize biomass growth using two substrates. As a complementary tool, MFlux may offer a quick metabolic overview and provide biologically meaningful flux boundaries to reduce FBA solution spaces when proper constraints for FBA are unavailable.

Discussion

Metabolic robustness of fluxome patterns among microbial species

“Robustness” was originally defined as the closed-loop process stability under perturbations in the control field. This definition is applicable to biochemical networks. To maintain the physiological output (i.e., the fluxome) within a desired range, microorganisms employ sophisticated control disciplines at different architecture levels, from the genome to the phenotype. In contrast to chaotic transcriptional profiles, the microbial fluxome shows robustness so that cells can survive in constantly-altering environments or in response to genetic mutations [5658]. Metabolic rigidity at the flux level was first reported by Stephanopoulos in the early 1990s [59, 60]: NADPH is important for anabolism in the exponential growth phase, and the flux ratio around glucose-6-P node is rigid to form NADPH [60]. Moreover, 12 precursors from the central metabolism are required for biomass formation, which all have relatively small variations that are mainly dependent on biomass compositions. Due to both thermodynamic and mass balance constraints, cell metabolism aims to minimize variations in flux ratios under environmental perturbations. This rule also works for engineered microbes with moderately overexpressed pathways or strains from random mutations or deletions of non-essential genes. The feature of metabolic robustness facilitates ML applications.

Flux pattern recognition enables MFlux to predict metabolism of new species by learning from a small set of fluxome information from the same genus. For example, the metabolisms of P. aeruginosa, P. fluorescens, and P. putida have been studied by 13C-MFA in the past decade [6165]. The results show that different Pseudomonas species employ remarkably identical fluxomics types: they employ a highly active ED pathway for glycolytic metabolism and keep a low flux on the PP pathway for biomass synthesis, due to the lack of the pfk gene [66]. The ED pathway has less cost for protein formation than the Embden–Meyerhof–Parnas (EMP) pathway, yet only one ATP is formed per glucose [67, 68]. Pseudomonas species have slow cell growth rates and their aerobic metabolisms do not yield by-products. They also demonstrate a very active pyruvate shunt (MALPYR) and NADPH overproduction flux (a benefit for counteracting oxidative stress). On the other hand, the TCA cycle in Pseudomonas species show plasticity under genetic and environmental variations [69], and can respond to increased ATP and NADH demands under stress conditions [70].

For different bacterial species (e.g., E. coli and Bacillus), their fluxomes (e.g., glucose metabolisms) can be similar, because central fluxes in catabolism are regulated by energy and building block requirements that show much smaller variations than genome or transcriptional differences. On the other hand, change of carbon substrates may alternate flux distributions. For example, co-utilization of glucose and glycerol in E. coli cause significant re-organization of fluxomes. In a same microbial strain, different fluxome patterns can be employed for metabolizing different substrates (e.g., glucose-based fluxome vs acetate based fluxomes). Recognizing these metabolic patterns allows the use of a relatively small training set to perform a decent metabolic prediction of diverse metabolic types. Consequently, these common principles of certain classes of microorganisms can be captured by machine learning for fluxome predictions.

Limitations of machine learning

There are several major challenges regarding MFlux. First, the 13C-MFA flux in literature may have errors and biases, which would be included in the learning/training process of MFlux and lead to further variations. For example, current 13C-MFA studies are not evenly distributed among a broad scope of microbial genera. Most reported MFAs are concentrated in a few model microbial species or metabolism of only a few substrates (mainly glucose), and thus our current ML cannot predict fluxomes well in certain cases. Such problem (model bias) can be resolved after more 13C-MFA papers for non-model species are included in the database and more constraints are implemented by our platform.

Second, the predictability of ML is limited to species and pathways that are already included in learning. More information and efforts are required to deal with cases of genetically modified strains with engineered pathways that hijack flux for synthesis of diverse commodity chemicals [13]. Currently, 13C-MFA has not widely used by synthetic biology community yet. In future versions of MFlux, new metabolic knowledge and rules should be applied for flux corrections.

Third, it is still difficult to incorporate regulation mechanisms into the current model. For instance, various catabolite repression mechanisms regulate the cell fluxome in the presence of multiple substrates (e.g., glucose shows catabolite repression for fast growing E. coli when both glucose and glycerol are available, Fig 10) [71]. These hierarchy regulations among substrate utilization can be dependent on growth rates or can differ among microbial species (E. coli, Bacillus and Corynebacterium).

Fourth, when oxygen is not available, fast bacterial sugar utilization will activate mixed acid fermentation (e.g., by utilizing lactate dehydrogenase and pyruvate formate lyase) to produce complicated overflow metabolites [13]. This mechanism is also furnished in yeast and mammalian cells. However, 13C-MFA studies on anaerobic metabolisms are much less frequent than on aerobic metabolisms. MFlux cannot predict the complicated patterns of overflow fluxes at this stage.

Fifth, our current dataset is still unable to support ML studies on phototrophic bacterial fluxomes. For phototrophic metabolism, its energy generation (ATP, NADH and NADPH) may not be controlled by substrate catabolism. Some phototrophic bacteria (e.g., cyanobacteria) have versatile autotrophic and photomixotrophic metabolism that is highly sensitive to light and substrate availability. Other phototrophs may even have CO2 fixation pathway (such as the reversed TCA cycle). Therefore, our MFlux platform could not make ML predictions but only reports a general description of metabolic features of these species.

Lastly, ML cannot directly estimate fluxes for carbon sources which are not part of the learning dataset. To predict fluxomes for new substrates, users need to assume that similar entry-points of carbon sources into the central metabolic network may cause similar flux distributions (e.g., sucrose has to be treated as a combination of glucose and fructose).

Conclusion

This proof-of-concept study demonstrates that AI methods can facilitate fluxomics research with reasonable precision. 13C-MFA is a very small field of just hundreds of MFA research papers on microbial species published in the past two decades. In the long term, ML methods may solve this problem: with a large and reliable fluxomics dataset and more information from 13C-MFA and AI scientists, the future MFlux model can make broad-scope metabolism predictions. To sum up, MFlux presents the first platform introducing ML in the field of fluxomics and it will be continuously updated and improved. It will inspire the development of similar computational tools to advance omics and metabolic engineering fields [72].

Supporting Information

S1 Program. MFlux Computer Program (Source code).

Python scripts in a ZIP file.

https://doi.org/10.1371/journal.pcbi.1004838.s001

(ZIP)

S1 Table. Results of 20 case studies.

Detailed information for 20 cases studies using MFlux, including literature references, input conditions, 13C-MFA flux, the flux profiles predicted by ML, and the flux profiles predicted by MFlux with additional constraints.

https://doi.org/10.1371/journal.pcbi.1004838.s002

(XLSX)

S2 Table. Detailed information of the comparison with FBA/pFBA.

The information of constraints, objective function, and simulation results.

https://doi.org/10.1371/journal.pcbi.1004838.s003

(XLSX)

Acknowledgments

The authors would like to appreciate helpful suggestions by Dr. Eric You Xu (Fitbit Inc.) and Dr. Lian He (Washington University).

Author Contributions

Conceived and designed the experiments: SGW YJT. Performed the experiments: SGW YW WJ TO FSB. Analyzed the data: SGW RY XZ KS YJT FSB. Contributed reagents/materials/analysis tools: XZ YJT FSB. Wrote the paper: SGW KS YJT FSB.

References

  1. 1. Winter G, Krömer JO. Fluxomics–connecting’omics analysis and phenotypes. Environmental Microbiology. 2013;15(7):1901–1916. pmid:23279205
  2. 2. Chen X, Alonso AP, Allen DK, Reed JL, Shachar-Hill Y. Synergy between 13C-metabolic flux analysis and flux balance analysis for understanding metabolic adaption to anaerobiosis in E. coli. Metabolic Engineering. 2011;13(1):38–48. pmid:21129495
  3. 3. Tang YJ, Chakraborty R, Martín HG, Chu J, Hazen TC, Keasling JD. Flux analysis of central metabolic pathways in Geobacter metallireducens during reduction of soluble Fe(III)-nitrilotriacetic acid. Applied and Environmental Microbiology. 2007;73(12):3859–3864. pmid:17468285
  4. 4. Tang JKH, You L, Blankenship RE, Tang YJ. Recent advances in mapping environmental microbial metabolisms through 13C isotopic fingerprints. Journal of The Royal Society Interface. 2012;9(76):2767–2780.
  5. 5. Yim H, Haselbeck R, Niu W, Pujol-Baxley C, Burgard A, Boldt J, et al. Metabolic engineering of Escherichia coli for direct production of 1, 4-butanediol. Nature Chemical Biology. 2011;7(7):445–452. pmid:21602812
  6. 6. Becker J, Zelder O, Häfner S, Schröder H, Wittmann C. From zero to hero–Design-based systems metabolic engineering of Corynebacterium glutamicum for L-lysine production. Metabolic Engineering. 2011;13(2):159–168. pmid:21241816
  7. 7. He L, Xiao Y, Gebreselassie N, Zhang F, Antoniewicz MR, Tang YJ, et al. Central metabolic responses to the overproduction of fatty acids in Escherichia coli based on 13C-metabolic flux analysis. Biotechnology and Bioengineering. 2014;111(3):575–585. pmid:24122357
  8. 8. Antoniewicz MR, Kelleher JK, Stephanopoulos G. Elementary metabolite units (EMU): a novel framework for modeling isotopic distributions. Metabolic Engineering. 2007;9(1):68–86. pmid:17088092
  9. 9. Weitzel M, Nöh K, Dalman T, Niedenführ S, Stute B, Wiechert W. 13CFLUX2–high-performance software suite for 13C-metabolic flux analysis. Bioinformatics. 2013;29(1):143–145. pmid:23110970
  10. 10. Quek LE, Wittmann C, Nielsen LK, Krömer JO. OpenFLUX: efficient modelling software for 13C-based metabolic flux analysis. Microbial Cell Factories. 2009;8:25. pmid:19409084
  11. 11. Zamboni N, Fischer E, Sauer U. FiatFlux–a software for metabolic flux analysis from 13C-glucose experiments. BMC Bioinformatics. 2005;6(1):209. pmid:16122385
  12. 12. Crown SB, Long CP, Antoniewicz MR. Integrated 13C-metabolic flux analysis of 14 parallel labeling experiments in Escherichia coli. Metabolic Engineering. 2015;28:151–158. pmid:25596508
  13. 13. Antoniewicz MR, Kraynie DF, Laffend LA, González-Lergier J, Kelleher JK, Stephanopoulos G. Metabolic flux analysis in a nonstationary system: fed-batch fermentation of a high yielding strain of E. coli producing 1, 3-propanediol. Metabolic Engineering. 2007;9(3):277–292. pmid:17400499
  14. 14. Nöh K, Grönke K, Luo B, Takors R, Oldiges M, Wiechert W. Metabolic flux analysis at ultra short time scale: isotopically non-stationary 13C labeling experiments. Journal of Biotechnology. 2007;129(2):249–267. pmid:17207877
  15. 15. Tang YJ, Martin HG, Myers S, Rodriguez S, Baidoo EE, Keasling JD. Advances in analysis of microbial metabolic fluxes via 13C isotopic labeling. Mass Spectrometry Reviews. 2009;28(2):362–375. pmid:19025966
  16. 16. Zhuang WQ, Yi S, Bill M, Brisson VL, Feng X, Men Y, et al. Incomplete Wood–Ljungdahl pathway facilitates one-carbon metabolism in organohalide-respiring Dehalococcoides mccartyi. Proceedings of the National Academy of Sciences. 2015;111(17):6419–6424.
  17. 17. Tarca AL, Carey VJ, Chen X, Romero R, Draghici S. Machine learning and its applications to biology. PLoS Computational Biology. 2007;3(6):e116. pmid:17604446
  18. 18. Kell DB. Metabolomics, modelling and machine learning in systems biology–towards an understanding of the languages of cells. FEBS Journal. 2006;273(5):873–894.
  19. 19. Dale JM, Popescu L, Karp PD. Machine learning methods for metabolic pathway prediction. BMC Bioinformatics. 2010;11(1):15. pmid:20064214
  20. 20. Rätsch G, Sonnenburg S, Srinivasan J, Witte H, Müller KR, Sommer RJ, et al. Improving the Caenorhabditis elegans genome annotation using machine learning. PLoS Computational Biology. 2007;3(2):e20. pmid:17319737
  21. 21. Ye QH, Qin LX, Forgues M, He P, Kim JW, Peng AC, et al. Predicting hepatitis B virus–positive metastatic hepatocellular carcinomas using gene expression profiling and supervised machine learning. Nature Medicine. 2003;9(4):416–423. pmid:12640447
  22. 22. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine. 2002;8(1):68–74. pmid:11786909
  23. 23. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics. 2000;16(10):906–914. pmid:11120680
  24. 24. Supek F, Peharec P, Krsnik-Rasol M, Šmuc T. Enhanced analytical power of SDS-PAGE using machine learning algorithms. Proteomics. 2008;8(1):28–31. pmid:18046695
  25. 25. Mahadevan S, Shah SL, Marrie TJ, Slupsky CM. Analysis of metabolomic data using support vector machines. Analytical Chemistry. 2008;80(19):7562–7570. pmid:18767870
  26. 26. Zhang Z, Shen T, Rui B, Zhou W, Zhou X, Shang C, et al. CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by 13C-fluxomics. Nucleic Acids Research. 2014;43:D549–D557.
  27. 27. Sauer U, Eikmanns BJ. The PEP–pyruvate–oxaloacetate node as the switch point for carbon flux distribution in bacteria. FEMS Microbiology Reviews. 2005;29(4):765–794. pmid:16102602
  28. 28. Tang YJ, Sapra R, Joyner D, Hazen TC, Myers S, Reichmuth D, et al. Analysis of metabolic pathways and fluxes in a newly discovered thermophilic and ethanol-tolerant Geobacillus strain. Biotechnology and Bioengineering. 2009;102(5):1377–1386. pmid:19016470
  29. 29. Peters-Wendisch PG, Kreutzer C, Kalinowski J, Pátek M, Sahm H, Eikmanns BJ. Pyruvate carboxylase from Corynebacterium glutamicum: characterization, expression and inactivation of the pyc gene. Microbiology. 1998;144(4):915–927. pmid:9579065
  30. 30. Toya Y, Ishii N, Nakahigashi K, Hirasawa T, Soga T, Tomita M, et al. 13C-metabolic flux analysis for batch culture of Escherichia coli and its pyk and pgi gene knockout mutants based on mass isotopomer distribution of intracellular metabolites. Biotechnology Progress. 2010;26(4):975–992. pmid:20730757
  31. 31. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research. 2011;12:2825–2830.
  32. 32. Leighty RW, Antoniewicz MR. COMPLETE-MFA: Complementary parallel labeling experiments technique for metabolic flux analysis. Metabolic Engineering. 2013;20:49–55. pmid:24021936
  33. 33. Zhao J, Baba T, Mori H, Shimizu K. Effect of zwf gene knockout on the metabolism of Escherichia coli grown on glucose or acetate. Metabolic Engineering. 2004;6(2):164–174. pmid:15113569
  34. 34. Segre D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic networks. Proceedings of the National Academy of Sciences. 2002;99(23):15112–15117.
  35. 35. Andersen M, Dahl J, Liu Z, Vandenberghe L. Interior-point methods for large-scale cone programming. Optimization for Machine Learning. 2011;p. 55–83.
  36. 36. Towell GG, Shavlik JW. Knowledge-based artificial neural networks. Artificial Intelligence. 1994;70(1):119–165.
  37. 37. Marriott K, Stuckey P. Programming with Constraints: An Introduction. MIT Press; 1998.
  38. 38. Niemeyer G. python-constraint: Constraint Solving Problem solver for Python;. Available from https://labix.org/python-constraint.
  39. 39. Fischer E, Zamboni N, Sauer U. High-throughput metabolic flux analysis based on gas chromatography–mass spectrometry derived 13C constraints. Analytical Biochemistry. 2004;325(2):308–316. pmid:14751266
  40. 40. Zhao J, Baba T, Mori H, Shimizu K. Global metabolic response of Escherichia coli to gnd or zwf gene-knockout, based on 13C-labeling experiments and the measurement of enzyme activities. Applied Microbiology and Biotechnology. 2004;64(1):91–98. pmid:14661115
  41. 41. Fong SS, Nanchen A, Palsson BO, Sauer U. Latent pathway activation and increased pathway capacity enable Escherichia coli adaptation to loss of key metabolic enzymes. Journal of Biological Chemistry. 2006;281(12):8024–8033. pmid:16319065
  42. 42. Peng L, Arauzo-Bravo MJ, Shimizu K. Metabolic flux analysis for a ppc mutant Escherichia coli based on 13C-labelling experiments together with enzyme activity assays and intracellular metabolite measurements. FEMS Microbiology Letters. 2004;235(1):17–23. pmid:15158257
  43. 43. Tännler S, Decasper S, Sauer U. Maintenance metabolism and carbon fluxes in Bacillus species. Microbial Cell Factories. 2008;7(1):19. pmid:18564406
  44. 44. Chubukov V, Uhr M, Le Chat L, Kleijn RJ, Jules M, Link H, et al. Transcriptional regulation is insufficient to explain substrate-induced flux changes in Bacillus subtilis. Molecular Systems Biology. 2013;9(1):709. pmid:24281055
  45. 45. van Ooyen J, Noack S, Bott M, Reth A, Eggeling L. Improved L-lysine production with Corynebacterium glutamicum and systemic insight into citrate synthase flux and activity. Biotechnology and Bioengineering. 2012;109(8):2070–2081. pmid:22392073
  46. 46. Bommareddy RR, Chen Z, Rappert S, Zeng AP. A de novo NADPH generation pathway for improving lysine production of Corynebacterium glutamicum by rational design of the coenzyme specificity of glyceraldehyde 3-phosphate dehydrogenase. Metabolic Engineering. 2014;25:30–37. pmid:24953302
  47. 47. Wang ZJ, Wang P, Liu YW, Zhang YM, Chu J, Huang Mz, et al. Metabolic flux analysis of the central carbon metabolism of the industrial vitamin B12 producing strain Pseudomonas denitrificans using 13C-labeled glucose. Journal of the Taiwan Institute of Chemical Engineers. 2012;43(2):181–187.
  48. 48. Hemme CL, Fields MW, He Q, Deng Y, Lin L, Tu Q, et al. Correlation of genomic and physiological traits of Thermoanaerobacter species with biofuel yields. Applied and Environmental Microbiology. 2011;77(22):7998–8008. pmid:21948836
  49. 49. Tang Y, Pingitore F, Mukhopadhyay A, Phan R, Hazen TC, Keasling JD. Pathway confirmation and flux analysis of central metabolic pathways in Desulfovibrio vulgaris Hildenborough using gas chromatography-mass spectrometry and Fourier transform-ion cyclotron resonance mass spectrometry. Journal of Bacteriology. 2007;189(3):940–949. pmid:17114264
  50. 50. Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nature Biotechnology. 2010;28(3):245–248. pmid:20212490
  51. 51. Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, et al. A comprehensive genome-scale reconstruction of Escherichia coli metabolism–2011. Molecular Systems Biology. 2011;7(1):535. pmid:21988831
  52. 52. Lewis NE, Hixson KK, Conrad TM, Lerman JA, Charusanti P, Polpitiya AD, et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome-scale models. Molecular Systems Biology. 2010;6(1):390. pmid:20664636
  53. 53. Smallbone K, Simeonidis E. Flux balance analysis: a geometric perspective. Journal of Theoretical Biology. 2009;258(2):311–315. pmid:19490860
  54. 54. Orth JD, Palsson B. Gap-filling analysis of the iJO1366 Escherichia coli metabolic network reconstruction for discovery of metabolic functions. BMC Systems Biology. 2012;6(1):30. pmid:22548736
  55. 55. Wu SG, He L, Wang Q, Tang YJ, An ancient Chinese wisdom for metabolic engineering: Yin-Yang. Microbial Cell Factories. 2015;14(1):39. pmid:25889067
  56. 56. Fischer E, Sauer U. Large-scale in vivo flux analysis shows rigidity and suboptimal performance of Bacillus subtilis metabolism. Nature Genetics. 2005;37(6):636–640. pmid:15880104
  57. 57. Schuetz R, Kuepfer L, Sauer U. Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Molecular Systems Biology. 2007;3(1):119. pmid:17625511
  58. 58. Tang YJ, Martin HG, Deutschbauer A, Feng X, Huang R, Llora X, et al. Invariability of central metabolic flux distribution in Shewanella oneidensis MR-1 under environmental or genetic perturbations. Biotechnology Progress. 2009;25(5):1254–1259. pmid:19610125
  59. 59. Stephanopoulos G. Metabolic fluxes and metabolic engineering. Metabolic Engineering. 1999;1(1):1–11. pmid:10935750
  60. 60. Stephanopoulos G, Vallino JJ. Network rigidity and metabolic engineering in metabolite overproduction. Science. 1991;252(5013):1675–1681. pmid:1904627
  61. 61. Lien SK, Niedenführ S, Sletta H, Nöh K, Bruheim P. Fluxome study of Pseudomonas fluorescens reveals major reorganisation of carbon flux through central metabolic pathways in response to inactivation of the anti-sigma factor MucA. BMC Systems Biology. 2015;9(1):6. pmid:25889900
  62. 62. Fuhrer T, Fischer E, Sauer U. Experimental identification and quantification of glucose metabolism in seven bacterial species. Journal of Bacteriology. 2005;187(5):1581–1590. pmid:15716428
  63. 63. Wierckx N, Ruijssenaars HJ, de Winde JH, Schmid A, Blank LM. Metabolic flux analysis of a phenol producing mutant of Pseudomonas putida S12: verification and complementation of hypotheses derived from transcriptomics. Journal of Biotechnology. 2009;143(2):124–129. pmid:19560494
  64. 64. del Castillo T, Ramos JL, Rodríguez-Herva JJ, Fuhrer T, Sauer U, Duque E. Convergent peripheral pathways catalyze initial glucose catabolism in Pseudomonas putida: genomic and flux analysis. Journal of Bacteriology. 2007;189(14):5142–5152. pmid:17483213
  65. 65. Blank LM, Ionidis G, Ebert BE, Bühler B, Schmid A. Metabolic response of Pseudomonas putida during redox biocatalysis in the presence of a second octanol phase. FEBS Journal. 2008;275(20):5173–5190. pmid:18803670
  66. 66. Conway T. The Entner-Doudoroff pathway: history, physiology and molecular biology. FEMS Microbiology Reviews. 1992;103(1):1–28.
  67. 67. Bar-Even A, Flamholz A, Noor E, Milo R. Rethinking glycolysis: on the biochemical logic of metabolic pathways. Nature Chemical Biology. 2012;8(6):509–517. pmid:22596202
  68. 68. Flamholz A, Noor E, Bar-Even A, Liebermeister W, Milo R. Glycolytic strategy as a tradeoff between energy yield and protein cost. Proceedings of the National Academy of Sciences. 2013;110(24):10039–10044.
  69. 69. Berger A, Dohnt K, Tielen P, Jahn D, Becker J, Wittmann C. Robustness and plasticity of metabolic pathway flux among uropathogenic isolates of Pseudomonas aeruginosa. PloS One. 2014;9(4).
  70. 70. Ebert BE, Kurth F, Grund M, Blank LM, Schmid A. Response of Pseudomonas putida KT2440 to increased NADH and ATP demand. Applied and Environmental Microbiology. 2011;77(18):6597–6605. pmid:21803911
  71. 71. Yao R, Hirose Y, Sarkar D, Nakahigashi K, Ye Q, Shimizu K. Catabolic regulation analysis of Escherichia coli and its crp, mlc, mgsA, pgi and ptsG mutants. Microbial Cell Factories. 2011;10(67):1475–2859 pmid:21831320.
  72. 72. Wu SG, Shimizu K, Tang JKH, Tang YJ. Facilitate Collaborations among Synthetic Biology, Metabolic Engineering and Machine Learning. ChemBioEng Reviews. 2016;3(2):1–11 .