Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Proteome-scale understanding of relationship between homo-repeat enrichments and protein aggregation properties

  • Oxana V. Galzitskaya ,

    Roles Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing – original draft, Writing – review & editing

    ogalzit@vega.protres.ru

    Affiliation Group of Bioinformatics, Institute of Protein Research, Russian Academy of Science, Pushchino, Moscow Region, Russia

  • Miсhail Yu. Lobanov

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – review & editing

    Affiliation Group of Bioinformatics, Institute of Protein Research, Russian Academy of Science, Pushchino, Moscow Region, Russia

Abstract

Expansion of homo-repeats is a molecular basis for human neurological diseases. We are the first who studied the influence of homo-repeats with lengths larger than four amino acid residues on the aggregation properties of 1449683 proteins across 122 eukaryotic and bacterial proteomes. Only 15% of proteins (215481) include homo-repeats of such length. We demonstrated that RNA-binding proteins with a prion-like domain are enriched with homo-repeats in comparison with other non-redundant protein sequences and those in the PDB. We performed a bioinformatics analysis for these proteins and found that proteins with homo-repeats are on average two times longer than those in the whole database. Moreover, we are first to discover that as a rule, homo-repeats appear in proteins not alone but in pairs: hydrophobic and aromatic homo-repeats appear with similar ones, while homo-repeats with small, polar and charged amino acids appear together with different preferences. We elaborated a new complementary approach to demonstrate the influence of homo-repeats on their host protein aggregation properties. We have shown that addition of artificial homo-repeats to natural and random proteins results in intensification of aggregation properties of the proteins. The maximal effect is observed for the insertion of artificial homo-repeats with 5–6 residues, which is consistent with the minimal length of an amyloidogenic region. We have also demonstrated that the ability of proteins with homo-repeats to aggregate cannot be explained only by the presence of long homo-repeats in them. There should be other characteristics of proteins intensifying the aggregation property including such as the appearance of homo-repeats in pairs in the same protein. We are the first who elaborated a new approach to study the influence of homo-repeats present in proteins on their aggregation properties and performed an appropriate analysis of the large number of proteomes and proteins.

Introduction

Eukaryotic and bacterial proteomes contain proteins bearing simple amino acid motifs including homo-repeats consisting of a single multiply repeated amino acid. The understanding of the amino acid tandem repeat function in different proteomes is one of the important tasks of molecular biology. It turned out that some homo-repeats play more important roles in the biological processes [1] and are associated with human diseases than it was previously recognized. Strong selection of homo-repeats in evolution for all proteomes has been demonstrated [2].

The question about the influence of homo-repeats in proteins on the increasing or decreasing the fraction of disordered residues was considered in several publications [37]. It was shown that the occurrence of homo-repeats with hydrophobic amino acids results in a decreasing fraction of disordered residues, at the same time this value for charge, polar and small amino acid residues increases. The maximum fraction of disordered residues was obtained for proteins with lysine and arginine homo-repeats, and the minimum value corresponds to valine and leucine homo-repeats [7]. The recent review by Darling and Uversky concentrates on the intrinsic disorder in proteins with pathogenic repeat expansions, considering only alanine and glutamine homo-repeats [8].

As we demonstrated earlier, that the minimal size of homo-repeats varies with amino acid types and proteomes. We have found that homo-repeats containing polar or small amino acids S, P, H, E, D, K, Q, and N are enriched in structural disorder as well as protein- and RNA-interactions. We observed that E, S, Q, G, L, P, D, A, and H homo-repeats are strongly associated with the occurrence in human diseases. Moreover, S, E, P, A, Q, D, and T homo-repeats are significantly enriched in neuronal proteins associated with autism and other disorders [2].

It was shown that proteins containing alanine repeats of ten and more residues were able to aggregate [9]. It should be stressed that expansion of homo-repeats is a molecular basis for at least 18 human neurological diseases. Several proteins were found to be associated with poly-A (alanine) developmental diseases (9 inherited human diseases) [8,10]: cleidocranial dysplasia (CCD, gene RUNX2), congenital central hypo-ventilation syndrome (CCHS, gene PHOX2B), hand–foot–genital syndrome (HFGS, gene HOXA13), blepharophimosis (BPEIS, gene FOXL2), oculopharyngeal muscular dystrophy (OPMD, gene PABPN1), infantile spasm syndrome (XLMR, gene ARX), X-linked mental retardation and abnormal genitalia (XLAG, gene ARX), X-linked mental retardation and growth hormone deficit (XLMR + GHD, gene SOX3), and holoprosencephaly (HPE, gene ZIC2) [10]. Expansion of poly-Q is implicated in several neurodegenerative diseases, including Huntington’s disease and several spinocerebellar ataxias. It should be noted that the length of the poly-Q repeat is critical to pathogenesis. Although a repeat of 40 glutamine residues is present in the forkhead box P2 transcription factor normal allele, the protein has not been found to be associated with a poly-Q disease [11].

Recently it has been found that local compositional enrichment within protein sequences affects the translation efficiency, abundance, half-life, subcellular localization, and molecular functions of proteins [12]. It should be mentioned several papers about aggregation propensity of the human [13], yeast [14] proteomes, and cytosolic E. coli proteome [15], but without consideration of homo-repeats.

One can suggest that the occurrence of homo-repeats in the protein sequence results in the increasing aggregation ability of the proteins. They are more aggregation-prone. It is well known that an increase in the number of PrP repeats induces spontaneous prion disease [16], whereas repeat deletion retards the disease and diminishes PrPSc formation [17]. In vitro, two extra copies of R2 repeat cause the N-terminal and Middle domains (NM) of SUP35 to aggregate with an abbreviated lag phase, whereas deletion of R2–R5 repeats extends the lag phase [18,19]. Therefore, a large number of repeats will facilitate the correct alignment of intermolecular contacts between protein molecules that drive amyloid formation [20].

Rapidly formed fibrils stimulate aggregation acting as seeds and can greatly decrease with increasing differences in the primary structure. A good example is immunoglobulin domains with different primary structures. It was shown that co-aggregation between different types of domains is not observed when the identity of the primary structure is below 30–40% [21]. The bioinformatics analysis of the tandem homologous domains in large multi-domain proteins revealed homology less than 40%, which probably indicates that the primary structure of proteins is arranged so as to avoid aggregation. One can conclude that modulation of the aggregation propensity is a driving force in protein evolution.

In this respect important questions arise: what lengths and type of homo-repeats can affect aggregation properties of their host proteins? What differences exist between the proteins with homo-repeats and without them? We are the first who have made a bioinformatics analysis of the influence of homo-repeats of different lengths on aggregation properties of their host proteins for the analysis covered all 20 amino acid residues and 122 proteomes.

Results and discussion

Systematic analysis of occurrence of homo-repeats in 1449683 proteins from 122 proteomes and in the different sets of proteins

To investigate the influence of homo-repeats on the aggregation properties of proteins we should define what length of homo-repeat is not random. In our previous analysis we demonstrated what length of amino acid residues is not random [2]. For each of 20 amino acids, this length was determined considering that the occurrence of such lengths of homo-repeats differs at least 10-fold between natural and expected occurrence in 122 proteomes. Therefore, for our analysis we considered the effect of only homo-repeats with the length larger than four amino acid residues (single-amino-acid tandem repeats) in the proteins on the aggregation properties of host proteins from 122 eukaryotic and bacterial proteomes. It should be noted that the lengths of five and six residues are the minimal lengths which are responsible for aggregation or can be considered as amyloidogenic regions [22,23] although dipeptide IlePhe can form amyloid fibrils [24].

In some proteomes there are not sufficient proteins containing homo-repeats for statistics (see Table 1, [25]), therefore we combined all proteins for analysis, and the database includes 1 449 683 (Np) proteins.

thumbnail
Table 1. Number of proteins having at least one pair of homo-repeats.

https://doi.org/10.1371/journal.pone.0206941.t001

In 215 481 proteins (15%) there are homo-repeats with the length of 5 residues and more. Our database includes 380 853 (Nh) homo-repeats for all amino acids. The leader among these homo-repeats is serine. There are 41 253 serine homo-repeats, and only 49 tryptophan ones. The rest values are presented in Fig 1A. First, let us examine common features of proteins with homo-repeats.

thumbnail
Fig 1. Properties of proteins with homo-repeats.

A. Number of proteins with homo-repeats for 20 amino acids in 1 449 683 proteins from 122 proteomes. B. Averaged number of amino acid residues in proteins with homo-repeats for 20 amino acids.

https://doi.org/10.1371/journal.pone.0206941.g001

As seen, the number of proteins with homo-repeats is less than the number of homo-repeats, because some homo-repeats occur in pairs. Green color corresponds to hydrophobic amino acids, orange to hydrophilic and charged ones, and yellow to small amino acids and proline. Hydrophobic homo-repeats occur rarer than the others with the exception of leucine.

Proteins with homo-repeats are on average longer than in the whole database. The average length of proteins in the database is 435 residues (shown by the bold line in Fig 1B), the average length of a protein with homo-repeats ranging from 421 for cysteine homo-repeats to 847 for asparagine homo-repeats. The differences between the average length proteins with homo-repeats and the average length of proteins in the whole database are significant for all with exception of C, F, W, Y, M. The statistical significance was estimated with the Z-score. The distribution of Z-scores can be approximated by a normal distribution. For isoleucine homo-repeat this difference is 5 standard deviations (s.d.), and the probability for this is less than ; for V it is 7 s.d. and the probability is less than . For all the rest the difference is more than 20 s.d. and the probability of an accidental match is too small to count. It should be mentioned that the longer the protein the longer homo-repeat will be.

The percentage of single homo-repeats among all possible ones is presented in Fig 2. If the homo-repeats occur independently of each other in proteins, the proportion of single homo-repeats would be for all amino acids. Meanwhile, even for leucine homo-repeats it is less (73%), although only slightly. But 15% of asparagine homo-repeats are not random. The number of proteins that have at least a couple of homo-repeats for two amino acids is shown in Table 1.

thumbnail
Fig 2. Fraction of single homo-repeats for 20 amino acids occurring in the proteins from 122 proteomes.

https://doi.org/10.1371/journal.pone.0206941.g002

Different style is given according to the Z-values: (1) Here is the number of proteins with homo-repeats for a pair of amino acids i and j. and are the numbers of homo-repeats for amino acids i and j, respectively. is the number of proteins in the database. Bold fontcorresponds to , and italic font to . It is easy to note that the most striking result corresponds to the diagonal of the matrix, i.e., homo-repeats of the same amino acids are often found in pairs in the considered proteins. Moreover, the matrix is divided in two parts: the first one is the cluster of hydrophobic amino acids (CMFILVWY) and the second one includes small and hydrophilic amino acids (AGTSQN EDHRKP). The obtained result that hydrophobic amino acids prefer to occur in pair with hydrophobic ones, and polar, charged and small amino acids in pair with similar amino acids agrees with our previous result that the appearance of the first will decrease the fraction of the disordered residues, at the same time the occurrence of the second will increase the fraction of the disordered residues [7].

Large cluster with small, polar and charge amino acids again divided into 6 smaller clusters. A, G, T, S, Q, N prefer to appear in the same proteins. E and D prefer to appear together, H, R, and K prefer to be in pair with itself. P prefer to be with A, G, Q and P.

It should be noted that basic amino acid homo-repeats (R and K) are not very often combined with other homo-repeats, but are more common than one could randomly expect. The general result is that homo-repeats occur in pairs in the protein chain.

Homo-repeats are important for prion-like domains of RNA-binding proteins

The formation of stress granules and all membrane less compartments (P-bodies, etc…) is considered a composition-driven molecular process. Many of the RNA-binding proteins that make up stress granules have prion-like domains. To verify that homo-repeats are important for some proteins, we considered two databases. One database consists of 49 RNA-binding proteins containing predicted prion-like domains published in [26]. These proteins enriched in some amino acids (see S1 Table). Prion-like domains are predominantly associated with enrichment of Q or N residues [27]. The other database is compiled from the Uniprot in which it is indicated that these proteins are included in the stress granules from the human proteome. In total 102 such proteins have been found. In order to compare these bases, we analyzed PDB (70 147 structures and non-redundant protein sequences (nr) 38 876 450). We estimated the fraction of amino acid residues included in the homo-repeats. We started from the length two, because it is the minimal length of any homo-repeat. It turned out that the fraction of amino acid residues in homo-repeats is larger for RNA-binding proteins with prion-like domains and for 102 proteins from the stress granules than for 70147 protein structures from the PDB, and from the non-redundant 38 876 450 protein sequences until 6 residue length for 49 RNA-binding proteins with prion-like domain and until 3 for 102 human proteins from the stress granules (Fig 3). It is important to underline that RNA-binding proteins with a prion-like domain involved in many protein functions and diseases are connected with misfolding of these proteins.

thumbnail
Fig 3. Occurrence of homo-repeats in the different set of proteins.

Fraction of amino acid residues in homo-repeats versus the length of homo-repeats for 49 RNA-binding proteins with predicted prion-like domains (black circles), 102 proteins from stress granules (white circles), for 70 147 protein structures from the PDB (black triangles), and from the non-redundant 38 876 450 protein sequences (white triangles).

https://doi.org/10.1371/journal.pone.0206941.g003

Influence of homo-repeats on the aggregation properties of proteins

To examine whether homo-repeat enrichment can affect protein aggregation we explored the relationship between enrichment for each amino acid homo-repeat and aggregating properties of proteins. We describe the aggregating properties of proteins considering such the aggregation values as Spos, Sneg and Sall (see Material and methods) for each amino acid residue along the protein sequence using the FoldAmyloid program [28,29]. Comparison of the results for 30 proteins [30] using eight different methods demonstrated that our method is among the best ones (see Table 2).

thumbnail
Table 2. Averaged results of amyloid predictions (amyloidogenic regions) for 30 proteins by various algorithms.

https://doi.org/10.1371/journal.pone.0206941.t002

Also, it should be mentioned the review of Chiti who presented experimental data about the possibility of different methods of predictions of amyloidogenic regions in vivo [38]. He also demonstrated that our method is among the best methods. Recently, 14 different methods for the prediction of protein aggregation propensity have been considered [39].

To observe the impact of homo-repeat in a pure form we performed an additional analysis to understand what properties of the protein chain will be changed after adding homo-repeats in the random sequences and the real proteins from 122 proteomes. To each protein in two bases (random proteome and 122 real proteomes) 20*15 homo-repeats have been added with the length from 1 to 15 residues. Homo-repeats are added in the middle of the chain. If the length of the protein represented an odd number of residues, then a homo-repeat was added between residues M and M+1 (2M+1 = N is the length of the given protein). The difference between Spos (N)—Spos(N-1) is shown in Fig 4. Sneg and Sall were treated by the same procedure (see Fig 4). Spos is the sum of significant positive peaks normalized by the length of the protein. When we add a homo-repeat the length of the protein increases. Therefore, Spos decreases when we add homo-repeat containing hydrophilic amino acids. And likewise the absolute value decreases Sneg when we add homo-repeat with hydrophobic amino acids.

thumbnail
Fig 4. Effect of the single cysteine homo-repeat insertion of different length into the random proteome on Spos, Sneg, and Sall.

https://doi.org/10.1371/journal.pone.0206941.g004

To find the pure influence of a homo-repeat in protein we have added in all sequences, including 2 000 000 random sequences, artificial homo-repeat of different length from 1 and to 15 residues. The maximal effect which we observed for any homo-repeat corresponds to homo-repeat of 5–6 residues long. This result is consistent with the experimental observation that the minimal amyloidogenic fragment has also 5–6 residues. We present results only for cysteine because the results for other amino acids are similar (see S2 Table). For homo-repeats with hydrophilic amino acids the sign and graphs Sneg and Spos are reversed. Through this study, we can estimate the effect of the single homo-repeat on Spos, Sneg, and Sall. The dependences are the same for random and real 122 proteomes (S2 and S3 Tables).

In order to estimate the effect of homo-repeats themselves, we cut the longest homo-repeat for the given amino acid, and then recalculated the Spos, Sneg, and Sall for the protein chain without it. Finally, to assess the impact of all homo-repeats in the considered protein, we also cut out all homo-repeats and recalculated Spos, Sneg, and Sall again.

We can observe the influence of homo-repeats on the aggregation properties by looking from the other side: deleting the main homo-repeat in the first case and then deleting all homo-repeats from the protein.

After characterization of proteins with homo-repeats, we analyzed the aggregation properties of such proteins. For all proteins, we calculated Spos which reflects aggregation properties of proteins. The trivial effect is connected with the occurrence of hydrophobic home-repeats which will enhance the aggregation properties of protein by itself.

The difference between Spos, Sneg, and Sall for proteins with homo-repeats and the entire database cannot be explained only by the occurrence of homo-repeats (Fig 5, data for Sneg, and Sall are presented in Figs 6 and 7). It is evident that for tryptophan and methionine, all the features are exhausted by the longest homo-repeat (Fig 5) (Spos decreases to zero after cutting off the main homo-repeat). But for all other amino acids, the difference between proteins with homo-repeats and the rest of the database is much larger than the impact of actual homo-repeats (Fig 5). Such a way we have demonstrated that homo-repeats enrichments influence on the protein aggregation properties.

thumbnail
Fig 5. Comparison of normalized Spos scores for proteins with homo-repeats with the whole database.

Blue bars correspond to normalized Spos scores for a full chain, red bars correspond to Spos scores for a chain without the main homo-repeat, and green bars correspond to Spos scores for a chain without all homo-repeats.

https://doi.org/10.1371/journal.pone.0206941.g005

thumbnail
Fig 6. Comparison of normalized Sneg scores for proteins with homo-repeats and the whole database.

https://doi.org/10.1371/journal.pone.0206941.g006

thumbnail
Fig 7. Comparison of normalized Sall scores for proteins with homo-repeats and the whole database.

https://doi.org/10.1371/journal.pone.0206941.g007

In this paper, we have demonstrated the influence of homo-repeats with lengths larger than four amino acid residues on the aggregation properties of their host proteins considering 122 eukaryotic and bacterial proteomes. It turned out that proteins with homo-repeats are twice longer than the average length of proteins from 122 proteomes. We have shown that the aggregation properties of proteins with homo-repeats cannot be explained only by the appearance of the main (the longest) homo-repeat in the sequence. We have discovered that, as a rule, homo-repeats occur in pairs in the proteins, though hydrophobic and aromatic homo-repeats most frequently occur in pairs with similar ones, and homo-repeats constructed of polar, charged and small amino acids are prone to be in pair with similar homo-repeat. Considering different sets of proteins, we have demonstrated that the RNA-binding proteins with a prion-like domain have the maximal fraction of homo-repeats in comparison with those in the PDB and non-redundent dataset of sequences.

Materials and methods

FoldAmyloid program

The FoldAmyloid web server is available at http://bioinfo.protres.ru/fold-amyloid/. The program/server takes an amino acid sequence (in the FASTA format) as an input and calculates the profile of the requested type [in this case we used the scale of the expected number of contacts]. If five or more residues in the profile lie above the given cutoff (the default value is 21.4 for the packing density scale), we predict this region as amyloidogenic. Spos is the sum of areas of aggregation peaks, i.e. the area under the peak that lies above the threshold of 21.4, which is then normalized by the protein length (Fig 8). Sneg is the sum of areas of aggregation peaks that lies below the threshold of 21.4. Sall is the sum of aggregation values for each amino acid along the protein chain normalized by the protein length.

thumbnail
Fig 8. Schematic representation of amyloidogenic profile.

The area under the peak that lies above the threshold of 21.4 is colored by red and below the threshold by blue.

https://doi.org/10.1371/journal.pone.0206941.g008

Databases and programs

The HRaP database (http://bioinfo.protres.ru/hrap/) includes 1 449 683 proteins from 122 proteomes. For 215 481 proteins having homo-repeats the user can find the GO annotation. Also, we have considered the set of 49 RNA-binding proteins with predicted prion-like domains by using the prion score [39], 102 proteins from the stress granules, 38 876 450 non-redundant protein sequences and 70 147 protein structures from the PDB.

The random proteome includes 2 000 000 sequences. The lengths of sequences vary from 50 to 550 amino acid residues. An amino acid was chosen randomly according to the frequencies of amino acids obtained from the real 122 proteomes (see Fig 9).

We used the database of 30 proteins and peptides to test the work of different programs that are not created by us [31]: prolactin, calcitonin, apolipoprotein A-I, casein, serum amyloid A1 protein, transthyretin, lactoferrin, semenogelin-1, Aβ42, gelsolin, tau, amylin, lung surfactant, α-synuclein, lysozyme, β2-microglobulin, medin, brain natriuretic peptide, apolipoprotein C-II, odontogenic ameloblast-associated protein, cystatin C, insulin chain A, insulin chain B, β-lactoglobulin, acylphosphatase-2, high mobility group protein B1, cold shock protein, kerato-epithelin, myoglobin, replication protein.

thumbnail
Fig 9. Frequencies of amino acids for 1449683 proteins from 122 proteomes.

https://doi.org/10.1371/journal.pone.0206941.g009

Supporting information

S1 Table. Amino acid composition values for 49 RNA-binding proteins with predicted prion-like domains.

https://doi.org/10.1371/journal.pone.0206941.s001

(XLSX)

S2 Table. Effect of the single homo-repeat insertion of different length into the random proteome on Spos, Sneg, and Sall for 20 amino acids.

https://doi.org/10.1371/journal.pone.0206941.s002

(XLSX)

S3 Table. Effect of the single homo-repeat insertion of different length into the proteins from 122 proteomes on Spos, Sneg, and Sall for 20 amino acids.

https://doi.org/10.1371/journal.pone.0206941.s003

(XLSX)

Acknowledgments

We are grateful to T.B. Kuvshinkina, N.V. Dovidchenko, and Saikat Dutta Chowdhury for assistance in preparation of the manuscript.

References

  1. 1. Siwach P, Ganesh S. Tandem repeats in human disorders: mechanisms and evolution. Front Biosci J Virtual Libr. 2008;13: 4467–4484.
  2. 2. Lobanov MY, Klus P, Sokolovsky IV, Tartaglia GG, Galzitskaya OV. Non-random distribution of homo-repeats: links with biological functions and human diseases. Sci Rep. 2016;6: 26941. pmid:27256590
  3. 3. Jorda J, Xue B, Uversky VN, Kajava AV. Protein tandem repeats—the more perfect, the less structured. FEBS J. 2010;277: 2673–2682. pmid:20553501
  4. 4. Lobanov MY, Furletova EI, Bogatyreva NS, Roytberg MA, Galzitskaya OV. Library of disordered patterns in 3D protein structures. PLoS Comput Biol. 2010;6: e1000958. pmid:20976197
  5. 5. Lobanov MY, Galzitskaya OV. Occurrence of disordered patterns and homorepeats in eukaryotic and bacterial proteomes. Mol Biosyst. 2012;8: 327–337. pmid:22009164
  6. 6. Lobanov MY, Galzitskaya OV. Disordered patterns in clustered Protein Data Bank and in eukaryotic and bacterial proteomes. PloS One. 2011;6: e27142. pmid:22073276
  7. 7. Lobanov MY, Galzitskaya OV. How Common Is Disorder? Occurrence of Disordered Residues in Four Domains of Life. Int J Mol Sci. 2015;16: 19490–19507. pmid:26295225
  8. 8. Darling A, Uversky V. Intrinsic Disorder in Proteins with Pathogenic Repeat Expansions. Molecules. 2017;22: 2027. pmid:29186753
  9. 9. Fan X, Dion P, Laganiere J, Brais B, Rouleau GA. Oligomerization of polyalanine expanded PABPN1 facilitates nuclear protein aggregation that is associated with cell death. Hum Mol Genet. 2001;10: 2341–2351. pmid:11689481
  10. 10. Mularoni L, Ledda A, Toll-Riera M, Albà MM. Natural selection drives the accumulation of amino acid tandem repeats in human proteins. Genome Res. 2010;20: 745–754. pmid:20335526
  11. 11. Robertson AL, Bate MA, Androulakis SG, Bottomley SP, Buckle AM. PolyQ: a database describing the sequence and domain context of polyglutamine repeats in proteins. Nucleic Acids Res. 2011;39: D272–276. pmid:21059684
  12. 12. Cascarina SM, Ross ED. Proteome-scale relationships between local amino acid composition and protein fates and functions. PLoS Comput Biol. 2018;14: e1006256. pmid:30248088
  13. 13. Monsellier E, Ramazzotti M, Taddei N, Chiti F. Aggregation propensity of the human proteome. PLoS Comput Biol. 2008;4: e1000199. pmid:18927604
  14. 14. Tartaglia GG, Caflisch A. Computational analysis of the S. cerevisiae proteome reveals the function and cellular localization of the least and most amyloidogenic proteins. Proteins. 2007;68: 273–278. pmid:17407164
  15. 15. de Groot NS, Ventura S. Protein aggregation profile of the bacterial cytosol. PloS One. 2010;5: e9383. pmid:20195530
  16. 16. Prusiner SB, editor. Prion biology and diseases. 2nd ed. Cold Spring Harbor, N.Y: Cold Spring Harbor Laboratory Press; 2004.
  17. 17. Flechsig E, Shmerling D, Hegyi I, Raeber AJ, Fischer M, Cozzio A, et al. Prion protein devoid of the octapeptide repeat region restores susceptibility to scrapie in PrP knockout mice. Neuron. 2000;27: 399–408. pmid:10985358
  18. 18. Liu JJ, Lindquist S. Oligopeptide-repeat expansions modulate “protein-only” inheritance in yeast. Nature. 1999;400: 573–576. pmid:10448860
  19. 19. Krishnan R, Lindquist SL. Structural insights into a yeast prion illuminate nucleation and strain diversity. Nature. 2005;435: 765–772. pmid:15944694
  20. 20. Galzitskaya OV. Repeats are one of the main characteristics of RNA-binding proteins with prion-like domains. Mol Biosyst. 2015;11: 2210–2218. pmid:26022110
  21. 21. Wright CF, Teichmann SA, Clarke J, Dobson CM. The importance of sequence diversity in the aggregation and evolution of proteins. Nature. 2005;438: 878–881. pmid:16341018
  22. 22. López de la Paz M, Serrano L. Sequence determinants of amyloid fibril formation. Proc Natl Acad Sci U S A. 2004;101: 87–92. pmid:14691246
  23. 23. Thompson MJ, Sievers SA, Karanicolas J, Ivanova MI, Baker D, Eisenberg D. The 3D profile method for identifying fibril-forming segments of proteins. Proc Natl Acad Sci U S A. 2006;103: 4074–4078. pmid:16537487
  24. 24. de Groot NS, Parella T, Aviles FX, Vendrell J, Ventura S. Ile-phe dipeptide self-assembly: clues to amyloid formation. Biophys J. 2007;92: 1732–1741. pmid:17172307
  25. 25. Lobanov MY, Sokolovskiy IV, Galzitskaya OV. HRaP: database of occurrence of HomoRepeats and patterns in proteomes. Nucleic Acids Res. 2014;42: D273–278. pmid:24150944
  26. 26. Li YR, King OD, Shorter J, Gitler AD. Stress granules as crucibles of ALS pathogenesis. J Cell Biol. 2013;201: 361–372. pmid:23629963
  27. 27. Alberti S, Halfmann R, King O, Kapila A, Lindquist S. A Systematic Survey Identifies Prions and Illuminates Sequence Features of Prionogenic Proteins. Cell. 2009;137: 146–158. pmid:19345193
  28. 28. Galzitskaya OV, Garbuzynskiy SO, Lobanov MY. Prediction of amyloidogenic and disordered regions in protein chains. PLoS Comput Biol. 2006;2: e177. pmid:17196033
  29. 29. Garbuzynskiy SO, Lobanov MY, Galzitskaya OV. FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence. Bioinforma Oxf Engl. 2010;26: 326–332. pmid:20019059
  30. 30. Dovidchenko NV, Galzitskaya OV. Computational Approaches to Identification of Aggregation Sites and the Mechanism of Amyloid Growth. Adv Exp Med Biol. 2015;855: 213–239. pmid:26149932
  31. 31. Walsh I, Seno F, Tosatto SCE, Trovato A. PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Res. 2014;42: W301–307. pmid:24848016
  32. 32. Tsolis AC, Papandreou NC, Iconomidou VA, Hamodrakas SJ. A consensus method for the prediction of “aggregation-prone” peptides in globular proteins. PloS One. 2013;8: e54175. pmid:23326595
  33. 33. Fernandez-Escamilla A-M, Rousseau F, Schymkowitz J, Serrano L. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol. 2004;22: 1302–1306. pmid:15361882
  34. 34. Emily M, Talvas A, Delamarche C. MetAmyl: a METa-predictor for AMYLoid proteins. PloS One. 2013;8: e79722. pmid:24260292
  35. 35. Maurer-Stroh S, Debulpaep M, Kuemmerer N, Lopez de la Paz M, Martins IC, Reumers J, et al. Exploring the sequence determinants of amyloid structure using position-specific scoring matrices. Nat Methods. 2010;7: 237–242. pmid:20154676
  36. 36. Ahmed AB, Znassi N, Château M-T, Kajava AV. A structure-based approach to predict predisposition to amyloidosis. Alzheimers Dement J Alzheimers Assoc. 2015;11: 681–690. pmid:25150734
  37. 37. Gasior P, Kotulska M. FISH Amyloid–a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids. BMC Bioinformatics. 2014;15: 54. pmid:24564523
  38. 38. Belli M, Ramazzotti M, Chiti F. Prediction of amyloid aggregation in vivo. EMBO Rep. 2011;12: 657–663. pmid:21681200
  39. 39. Pallarès I, Ventura S. Advances in the prediction of protein aggregation propensity. Curr Med Chem. 2017; pmid:28685682