Amyotrophic Lateral Sclerosis Type 20 - In Silico Analysis and Molecular Dynamics Simulation of hnRNPA1

Bruna Baumgarten Krebs; Joelma Freire De Mesquita

doi:10.1371/journal.pone.0158939

Abstract

Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disease that affects the upper and lower motor neurons. 5–10% of cases are genetically inherited, including ALS type 20, which is caused by mutations in the hnRNPA1 gene. The goals of this work are to analyze the effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on hnRNPA1 protein function, to model the complete tridimensional structure of the protein using computational methods and to assess structural and functional differences between the wild type and its variants through Molecular Dynamics simulations. nsSNP, PhD-SNP, Polyphen2, SIFT, SNAP, SNPs&GO, SNPeffect and PROVEAN were used to predict the functional effects of nsSNPs. Ab initio modeling of hnRNPA1 was made using Rosetta and refined using KoBaMIN. The structure was validated by PROCHECK, Rampage, ERRAT, Verify3D, ProSA and Qmean. TM-align was used for the structural alignment. FoldIndex, DICHOT, ELM, D2P2, Disopred and DisEMBL were used to predict disordered regions within the protein. Amino acid conservation analysis was assessed by Consurf, and the molecular dynamics simulations were performed using GROMACS. Mutations D314V and D314N were predicted to increase amyloid propensity, and predicted as deleterious by at least three algorithms, while mutation N73S was predicted as neutral by all the algorithms. D314N and D314V occur in a highly conserved amino acid. The Molecular Dynamics results indicate that all mutations increase protein stability when compared to the wild type. Mutants D314N and N319S showed higher overall dimensions and accessible surface when compared to the wild type. The flexibility level of the C-terminal residues of hnRNPA1 is affected by all mutations, which may affect protein function, especially regarding the protein ability to interact with other proteins.

Citation: Krebs BB, De Mesquita JF (2016) Amyotrophic Lateral Sclerosis Type 20 - In Silico Analysis and Molecular Dynamics Simulation of hnRNPA1. PLoS ONE 11(7): e0158939. https://doi.org/10.1371/journal.pone.0158939

Editor: Xu Gang Xia, Department of Pathology, Anatomy & Cell Biology, Thomas Jefferson University, UNITED STATES

Received: May 27, 2016; Accepted: June 24, 2016; Published: July 14, 2016

Copyright: © 2016 Krebs, De Mesquita. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work was supported by FAPERJ and CNPq. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease that affects the upper and lower motor neurons, causing weakness, muscle atrophy, and eventually death [1]. ALS is one of the most frequent types of motor neuron diseases, with an incidence of 1–5 per 100,000, and thus, it is extensively studied [2]. The ALS onset age is usually around age 40, being juvenile ALS rare [1]. Due to the lack of an effective treatment, ALS leads to death between 2 and 5 years after diagnosis, mostly due to respiratory failure [2]. Although most ALS cases are sporadic (sALS), 5–10% are familial (fALS) and related with inherited genetic mutations. Among the previously identified ALS causative genes, the most frequently mutated ones are C9orf72, SOD1, TARDBP and FUS [3].

Recently, mutations in the hnRNPA1 gene were identified in one family with ALS and in one sporadic ALS case [4]. The hnRNPA1 gene codes for the ROA1 protein, usually referred to as hnRNPA1 as well. This heterogeneous nuclear ribonucleoprotein (hnRNP) plays a key role in mRNA metabolism, being involved in alternative splicing, nucleocytoplasmic shuttling and microRNA biogenesis [5–7]. Along with histones, hnRNPs are the most abundant proteins in the nucleus [8]. Two RNA recognition motifs, one RNA-binding box, one M9 nuclear localization signal, and a príon-like glycine-rich domain in the C-terminal part of the protein have been previously identified in hnRNPA1 [8]; however, its complete tridimensional structure has not yet been experimentally solved (Fig 1).

Download:

Fig 1. Schematic representation of the domains found on hnRNPA1.

The two RNA recognition motifs (RRM 1 and 2) are represented in blue, the glycine-rich domain is represented in purple, the RNA-binding box is represented in green, and the nuclear localization signal M9 is represented in pink. The red arrows indicate the location where the four known mutations occur: position 73 (mutation N73S), position 314 (mutations D314V and D314N) and position 319 (mutation N319S).

https://doi.org/10.1371/journal.pone.0158939.g001

The knowledge of tridimensional structures allows for a better understanding of the activity of a protein, the structure-function relationship, the interaction with other molecules, and contributes for a better comprehension of biological processes in a more detailed approach. With the advances in sequencing technology, the number of protein sequences available in online databases has grown exponentially, producing an extensive amount of data. The conventional methods of protein structure determination, such as crystallography, electron microscopy or nuclear magnetic resonance (NMR), are time consuming and expensive [9]. In this scenario, the computational approach of Bioinformatics comes as a great ally of experimental methodology. Computational—or in silico—methods are based on algorithms that can make predictions with a variety of purposes, such as predicting the effect of mutations in protein function according to the amino acid sequence, and modeling tridimensional structures in a cheaper, faster, and yet efficient way.

In this work, computational biology methods were applied, following the methodology previously described by our group [10,11], to an in silico analysis of hnRNPA1 protein, which has been described as the cause of familial Amyotrophic Lateral Sclerosis type 20, aiming for a thorough analysis of the protein structure and its natural variants, as well as the effects of structural changes in the disease development.

Materials and Methods

Sequence Retrieval

The sequence of hnRNPA1 and its natural variants were retrieved from the UNIPROT database [UniProt ID: P09651] and OMIM [OMIM ID: 164017].

SNP Analysis

Eight algorithms were used to analyze the functional effects of non-synonymous single nucleotide polymorphisms: nsSNP [12], PhD-SNP [13], Polyphen2 [14], SIFT [15], SNAP [16], SNPs&GO [17], SNP Effect [18] and PROVEAN [19].

Structural Modeling

The tridimensional structures were created based on comparative and ab initio modeling. For the comparative modeling, the following algorithms were used: IntFOLD [20], Phyre2 [21], M4T [22], SwissModel [23], PS2 [24], RaptorX [25] and Modeller [26]. For the ab initio modeling, the algorithms Rosetta [27,28] and I-TASSER [29] were used. The generated structures were then structurally aligned to the crystallographic structure of hnRNPA1 (PDB ID: 1L3K), which comprises its first 196 amino acids, using the TM-Align server [30], and the best structures were chosen according to the RMSD and TM-score values.

Structure Refinement

The selected structures were submitted to KoBaMIN, a structure refinement algorithm that performs stereochemistry correction, and energy minimization using a knowledge-based potential of mean force [31].

Structure Validation

The selected structures had their quality analyzed through the following structure validation algorithms: PROCHECK [32], Rampage [33], Qmean server [34], ProSA web [35], ERRAT [36] and Verify3D [37]. To further validate the modeled structure, its secondary structure was predicted by PsiPred [38], JuFo9D [39] and Jpred [40], and six disorder prediction algorithms were also consulted: FoldIndex [41], Disopred [42], ELM [43], DisEMBL [44], DICHOT [45] and D2P2 [46].

Conservation Analysis

The phylogenetic analysis was performed using the ConSurf algorithm [47,48], which determined the evolutionary conservation degree of each hnRNPA1 amino acid. The analysis was done using UniProt database, with a maximum of 95% of identity between sequences, and a minimum of 35% of identity for homologs.

Molecular Dynamics

The GROMACS package version 5.0.7 [49] was used for the molecular dynamics (MD) simulations of the wild type structure and the natural variants D314N, D314V, N73S and N319S. The tridimensional structures of the natural variants were generated using the Mutator plugin available in the VMD software (Version 1.9.2) [50]. The force field used was Amber99SB-ILDN [51]. The molecules were solvated in a dodecahedral box with TIP3P water molecules, and neutralized by adding Na⁺Cl^- ions. The energy minimization was carried out using steepest descent method for 5000 steps. After minimization, NVT (constant number, volume and temperature) equilibration was done, with constant temperature of 300K for 100ps, followed by NPT (constant number, pressure and temperature) equilibration, with constant pressure of 1 atm and constant temperature of 300K for 100ps. The production simulations were performed at 300K for 40ns. The algorithm LINCS (Linear Constraint Solver) was used to constrain the covalent bonds [52], and the electrostatic interactions were computed using the Particle Mesh Ewald (PME) method [53]. The MD trajectories were saved every 10ps. The stability and conformational changes in the native and the mutants were assessed through the analysis of Root-mean-square deviation (RMSD), Root-mean-square fluctuation (RMSF), Radius of gyration (Rg), Number of hydrogen bonds (Hb), Solvent accessible surface area (SASA), and B-factor. All graphs were created using the XMGrace tool [54].

Results and Discussion

Sequence and natural variants retrieval

HnRNPA1 is a 372 amino acid protein (isoform A1-B) coded by the hnRNPA1 gene, which is located on chromosome 12q13.13. There are four natural variants currently known: N73S, D314V, D314N and N319S (Fig 1).

The D314N and N319S mutations were identified in patients with Amyotrophic Lateral Sclerosis type 20, the first one in a family, and the second one in a patient with sporadic ALS, while the D314V mutation was identified in a family with inclusion body myopathy and Paget’s disease of the bone [4]. The N73S mutation has not been correlated with any diseases so far.

nsSNP Analysis

The natural variants were functionally analyzed by different algorithms that predict whether they have deleterious or neutral effect on protein function. The N73S mutation was the only one predicted as neutral by all the algorithms, while the D314V, D314N and N319S mutations were predicted as deleterious by at least two algorithms (Fig 2). The D314V mutation was predicted as deleterious by PhD-SNP, Polyphen-2, SIFT, SNAP and PROVEAN. The algorithms SIFT, SNAP and PROVEAN predicted the D314N variant as being deleterious, and the N319S variant was predicted as deleterious by PhD-SNP and SIFT (Table 1).

Download:

Fig 2. Number of “deleterious” and “neutral” predictions of each hnRNPA1 mutation.

The four known mutations were analyzed by non-synonymous single nucleotide polymorphism (nsSNP) prediction algorithms. The graph indicates how many algorithms predicted each mutation as having a deleterious effect or a neutral effect on hnRNPA1. Blue bars indicate neutral predictions, and purple bars indicate deleterious predictions.

https://doi.org/10.1371/journal.pone.0158939.g002

Download:

Table 1. Functional effect prediction of hnRNPA1 natural variants by different SNP prediction algorithms.

https://doi.org/10.1371/journal.pone.0158939.t001

The inconsistency between results shows how important it is to use more than one prediction algorithm to determine the potential effects of mutations. While most algorithms successfully predicted the D314V mutation as deleterious, 4 out of 7 algorithms failed to suggest the D314N variant’s deleterious potential, as well as 5 out of 7 algorithms failed to predict the N319S variant as deleterious, suggesting that the results obtained with the nsSNP prediction algorithms are not conclusive. The variants were further analyzed using SNP Effect, which predicts the mutations effect on aggregation tendency (TANGO), amyloid propensity (WALTZ) and chaperone binding tendency (LIMBO) (Table 2). SNP Effect results showed that aggregation tendency and chaperone binding tendency are not affected by any variant, but the D314N mutation increases amyloid propensity, while N319S decreases amyloid propensity. Mutation D314V was shown to increase amyloid propensity, corroborating the experimental findings by Shorter and Taylor [55].

Download:

Table 2. SNP Effect predictions on hnRNPA1 natural variants.

https://doi.org/10.1371/journal.pone.0158939.t002