|\/| () |-| ! T

Friday, September 11, 2009

link of NMR spectroscopy and x - ray crystallography given by HARSHA DUBEY MADAM

file:///E:/X-ray%20crystallography%20-%20Wikipedia,%20the%20free%20encyclopedia.htm

file:///E:/Protein%20nuclear%20magnetic%20resonance%20spectroscopy%20-%20Wikipedia,%20the%20free%20encyclopedia.htm

NMR spectroscopy

Protein nuclear magnetic resonance spectroscopy
From Wikipedia, the free encyclopedia
(Redirected from Protein NMR)
Jump to: navigation, search

Pacific Northwest National Laboratory's high magnetic field (800 MHz) NMR spectrometer being loaded with a sample.
Protein nuclear magnetic resonance spectroscopy (usually abbreviated protein NMR) is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins. The field was pioneered by, among others, Kurt Wüthrich, who shared the Nobel Prize in Chemistry in 2002. Protein NMR techniques are continually being used and improved in both academia and the biotech industry. Structure determination by NMR spectroscopy usually consists of several following phases, each using a separate set of highly specialized techniques. The sample is prepared, resonances are assigned, restraints are generated and a structure is calculated and validated.
Contents[hide]
1 Sample preparation
2 Data collection
3 Resonance assignment
3.1 Homonuclear nuclear magnetic resonance
3.2 Nitrogen-15 nuclear magnetic resonance
3.3 Carbon-13 and nitrogen-15 nuclear magnetic resonance
4 Restraint generation
4.1 Distance restraints
4.2 Angle restraints
4.3 Orientation restraints
5 Hydrogen-Deuterium exchange
6 Structure calculation
7 Dynamics
8 NMR spectroscopy on large proteins
9 Automation of the process
10 See also
11 References
11.1 Citations
11.2 General references
11.3 Related links
//

[edit] Sample preparation

The NMR sample is prepared in a thin walled glass tube.
Protein nuclear magnetic resonance is performed on aqueous samples of highly purified protein. Usually the sample consist of between 300 and 600 microlitres with a protein concentration in the range 0.1 – 3 millimolar. The source of the protein can be either natural or produced in an expression system using recombinant DNA techniques through genetic engineering. Recombinantly expressed proteins are usually easier to produce in sufficient quantity, and makes isotopic labelling possible.
The most abundant isotopes of carbon and oxygen, carbon-12 and oxygen-16, have no net nuclear spin, which is the physical property nuclear magnetic resonance spectroscopy exploits. The most abundant isotope of nitrogen, nitrogen-14, has a net nuclear spin of 1; however, it also has a large quadrupolar moment, a property of the atomic nuclei which prevents high-resolution information to be obtained from this isotope. Thus nuclear magnetic resonance of proteins from natural sources is restricted to utilizing nuclear magnetic resonance based solely on protons. However the less common isotopes, carbon-13 and nitrogen-15, have a net nuclear spin of 1/2, a simpler case making them suitable for nuclear magnetic resonance, and therefore labeling the proteins with these compounds opens up possibilities for doing more advanced experiments which also detect or use these nuclei. Isotopic labeling is done by growing the expression host in a growth media enriched with the desired isotopes. Since isotopically-enriched compounds remain expensive, organisms are used which are capable of growing on a defined minimal medium, containing only one carbon-13 source, usually glucose, but occasionally glycerol or methanol, and one nitrogen-15 source such as ammonium chloride or ammonium sulfate . These organisms include organisms such as Escherichia coli, which is the most frequently used type of bacteria, and Pichia pastoris, which is the most frequently used yeast.
The purified protein is usually dissolved in a buffer solution and adjusted to the desired solvent conditions. The NMR sample is prepared in a thin walled glass tube.

[edit] Data collection
Protein NMR utilizes multidimensional nuclear magnetic resonance experiments to obtain information about the protein. Ideally, each distinct nucleus in the molecule experiences a distinct chemical environment and thus has a distinct chemical shift by which it can be recognized. However, in large molecules such as proteins the number of resonances can typically be several thousand and a one-dimensional spectrum inevitably has incidental overlaps. Therefore multidimensional experiments are performed which correlate the frequencies of distinct nuclei. The additional dimensions decrease the chance of overlap and have a larger information content since they correlate signals from nuclei within a specific part of the molecule. Magnetization is transferred into the sample using pulses of electromagnetic (radiofrequency) energy and between nuclei using delays; the process is described with so-called pulse sequences. Pulse sequences allow the experimenter to investigate and select specific types of connections between nuclei. The array of nuclear magnetic resonance experiments used on proteins fall in two main categories — one where magnetization is transferred through the chemical bonds, and one where the transfer is through space, irrespective of the bonding structure. The first category is used to assign the different chemical shifts to a specific nucleus, and the second is primarily used to generate the distance restraints used in the structure calculation, and in the assignment with unlabelled protein.
Depending on the concentration of the sample, on the magnetic field of the spectrometer, and on the type of experiment, a single multidimensional nuclear magnetic resonance experiment on a protein sample may take hours or even several days to obtain suitable signal-to-noise ratio through signal averaging, and to allow for sufficient evolution of magnetization transfer through the various dimensions of the experiment. Other things being equal, higher-dimensional experiments will take longer than lower-dimensional experiments.
Typically the first experiment to be measured with an isotope-labelled protein is a 2D heteronuclear single quantum correlation (HSQC) spectrum where "heteronuclear" refers to nuclei other than 1H. In theory the heteronuclear single quantum correlation has one peak for each H bound to a heteronucleus. Thus in the 15N-HSQC one signal is expected for each amino acid residue with the exception of proline which has no amide-hydrogen due to the cyclic nature of its backbone. Tryptophan and certain other residues with N-containing sidechains also give rise to additional signals. The 15N-HSQC is often referred to as the fingerprint of a protein because each protein has a unique pattern of signal positions. Analysis of the 15N-HSQC allows researchers to evaluate whether the expected number of peaks is present and thus to identify possible problems due to multiple conformations or sample heterogeneity. The relatively quick heteronuclear single quantum correlation experiment helps determine the feasibility of doing subsequent longer, more expensive, and more elaborate experiments. It is not possible to assign peaks to specific atoms from the heteronuclear single quantum correlation alone.

[edit] Resonance assignment
In order to analyze the nuclear magnetic resonance data, it is important to get a resonance assignment for the protein. That is to find out which chemical shift in each dimension corresponds to which atom. Several different types of experiments have been invented to achieve this. The procedure depends on whether the protein is isotopically labelled or not, since a lot of the assignment experiments depend on carbon-13 and nitrogen-15.

Comparison of a COSY and TOCSY 2D spectra for an amino acid like glutamate or methionine. The TOCSY shows off diagonal crosspeaks between all protons in the spectrum, but the COSY only has crosspeaks between neighbours.

[edit] Homonuclear nuclear magnetic resonance
With unlabelled protein the usual procedure is to record a set of two dimensional homonuclear nuclear magnetic resonance experiments through correlation spectroscopy (COSY), of which several types include conventional correlation spectroscopy, total correlation spectroscopy (TOCSY) and nuclear Overhauser effect spectroscopy (NOESY).[1] A two-dimensional nuclear magnetic resonance experiment produces a two-dimensional spectrum. The units of both axes are chemical shifts. The COSY and TOCSY transfer magnetization through the chemical bonds between adjacent protons. The conventional correlation spectroscopy experiment is only able to transfer magnetization between protons on adjacent atoms, whereas in the total correlation spectroscopy experiment the protons are able to relay the magnetization, so it is transferred among all the protons that are connected by adjacent atoms. Thus in a conventional correlation spectroscopy, an alpha proton transfers magnetization to the beta protons, the beta protons transfers to the alpha and gamma protons, if any are present, then the gamma proton transfers to the beta and the delta protons, and the process continues. In total correlation spectroscopy, the alpha and all the other protons are able to transfer magnetization to the beta, gamma, delta, epsilon if they are connected by a continuous chain of protons. The continuous chain of protons are the sidechain of the individual amino acids. Thus these two experiments are used to build so called spin systems, that is build a list of resonances of the chemical shift of the peptide proton, the alpha protons and all the protons from each residue’s sidechain. Which chemical shifts corresponds to which nuclei in the spin system is determined by the conventional correlation spectroscopy connectivities and the fact that different types of protons have characteristic chemical shifts. To connect the different spinsystems in a sequential order, the nuclear Overhauser effect spectroscopy experiment has to be used. Because this experiment transfers magnetization through space, it will show crosspeaks for all protons that are close in space regardless of whether they are in the same spin system or not. The neighbouring residues are inherently close in space, so the assignments can be made by the peaks in the NOESY with other spin systems.
One important problem using homonuclear nuclear magnetic resonance is overlap between peaks. This occurs when different protons have the same or very similar chemical shifts. This problem becomes greater as the protein becomes larger, so homonuclear nuclear magnetic resonance is usually restricted to small proteins or peptides.

[edit] Nitrogen-15 nuclear magnetic resonance

Schematic of an HNCA and HNCOCA for four sequential residues. The nitrogen-15 dimension is perpendicular to the screen. Each window is focused on the nitrogen chemical shift of that amino acid. The sequential assignment is made by matching the alpha carbon chemical shifts. In the HNCA each residue sees the alpha carbon of it self and the preceding residue. The HNCOCA only sees the alpha carbon of the preceding residue.
The process of resonance assignment for a nitrogen-15 labelled sample is similar to the homonuclear case. No experiment can be performed that transfers magnetisation between two spin systems through bonds either. The main difference is the ability to record nitrogen-15 edited three dimensional experiments: TOCSY-N HSQC and NOESY-N-HSQC. These experiments build onto the HSQC experiment, but have an additional proton dimension. It can be visualised as each peak in the HSQC having the TOCSY or NOESY peaks stacked onto it. Thus if the TOCSY peak from an amide proton, HN, has a cross peak to its alpha proton, Halpha, at the coordinates (HN, Halpha) in the TOCSY spectrum, the corresponding peak would be at (HN, Halpha,N) in the TOCSY-N-HSQC. Thus it is possible to resolve overlaps in the proton dimension, if the corresponding nitrogens have chemical shifts distinct from one another.

[edit] Carbon-13 and nitrogen-15 nuclear magnetic resonance
When the protein is labelled with carbon-13 and nitrogen-15 it is possible to record an experiment that transfers magnetisation over the peptide bond, and thus connect different spin systems through bonds. This is usually done using some of the following experiments, HNCO, HNCACO, HNCA,[2] HNCOCA, HNCACB and CBCACONH. All six experiments consist of a HSQC plane expanded with a carbon dimension. In the HNCACO the spectrum contains peaks at the chemical shifts of the carbonyl carbons in the residue of the HSQC peak and the previous one in the sequence. The HNCO only contains the chemical shift from the previous residue, and it is thus possible to assign the carbonyl carbon shifts that corresponds to each HSQC peak and the one previous to that one. Sequential assignment can then be undertaken by matching the shifts of each spin system's own and previous carbons. The HNCA and HNCOCA works similarly, just with the alpha carbons rather than the carbonyls, and the HNCACB and the CBCACONH contains both the alpha carbon and the beta carbon. Usually several of these experiments are required to resolve overlap in the carbon dimension. This procedure is usually less ambiguous than the NOESY based method, since it is based on through bond transfer. In the NOESY-based methods additional peaks that are close in space but not belonging to the sequential residues will appear confusing the assignment process. When the sequential assignment has been made it is usually possible to assign the sidechains using HCCH-TOCSY, which is basically a TOCSY experiment resolved in an additional carbon dimension.

[edit] Restraint generation
In order to make structure calculations a number of experimentially determined restraints have to be generated. These fall into different categories, the most widely used is distance restraints and angle restraints.

[edit] Distance restraints
A crosspeak in a NOESY experiment signifies spatial proximity between the two nuclei in question. Thus each peak can be converted in to a maximum distance between the nuclei, usually between 1,8 and 6 angstroms. The intensity of a noesy peak is proportional to the distance to the minus 6th power, so the distance is determined according to intensity of the peak. The intensity-distance relationship is not exact, so usually a distance range is used.
It is of great importance to assign the noesy peaks to the correct nuclei based on the chemical shifts. If this task is performed manually it is usually very labor intensive, since proteins usually have thousands of noesy peaks. Some computer programs such as CYANA [3] and ARIA [4]/CNS perform this task automatically, coupled to a structure calculation.
To obtain as accurate assignments as possible it is a great advantage to have access to carbon-13 and nitrogen-15 noesy experiments, since they help to resolve overlap in the proton dimension. This leads to faster and more reliable assignments, and in turn to better structures.

[edit] Angle restraints
In addition to distance restraints, restraints on the torsion angles of the chemical bonds, typically the psi and phi angles can be generated. One approach is to use the Karplus equation, to generate angle restraints from coupling constants. Another approach uses the chemical shifts to generate angle restraints. Both methods use the fact that the geometry around the alpha carbon affects the coupling constants and chemical shifts, so given the coupling constants or the chemical shifts, a qualified guess can be made about the torsion angles.

[edit] Orientation restraints
Main article: Residual dipolar coupling

The blue arrows represent the orientation of the N - H bond of selected peptide bonds. By determining the orientation of a sufficient amount of bonds relative to the external magnetic field, the structure of the protein can be determined. From PDB record 1KBH
The analyte molecules in a sample can be partially ordered with respect to the external magnetic field of the spectrometer by manipulating the sample conditions. Common techniques include addition of bacteriophages or bicelles to the sample, or preparation of the sample in a stretched polyacrylamide gel. This creates a local environment that favours certain orientations of nonspherical molecules. Normally in solution NMR the dipolar couplings between nuclei are averaged out because of the fast tumbling of the molecule. The slight overpopulation of one orientation means that a residual dipolar coupling remains to be observed. The dipolar coupling is commonly used in solid state NMR and provides information about the relative orientation of the bond vectors relative to a single global reference frame. Typically the orientation of the N-H vector is probed in a HSQC like experiment. Initially residual dipolar couplings were used for refinement of previously determined structures, but attempts at de novo structure determination have also been made.[5]

[edit] Hydrogen-Deuterium exchange
Main article: Hydrogen-deuterium exchange
NMR spectroscopy is nuclei specific. Thus it can distinguish between hydrogen and deuterium. The amide protons in the protein exchange readily with the solvent, and if the solvent contains a different isotope, typically deuterium, the reaction can be monitored by NMR spectroscopy. How rapidly a given amide exchanges reflects its solvent accessibility. Thus amide exchange rates can give information on which parts of the protein are buried, hydrogen bonded etc. A common application is to compare the exchange of a free form versus a complex. The amides that become protected in the complex, are assumed to be in the interaction interface.

[edit] Structure calculation

Nuclear magnetic resonance structure determination generates an ensemble of structures. The structures will only converge if the data is sufficient to dictate a specific fold. In these structures, it is only the case for a part of the structure. From PDB 1SSU.
The experimentially determined restraints can be used as input for the structure calculation process. Researchers, using computer programs such as CYANA or XPLOR-NIH,[6] attempt to satisfy as many of the restraints as possible, in addition to general properties of proteins such as bond lengths and angles. The algorithms convert the restraints and the general protein properties into energy terms, and thus tries to minimize the energy. The process results in an ensemble of structures that, if the data were sufficient to dictate a certain fold, will converge.

[edit] Dynamics
In addition to structures, nuclear magnetic resonance can yield information on the dynamics of various parts of the protein. This usually involves measuring relaxation times such as T1 and T2 to determine order parameters, correlation times, and chemical exchange rates. NMR relaxation is a consequence of local fluctuating magnetic fields within a molecule. Local fluctuating magnetic fields are generated by molecular motions. In this way measurements of relaxation times can provide information of motions within a molecule on the atomic level. In NMR studies of protein dynamics the nitrogen-15 isotope is the preferred nucleus to study, because its relaxation times are relatively simple to relate to molecular motions, which however requires isotope labeling of the protein. The T1 and T2 relaxation times can be measured using various types of HSQC based experiments. The types of motions, which can be detected, are motions that occur on a time-scale ranging from about 10 picoseconds to about 10 nanoseconds. In addition slower motions, which take place on a time-scale ranging from about 10 microseconds to 100 milliseconds can also be studied. However, since nitrogen atoms are mainly found in the backbone of a protein, the results mainly reflect the motions of the backbone, which is the most rigid part of a protein molecule. Thus, the results obtained from nitrogen-15 relaxation measurements may not be representative for the whole protein. Therefore techniques utilizing relaxation measurements of carbon-13 and deuterium have recently been developed, which enables systematic studies of motions of the amino acid side chains in proteins.

[edit] NMR spectroscopy on large proteins
Traditionally nuclear magnetic resonance spectroscopy has been limited to relatively small proteins or protein domains. This is in part caused by problems resolving overlapping peaks in larger proteins, but this has been alleviated by the introduction of isotope labelling and multidimensional experiments. Another more serious problem is the fact that in large proteins the magnetization relaxes faster, which means there is less time to detect the signal. This in turn causes the peaks to become broader and weaker, and eventually disappear. Two techniques have been introduced to attenuate the relaxation: transverse relaxation optimized spectroscopy (TROSY)[7] and deuteration [8] of proteins. By using these techniques it has been possible to study proteins in complex with the 900 kDa chaperone GroES-GroEL.[9]

[edit] Automation of the process
Structure determination by NMR has traditionally been a time consuming process, requiring interactive analysis of the data by a trained scientist. There has been a considerable interest in automating the process to increase the throughput of structure determination (See structural genomics). The two most time consuming processes are the resonance assignment and the NOE assignment. Several different computer programs have been published that do this processes automatically.[3][1] Efforts have also been made to standardize the structure calculation protocol to make it quicker and more amenable to automation.[10]

Thursday, September 3, 2009

Bioinformatics Practicals

Practical 2
Date: 27-07-2009
Title
"Retrieval of protein sequence of HIV 1 Protease and predicting the length, molecular weight,
amino acid composition"
Objective
Understanding of Protein Primary database and study of retrieval and analyzing a protein
sequence
Introduction
Swissprot
Swiss-Prot is a manually curated biological database of protein sequences. Swiss-Prot was
created in 1986 by Amos Bairoch by the Swiss Institute of Bioinformatics (SIB) and the European
Bioinformatics Institute (EBI). Swiss-Prot strives to provide reliable protein sequences
associated with a high level of annotation (such as the description of the function of a protein,
its domains structure, post-translational modifications, variants, etc.), a minimal level of
redundancy and high level of integration with other databases.
Bioedit
BioEdit is biological sequence alignment software for windows. An intuitive multiple
document interfaces with convenient features makes alignment and manipulation of sequences
relatively easy for the desktop computer. By using Bioedit we can predict the protein sequence
molecular weight, length and amino acid composition.
SwissProt: P04585 (HIV-1 HXB2 POL) 99 amino acids (residues 57 to 155)
EMBL: K03455; AAB50259.1 [EMBL/GenBank/DDBJ]
Pfam: PF00558; UNKNOWN
SCOP: SSF50630 Acid protease
BLOCKS: P04585
Prosite: P04585
ProtoNet: P04585
ProtoMap: P04585
PDB: 1AAQ
Requirements for the analysis
OS: windows/Linux/Mac Os
Internet connection
Softwares Required: MS Office, Swissprot, Bioedit.
Methodology
I. Open the internet browser and use the url www.expasy,org
II. Enter the Swissprot Accession "P04585" of HIV 1 protease
III. Retrieve the Protein sequence in FASTA format
IV. Open the sequence in BIOEDIT software and analyse the length, molecular weight and
amino acid composition
V. Report the result
RESULTS
>Protease (Retropepsin) p15 (EC 3.4.23.16) - Human immunodeficiency virus
type 1 (HXB2 isolate) (HIV-1).
PQVTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLV
GPTPVNIIGRNLLTQIGCTLNF
Protein: Protease (Retropepsin) p15 (EC 3.4.23.16) - Human immunodeficiency virus type
1 (HXB2 isolate) (HIV-1).
Length = 99 amino acids
Molecular Weight = 10778.21 Daltons
Amino Acid Number Mol%
Ala A 3 3.03
Cys C 2 2.02
Asp D 4 4.04
Glu E 4 4.04
Phe F 2 2.02
Gly G 13 13.13
His H 1 1.01
Ile I 12 12.12
Lys K 6 6.06
Leu L 12 12.12
Met M 2 2.02
Asn N 3 3.03
Pro P 6 6.06
Gln Q 6 6.06
Arg R 4 4.04
Ser S 1 1.01
Thr T 8 8.08
Val V 7 7.07
Trp W 2 2.02
Tyr Y 1 1.01
Practical 3
Date: 29- 07-2008
Title
"Prediction of protein primary structure of HIV 1 Protease by ProtParam"
Objective
To study the protein primary structure of the protein for the given sequence
Introduction
ProtParam
ProtParam is a tool which allows the computation of various physical and chemical parameters
for a given protein stored in Swiss-Prot or TrEMBL or for a user entered sequence. The
computed parameters include the molecular weight, theoretical pI, amino acid composition,
atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index
and grand average of hydropathicity
Methodology
I. Open internet browser and use the url http://www.expasy.org/tools/protparam.html
II. Copy and past the given sequence in the text box displayed in protparam tool
III. Click on compute parameters
IV. Report the results
RESULTS
Number of amino acids: 99
Molecular weight: 10778.7
Theoretical pI: 8.83
Total number of negatively charged residues (Asp + Glu): 8
Total number of positively charged residues (Arg + Lys): 10
Atomic composition:
Carbon C 488
Hydrogen H 802
Nitrogen N 130
Oxygen O 135
Sulfur S 4
Formula: C488H802N130O135S4
Total number of atoms: 1559
Practical 4
Date: 30-07-2009
Title
"Prediction of protein Secondary structure of HIV 1 Protease by Chou-Fasman"
Objective
To study the alpha helix, beeta sheets, coils and truns of HIV 1 Protease.
Introduction
Secondary Structure Prediction
Secondary structure in proteins consists of local inter-residue interactions mediated by
hydrogen bonds, or not. The most common secondary structures are alpha helices and beta
sheets. Other helices, such as the 310 helix and ð helix, are calculated to have energetically
favorable hydrogen-bonding patterns but are rarely if ever observed in natural proteins except
at the ends of á helices due to unfavorable backbone packing in the center of the helix.
Secondary structure of protein is mainly study of Alpha helix regions, Beeta sheets, coils and
turns. For predicting the secondary structure we have used an online server called SOPMA.
Chou-Fasman method
The Chou-Fasman method are an empirical technique for the prediction of secondary structures
in proteins, originally developed in the 1970s. The method is based on analyses of the relative
frequencies of each amino acid in alpha helices, beta sheets, and turns based on known protein
structures solved with X-ray crystallography. From these frequencies a set of probability
parameters were derived for the appearance of each amino acid in each secondary structure
type, and these parameters are used to predict the probability that a given sequence of amino
acids would form a helix, a beta strand, or a turn in a protein. The method is at most about 50-
60% accurate in identifying correct secondary structures, which is significantly less accurate
than the GOR method or modern machine learning-based techniques.
http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi?rm=misc1
Methodology
I. Open internet browser and use the url of Chou-Fasman online server
http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi?rm=misc1
II. Copy and paste the given sequence in the text box displayed in Chou-Fasman online tool
III. Click on SUBMIT
IV. Report the results
RESULTS
Practical 5
Date: 04 – 08 - 2008
Title
"Prediction of protein Secondary structure of HIV 1 Protease by SOPMA"
Objective
To study the alpha helix, beeta sheets, coils and truns of HIV 1 Protease.
INTRODUCTION
Secondary Structure Prediction
Secondary structure in proteins consists of local inter-residue interactions mediated by
hydrogen bonds, or not. The most common secondary structures are alpha helices and beta
sheets. Other helices, such as the 310 helix and ð helix, are calculated to have energetically
favorable hydrogen-bonding patterns but are rarely if ever observed in natural proteins except
at the ends of á helices due to unfavorable backbone packing in the center of the helix.
Secondary structure of protein is mainly study of Alpha helix regions, Beeta sheets, coils and
turns. For predicting the secondary structure we have used an online server called SOPMA.
SOPMA
SOPMA (Self-Optimized Prediction Method with Alignment) is an improvement of SOPM
method. These methods are based on the homologue method of Levin et al.. The improvent
takes place in the fact that SOPMA takes into account information from an alignment of
sequences belonging to the same family. SOPMA (Self-Optimized Prediction Method with
Alignment) is an improvement of SOPM method. These methods are based on the homologue
method of Levin et al.. This online server is available with swissprot. Following is the url to get
the server.
http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html
Methodology
V. Open internet browser and use the url http://npsa-pbil.ibcp.fr/cgibin/
npsa_automat.pl?page=npsa_sopma.html
VI. Copy and paste the given sequence in the text box displayed in SOPMA tool
VII. Click on SUBMIT
VIII. Report the results
RESULTS

Monday, August 31, 2009

bioinformatics noTES

http://www.ccbb.pitt.edu/BBSI/2007/lab/structural%20%20modelling.pdf
http://depts.washington.edu/bakerpg/papers/Chivian-MBA-v44-p547.pdf

these are two links by clicking on it u all can get notes of bioinformatics unit 4 given by madam

Notes for Ab initio method given by madam :-

27AB INITIO METHODSDylan Chivian, Timothy Robertson, Richard Bonneau, and David BakerAb initio structure prediction seeks to predict the native conformation of a proteinfrom the amino acid sequence alone. Such attempts are both a fundamental test of ourunderstanding of protein folding, and an important practical challenge in this era oflarge scale genome sequencing projects, which are producing large numbers of proteinsequences for which no three-dimensional structural information is available.Anfinsen showed forty years ago that all of the information necessary for a proteinto fold to the native state resides in the protein’s amino acid sequence (Anfinsen et al.,1961; Anfinsen, 1973). In the absence of large kinetic barriers in the free energy landscape,Anfinsen’s results and those of large numbers of researchers in the interveningyears suggest that the native conformations of most proteins are the lowest free energyconformations for their sequences (for a description of some notable exceptions, seeBaker and Agard, 1994).Successful structure prediction requires a free energy function sufficiently close tothe true potential for the native state to be at one of the lowest free energy minima,as well as a method for searching conformational space for low energy minima. Abinitio structure prediction is challenging because current potential functions have limitedaccuracy, and the conformational space to be searched is vast. Many methods usereduced representations, simplified potentials, and coarse search strategies in recognitionof this resolution limit (Simons et al., 1997; Samudrala et al., 1999; Ortiz et al.,1999; Pillardy et al., 2001). Encouragingly, these simplified methods are starting toshow some success in protein structure prediction (Murzin, 2001; Lesk, Lo Conte, andHubbard, 2001) and have advanced to the point where genome scale modeling maybecome useful.Structural BioinformaticsEdited by Philip E. Bourne and Helge WeissigCopyright  2003 by Wiley-Liss, Inc.547548 AB INITIO METHODSREPRESENTATIONS OF THE POLYPEPTIDE CHAINThe most detailed representations include all atoms of the protein and the surroundingsolvent molecules. However, representing this large number of atoms and the interactionsbetween them is quite computationally expensive, and it is not clear that this levelof detail is necessary during the phase of the search far from the native conformation.To streamline the calculations, representations can be simplified in a variety ofways. The use of explicit solvent molecules is usually replaced by employing implicitsolvent models. United atom representations are frequently used in which hydrogensare drawn into their base carbon, oxygen, and nitrogen atoms. Side chains can berepresented using a limited set of conformations (Dunbrack and Karplus, 1994) that arefound to be prevalent in structures from the Protein Data Bank (PDB; see Chapter 9),without any great loss in predictive ability. Alternatively, side-chain atoms can bereplaced entirely by locating the side-chain properties at either the centroid of the sidechain or at the beta carbon (Simons et al., 1997), which amounts to averaging over theside-chain degrees of freedom and permits a significant performance enhancement atthe loss of some degree of specificity.The size of the conformational space to be searched can be further reduced byrestricting the conformations available to the polypeptide backbone. Certain torsionangle pairs are preferred by amino acids in particular local structures (Marqusee, Robbins,and Baldwin, 1989; Blanco, Rivas, and Serrano, 1994; Callihan and Logan, 1999).One may restrict the torsion angles to discrete values commonly seen in known structures,either by utilization of a small set of phi–psi pairs (Park and Levitt, 1995), byselecting pairs from an ideal set based on predicted regular secondary structure, or bythe use of fragments from known protein structures (Sippl, Hendlich, and Lackner,1992; Bowie and Eisenberg, 1994; Jones, 1997; Simons et al., 1997).A method developed by our group that builds structures from protein fragments,called Rosetta (examples of Rosetta predictions in Critical Assessment of StructurePrediction 4 (CASP4) are shown in Figure 27.1), is based on a model of folding inwhich short segments of the protein chain flicker between different local structures,consistent with their local sequence, and folding to the native state occurs when theselocal segments are oriented such that low free energy interactions are made throughoutthe protein (Simons et al., 1997). In simulating this process, it is assumed that theensemble of local structures sampled by a given sequence segment during folding isroughly approximated by the distribution of local structures sampled by that sequencesegment in native protein structures. A list of possible conformations is extractedfrom experimental structures for each nine residue segments of the chain, and proteintertiary structures are assembled by searching through the combinations of these shortfragments for conformations that have buried hydrophobic residues, paired beta strands,and other low free energy features of native proteins. This strategy resolves some of thetypical problems with both the conformational search and the free energy function: Thesearch is greatly accelerated as switching between different possible local structurescan occur in a single Monte Carlo step, and less demands are placed on the free energyfunction since local interactions are accounted for in the fragment libraries.In the most simplified models, entire segments of contiguous regular secondarystructure are represented as rigid bodies, allowing only freedom at the junctions (Eyrich,Standley, and Friesner, 1999). Such methods perform searches of probable arrangementsof the elements, thus significantly decreasing the conformational search. However,such representations lack enough detail to allow for more subtle features such asstrand twist and do not accommodate packing issues well.POTENTIAL FUNCTIONS 549Secreted frizzled protein 3 (1IJX)PPase (1I74), domain 2 MutS (1EWQ), domain 1native prediction native prediction13.8 11.1Ribosome Binding Factor A (1KKG)native prediction native prediction10.1 11.0Hypothetical Protein HI0442 (1J8B)6.9 7.2ERp29 C-terminal domain (1G7D)predictionpredictionnative nativeFigure 27.1. Examples of ROSETTA structure predictions from CASP4 (see Chapter 24).Native/prediction pairs are shown left-to-right, except for 1J8B and 1IJX, which are displayedas a superposition of native and predicted structures. Values indicate Calpha root-mean-square(rms) deviations between native and predicted structures, in angstroms. Colors represent positionalong the chain from blue (N terminus) to red (C terminus). Figure also appears in Color Figuresection.An alternative model with a long history is that of the lattice representation, inwhich residues are restricted to points on a regular three-dimensional lattice, withresidues proximal in sequence occupying adjacent lattice points (Skolnick and Kolinski,1991; Hinds and Levitt, 1994; Dill et al. 1995; Ishikawa, Yue, and Dill, 1999). Suchmethods allow for very fast sampling of conformational space, but are limited intheir ability to represent some of the finer details of backbone conformations (Revaet al., 1996).POTENTIAL FUNCTIONSThere are two categories of potentials that may be employed in evaluating the freeenergy of the peptide chain and the surrounding solvent. Molecular mechanics potentialsseek to model the forces that determine protein conformation using physically550 AB INITIO METHODSbased functional forms parameterized from small molecule data or in vacuo quantummechanical (QM) calculations. For example, van der Waals interactions are usuallyrepresented using a standard 6–12 potential with parameters derived from simpleliquids, whereas electrostatic interactions are modeled using Coulomb’s law with partialcharges derived from QM calculations on peptide substructures or from chemicalintuition. In contrast, protein structure-derived potentials or scoring functions are empiricallyderived from experimental structures from the PDB (Sippl, 1995; Koppensteinerand Sippl, 1998). Usually a functional form is not specified and instead pseudoenergiesare obtained by taking the logarithm of probability distribution functions. Suchstructure-derived potentials are particularly useful in conjunction with reduced complexitymodels, where they may be viewed as representing the interactions between, forexample, side-chain centroids after averaging over all plausible positions of the atomsnot represented (Kocher, Rooman, and Wodak, 1994). Such potentials are also usefulin treating aspects of protein thermodynamics, particularly the hydrophobic effect, thatare not completely understood.Both classes of potentials must represent the forces that determine macromolecularconformation: solvation, electrostatic interactions including hydrogen bonds and ionpairs, Van der Waals interactions, and, in certain cases, covalent bonds (Park, Huang,and Levitt, 1997). Additionally, they must be applicable at a granularity that is inkeeping with that of the representation selected and the target resolution of the method.SEARCH METHODSIn searching, as in selecting the appropriate level of detail in the representation andin the potential, one must choose the granularity of the search based on the resolutiondesired from the method. Molecular dynamics directly integrates Newton’s equationsof motion to derive the motion of a molecule in a given potential. However, thevery small step size required for numerical stability makes molecular dynamics withfull atom representation of protein and solvent impractical for de novo generation oflow-resolution models.To accelerate conformational searching, one must employ techniques that permitcoarse sampling of the energy landscape. A variety of methods may be used in conjunctionwith reduced complexity models and simplified potentials to perform broadsearches through low-resolution structures, including Metropolis Monte Carlo simulatedannealing (Simons et al., 1997), simulated tempering (Hansmann and Okamoto,1997), evolutionary algorithms (Bowie and Eisenberg, 1994), and genetic algorithms(Pedersen and Moult, 1997). Individual moves in these procedures can involve quitelarge perturbations, and allow much more rapid (and more coarse) sampling of conformationalspace in a relatively short time. For example, simple torsion space MonteCarlo procedures involve changing the backbone torsion angles of one or a smallnumber of residues by several degrees, which can produce quite large changes inthe Cartesian coordinates of the protein. Fragment insertion-based procedures (seeabove) can speed sampling by allowing jumps between different local structures in asingle step.A single search is unlikely to find the global minimum of the free energy landscape,and may instead yield a structure that has become trapped in a local minimum.In an effort to correct for this possibility, many current methods perform numerousconformational searches, generating an ensemble of candidate structures. NumerousAPPLICATIONS 551techniques have been used to select those structures most likely to be close to thenative from the ensemble (Park and Levitt, 1996; Huang et al., 1996; Samudrala andMoult, 1998), and future insights into features of native protein structures and propertiesof near-native ensembles will undoubtedly add to the arsenal of methods ofselecting the most nativelike structures. Ultimately, improvements in potential functionsmay make identification of the most accurate models a straightforward procedureof selecting those conformations possessing the lowest free energy (Vorobjev, Almagro,and Hermans, 1998; Lazaridis and Karplus, 1999; Rapp and Friesner, 1999; Petrey andHonig, 2000; Lee et al., 2001). It is possible that improved energy functions for discriminationwill ultimately involve a fusion of molecular mechanics-based and proteindatabase-derived potentials.APPLICATIONSGenome functional annotation and structural genomics initiatives are two areasof research where ab initio protein structure prediction could make importantcontributions.Genome AnnotationWhile traditionally genome annotation has been accomplished using sequence-similaritysearch tools, many factors reduce the ability of sequence homology to identify distanthomologs (Russell and Pontig, 1998). Domain insertions, circular permutations,exchange of secondary structure elements, and genetic drift all contribute to the divergenceof functionally related proteins over time. Thus, the annotation of open readingframes lacking detectable sequence homology to proteins of known function representsa promising application for ab initio models. Low-resolution ab initio predicted structuresmay be able to reveal structural and functional relationships between proteinsnot apparent from sequence similarity alone. This concept is well illustrated by someexamples of predictions from CASP4. In the first examples (Figs. 27.2a and 27.2b), thepredicted structures were each found to be structurally related to a protein with a similarfunction, but no significant sequence similarity. In the second example (Fig. 27.3),functionally important residues were found clustered in the predicted structures. Inboth cases, some of the most important insights into these proteins’ function couldhave been obtained from the predicted structures alone.Structural similarities like these may be detected using several different methods.First, predicted structures may be compared against the PDB, using a generalstructure–structure comparison tool (Chapter 16). Recent experiments have found significantmatches of ab initio predictions to structural homologs of the native structuresfor a variety of sequences, suggesting that current techniques may be sufficient todetect evolutionarily distant functional homologies in this manner (Simons, Strauss,and Baker, 2001; Bonneau et al., 2002, see also Chapter 20).Second, ab initio structures could be probed for the presence of residues adoptingconserved geometric motifs (e.g., serine protease catalytic triads). While thisapproach has been applied to ab initio models with some success (Fetrow and Skolnick,1998a, Fetrow, et al., 1998b), it remains unclear how to best apply the techniqueto low-resolution structures. In particular, some question remains as to howambiguous structural motifs must be in order to detect homologies in low-resolutionmodels.552 AB INITIO METHODSnative prediction homolog (1NKL)(a)native prediction homolog (1B7E)(b)Figure 27.2. Potential of ab initio predcitions to detect distant protein homologies. (a) Thenative structure of bacterial-lysis protein Bacteriocin AS-48 (left, PDB id 1E68) is compared to thebest ROSETTA prediction for the structure (center), and the native structure of NK-Lysin (right,PDB id 1NKL), a functionally similar protein. (b) The native structure of domain 2 of the DNAmismatch repair protein MutS (left, PDB id 1EWQ), is compared to the best ROSETTA predictionfor the domain (center), and a domain from the native structure of the Tn5 transposase inhibitor(right, PDB id 1B7E). In both (a) and (b) the ab initio models of the proteins were of sufficientquality to detect these functional homologs by the similarity of the folds in the absence ofsignificant sequence similarity. Figure also appears in Color Figure section.Third, predicted structures could be used to improve the sensitivity and reliablityof matches to sequence-based motif libraries, such as the PROSITE database (Bucherand Bairoch, 1994). Previous work has shown that weak matches to functional motifpatterns may be filtered effectively by requiring similarity between the structures ofpattern matches and the known structural environments of particular motifs (Jonassenet al., 2000). Therefore, it seems possible that ab initio models could provide thisstructural information when high-resolution structures are unavailable.Structural Genomics InitiativesStructural genomics initiatives present a second opportunity for the application of abinitio methods in several ways. First, ab initio structure prediction can help guide targetFUTURE WORK 553native predictionFigure 27.3. An example of active-site conservation in ab initio models. The ROSETTA predictedstructure of domain 1 from an inorganic pyrophosphatase from Streptococcusmutans is comparedto the corresponding domain in the native structure (PDB id 1I74). Strongly conserved active siteresidues are rendered as spheres along the backbone. Note the similar relative orientation ofthese residues in the native and predicted structures, implying that ab initio models may besufficient to detect functional homologies using methods that search for functionally significantresidue arrangements. Figure also appears in Color Figure section.selection by focusing experimental structure determination on those proteins likely toadopt novel folds or to be of particular biological importance.Second, although homology modeling methods have been applied on a genomicscale (Sanchez and Sali, 1998, Sanchez and Sali, 1999), these approaches are inherentlylimited by their need for at least one homolog of known structure with good coverageand sufficient sequence similarity to be structurally equivalent (Marti-Renom et al.;see also Chapter 25). Homologs of this quality are not always available, and thereforehomology methods tend to leave significant fractions of both sequences and genomesimproperly modeled. Ab initio techniques do not face this limitation, and thus may bea valuable adjunct to homology methods, filling in structural gaps and producing muchmore complete sets of models than could be obtained by either technique alone.Third, even small amounts of experimental data can dramatically improve thequality and reliability of ab initio structure prediction with the application of spatialconstraints. For example, the Rosetta method can produce moderate- to high-resolutionstructures when combined with limited NMR constraints (Standley et al., 1999; Bowers,Strauss, and Baker, 2000; Rohl and Baker, 2002). In addition, other sources ofexperimental data such as chemical cross-linking experiments could be used, allowingrapid structure determination for proteins not readily amenable to X-ray or NMRanalysis (e.g., membrane-bound proteins). Ab initio structure prediction may thereforebe useful for increasing the speed of structure determination, which is particularlyimportant for structural genomics.FUTURE WORKWhat are the prospects for improvement in ab initio protein structure prediction methods?Improvement in potential functions should permit the generation of more preciseand accurate structures. All atom potentials in particular seem promising for the refinementof low-resolution models. Additionally, more detailed structures may require554 AB INITIO METHODSbetter fine search strategies. Even for coarse models, the sampling rate of proteinconformational space has been a limitation, as demonstrated by the tendency of abinitio models to adopt low contact order conformations (Plaxco, Simons, and Baker,1998). Correcting for this contact order bias through focused sampling of higher-orderconformations will require significantly more computational resources, but is likely toimprove the prediction of larger, more complicated proteins. Ideally, the developmentof search strategies that do not face this local-contact bias would provide a boost toab initio methods.Ab initio protein structure prediction has traditionally been an area of primarilyacademic interest, attaining only slow progress. Recently, however, there have beensignificant advancements in the field. There is hope that ab initio methods will continueto improve, and that this improvement will provide both fundamental insightsinto the physics underlying protein folding and a valuable, practical resource forgenome analysis.FURTHER READINGCASP3 (1999): Results from the Comparative Assessment of Techniques for Protein StructurePrediction. Proteins 37(S3):149–208.CASP4 (Forthcoming) Results from the Comparative Assessment of Techniques for ProteinStructure Prediction. Proteins 45(S5):98–162.Chothia C (1984): Principles that determine the structure of proteins. Ann Rev Biochem53:537–72.Kabsch W, Sander C (1984): On the use of sequence homologies to predict protein structure:identical pentapeptides can have completely different conformations. Proc Natl Acad Sci USA81:1075–8.Lazaridis T, Karplus M (2000): Effective energy functions for protein structure prediction. CurrOpin Struct Biol 10:139–45.Simons KT, Strauss C, Baker D, (2001): Prospects for ab initio protein structural genomics. JMol Biol 306:1191–9.Sippl MJ (1995): Knowledge-based potentials for proteins. Curr Opin Struct Biol 5:229–35.Wallace AC, Borkakoti N, Thornton JM (1997): TESS: a geometric hashing algorithm forderiving 3D coordinate templates for searching structural databases. Application to enzymeactive sites. Protein Sci 6:2308–23.REFERENCESAnfinsen CB (1973): Principles that govern the folding of protein chains. Science 181:223–30.Anfinsen CB, Haber E, Sela M, White FW Jr (1961): The kinetics of the formation of nativeribonuclease during oxidation of the reduced polypeptide domain. Proc Natl Acad Sci USA47:1309–14.Baker D, Agard DA (1994): Kinetics versus thermodynamics in protein folding. Biochemistry33:7505–9.Blanco FJ, Rivas G, Serrano L (1994): A short linear peptide that folds into a native stablebeta-hairpin in aqueous solution. Nat Struct Biol 1:584–90.Bonneau R, Strauss CE, Rohl CA, Chivian D, Bradley P, Malonstrom L, Robertson T, Baker D(2002): De novo prediction of three-dimensional structures for major protein families. J MolBiol 322:65–78.REFERENCES 555Bowers PM, Strauss CE, Baker D (2000): De novo protein structure determination using sparseNMR data. J Biomol NMR 18:311–8.Bowie JU, Eisenberg D (1994): An evolutionary approach to folding small alpha-helical proteinsthat uses sequence information and an empirical guiding fitness function. Proc Natl Acad SciUSA 91:4436–40.Bucher P, Bairoch A (1994): A generalized profile syntax for biomolecular sequence motifs andits function in automatic sequence interpretation. Proc Int Conf Intell Syst Mol Biol 2:53–61.Callihan DE, Logan TM (1999): Conformations of peptide fragments from the FK506 bindingprotein: comparison with the native and urea-unfolded states. J Mol Biol 285:2161–75.Dann CE, Hsieh JC, Rattner A, Sharma D, Nathans J, Leahy DJ (2001): Insights into Wntbinding and signaling from the structures of two frizzled cysteine-rich domains. Nature12:86–90.Davies DR, Braem LM, Reznikoff WS, Rayment I (1999): The three-dimensional structure of aTn5 transposase-related protein determined to 2.9- ° A resolution. J Biol Chem 274:11904–13.Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS (1995): Principles ofprotein folding—a perspective from simple exact models. Protein Sci 4:561–602.Dunbrack RL Jr, Karplus M (1994): Conformational analysis of the backbone-dependentrotamer preferences of protein sidechains. Nat Struct Biol 1:334–40.Eyrich VA, Standley DM, Friesner RA (1999): Prediction of protein tertiary structure to lowresolution: performance for a large and structurally diverse test set. J Mol Biol 288:725–42.Fetrow JS, Skolnick J (1998a): Method for prediction of protein function from sequence usingthe sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxinsand T1 ribonucleases. J Mol Biol 281:949–68.Fetrow JS, Godzik A, Skolnick J (1998b): Functional analysis of the Escherichia coli genomeusing the sequence-to-structure-to-function paradigm: identification of proteins exhibiting theglutaredoxin/thioredoxin disulfide oxidoreductase activity. J Mol Biol 282:703–11.Gonzalez C, Langdon G, Bruix M, Galvez A, Valdivia E, Maqueda M, Rico M (2000):Bacteriocin AS-48, a microbial cyclic polypeptide structurally and functionally related tomammalian NK-lysin. Proc Nat Acad Sci 97:11221–6.Hansmann UH, Okamoto Y (1997): Numerical comparisons of three recently proposedalgorithms in the protein folding problem. J Comput Chem 18:920–33.Hinds DA, Levitt M (1994): Exploring conformational space with a simple lattice model forprotein structure. J Mol Biol 243:668–82.Huang ES, Subbiah S, Tsai J, Levitt M (1996): Using a hydrophobic contact potential to evaluatenative and near-native folds generated by molecular dynamics simulations. J Mol Biol257:716–25.Huang YJ, Swapna GV, Shukla K, Ke H, Xia B, Inovye M, Montalione GT (Forthcoming).Ishikawa K, Yue K, Dill KA (1999): Predicting the structures of 18 peptides using Geocore.Protein Sci 8:716–21.Jonassen I, Eidhammer I, Grindhaug SH, Taylor WR (2000): Searching the protein structuredatabank with weak sequence patterns and structural constraints. J Mol Biol 304:599–619.Jones DT (1997): Successful ab initio prediction of the tertiary structure of NK-lysin usingmultiple sequences and recognized supersecondary structural motifs. Proteins 29(S1):185–91.Kocher JP, Rooman MJ, Wodak SJ (1994): Factors influencing the ability of knowledge-basedpotentials to identify native sequence-structure matches. J Mol Biol 235:1598–613.Koppensteiner WA, Sippl MJ (1998): Knowledge-based potentials—back to the roots.Biochemistry (Mosc) 63:247–52.Lazaridis T, Karplus M (1999): Discrimination of the native from misfolded protein modelswith an energy function including implicit solvation. J Mol Biol 288:477–87.556 AB INITIO METHODSLee MR, Tsai J, Baker D, Kollman PA (2001): Molecular dynamics in the endgame of proteinstructure prediction. J Mol Biol 313:417–30.Lesk AM, Lo Conte L, Hubbard T (2001): Assessment of novel fold targets in CASP4:predictions of three-dimensional structures, secondary structures, and interresidue contacts.Proteins 45(S5):98–118.Liepinsh E, Andersson M, Roysschaert JM, otting G (1997): Saposin fold revealed by the NMRstructure of NK-lysin. Nat Struct Biol 4:793–5.Liepinsh E, Barishev M, Shapiro A, Ingelman-Sundberg M, Otting G, Mkrtchian S (2001):Thioredoxin fold as a homodimerization module in the potative chaperone Erp29: NMRstructures of the domains and experimental model of the 51 kDa dimer. Structure 9:457–71.Lim K, Tempcyzk A, Toedt J, Parsons J, Howard A, Eisenstein E, Herzberg O (Forthcoming).Marqusee S, Robbins VH, Baldwin RL (1989): Unusually stable helix formation in shortalanine-based peptides. Proc Natl Acad Sci USA 86:5286–90.Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A (2000): Comparative proteinstructure modeling of genes and genomes. Ann Rev Biophys Biomol Struct 29:291–325.Merckel MC, Fabrichniy IP, Salminen A, Kalkkinen N, Baykov AA, Lahti R, Goldman A(2001): Crystal structure of Streptococcus mutans pyrophosphatase: a new fold for an oldmechanism. Structure 9:289–97.Murzin AG (2001): Progress in protein structure prediction. Nat Struct Biol 8:110–2.Obmolova G, Ban C, Hsieh P, Yang W (2000): Crystal structures of mismatch repair proteinMutS and its complex with a substrate DNA. Nature 407:703–10.Ortiz AR, Kolinski A, Rotkiewicz P, Ilkowski B, Skolnick J (1999): Ab initio folding ofproteins using restraints derived from evolutionary information. Proteins 37(S3):177–85.Park BH, Levitt M (1995): The complexity and accuracy of discrete state models of proteinstructure. J Mol Biol 249:493–507.Park B, Levitt M (1996): Energy functions that discriminate X-ray and near native folds fromwell-constructed decoys. J Mol Biol 258:367–92.Park BH, Huang ES, Levitt M (1997): Factors affecting the ability of energy functions todiscriminate correct from incorrect folds. J Mol Biol 266:831–46.Pedersen JT, Moult J (1997): Protein folding simulations with genetic algorithms and a detailedmolecular description. J Mol Biol 269:240–59.Petrey D, Honig B (2000): Free energy determinants of tertiary structure and the evaluation ofprotein models. Protein Sci 9:2181–91.Pillardy J, Czaplewski C, Liwo A, Lee J, Ripoll DR, Kazmierkiewicz R, Oldziej S, WedemeyerWJ, Gibson KD, Arnautova YA, Saunders J, Ye YJ, Sheraga HA (2001): Recentimprovements in prediction of protein structure by global optimization of a potential energyfunction. Proc Natl Acad Sci USA 98:2329–33.Plaxco KW, Simons KT, Baker D (1998): Contact order, transition state placement and therefolding rates of single domain proteins. J Mol Biol 277:985–94.Rapp CS, Friesner RA (1999): Prediction of loop geometries using a generalized born model ofsolvation effects. Proteins 35:173–83.Reva BA, Finkelstein AV, Sanner MF, Olson AJ (1996): Adjusting potential energy functionsfor lattice models of chain molecules. Proteins 25:379–88.Rohl CA, Baker D (2002): De novo determination of protein backbone structure from residualdipolar couplings using Rosetta. J Am Chem Soc 124:2723–9.Russell RB, Ponting CP (1998): Protein fold irregularities that hinder sequence analysis. CurrOpin Struct Biol 8:364–71.Samudrala R, Moult J (1998): An all-atom distance-dependent conditional probabilitydiscriminatory function for protein structure prediction. J Mol Biol 275:895–916.REFERENCES 557Samudrala R, Xia Y, Huang E, Levitt M (1999): Ab initio protein structure prediction using acombined hierarchical approach. Proteins 37(S3):194–8.Sanchez R, Sali A (1998): Large-scale protein structure modeling of the Saccharomycescerevisiae genome. Proc Natl Acad Sci USA 95:13597–602.Sanchez R, Sali A (1999): Comparative protein structure modeling in genomics. J Comp Phys151:388–401.Simons KT, Kooperberg C, Huang E, Baker D (1997): Assembly of protein tertiary structuresfrom fragments with similar local sequences using simulated annealing and Bayesian scoringfunctions. J Mol Biol 268:209–25.Simons KT, Strauss C, Baker D (2001): Prospects for ab initio protein structural genomics. JMol Biol 306:1191–9.Sippl MJ (1995): Knowledge-based potentials for proteins. Curr Opin Struct Biol 5:229–35.Sippl MJ, Hendlich M, Lackner P (1992): Assembly of polypeptide and protein backboneconformations from low energy ensembles of short fragments: development of strategies andconstruction of models for myoglobin, lysozyme, and thymosin beta 4. Protein Sci 1:625–40.Skolnick J, Kolinski A (1991): Dynamic Monte Carlo simulations of a new lattice model ofglobular protein folding, structure and dynamics. J Mol Biol 221:499–531.Standley DM, Eyrich VA, Felts AK, Friesner RA, McDermott AE (1999): A branch andbound algorithm for protein structure refinement from sparse NMR data sets. J Mol Biol285:1691–710.Vorobjev YN, Almagro JC, Hermans J (1998): Discrimination between native and intentionallymisfolded conformations of proteins: ES/IS, a new method for calculating conformationalfree energy that uses both dynamics simulations with an explicit solvent and an implicitsolvent continuum model. Proteins 32:399–413.

Wednesday, August 26, 2009

basic local aligment search tool

INTRODUCTION

The BLAST algorithm was developed as a way to perform DNA andprotein sequence similarity searches by an algorithm that isfaster than FASTA but considered to be equally as sensitive.Both of these methods follow a heuristic (tried-and-true) methodthat almost always works to find related sequences in a databasesearch, but does not have the underlying guarantee of an optimalsolution like the dynamic programming algorithm. FASTA findsshort common patterns in query and database sequences and joinsthese into an alignment. BLAST is similar to FASTA, but gainsa further increase in speed by searching only for rarer, moresignificant patterns in nucleic acid and protein sequences.BLAST is very popular due to its availability on the World WideWeb through a large server at the National Center for BiotechnologyInformation (NCBI) and at many other sites. The BLAST algorithmhas evolved to provide molecular biologists with a set of verypowerful search tools that are freely available to run on manycomputer platforms. This article is intended to be a "user’sguide" to the principles underlying BLAST

database and tool 4 modeling

Table 1: Common uses of comparative protein structure models. A list of our papers using MODELLER to address practical problems in collaboration with experimentalists can be obtained at URL http://guitar.rockefeller.edu/publications/ref/ref.html.

Designing (site-directed) mutants to test hypotheses about function

Identifying active and binding sites

Searching for ligands of a given binding site

Designing and improving ligands of a given binding site

Modeling substrate specificity

Predicting antigenic epitopes

Protein-protein docking simulations

Inferring function from calculated electrostatic potential around the protein

Molecular replacement in X-ray structure refinement

Refining models against NMR dipolar coupling data

Testing a given sequence - structure alignment

Rationalizing known experimental observations

Planning new experiments

Table 2: Web sites useful for comparative modeling.

Databases
NCBI	www.ncbi.nlm.nih.gov/
PDB	www.rcsb.org/
MSD	www.rcsb.org/databases.html
CATH	www.biochem.ucl.ac.uk/bsm/cath/
TrEMBL	srs.ebi.ac.uk/
SCOP	scop.mrc-lmb.cam.ac.uk/scop/
PRESAGE	presage.stanford.edu
MODBASE	guitar.rockefeller.edu/modbase/
GeneCensus	bioinfo.mbb.yale.edu/genome
GeneBank	www.ncbi.nlm.nih.gov/Genbank/GenbankSearch.html
PSI	www.structuralgenomics.org
Template search, fold assignment
PDB-Blast	bioinformatics.burnham-inst.orgpdb_blast
BLAST	www.ncbi.nlm.nih.gov/BLAST/
FastA	www.dna.affrc.go.jp/htdocs/Blast/fasta.html
DALI	www2.ebi.ac.uk/dali/
PhD, TOPITS	www.embl-heidelberg.de/predictprotein/predictprotein.html
THREADER	insulin.brunel.ac.uk/
123D	genomic.sanger.ac.uk/123D/run123D.html
UCLA-DOE	www.doe-mbi.ucla.edu/people/frsvr/frsvr.html
PROFIT	lore.came.sbg.ac.at/
MATCHMAKER	www.tripos.com/software/mm.html
3D-PSSM	www.bmm.icnet.uk/ 3dpssm/html/ffrecog.html
BIOINGBGU	www.cs.bgu.ac.il/ bioinbgu/
FUGUE	www-cryst.bioc.cam.ac.uk/ fugue
LOOPP	ser-loopp.tc.cornell.edu/loopp.html
FASS	bioinformatics.burnham-inst.org/FFAS/index.html
SAM-T99/T98	www.cse.ucsc.edu/research/compbio/sam.html

Comparative modeling
3D-JIGSAW	www.bmm.icnet.uk/servers/3djigsaw/
CPH-Models	www.cbs.dtu.dk/services/CPHmodels/
COMPOSER	www-cryst.bioc.cam.ac.uk/
FAMS	physchem.pharm.kitasato-u.ac.jp/FAMS/fams.html
MODELLER	guitar.rockefeller.edu/modeller/modeller.html
PrISM	honiglab.cpmc.columbia.edu/
SWISS-MODEL	www.expasy.ch/swissmod/SWISS-MODEL.html
SDSC1	cl.sdsc.edu/hm.html
WHAT IF	www.cmbi.kun.nl/bioinf/predictprotein/
ICM	www.molsoft.com/
SCWRL	www.fccc.edu/research/labs/dunbrack/scwrl/
InsightII	www.accelrys.com
SYBYL	www.tripos.com
Model evaluation
PROCHECK	www.biochem.ucl.ac.uk/~roman/procheck/procheck.html
WHATCHECK	www.cmbi.kun.nl/swift/whatcheck/
ProsaII	www.came.sbg.ac.at
BIOTECH	biotech.embl-ebi.ac.uk:8400/
VERIFY3D	www.doe-mbi.ucla.edu/Services/Verify_3D/
ERRAT	www.doe-mbi.ucla.edu/Services/Errat.html
ANOLEA	guitar.rockefeller.edu/ fmelo/anolea/anolea.html
AQUA	urchin.bmrb.wisc.edu/ jurgen/Aqua/server/
SQUID	www.yorvic.york.ac.uk/~oldfield/squid
PROVE	www.ucmb.ulb.ac.be/UCMB/PROVE/