BINC BioInformatics Syllabus – Basic
Major Bioinformatics Resources: NCBI, EBI, ExPASy, RCSB
The knowledge of various databases and bioinformatics tools available at these resources, organization of databases: data contents and formats, purpose and utility in Life Sciences
Open access bibliographic resources and literature databases:
Open access bibliographic resources related to Life Sciences viz., PubMed, BioMed Central, Public Library of Sciences (PloS), CiteXplore
Sequence databases: Formats, querying and retrieval; Nucleic acid sequence databases: GenBank, EMBL, DDBJ; Protein sequence databases: Uniprot-KB: SWISS-PROT, TrEMBL, UniParc; Repositories for high throughput genomic sequences: EST, STS GSS, etc.; Genome Databases at NCBI, EBI, TIGR, SANGER – Viral Genomes; Archeal and Bacterial Genomes; Eukaryotic genomes with special reference to model organisms (Yeast, Drosophila, C. elegans, Rat, Mouse, Human, plants such as Arabidopsis thaliana, Rice, etc.)
Structure Database: PDB, NDB, PubChem, ChemBank
Derived Databases
Knowledge of the following databases with respect to: basic concept of derived databases, sources of primary data and basic principles of the method for deriving the secondary data, organization of data, contents and formats of database entries, identification of patterns in given sequences and interpretation of the same – Sequence: InterPro, Prosite, Pfam, ProDom; Structure: FSSP, DSSP
Extraction of knowledge from resources on Immunology, Plant, animal and infectious diseases: databases and servers published in the NAR Database and Web server Issues and other Bioinformatics journals viz. BMC Bioinformatics etc.
Sequence Analysis
Various file formats for bio-molecular sequences: GenBank, FASTA, GCG, MSF etc
Basic concepts of sequence similarity, identity and homology, definitions of homologues, orthologues, paralogues and xenologues
Scoring matrices: basic concept of a scoring matrix, Matrices for nucleic acid and proteins sequences, PAM and BLOSUM series, principles based on which these matrices are derived
Database Searches: Keyword-based Entrez and SRS; Sequence-based: BLAST & FASTA; Use of these methods for sequence analysis including the on-line use of the tools and interpretation of results from various sequence and structural as well as bibliographic databases
Pairwise sequence alignments: basic concepts of sequence alignment, Needleman and Wunsch, Smith and Waterman algorithms for pairwise alignments, gap penalties, use of pairwise alignments for analysis of Nucleic acid and protein sequences and interpretation of results
Multiple sequence alignments (MSA): the need for MSA, basic concepts of various approaches for MSA (e.g. progressive, hierarchical etc.). Algorithm of CLUSTALW and PileUp and their application for sequence analysis (including interpretation of results), concept of dandrogram and its interpretation
Sequence patterns and profiles: Basic concept and definition of sequence patterns, motifs and profiles, various types of pattern representations viz. consensus, regular expression (Prosite-type) and sequence profiles; profile-based database searches using PSI-BLAST, analysis and interpretation of profile-based searches
Taxonomy and phylogeny: Basic concepts in systematics, taxonomy and phylogeny; molecular evolution; nature of data used in Taxonomy and Phylogency, Definition and description of phylogenetic trees and various types of trees
Protein and nucleic acid properties: Computation of various parameters using proteomics tools at the ExPASy server, GCG utilities and EMBOSS
Comparative genomics: Basic concepts and applications, whole genome alignments: understanding significance. Artemis as an example
Structural Biology
Proteins: Principles of protein structure; anatomy of proteins – Hierarchical organization of protein structure – Primary. Secondary, Super secondary, Tertiary and Quaternary structure; Hydrophobicity of amino acids, Pacing of protein structure, van der Waal and Solvent accessible surface, Internal coordinates of proteins; Derivation, significance and applications of Ramachandran Map, protein folding
DNA and RNA: types of base pairing – Watson-Crick and Hoogstein; types of double helices A, B, Z and their geometrical as well as structural features; structural and geometrical parameters of each form and their comparison; various types of interactions of DNA with proteins, small molecules
RNA secondary and tertiary structures, t-RNA tertiary structure
Carbohydrates: The various building blocks (monosaccharides), configurations and conformations of the building blocks; formations of polysaccharides and structural diversity due to the different types of linkages
Glyco-conjugates: various types of glycolipids and glycoproteins
Structure analysis and validation: PDB Goodies, Procheck, ProsaII, PDBsum
3-D structure visualization and simulation: Visualization of structures using Rasmol or SPDBV or CHIME or VMD
Basic concepts in molecular modeling: different types of computer representations of molecules. External coordinates and Internal Coordinates
Concepts of force fields: representations of atoms and atomic interactions, potential energy representation
Classification and comparison of protein 3D structures:
Purpose of 3-D structure comparison and concepts, Algorithms such as FSSP, CE, VAST and DALI, Fold Classes
Databases of structure-based classification; CATH and SCOP
Secondary structure prediction: Algorithms viz. Chou Fasman, GOR methods; analysis of results and measuring the accuracy of predictions using Q3, Segment overlap, Mathew’s correlation coefficient
Tertiary Structure prediction: Fundamentals of the methods for 3D structure prediction (sequence similarity/identity of target proteins of known structure, fundamental principles of protein folding etc.) Homology Modeling, fold recognition, threading approaches, and ab-initio structure prediction methods
Fundamentals of docking small and macromolecules to proteins and nucleic acids