BINC BioInformatics Syllabus –  Basic

Major Bioinformatics Resources: NCBI, EBI, ExPASy, RCSB

The knowledge of various databases and bioinformatics tools available at these resources, organization of databases: data contents and formats, purpose and utility in Life Sciences

Open access bibliographic resources and literature databases:

Open access bibliographic resources related to Life Sciences viz., PubMed, BioMed Central, Public Library of Sciences (PloS), CiteXplore

Sequence databases: Formats, querying and retrieval; Nucleic acid sequence databases: GenBank, EMBL, DDBJ; Protein sequence databases: Uniprot-KB: SWISS-PROT, TrEMBL, UniParc; Repositories for high throughput genomic sequences: EST, STS GSS, etc.; Genome Databases at NCBI, EBI, TIGR, SANGER – Viral Genomes; Archeal and Bacterial Genomes; Eukaryotic genomes with special reference to model organisms (Yeast, Drosophila, C. elegans, Rat, Mouse, Human, plants such as Arabidopsis thaliana, Rice, etc.)

Structure Database: PDB, NDB, PubChem, ChemBank

Derived Databases

Knowledge of the following databases with respect to: basic concept of derived databases, sources of primary data and basic principles of the method for deriving the secondary data, organization of data, contents and formats of database entries, identification of patterns in given sequences and interpretation of the same – Sequence: InterPro, Prosite, Pfam, ProDom; Structure: FSSP, DSSP

Extraction of knowledge from resources on Immunology, Plant, animal and infectious diseases: databases and servers published in the NAR Database and Web server Issues and other Bioinformatics journals viz. BMC Bioinformatics etc.

Sequence Analysis

Various file formats for bio-molecular sequences: GenBank, FASTA, GCG, MSF etc

Basic concepts of sequence similarity, identity and homology, definitions of homologues, orthologues, paralogues and xenologues

Scoring matrices: basic concept of a scoring matrix, Matrices for nucleic acid and proteins sequences, PAM and BLOSUM series, principles based on which these matrices are derived

Database Searches: Keyword-based Entrez and SRS; Sequence-based: BLAST & FASTA; Use of these methods for sequence analysis including the on-line use of the tools and interpretation of results from various sequence and structural as well as bibliographic databases

Pairwise sequence alignments: basic concepts of sequence alignment, Needleman and Wunsch, Smith and Waterman algorithms for pairwise alignments, gap penalties, use of pairwise alignments for analysis of Nucleic acid and protein sequences and interpretation of results

Multiple sequence alignments (MSA): the need for MSA, basic concepts of various approaches for MSA (e.g. progressive, hierarchical etc.). Algorithm of CLUSTALW and PileUp and their application for sequence analysis (including interpretation of results), concept of dandrogram and its interpretation

Sequence patterns and profiles: Basic concept and definition of sequence patterns, motifs and profiles, various types of pattern representations viz. consensus, regular expression (Prosite-type) and sequence profiles; profile-based database searches using PSI-BLAST, analysis and interpretation of profile-based searches

Taxonomy and phylogeny: Basic concepts in systematics, taxonomy and phylogeny; molecular evolution; nature of data used in Taxonomy and Phylogency, Definition and description of phylogenetic trees and various types of trees

Protein and nucleic acid properties: Computation of various parameters using proteomics tools at the ExPASy server, GCG utilities and EMBOSS

Comparative genomics: Basic concepts and applications, whole genome alignments: understanding significance. Artemis as an example

Structural Biology

Proteins: Principles of protein structure; anatomy of proteins – Hierarchical organization of protein structure – Primary. Secondary, Super secondary, Tertiary and Quaternary structure; Hydrophobicity of amino acids, Pacing of protein structure, van der Waal and Solvent accessible surface, Internal coordinates of proteins; Derivation, significance and applications of Ramachandran Map, protein folding

DNA and RNA: types of base pairing – Watson-Crick and Hoogstein; types of double helices A, B, Z and their geometrical as well as structural features; structural and geometrical parameters of each form and their comparison; various types of interactions of DNA with proteins, small molecules

RNA secondary and tertiary structures, t-RNA tertiary structure

Carbohydrates: The various building blocks (monosaccharides), configurations and conformations of the building blocks; formations of polysaccharides and structural diversity due to the different types of linkages

Glyco-conjugates: various types of glycolipids and glycoproteins

Structure analysis and validation: PDB Goodies, Procheck, ProsaII, PDBsum

3-D structure visualization and simulation: Visualization of structures using Rasmol or SPDBV or CHIME or VMD

Basic concepts in molecular modeling: different types of computer representations of molecules. External coordinates and Internal Coordinates

Concepts of force fields: representations of atoms and atomic interactions, potential energy representation

Classification and comparison of protein 3D structures:

Purpose of 3-D structure comparison and concepts, Algorithms such as FSSP, CE, VAST and DALI, Fold Classes

Databases of structure-based classification; CATH and SCOP

Secondary structure prediction: Algorithms viz. Chou Fasman, GOR methods; analysis of results and measuring the accuracy of predictions using Q3, Segment overlap, Mathew’s correlation coefficient

Tertiary Structure prediction: Fundamentals of the methods for 3D structure prediction (sequence similarity/identity of target proteins of known structure, fundamental principles of protein folding etc.) Homology Modeling, fold recognition, threading approaches, and ab-initio structure prediction methods

Fundamentals of docking small and macromolecules to proteins and nucleic acids

Bio-Informatics National Certification (BINC) 2019 Syllabus