Sequence analysis in a nutshell : a guide to common tools and databases / Scott Markel and Darryl León.



Markel, Scott.
1st ed. - Sebastopol, CA : Farnham : O'Reilly, 2003.
  • Book
  • xiv, 286 p. : ill. ; 24 cm.
"Covers EMBOSS 2.5.0"--Cover.
"Resources": p. 265-271. Includes bibliographical references and index.
  • Preface I. Data Formats 1. FASTA Format NCBI's Sequence Identifier Syntax NCBI's Non-Redundant Database Syntax References 2. GenBank/EMBL/DDBJ Example Flat Files GenBank Example Flat File DDBJ Example Flat File GenBank/DDBJ Field Definitions EMBL Example Flat File EMBL Field Definitions DDBJ/EMBL/GenBank Feature Table References 3. SWISS-PROT SWISS-PROT Example Flat File SWISS-PROT Field Definitions SWISS-PROT Feature Table References 4. Pfam Pfam Example Flat File Pfam Field Definitions References 5. PROSITE PROSITE Example Flat File PROSITE Field Definitions References II. Tools 6. Readseq Supported Formats Command-Line Options References 7. BLAST formatdb blastall megablast blastpgp PSI-BLAST PHI-BLAST bl2seq References 8. BLAT Command-Line Options References 9. ClustalW Command-Line Options References 10. HMMER hmmalign hmmbuild hmmcalibrate hmmconvert hmmemit hmmfetch hmmindex hmmpfam hmmsearch References 11. MEME/MAST MEME MAST References 12. EMBOSS Common Themes List of All EMBOSS Programs Details of EMBOSS Programs References III. Appendixes A. Nucleotide and Amino Acid Tables B. Genetic Codes C. Resources D. Future Plans Index.
  • (source: Nielsen Book Data)
Publisher's Summary:
Gene sequence data is the most abundant type of data available, and if you're interested in analyzing it, you'll find a wealth of computational methods and tools to help you. In fact, finding the data is not the challenge at all; rather it is dealing with the plethora of flat file formats used to process the sequence entries and trying to remember what their specific field codes mean. This book is a handy resource, as well as an invaluable reference, for anyone who needs to know about the practical aspects and mechanics of sequence analysis. This reference pulls together all of the vital information about the most commonly used databases, analytical tools, and tables used in sequence analysis. The book is partitioned into three fundamental areas to help you maximize your use of the content. The first section, "Databases" contains examples of flatfiles from key databases (GenBank, EMBL, SWISS-PROT), the definitions of the codes or fields used in each database, and the sequence feature types/terms and qualifiers for the nucleotide and protein databases. The second section, "Tools" provides the command line syntax for popular applications such as ReadSeq, MEME/MAST, BLAST, ClustalW, and the EMBOSS suite of analytical tools. The third section, "Appendixes" concentrates on information essential to understanding the individual components that make up a biological sequence. The tables in this section include nucleotide and protein codes, genetic codes, as well as other relevant information. Written in O'Reilly's straightforward "Nutshell" format, this book draws together essential information for bioinformaticians in industry and academia, as well as for students.
León, Darryl.

