SmotifTF Version 0.05 Template-free modeling algorithm. SYNOPSIS SmotifTF carries out template-free structure prediction using a dynamic library of supersecondary structure fragments obtained from a set of remotely related PDB structures. This README provides the information required for downloading, installing and running the software package. For more information on how to run the program use "perldoc SmotifTF" after installation. DOWNLOAD Download SmotifTF package from CPAN: http://search.cpan.org/dist/SmotifTF/ INSTALLATION To install SmotifTF package, run the following commands: 1. Manually: Install where standard Perl modules are stored tar -zxvf SmotifTF-version.tar.gz cd SmotifTF-version/ perl Makefile.PL make make test make install 2. Install in a custom location (/home/user/MyPerlLib) tar -zxvf SmotifTF-version.tar.gz cd SmotifTF-version/ perl Makefile.PL PREFIX=/home/user/MyPerlLib/ make make test make install Please, do not forget to add the following line: use lib "$ENV{HOME}/MyPerlLib/share/perl5/" in ./smotiftf.pl and ./smotiftf_prereq.pl 3. Using a CPAN client: as root type: perl -MCPAN -e shell > install SmotifTF 4. Using a CPAN client and installing in a custom location (/home/user/MyPerlLib) perl -MCPAN -e shell > conf makepl_arg PREFIX=/home/user/MyPerlLib/ > install SmotifTF Please, do not forget to add the following line: use lib "$ENV{HOME}/MyPerlLib/share/perl5/" in ./smotiftf.pl and ./smotiftf_prereq.pl PRE-REQUISITES The Smotif-based modeling algorithm requires the query protein sequence as input. Software/data required: 1. Psipred (http://bioinf.cs.ucl.ac.uk/psipred/) 2. HHSuite (ftp://toolkit.genzentrum.lmu.de/pub/HH-suite/) 3. Psiblast and Delta-blast (http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download) 4. Modeller (version 9.14 https://salilab.org/modeller/) 5. DSSP (http://swift.cmbi.ru.nl/gv/dssp/) 6. Local PDB directory (central or user-designated from http://www.rcsb.org). Many PDB structures are incomplete with missing residues. The SmotifTF algorithm performs best when the PDB structures are complete. Hence, we use Modeller (https://salilab.org/modeller/) to model the missing residues in the PDB to obtain complete structures. The algorithm can work with incomplete PDB structures but the performance may not be as expected. The SMotifTF software can handle gzipped (.gz) or unzipped (.ent) PDB structure files. The software for remodeling the missing residues can be obtained from our website at: http://fiserlab.org/remodel_pdb.tar.gz This can be used to remodel missing residues in the entire PDB and these remodeled structures can be used in the SmotifTF package. The SmotifTF package can handle both regular and remodeled PDB database. Download and install the above mentioned software / data according to their instructions. Note: Psipred may require legacy blast and Psiblast and Delta-blast are part of the Blast+ package. .ncbirc file may be required in the home directory for Psipred. DATABASES REQUIRED: 1. PDBAA blast database is required (ftp://ftp.ncbi.nlm.nih.gov/blast/db/). 2. HHsuite databases NR20 and PDB70 are required (ftp://toolkit.genzentrum.lmu.de/pub/HH-suite/databases/hhsuite_dbs/) SET UP CONFIGURATION FILE The configuration file, smotiftf_config.ini has all the information regarding the required library files and other pre-requisite software. Set all the paths and executables in this file correctly. Set environment varible in .bashrc file: export SMOTIFTF_CONFIG_FILE=/home/user/MyPerlLib/share/perl5/SmotifTF-version/smotiftf_config.ini MODELING ALGORITHM STEPS ---------------------------------------------------- |First run the Pre-requisites: | | Psipred, HHblits+HHsearch, Psiblast, | | Delta-blast | | | | Single-core job | | Usage: perl smotiftf_prereq.pl --step=all | | --sequence_file=1zzz.fasta --dir=1zzz | ---------------------------------------------------- ---------------------------------------------------- | Step 1: | | Compare Smotifs | | | | Multi-core / cluster job | | Usage: perl smotiftf.pl --step=1 --pdb=1zzz | ---------------------------------------------------- ---------------------------------------------------- | Step 2: | | Rank Smotifs | | | | Multi-core / cluster job | | Usage: perl smotiftf.pl --step=2 --pdb=1zzz | ---------------------------------------------------- ---------------------------------------------------- | Step 3: | | Enumerate all possible combinations of | | Smotifs (about a million models) | | | | Multi-core / cluster job | | Usage: perl smotiftf.pl --step=3 --pdb=1zzz | ---------------------------------------------------- ---------------------------------------------------- | Step 4: | | Rank enumerated structures using a | | composite energy function | | | | Single-core job | | Usage: perl smotiftf.pl --step=4 --pdb=1zzz | ---------------------------------------------------- ---------------------------------------------------- | Step 5: | | Run Modeller to generate top 5 complete | | models | | | | Single-core job | | Usage: perl smotiftf.pl --step=5 --pdb=1zzz | ---------------------------------------------------- HOW TO RUN SMOTIFTF: 1. The two perl scripts needed to run SmotifTF are: smotiftf_prereq.pl and smotiftf.pl If installed locally, the correct path name to the SmotifTF perl library must be provided in both scripts. 2. Create a subdirectory with a dummy pdb file name (eg: 1abc or 1zzz). 3. Put the query fasta file (1zzz.fasta) in this directory. 4. Run the pre-requisites first. This runs Psipred, HHblits+HHsearch, Psiblast and Delta-blast. Input is the query sequence in fasta format and the outputs are (a) dynamic database of Smotifs and (b) the putative Smotifs in the query protein. These are used in the subsequent modeling steps. Follow the instructions given in smotiftf_prereq.pl. For more information about the pre-requisites use: perl smotiftf_prereq.pl -help Usage: perl smotiftf_prereq.pl --step=all --sequence_file=1zzz.fasta --dir=1zzz 5. After the pre-requisites are completed, run steps 1 to 5 as given above sequentially. Output from previous steps are often required in subsequent steps. Wait for each step to be completed without errors before going to the next step. Follow the instructions given in smotiftf.pl. For more information use: perl smotiftf.pl -help Usage: perl smotiftf.pl --step=[1-5] --pdb=1zzz 6. To run steps 1-5 together use: perl smotiftf.pl --step=all --pdb=1zzz 7. Use multiple-cores or clusters as available, for steps 1 & 3 above. These are computationally intensive steps. Results: Top 5 models are stored in the subdirectory (1abc or 1zzz) as: Model.1.pdb, Model.2.pdb, Model.3.pdb, Model.4.pdb & Model.5.pdb HOW TO TEST SMOTIFTF PACKAGE A sample fasta sequence (4uzx.fasta) is provided with the distribution that can be used to test the SmotifTF software installation. The fasta file can be found at: /home/user/MyPerlLib/share/perl5/SmotifTF-version/t/Data/4uzx.fasta Steps to perform the test: Create a directory named 4uzx mkdir 4uzx Copy the fasta file into the directory cp /home/user/MyPerlLib/share/perl5/SmotifTF-version/t/Data/4uzx.fasta 4uzx/ Run pre-requisites perl smotiftf_prereq.pl --step=all --sequence_file=4uzx.fasta --dir=4uzx Run modeling algorithm perl smotiftf.pl --step=all --pdb=4uzx REFERENCE Vallat BK, Fiser A. Modularity of protein folds as a tool for template-free modeling of sequences Manuscript under review. AUTHORS Brinda Vallat, Carlos Madrid, Andras Fiser C<< >> SUPPORT AND DOCUMENTATION After installing, you can find documentation for using the sofware with the perldoc command. perldoc SmotifTF You can also look for information at: RT, CPAN's request tracker (report bugs here) http://rt.cpan.org/NoAuth/Bugs.html?Dist=SmotifTF AnnoCPAN, Annotated CPAN documentation http://annocpan.org/dist/SmotifTF CPAN Ratings http://cpanratings.perl.org/d/SmotifTF Search CPAN http://search.cpan.org/dist/SmotifTF/ LICENSE AND COPYRIGHT Copyright (C) 2015 Fiserlab Members This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at: L Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license. If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license. This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder. This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed. Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.