NAME Bio::DB::Das::Chado - DAS-style access to a chado database SYNOPSIS # Open up a feature database $db = Bio::DB::Das::Chado->new( -dsn => 'dbi:Pg:dbname=gadfly;host=lajolla' -user => 'jimbo', -pass => 'supersecret', ); @segments = $db->segment(-name => '2L', -start => 1, -end => 1000000); # segments are Bio::Das::SegmentI - compliant objects # fetch a list of features @features = $db->features(-type=>['type1','type2','type3']); # invoke a callback over features $db->features(-type=>['type1','type2','type3'], -callback => sub { ... } ); # get all feature types @types = $db->types; # count types %types = $db->types(-enumerate=>1); @feature = $db->get_feature_by_name($class=>$name); @feature = $db->get_feature_by_target($target_name); @feature = $db->get_feature_by_attribute($att1=>$value1,$att2=>$value2); $feature = $db->get_feature_by_id($id); $error = $db->error; DESCRIPTION Chado is the GMOD database schema, and chado is a specific instance of it. It is still somewhat of a moving target, so this package will probably require several updates over the coming months to keep it working. FEEDBACK Mailing Lists User feedback is an integral part of the evolution of this and other GMOD modules. Send your comments and suggestions preferably to one of the GMOD mailing lists. Your participation is much appreciated. gmod-gbrowse@lists.sourceforge.com Reporting Bugs Report bugs to the GMOD bug tracking system at SourceForge to help us keep track the bugs and their resolution. http://sourceforge.net/tracker/?group_id=27707&atid=391291 AUTHOR - Scott Cain Email scain@cpan.org LICENSE This software may be redistributed under the same license as perl. APPENDIX The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new Title : new Usage : $db = Bio::DB::Das::Chado( -dsn => 'dbi:Pg:dbname=gadfly;host=lajolla' -user => 'jimbo', -pass => 'supersecret', ); Function: Open up a Bio::DB::DasI interface to a Chado database Returns : a new Bio::DB::Das::Chado object Args : use_all_feature_names Title : use_all_feature_names Usage : $obj->use_all_feature_names() Function: set or return flag indicating that all_feature_names view is present Returns : 1 if all_feature_names present, 0 if not Args : to return the flag, none; to set, 1 organism_id Title : organism_id Usage : $obj->organism_id() Function: set or return the organism_id Returns : the value of the id Args : to return the flag, none; to set, the common name of the organism If -organism is set when the Chado feature is instantiated, this method queries the database with the common name to cache the organism_id. inferCDS Title : inferCDS Usage : $obj->inferCDS() Function: set or return the inferCDS flag Returns : the value of the inferCDS flag Args : to return the flag, none; to set, 1 Often, chado databases will be populated without CDS features, since they can be inferred from a union of exons and polypeptide features. Setting this flag tells the adaptor to do the inferrence to get those derived CDS features (at some small performance penatly). allow_obsolete Title : allow_obsolete Usage : $obj->allow_obsolete() Function: set or return the allow_obsolete flag Returns : the value of the allow_obsolete flag Args : to return the flag, none; to set, 1 The chado feature table has a flag column called 'is_obsolete'. Normally, these features should be ignored by GBrowse, but the -allow_obsolete method is provided to allow displaying obsolete features. sofa_id Title : sofa_id Usage : $obj->sofa_id() Function: get or return the ID to use for SO terms Returns : the cv.cv_id for the SO ontology to use Args : to return the id, none; to determine the id, 1 recursivMapping Title : recursivMapping Usage : $obj->recursivMapping($newval) Function: Flag for activating the recursive mapping (desactivated by default) Returns : value of recursivMapping (a scalar) Args : on set, new value (a scalar or undef, optional) Goal : When we have a clone mapped on a chromosome, the recursive mapping maps the features of the clone on the chromosome. srcfeatureslice Title : srcfeatureslice Usage : $obj->srcfeatureslice Function: Flag for activating Returns : value of srcfeatureslice Args : on set, new value (a scalar or undef, optional) Desc : Allows to use a featureslice of type featureloc_slice(srcfeat_id, int, int) Important : this and recursivMapping are mutually exclusives do2Level Title : do2Level Usage : $obj->do2Level Function: Flag for activating the fetching of 2levels in segment->features Returns : value of do2Level Args : on set, new value (a scalar or undef, optional) dbh Title : dbh Usage : $obj->dbh($newval) Function: Returns : value of dbh (a scalar) Args : on set, new value (a scalar or undef, optional) term2name Title : term2name Usage : $obj->term2name($newval) Function: When called with a hashref, sets cvterm.cvterm_id to cvterm.name mapping hashref; when called with an int, returns the name corresponding to that cvterm_id; called with no arguments, returns the hashref. Returns : see above Args : on set, a hashref; to retrieve a name, an int; to retrieve the hashref, none. Note: should be replaced by Bio::GMOD::Util->term2name name2term Title : name2term Usage : $obj->name2term($newval) Function: When called with a hashref, sets cvterm.name to cvterm.cvterm_id mapping hashref; when called with a string, returns the cvterm_id corresponding to that name; called with no arguments, returns the hashref. Returns : see above Args : on set, a hashref; to retrieve a cvterm_id, a string; to retrieve the hashref, none. Note: Should be replaced by Bio::GMOD::Util->name2term segment Title : segment Usage : $db->segment(@args); Function: create a segment object Returns : segment object(s) Args : see below This method generates a Bio::Das::SegmentI object (see Bio::Das::SegmentI). The segment can be used to find overlapping features and the raw sequence. When making the segment() call, you specify the ID of a sequence landmark (e.g. an accession number, a clone or contig), and a positional range relative to the landmark. If no range is specified, then the entire region spanned by the landmark is used to generate the segment. Arguments are -option=>value pairs as follows: -name ID of the landmark sequence. -class A namespace qualifier. It is not necessary for the database to honor namespace qualifiers, but if it does, this is where the qualifier is indicated. -version Version number of the landmark. It is not necessary for the database to honor versions, but if it does, this is where the version is indicated. -start Start of the segment relative to landmark. Positions follow standard 1-based sequence rules. If not specified, defaults to the beginning of the landmark. -end End of the segment relative to the landmark. If not specified, defaults to the end of the landmark. The return value is a list of Bio::Das::SegmentI objects. If the method is called in a scalar context and there are no more than one segments that satisfy the request, then it is allowed to return the segment. Otherwise, the method must throw a "multiple segment exception". features Title : features Usage : $db->features(@args) Function: get all features, possibly filtered by type Returns : a list of Bio::SeqFeatureI objects Args : see below Status : public This routine will retrieve features in the database regardless of position. It can be used to return all features, or a subset based on their type Arguments are -option=>value pairs as follows: -type List of feature types to return. Argument is an array of Bio::Das::FeatureTypeI objects or a set of strings that can be converted into FeatureTypeI objects. -callback A callback to invoke on each feature. The subroutine will be passed each Bio::SeqFeatureI object in turn. -attributes A hash reference containing attributes to match. The -attributes argument is a hashref containing one or more attributes to match against: -attributes => { Gene => 'abc-1', Note => 'confirmed' } Attribute matching is simple exact string matching, and multiple attributes are ANDed together. If one provides a callback, it will be invoked on each feature in turn. If the callback returns a false value, iteration will be interrupted. When a callback is provided, the method returns undef. types Title : types Usage : $db->types(@args) Function: return list of feature types in database Returns : a list of Bio::Das::FeatureTypeI objects Args : see below This routine returns a list of feature types known to the database. It is also possible to find out how many times each feature occurs. Arguments are -option=>value pairs as follows: -enumerate if true, count the features The returned value will be a list of Bio::Das::FeatureTypeI objects (see Bio::Das::FeatureTypeI. If -enumerate is true, then the function returns a hash (not a hash reference) in which the keys are the stringified versions of Bio::Das::FeatureTypeI and the values are the number of times each feature appears in the database. NOTE: This currently raises a "not-implemented" exception, as the BioSQL API does not appear to provide this functionality. get_feature_by_alias, get_features_by_alias Title : get_features_by_alias Usage : $db->get_feature_by_alias(@args) Function: return list of feature whose name or synonyms match Returns : a list of Bio::Das::Chado::Segment::Feature objects Args : See below This method finds features matching the criteria outlined by the supplied arguments. Wildcards (*) are allowed. Valid arguments are: -name -class -ref (refrence sequence) -start -end get_feature_by_name, get_features_by_name Title : get_features_by_name Usage : $db->get_features_by_name(@args) Function: return list of feature whose names match Returns : a list of Bio::Das::Chado::Segment::Feature objects Args : See below This method finds features matching the criteria outlined by the supplied arguments. Wildcards (*) are allowed. Valid arguments are: -name -class -ref (refrence sequence) -start -end _by_alias_by_name Title : _by_alias_by_name Usage : $db->_by_alias_by_name(@args) Function: return list of feature whose names match Returns : a list of Bio::Das::Chado::Segment::Feature objects Args : See below A private method that implements the get_features_by_name and get_features_by_alias methods. It accepts the same args as those methods, plus an addtional on (-operation) which is either 'by_alias' or 'by_name' to indicate what rule it is to use for finding features. srcfeature2name returns a srcfeature name given a srcfeature_id gff_source_db_id Title : gff_source_db_id Function: caches the chado db_id from the chado db table gff_source_dbxref_id Gets dbxref_id for features that have a gff source associated dbxref2source returns the source (string) when given a dbxref_id source_dbxref_list Title : source_dbxref_list Usage : @all_dbxref_ids = $db->source_dbxref_list() Function: Gets a list of all dbxref_ids that are used for GFF sources Returns : a comma delimited string that is a list of dbxref_ids Args : none Status : public This method queries the database for all dbxref_ids that are used to store GFF source terms. search_notes Title : search_notes Usage : $db->search_notes($search_term,$max_results) Function: full-text search on features, ENSEMBL-style Returns : an array of [$name,$description,$score] Args : see below Status : public This routine performs a full-text search on feature attributes (which attributes depend on implementation) and returns a list of [$name,$description,$score], where $name is the feature ID (accession?), $description is a human-readable description such as a locus line, and $score is the match strength. ** NOT YET ACTIVE: search_notes IS IN TESTING STAGE ** sub search_notes { my $self = shift; my ($search_string,$limit) = @_; my $limit_str; if (defined $limit) { $limit_str = " LIMIT $limit "; } else { $limit_str = ""; } # so here's the plan: # if there is only 1 word, do 1-3 # 1. search for accessions like $string.'%'--if any are found, quit and return them # 2. search for feature.name like $string.'%'--if found, keep and continue # 3. search somewhere in analysis like $string.'%'--if found, keep and continue # if there is more than one word, don't search accessions # 4. search each word anded together like '%'.$string.'%' --if found, keep and continue # 5. search somewhere in analysis like '%'.$string.'%' # $self->dbh->trace(1); my @search_str = split /\s+/, $search_string; my $qsearch_term = $self->dbh->quote($search_str[0]); my $like_str = "( (dbx.accession ~* $qsearch_term OR \n" ." f.name ~* $qsearch_term) "; for (my $i=1;$i<(scalar @search_str);$i++) { $qsearch_term = $self->dbh->quote($search_str[$i]); $like_str .= "and \n"; $like_str .= " (dbx.accession ~* $qsearch_term OR \n" ." f.name ~* $qsearch_term) "; } $like_str .= ")"; my $sth = $self->dbh->prepare(" select dbx.accession,f.name,0 from feature f, dbxref dbx, feature_dbxref fd where f.feature_id = fd.feature_id and fd.dbxref_id = dbx.dbxref_id and $like_str $limit_str "); $sth->execute or throw ("couldn't execute keyword query"); my @results; while (my ($acc, $name, $score) = $sth->fetchrow_array) { $score = sprintf("%.2f",$score); push @results, [$acc, $name, $score]; } $sth->finish; return @results; } attributes Title : attributes Usage : @attributes = $db->attributes($id,$name) Function: get the "attributes" on a particular feature Returns : an array of string Args : feature ID [, attribute name] Status : public This method is intended as a "work-alike" to Bio::DB::GFF's attributes method, which has the following returns: Called in list context, it returns a list. If called in a scalar context, it returns the first value of the attribute if an attribute name is provided, otherwise it returns a hash reference in which the keys are attribute names and the values are anonymous arrays containing the values. _segclass Title : _segclass Usage : $class = $db->_segclass Function: returns the perl class that we use for segment() calls Returns : a string containing the segment class Args : none Status : reserved for subclass use chado_reference_class Title : chado_reference_class Usage : $obj->chado_reference_class() Function: get or return the ID to use for Gbrowse map reference class using cvtermprop table, value = MAP_REFERENCE_TYPE Returns : the cvterm.name Args : to return the id, none; to determine the id, 1 See also: default_class, refclass_feature_id Optionally test that user/config supplied ref class is indeed a proper chado feature type. refclass_feature_id Title : refclass_feature_id Usage : $self->refclass_srcfeature_id() Function: Used to store the feature_id of the reference class feature we are working on (e.g. contig, supercontig) With this feature we can filter out all the request to be sure we are extracting a feature located on the reference class feature. Returns : A scalar Args : The feature_id on setting LEFTOVERS FROM BIO::DB::GFF NEEDED FOR DAS these methods should probably be declared in an interface class that Bio::DB::GFF implements. for instance, the aggregator methods could be described in Bio::SeqFeature::AggregatorI END LEFTOVERS