EMBOSS: profit


Program profit

Function

Scan a sequence or database with a matrix or profile

Description

profit takes a simple frequency matrix produced by prophecy and searches with this to find matches in the input sequence(s) you are searching.

Scores for the matches are calculated from the simple frequency matrix. It is the sum of scores at each position of the matrix.

A 'simple frequency matrix' is simply a count of the number of times any particular amino acid occurs at each position in the alignment used to create it. Simple frequency matrices are created using the program prophecy with the option '-type F' to create the correct type of matrix. The alignment should not have gaps in it.

The resulting matrix is moved to each position in the sequence(s) you are searching. At each position in the sequence, the frequencies of the amino acids or bases covered by the length of the matrix is read from the matrix. The sum of these frequencies at each position of the matrix is the score for that position of the sequence. If this score is above the threshold percentage of the maximum possible score for that matrix, then a hit is reported.

Usage

Here is a sample session with profit.

(My aligned set of sequences:)
% more m.seq
>one
DEVGGEALGRLLVVYPWTQR
>two
DEVGREALGRLLVVYPWTQR
>three
DEVGGEALGRILVVYPWTQR
>four
DEVGGEAAGRVLVVYPWTQR



(Make a simple frequency matrix using prophecy)
% prophecy
Creates matrices/profiles from multiple alignments
Input sequence set: m.seq
Profile type
         F : Frequency
         G : Gribskov
         H : Henikoff
Select type [F]: 
Enter a name for the profile [mymatrix]: 
Enter threshold reporting percentage [75]: 
Output file [outfile.prophecy]: 

(Search using profit)
% profit
Scan a sequence or database with a matrix or profile
Profile or matrix file: outfile.prophecy
Input sequence(s): sw:*
Output file [outfile.profit]: 

Command line arguments

   Mandatory qualifiers:
  [-infile]            infile     Profile or matrix file
  [-sequence]          seqall     Sequence database USA
  [-outfile]           outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-infile]
(Parameter 1)
Profile or matrix file Input file Required
[-sequence]
(Parameter 2)
Sequence database USA Readable sequence(s) Required
[-outfile]
(Parameter 3)
Output file name Output file <sequence>.profit
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
(none)

Input file format

It reads a simple frequency matrix produced by prophecy and searches one or more sequences.

Output file format

The ouput is a list of three columns.

The first column is the name of the matching sequence found.
The second is the start position in the sequence of the match.
The third column (after the word 'Percentage:') is the percentage of the maximum possible score (sum of the highest value at each position in the frequency matrix).

The output from the above example follows:


# PROF scan using simple frequency matrix mymatrix
# Scores >= threshold 75 (max score 76)
#
HBB0_MOUSE 21 Percentage: 78
HBB1_MOUSE 21 Percentage: 100
HBB1_RAT 21 Percentage: 94
HBB1_SPHPU 21 Percentage: 84
HBB1_TAPTE 21 Percentage: 94
HBB2_MOUSE 21 Percentage: 100
HBB2_PANLE 21 Percentage: 100
HBB2_RAT 21 Percentage: 90
HBB2_SPHPU 21 Percentage: 89
HBB2_TAPTE 21 Percentage: 89
HBBA_BOSJA 20 Percentage: 100
HBBA_CAPHI 20 Percentage: 96
HBBC_CAPHI 16 Percentage: 96
HBBC_SHEEP 16 Percentage: 96
HBBF_BOVIN 20 Percentage: 100
HBBF_CAPHI 20 Percentage: 94
HBBF_SHEEP 20 Percentage: 94
HBBN_AMMLE 16 Percentage: 75
HBBZ_MOUSE 21 Percentage: 78
HBB_AILFU 21 Percentage: 100
HBB_AILME 21 Percentage: 100
HBB_ALCAA 20 Percentage: 100
HBB_ANTPA 21 Percentage: 94
HBB_AOTTR 21 Percentage: 100
HBB_APTFO 21 Percentage: 75
HBB_ATEGE 21 Percentage: 100
HBB_BALAC 21 Percentage: 94
HBB_BISBO 20 Percentage: 100
HBB_BOSGF 20 Percentage: 100
HBB_BOSMU 20 Percentage: 100
HBB_BOVIN 20 Percentage: 100
HBB_BRATR 21 Percentage: 84
HBB_CALAR 21 Percentage: 100
HBB_CAMDR 21 Percentage: 94
HBB_CANFA 21 Percentage: 94
HBB_CAVPO 21 Percentage: 85
HBB_CEBAL 21 Percentage: 100
HBB_CEBAP 21 Percentage: 100
HBB_CERAE 21 Percentage: 100
HBB_CERSI 21 Percentage: 100
HBB_CERTO 21 Percentage: 100
HBB_CHICK 21 Percentage: 75
HBB_CHRPI 21 Percentage: 75
HBB_CICCI 21 Percentage: 80
HBB_COLBA 21 Percentage: 100
HBB_COLPO 21 Percentage: 100
HBB_COTJA 21 Percentage: 75
HBB_CROCR 21 Percentage: 100
HBB_CTEGU 21 Percentage: 94
HBB_CYNSP 21 Percentage: 100
HBB_CYPCA 21 Percentage: 77
HBB_DASNO 21 Percentage: 78
HBB_DIDMA 21 Percentage: 81
HBB_ECHTE 21 Percentage: 78
HBB_ELEEL 21 Percentage: 75
HBB_ELEMA 21 Percentage: 78
HBB_EQUHE 21 Percentage: 94
HBB_ERIEU 21 Percentage: 89
HBB_EULFU 21 Percentage: 89
HBB_FELCA 21 Percentage: 100
HBB_FRAPO 21 Percentage: 75
HBB_GALCR 21 Percentage: 94
HBB_GORGO 21 Percentage: 100
HBB_HIPAM 21 Percentage: 94
HBB_HORSE 21 Percentage: 94
HBB_HUMAN 21 Percentage: 100
HBB_HYLLA 21 Percentage: 100
HBB_LAMGL 21 Percentage: 94
HBB_LATCH 21 Percentage: 76
HBB_LEMCA 21 Percentage: 89
HBB_LEMVA 21 Percentage: 89
HBB_LEPEU 21 Percentage: 89
HBB_LEPWE 21 Percentage: 100
HBB_LORTA 21 Percentage: 89
HBB_LOXAF 21 Percentage: 78
HBB_LUTLU 21 Percentage: 100
HBB_LYNLY 21 Percentage: 100
HBB_MACCA 21 Percentage: 89
HBB_MACFU 21 Percentage: 100
HBB_MACGG 21 Percentage: 94
HBB_MACMU 21 Percentage: 94
HBB_MANSP 21 Percentage: 100
HBB_MARMA 21 Percentage: 80
HBB_MEGLY 21 Percentage: 94
HBB_MELCA 21 Percentage: 100
HBB_MELME 21 Percentage: 100
HBB_MESAU 21 Percentage: 90
HBB_MESBR 21 Percentage: 85
HBB_MUSLU 21 Percentage: 100
HBB_MUSPF 21 Percentage: 94
HBB_MYOVE 21 Percentage: 100
HBB_NASNA 21 Percentage: 100
HBB_NYCCO 21 Percentage: 94
HBB_ODORO 21 Percentage: 100
HBB_ODOVI 20 Percentage: 90
HBB_ONDZI 21 Percentage: 90
HBB_ORNAN 21 Percentage: 89
HBB_OVIMU 20 Percentage: 96
HBB_PAGLA 21 Percentage: 100
HBB_PANPO 21 Percentage: 100
HBB_PANTS 21 Percentage: 100
HBB_PAPCY 21 Percentage: 100
HBB_PASMO 21 Percentage: 78
HBB_PHACA 21 Percentage: 75
HBB_PHOVI 21 Percentage: 100
HBB_PHYCA 21 Percentage: 89
HBB_PIG 21 Percentage: 100
HBB_PREEN 21 Percentage: 100
HBB_PROCR 21 Percentage: 100
HBB_PROHA 21 Percentage: 94
HBB_PROLO 21 Percentage: 94
HBB_PSIKR 21 Percentage: 75
HBB_PTEAL 21 Percentage: 100
HBB_PTEBR 21 Percentage: 100
HBB_PTEPO 21 Percentage: 100
HBB_RABIT 21 Percentage: 94
HBB_RANCA 15 Percentage: 78
HBB_RANES 15 Percentage: 78
HBB_RANTA 20 Percentage: 96
HBB_RHIUN 21 Percentage: 100
HBB_ROUAE 21 Percentage: 94
HBB_SAGFU 21 Percentage: 94
HBB_SAGMY 21 Percentage: 94
HBB_SAGNI 21 Percentage: 94
HBB_SAISC 21 Percentage: 89
HBB_SHEEP 20 Percentage: 96
HBB_SPAEH 21 Percentage: 94
HBB_SPECI 21 Percentage: 90
HBB_SPETO 21 Percentage: 85
HBB_STUVU 21 Percentage: 75
HBB_SUNMU 21 Percentage: 96
HBB_TACAC 21 Percentage: 89
HBB_TADBR 21 Percentage: 94
HBB_TALEU 21 Percentage: 100
HBB_TAPGE 21 Percentage: 100
HBB_TARBA 21 Percentage: 89
HBB_TARSY 21 Percentage: 89
HBB_THEGE 21 Percentage: 100
HBB_TRAST 20 Percentage: 100
HBB_TRIIN 21 Percentage: 89
HBB_TURME 21 Percentage: 78
HBB_TURTR 21 Percentage: 94
HBB_URSMA 21 Percentage: 100
HBB_VULGR 21 Percentage: 75
HBB_VULVU 21 Percentage: 94
HBD_AOTTR 21 Percentage: 94
HBD_ATEFU 21 Percentage: 100
HBD_ATEGE 21 Percentage: 100
HBD_COLPO 21 Percentage: 94
HBD_GALCR 21 Percentage: 94
HBD_HUMAN 21 Percentage: 94
HBD_PANTR 21 Percentage: 94
HBD_SAGMY 21 Percentage: 100
HBD_SAISC 21 Percentage: 100
HBD_TARSY 21 Percentage: 89
HBE1_CAPHI 21 Percentage: 89
HBE_AOTAZ 21 Percentage: 89
HBE_ATEBE 21 Percentage: 89
HBE_CAIMO 21 Percentage: 75
HBE_CALJA 21 Percentage: 89
HBE_CEBAL 21 Percentage: 89
HBE_CHEME 21 Percentage: 89
HBE_CHICK 21 Percentage: 75
HBE_DAUMA 21 Percentage: 89
HBE_DIDMA 21 Percentage: 78
HBE_EULFU 21 Percentage: 89
HBE_GALCR 21 Percentage: 89
HBE_HUMAN 21 Percentage: 89
HBE_HYLSY 21 Percentage: 89
HBE_LAGLA 21 Percentage: 89
HBE_LEORO 21 Percentage: 84
HBE_MACEU 21 Percentage: 89
HBE_MACMU 21 Percentage: 89
HBE_MICMU 21 Percentage: 89
HBE_MOUSE 21 Percentage: 94
HBE_PANPA 21 Percentage: 84
HBE_PIG 21 Percentage: 80
HBE_PITIR 21 Percentage: 89
HBE_PONPY 21 Percentage: 89
HBE_PROVE 21 Percentage: 89
HBE_RABIT 21 Percentage: 94
HBE_SAGMI 21 Percentage: 89
HBE_SAISC 21 Percentage: 89
HBE_SMICR 21 Percentage: 89
HBE_TARSY 21 Percentage: 94
HBG1_PONPY 21 Percentage: 78
HBG_ALOBE 21 Percentage: 78
HBG_ALOSE 21 Percentage: 78
HBG_ATEGE 21 Percentage: 78
HBG_CEBAP 21 Percentage: 78
HBG_CHEME 21 Percentage: 89
HBG_EULFU 21 Percentage: 89
HBG_GALCR 21 Percentage: 89
HBG_GORGO 21 Percentage: 78
HBG_HUMAN 21 Percentage: 78
HBG_HYLLA 21 Percentage: 78
HBG_MACMU 21 Percentage: 78
HBG_MACNE 21 Percentage: 78
HBG_RABIT 21 Percentage: 80
HBG_TARSY 21 Percentage: 89
HBO_MACEU 21 Percentage: 75
HBRH_CHICK 21 Percentage: 75
HBT_PIG 21 Percentage: 85

Data files

None.

Notes

None.

References

None.

Warnings

The aligned set of sequences used to make the simple frquency matrix should not have gaps in it. profit will let you use a matrix made from a gapped alignment, but the results will probably not be sensible.

Diagnostic Error Messages

None.

Exit status

It always exits with a status of 0.

Known bugs

None.

See also

Program nameDescription
prophecyCreates matrices/profiles from multiple alignments
prophetGapped alignment for profiles

Author(s)

This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)

History

Written (1999) - Alan Bleasby

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments