Filtering data in QuiXoT

From PROTEOMICA
Jump to: navigation, search

Filtering commands is a powerful way to manage data in QuiXoT. The filter command is activated in four windows:

  • the main window (the command line at the left of the filter button): allows to filter the data presented in the table (and plotted in the graphs)
  • the statistics window (the window that opens by pressing the stats button): allows to filter the data that will be included in the statistical analysis
  • the variance calculations window (the window that opens by pressing the var calc button): allows to filter the data that will be used to estimate variances.
  • the zoom filter: automatically applied only to the data presented in the table when the zoom tool is used in a graph to select a data subset. Note that this last fourth kind of filter is not visible to the user; this situation is indicated by turning the filter command in green.

The first three filters applied to any of these windows are independent of each other (i.e., variances can be calculated using different data than those shown in the table or those used to make the statistics).

Filter commands

If you are familiar with SQL, you essentially already know how does filtering work. Otherwise, keep reading to get the gist of it.

Filter commands are composed by a list of "conditions" joined by "operators":

Condition1 Operator1 Condition2 Operator2 Condition3
Operators commonly used are
AND: filters the data that verifies the previous and following conditions
OR: filters the data that verifies either the previous or the following conditions

Although the OR operator has the preference, it is highly recommendable, to avoid surprises, to use parenthesis to specify what are the conditions that are linked by the operators. For instance

Condition1 AND (Condition2 OR Condition3) AND Condition4

means that the filtered data will verify Condition1 and Condition4, and either Condition2 or Condition3

Conditions may be applied to any of QuiXoT parameters. Conditions commonly used are
For numerical parameters:
= (equal),
> (higher than)
< (lower than)
>= (equal or higher than)
<= (equal or lower than)
<> (is not equal to)
For text parameters:
= 'text' (the parameter value is the word "text")
like '%text%' (the parameter contains the word "text")
not like '%text%' (the parameter does not contain the word "text")

Eliminating and detecting incorrect filters

A colour code is used for the three states of a filter:

  • blue colour: the filter is active. To eliminate a filter command, just delete the command line and press the filter button.
  • green colour: there is an active zoom filter; you should locate the graph with activated zoom and unzoom it (zoom out option with the right mouse button).
  • red colour: the filter string is incorrectly typed, and no action is taken (so the datagrid shows all the rows).

Filter shortcuts

QuiXoT contains two parameters that are specifically designed to use within filter commands. These are:

  • st_excluded: this text parameter is empty when the scan has been included in the last statistical analysis performed; if the parameter has been assigned any value, it means that it has been filtered out from the statistics. For instance, if you have made the statistics applying a large list of conditions in the filter command, it is not necessary to copy them all to the main window, just write
st_excluded = ''
or check the "show only data used in stats" option, for the table to display only the data used in the statistics.
  • numLabel1: this numerical parameter is used to mark manually bad quality scans by setting the value of 0 (zero). When the "bad quality scans" option is checked in the stats window, QuiXoT filters out the data that have a numLabel1 value of 0 (zero). For instance, if you do not want to show in the table the scans manually discarded (no matter whether they have been included or not in the statistics), write
numLabel1 <> 0
or check the hide manually discarded option.

scans, peptides and proteins buttons

the scans, peptides and proteins buttons.

These three buttons can be found at the main window, above the datagrid. They are useful to control whether all the scans, only the peptides or only the proteins are listed in the table, respectively. To reset the filter command delete the line and press the filter button again. The conditions introduced by these three buttons may be configured by the user in the configuration files.

Useful filter commands

See also the article: DataGrid information in QuiXoT.

The following are useful examples of filter commands accepted by QuiXoT:

Eliminating non-labelled peptides

This filter eliminates the C-terminal peptides of proteins (which cannot be labelled with 18O or SILAC).
st_Cterm = 0

Checking missed cleavages

This filter selects partially-digested peptides (i.e., peptides containing missing cleavage sites). This filter is useful to analyse whether this subpopulation of peptides is evenly distributed or not.
st_PartialDig = 1 or st_PartialDig = 3

Removing missed cleavages

This command filters out all the peptides containing missing cleavages and all the related peptides. This filter is useful in cases when an artefact of partial digestion in suspected.
st_PartialDig = 0

Selecting oxidized methionines

This filter selects all the peptides containing oxidized methionines.
st_Meth = 2

Selecting peptides with methionines

This filter selects all the peptides containing methionines (oxidized or not). These two filters are useful to analyse whether methionine oxidation is affecting quantification.
st_Meth <> 0

Removing peptides with methionines

This command filters out all the peptides containing methionines.
st_Meth = 0

Removing bad quantitations

This filter eliminates bad quantitations by selecting only the data which pass a set of criteria related to spectrum fitting (these are the filters used in our original 18O-stats paper[1])
Vs > 3.1 and q_Alpha < 1.4 and q_Sigma < 0.14

Removing trypsins and keratins

This filter eliminates peptides produced from trypsin or keratins.

FastaProteindescription not like '%trypsin%' and FastaProteindescription not like '%keratin%'

References

<references>
  1. A. Ramos-Fernández, D. López-Ferrer, and J. Vázquez, «Improved method for differential expression Proteomics using trypsin-catalyzed 18O labeling with a correction for labeling efficiency», Mol Cell Proteomics 6, 1274-1286 (2007) (DOI: 10.1074/mcp.T600029-MCP200)