QuiXoT

From PROTEOMICA
Revision as of 16:10, 12 September 2013 by Mtrevisan (talk | contribs)
Jump to: navigation, search
QuiXoT
Screenshot QuiXoT general.PNG
Screenshot of QuiXoT, depicting different spectra and graphs used.
Last release: v.1.4.00
Release date: 20th Aug 2013
Download link: [[{{{link}}}]]
Source code: QuiXoT at GitHub
Licence: Please read Licencing
Requirements


QuiXoT is an open source software created for the quantitation and statistical analysis of quantitative proteomics experiments. It has been developed at the Cardiovascular Proteomics Laboratory of Prof Jesús Vázquez, at the Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain.

It has been developed in Visual C#, hence users must install the .NET Framework 2.0 or higher (not necessary for Windows 7 users), which can be downloaded from this link.

Using QuiXoT

See also the article: DataGrid information in QuiXoT.

Part I: Checking an existent QuiXoT analysis

The QuiXML files

QuiXoT makes use of a QuiXML files, which is an ad hoc XML format created to manage the three levels of information treated: identification, quantitation and statistical information. To check a list of the different fields used in QuiXML files (i.e., the columns appearing in the main window of QuiXoT), you can check the article DataGrid information in QuiXoT.

After dragging and dropping the QuiXML file, you will have to choose the quantitation method.

If you just want to see an existing QuiXoT analysis, you only need the corresponding QuiXML file. Just drag and drop that file on the main form, and select the quantitation method used, which will depend on the SIL method used (such as 18O, SILAC, etc) and the spectrometre used (sich as high or low resolution).

The binStack folder

The spectra are saved in a folder called binStack, which contains one or more .bfr files and one index.idx file. You do not need this folder if you just want to check the results of a QuiXoT analysis (such as the statistics, identifications or the quantitative information).

However, you will need it if you want to requantitate a spectrum, or see the spectrum itself (for example to compare the theoretical and experimental isotopic envelope, which are respectively in red and blue colours). If this is your case, then you should always have the QuiXML file and its corresponding binStack in the same folder (do not forget to move them together).

As far as the binStack and the QuiXML file are in the same place, you do not need to do anything else to load the spectral information.

The configuration files

In the location where you have copied your version of QuiXoT you will find a conf folder containing the configuration files. It contains three kinds of file:

  • the QuantitationMethods.xml file, which contains the parameters of the different methods used. Here it is specified which labelling is associated to a method, or which is the spectrum type that contains the quantitative information (for instance, SILAC quantitation is performed in the full scan if a high resolution spectrometre has been used, but if it is a low resolution machine, then the quantitation is performed in the zoom scan immediately previous to the MS2 scan). Examples of other parametres that can be determined using this file are:
  • width: the tolerance for the high resolution peaks
  • deltaR: the mass difference between 16O and 18O
  • sumSQtolerance_NG, the tolerance accepted for the sum of squares when comparing the theoretical and the experimental spectrum
  • the iTRAQ masstags and their corrections
  • several xml files which containg information such as the weights of each isotope, the composition of each amino acid or their posttranslational modifications. They also include the correspondence between the different residues and their symbols; for instance, "Y" means "tysorine", while "*" may refer to an oxidation in methionine, or a SILAC label on arginine. Examples of these files are:
  • isotopes.xml
  • aminoacids.xml
  • aminoacids_SILAC.xml
  • several xsd files, which are the XML schemas that contain the structure of the QuiXML file depending on each quantitation method. Some examples of these files are
  • identifications_schema_18Ohighres.xsd
  • identifications_schema_mascot_SILAC.xsd

Checking spectra by weight

Inspect quantifications with low Vs values. Sort the table by Vs and inspect the spectra by using the spectrum button. At very low Vs values you will find completely useless spectra (bad fittings, mixtures, high background, etc). You can choose whether eliminating these spectra from the statistics by marking them with numLabel1 = 0, or filtering by a minimum Vs value (for instance Vs > 3). Non-quantifiable peptides (i.e. peptides not containing basic N-terminal residues in 18O-labeling or in SILAC) must also be excluded when calculating variances or performing the statistics.

Labelling efficiency for 18O labelling

If you have used 18O-labelling, you can check the labelling efficiency, prior to other analyses. Plot q_f versus Xs (consult how to create graphs). Since this plot does not differentiate between good and bad quantitations and hence plots together more and less accurate estimations of q_f, it is a good idea to eliminate bad quantitations from the plot by filtering out the data that do not have an arbitrary minimum Vs value (for instance Vs > 30 in ZoomScan-quantitated spectrum). Labelling efficiency must be above 0.8 for the vast majority of peptides. A cloud of points with q_f below 0.7 tending to curve towards the right (increasing Xs values) are indicative of a poorly-labeled experiment.

Part II: Analysing an experiment from scratch

Generating the files

Generate the QuiXML file containing the list of identified peptides. You can do this in different ways:

  • if you have identified using SEQUEST (which includes Proteome Discoverer), you may use pRatio from either the .msf files (or the .srf files, if you use an older version of SEQUEST).
  • if you have identified using Mascot, you may convert the .dat results file using MascotToQuiXML
  • if you have used another program, you will need to convert your data into a tab separated text file, and then parse the resulting table using CSVToQuiXML

You will also need the binStack folder, containing the binary files with the spectral information. It can be generated in different ways:

  • if you used Thermo RAW files, you can convert the spectra using RawToBinStack
  • if you used Mascot generic files (mgf), you can convert the spectra using mgfToBinStack
  • if your spectra are stored in a different file format, you should use an external converter to get a mgf file, and the use mgfToBinStack

Performing the first statistics

Introduce an initial set of statical parameters (k and variances) for the null-hypothesis model by using the change_values link. You can find a list of typical values for these parameters. Make an initial estimation of variances by pressing the var calc button. At this step you will have to tell QuiXoT which columns are going to be used as Xs and Vs (this is useful for multichannel labelling approaches such as, for instance, iTRAQ data, which contains several Xs and Vs values depending on the labels that are to be compared). Accept the newly calculated variances and perform the statistical analysis by pressing the stats button.

Inspecting spectra and peptides

A high resolution spectrum from an 18O-labelled experiment. Notice the light species (the four peaks at the left) and the heavy species (the peaks 5th to 8th). The theoretical peaks are red colour, while the original, experimental spectrum is blue colour. You can see a contaminant (or perhaps another less abundant peptide) on the right side of the spectrum (of course, only blue colour, as it does not match a theoretical spectrum in this case).

Inspect the presence of outliers at the scan and peptide levels by using the graphs button and setting Ws (or Wp) as X, vs Xs (or Xp) as Y, to check whether these data are influencing variance calculation. Sort out the data by FDRs (or FDRp) and check the rows having low FDR values (below 0.05 they are statistically considered as outliers). Typically a negligible proportion of outliers may be found (less than 1% of total); this is normal. However, if the number of outliers is too high, it may be indicative of quantification artefacts and/or problems in the labelling protocol.

Common artefacts at the scan level are rare and may be produced by

a) problems in mass calibration (spectra cannot be fitted to the theoretical mass envelope)
b) excesive noise and/or fluctuations in the detector
c) inadequate fitting parameters in the configuration files.

Common artefacts at the peptide level are, however, much more frequent when peptides are post-digestion labelled (which does not include SILAC). They include:

a) incomplete digestion of one of the samples (this may be easily checked by selecting peptide subpopulations using the st_PartialDig field and the filter tool in Vs versus Xp plots)
b) non homogeneous methionine oxidation in the samples to be compared (this may be easily checked by filtering out by the st_Meth field)
c) partially labelled peptides (with 18O-labelling this is indicated by q_f)

If any of these artefacts are encountered, outliers should be eliminated from variance calculation or statistics by using the filter tool.

A further inspection of proteins showing significant expression changes (low FDRq values) is recommendable at this step, since keratins and other external contaminants like trypsin may not be well-balanced in the two samples and introduce an artefactual variance at the protein level. Eliminate all the quantifications related to these contaminants from the statistics by applying an appropriate filter (consult applying filters to the data).

Variance calculation

Recalculate variances (var calc button), accept the resulting values and repeat the statistics (stats button). Check the null hypothesis behind the data. Press the graphs button, and select either Zs, Zp or Zq as X values and the sigmoidal normality plot option to check the null distributions at the scan, peptide or protein level, respectively. If everything is fine and the k constant and variances are properly calculated, these data (blue line) should produce a sigmoid corresponding to the normal distribution around an average of zero with a standard deviation of one (red line). Deviations of the blue curve in relation to the red curve indicate that the null hypothesis is not valid to analyse the data. There are different kind of deviations:

  • if the blue curve is less steep (less accused slope) than the red curve, it means that the variance has been underestimated.
  • If the blue curve is steeper (higher slope) than the red curve, then the variance has been overestimated.
  • if the blue curve agrees with the red curve in the middle but is higher at low values and/or lower at high values, it may be indicative of the presence of outliers.

Although the variances can be adjusted manually (using the change_values link) until experimental and theoretical curves agree, it should be noted that these deviations are usually indicative of the presence of contaminants, artefacts or outliers that disturb normality and make variance estimation inaccurate. Therefore, it is recommendable to inspect the underlying problem before manual adjustment of variances.