Difference between revisions of "Unit tests for QuiXoT"

From PROTEOMICA
Jump to: navigation, search
m
Line 1: Line 1:
We present here four experiments you may use to check '''[[QuiXoT]]'''. Here they are mainly presented to test if it is working as expected in your machine. Once you confirm this, you can also '''[[Exploring QuiXoT features|explore other QuiXoT features]]'''.
+
We present here four experiments you may use to check '''[[QuiXoT]]'''. Here they are mainly presented to test whether the software is working as expected in your machine, but you can use them also as a practical introduction. Once you confirm this, you can also '''[[Exploring QuiXoT features|explore other QuiXoT features]]'''.
  
 
=== Test 1: 18O quantification and statistical analysis from Proteome Discoverer results ===
 
=== Test 1: 18O quantification and statistical analysis from Proteome Discoverer results ===

Revision as of 09:57, 26 April 2017

We present here four experiments you may use to check QuiXoT. Here they are mainly presented to test whether the software is working as expected in your machine, but you can use them also as a practical introduction. Once you confirm this, you can also explore other QuiXoT features.

Test 1: 18O quantification and statistical analysis from Proteome Discoverer results

This first test uses several programs of the QuiXoT software package, such as pRatio or RAWToBinStack. In case you cannot use some of them (for example because you do not have Xcalibur installed), but you want to test only QuiXoT, you can skip this test and check tests 2, 3, and 4, which do not require satellite programs.

1) Download the zip with the files from here.

2) After unzipping the files, you should have three folders: dir, inv, and raw.

  • The dir folder should contain an XML file called modifications.xml and and MSF file called 110112_VSMC_EN_OG21.msf (the latter being an SQLite file containing identifications of the target SEQUEST search from Proteome Discoverer 1.4; the decoy search).
  • The inv folder should contain an MSF file called 110112_VSMC_EN_OG21-01.msf (an SQLite file containing the corresponding decoy SEQUEST search).
  • The raw folder should contain a Thermo RAW file called 110112_VSMC_EN_OG21.RAW, containing all the spectral information saved by the spectrometre.

3) After opening pRatio, drag and drop the dir and inv folders in the Target search and Decoy search fields, respectively. Press the button Run!.

4) After finishing, seven files should be saved in the dir folder, including four tab-separated text files (with XLS ending), and XML file (the QuiXML file), and two TXT files.

5) Now we need to generate the binStack folder, extracting the information from all spectra. Using Thermo RAW files, you just need to open RAWToBinStack, and then drag and drop three different files:

  • The QuiXML generated in step #4
  • The QuiXML schema you are going to use to add the quantitative and statistical information. This is the XSD file within QuiXoT's conf folder, corresponding to the quantification method you are going to use. In this example, this is an 18O experiment using a high resolution mass spectrometre (Orbitrap), so you will need to drag and drop the file identifications_schema_18O_HR.xsd.
  • The folder where the RAW files are. In this example, is the folder raw, containing the file 110112_VSMC_EN_OG21.RAW (note that you must drag and drop the folder, not just the file).

6) Fill the remaining information:

  • Spectrum type: the type of the spectrum where the quantitative information is. In the case of 18O this information is in the Full or ZoomScan; as this is a high resolution spectrometre, we should select the Full spectrum (ZoomScan is a type of scan used in low resolution machines). Note that for other techniques, such as iTRAQ, the quantitative information is in the MSMS spectra instead.
  • The position of the spectra containing quantitative information, relative to the spectra used to identify the peptide. For 18O experiments, this should be set to previous, as for these strategies the quantitative information is in the first Full scan prior to the identification (first a Full scan is taken, and then each of the most intense peaks are selected for fragmentation and identification). Note that other strategies, such as iTRAQ, are quantified in the same spectrum where peptides are identified.
  • Usually, importing the whole spectrum leads to huge files difficult to manage, so, for 18O experiments, we recommend checking the option import only window around parental mz with 12 m/z (which should be enough to cover the 4 isotopologues of the non-labelled feature + 4 isotopologues of the labelled feature + 4 more m/z to get some context for possible artefacts). Other strategies such as iTRAQ would require checking import between these mzs (for iTRAQ 8plex, importing between 112 and 122 should be enough to cover the 113-121 range).
  • In this example, we don't need advanced options, so we leave unchecked the average spectra feature.

7) Press create binStack. For this small example, the generation of the binStack should be fast (few seconds), but for normal or large experiments this might take between few minutes to some hours, depending on the experiment.

8) The output of the RAWToBinStack is:

  • A binStack folder should appear, containing several BFR files and one index.idx file. As they are binary files, so they are not usable with text editors.
  • A *_QuiXML_bs.xml file, which is the same as the input QuiXML file, but containing indexation of spectra (it is written with a new name, instead of being overwritten, in order to prevent information loss in case the previous steps go wrong).

9) Now we can open this output file with QuiXoT: execute QuiXoT.exe, and then drag and drop the *_QuiXML_bs.xml file. Choose the strategy (for this example, 18O, HR, SEQUEST).

10) To quantify, select all the spectra (double click on the upper-left corner of the data grid, left of the headers), and click on the button quantitate. With this dataset the quantification should take less than a minute, but with other experiments this might take anything between few seconds and some hours.

11) You can save the quantified results to compare your data with the results in the unit test: change the name by hand in the QuiXML File field (for example, changing it to VSMC_QuiXML_bs_quant.xml), and click on the write XML button.

12) Now we can make the statistic analysis. We need first the variances and the calibration constant (K). We can either include previously estimated values (by clicking on change values), or calculate them from scratch.

13) To calculate them:

  • Click on the var calc button,
  • In Choose the field to be used as Xs, write q_log2Ratio (which contains the fold changes)
  • In Choose the field to be used as Vs, write Vs (which contains the weight associated to every fold-change)
  • Add K = 40 (as recommended for 18O, HR experiments, or alternatively calculate the calibration constant with an independent program)
  • In Filter, write
q_A > 0 and q_B > 0
this ensures that both, the labelled and the unlabelled peptide, have been found). More complex filters might be needed for other experiments (for example, for spectra with lots of noise, which tend to be of bad quality when the peaks are not intense, it is good filtering out the corresponding low-weight quantifications, by adding ... and Vs > 100; some recommendations for the lower Vs-threshold can be found here; as you can see, for 18O HR experiments we have not set a minimum, as for these experiments we had very little noise.
  • Press the OK button

14) You should obtain, using this example, these results:

  • sigma2s = 0.1692
  • sigma2p = 0.0029
  • sigma2q = 0
  • Press the use values button

15) Now we need to calculate the statistics. There are several options, but for the simplest case, just press load previous columns and load previous filter (the columns and filters used to calculate the variances will appear), and then press calculate statistics.

16) You will see the datagrid has been filled with the Xs, Xp and Xq (spectrum, peptide and protein fold-changes, respectively), the Ws, Wp and Wq (the corresponding statistical weights), and more information you can find here.

17) You can save this file to compare your results to the results we have got. Change by hand the filename to VSMC_QuiXML_bs_quant_stats.xml and click on write XML.

18) This unit test is finished, but you can explore more QuiXoT features at exploring QuiXoT features.


Test 2: 18O quantification statistical analysis starting from QuiXML file + binStack

In the following examples, we start with QuiXML/binStack files, so we can skip some of the initial steps.

1) Download the zip with the files from here.

2) Open QuiXoT.exe, and drag and drop the O18_HR_bs.xml file on the program window to open it (keep in mind that, if you move this file, you have to move also the binStack folder, in case you want to see or use the spectra). Choose the strategy (for this example, 18O, HR, SEQUEST).

3) To quantify, select all the spectra (double click on the upper-left corner of the data grid, left of the headers), and click on the button quantitate. With this dataset the quantification should take about five minutes, but with other experiments this might take anything between few seconds and some hours.

4) You can save the quantified results to compare your data with the results in the unit test: change the name by hand in the QuiXML File field (for example, changing it to O18_HR_bs_quant.xml), and click on the write XML button.

5) Now we will make the statistical analysis.

  • In this (simpler) experiment, the calibration constant (K) and variances have been prefilled, and some outliers have been prelabelled, so you can skip the variance calculation step (you can also calculate a new one if you prefer).
  • Hence, we can calculate directly the statistics by
  • pressing the stats button, and then press load previous columns (q_log2Ratio and Vs),
  • pressing load previous filter. It should be
st_Cterm = 0 and q_f > 0.6 and q_f < 1.1 and (q_A > 0 or q_B > 0) and Vs > 135 and label4 not like '%s_%' and label4 not like '%p_%'
We will break this filter down (find more information at DataGrid information in QuiXoT):
  • st_Cterm = 0: this is to include only peptides that are not protein C-terminal (as most C-terminal peptides are not labelled, and lead to quantification artefacts).
  • q_f > 0.6 and q_f < 1.1: to ensure the peptide has been labelled properly (with labelling efficienci between 0.6 and 1.1).
  • (q_A > 0 or q_B > 0): at least one sample has been labelled.
  • Vs > 135: a lower threshold of 135 for the spectrum weights (to discard quantifications with bad fit).
  • label4 not like '%s_%': to exclude pre-labelled spectrum-to-peptide outliers.
  • label4 not like '%p_%': to exclude pre-labelled peptide-to-protein outliers.

6) You will see the datagrid has been filled with the Xs, Xp and Xq (spectrum, peptide and protein fold-changes, respectively), the Ws, Wp and Wq (the corresponding statistical weights), and more information you can find here.

7) You can save this file to compare your results to the results we have got. Change by hand the filename to O18_HR_bs_quant_stats.xml and click on write XML.


Test 3: iTRAQ statistical analysis starting from QuiXML file + binStack

As this test is very similar to Test 2, the parts that are different are highlighted in green colour for comparison.

1) Download the zip with the files from here.

2) Open QuiXoT.exe, and drag and drop the iTRAQ_4plex_Mascot_114vs116_bs.xml file on the program window to open it (keep in mind that, if you move this file, you have to move also the binStack folder, in case you want to see or use the spectra). Choose the strategy (for this example, iTRAQ, 4plex, Mascot).

3) To quantify, select all the spectra (double click on the upper-left corner of the data grid, left of the headers), and click on the button quantitate. With this dataset the quantification should take about one minute, but with other experiments this might take anything between few seconds and some hours.

4) You can save the quantified results to compare your data with the results in the unit test: change the name by hand in the QuiXML File field (for example, changing it to iTRAQ_4plex_Mascot_114vs116_bs_quant.xml), and click on the write XML button.

5) Now we will make the statistical analysis.

  • In this (simpler) experiment, the calibration constant (K) and variances have been prefilled, and some outliers have been prelabelled, so you can skip the variance calculation step (you can also calculate a new one if you prefer).
  • Hence, we can calculate directly the statistics by
  • pressing the stats button, and then press load previous columns (q_Xs_114_116 and q_Vs_114_116); note that you can analyse other iTRAQ comparisons using the same pattern (to avoid too many columns, the reverse comparisons are omitted, but the result should be identical changing the sign of the fold changes; if you want anyway a comparison that is not present, you can edit the "identifications_schema_iTRAQ_4plex_Mascot.xsd" file in the conf folder.
  • pressing load previous filter. It should be
q_fittedMass_114 > 0 and q_fittedMass_116 > 0 and q_Vs_114_116 > 155
and label4 not like '%s_%' and label4 not like '%p_%'
We will break this filter down (find more information at DataGrid information in QuiXoT):
  • q_fittedMass_114 > 0 and q_fittedMass_116 > 0: at least one sample has been labelled.
  • q_Vs_114_116 > 155: a lower threshold of 155 for the spectrum weights (to discard quantifications with bad fit).
  • label4 not like '%s_%': to exclude pre-labelled spectrum-to-peptide outliers.
  • label4 not like '%p_%': to exclude pre-labelled peptide-to-protein outliers.

6) You will see the datagrid has been filled with the Xs, Xp and Xq (spectrum, peptide and protein fold-changes, respectively), the Ws, Wp and Wq (the corresponding statistical weights), and more information you can find here.

7) You can save this file to compare your results to the results we have got. Change by hand the filename to iTRAQ_4plex_Mascot_114vs116_bs_quant_stats.xml and click on write XML.

Test 4: SILAC statistical analysis starting from QuiXML file + binStack

As this test is very similar to Test 2, the parts that are different are highlighted in blue colour for comparison.

1) Download the zip with the files from here.

2) Open QuiXoT.exe, and drag and drop the SILAC_HR_bs.xml file on the program window to open it (keep in mind that, if you move this file, you have to move also the binStack folder, in case you want to see or use the spectra). Choose the strategy (for this example, SILAC, HR, SEQUEST).

3) To quantify, select all the spectra (double click on the upper-left corner of the data grid, left of the headers), and click on the button quantitate. With this dataset the quantification should take about 20 minutes, but with other experiments this might take anything between few seconds and some hours.

4) You can save the quantified results to compare your data with the results in the unit test: change the name by hand in the QuiXML File field (for example, changing it to SILAC_HR_bs_quant.xml), and click on the write XML button.

5) Now we will make the statistical analysis.

  • In this (simpler) experiment, the calibration constant (K) and variances have been prefilled, and some outliers have been prelabelled, so you can skip the variance calculation step (you can also calculate a new one if you prefer).
  • Hence, we can calculate directly the statistics by
  • pressing the stats button, and then press load previous columns (q_log2Ratio and Vs),
  • check the SILAC: correction based on the Arg --> Pro conversion box, to include internal statistical correction of the arginine > proline artefact common of SILAC, due to the arginine catabolism via the arginase pathway[1].
  • pressing load previous filter. It should be
q_A > 0 and q_B > 0 and not (charge = 1 and q_deltaR > 6) and not (charge = 2 and q_deltaR > 12)
and not (charge = 3 and q_deltaR > 18) and label4 not like '%s_%' and label4 not like '%p_%' and Vs > 100
We will break this filter down (find more information at DataGrid information in QuiXoT):
  • st_Cterm = 0: this is to include only peptides that are not protein C-terminal (as most C-terminal peptides are not labelled, and lead to quantification artefacts).
  • q_f > 0.6 and q_f < 1.1: to ensure the peptide has been labelled properly (with labelling efficienci between 0.6 and 1.1).
  • q_A > 0 and q_B > 0: both samples have been labelled.
  • not (charge = 1 and q_deltaR > 6): to remove from statistics mislabelled peptides with charge 1
  • not (charge = 2 and q_deltaR > 12): to remove from statistics mislabelled peptides with charge 2
  • not (charge = 3 and q_deltaR > 18): to remove from statistics mislabelled peptides with charge 3
  • Vs > 100: a lower threshold of 100 for the spectrum weights (to discard quantifications with bad fit).
  • label4 not like '%s_%': to exclude pre-labelled spectrum-to-peptide outliers.
  • label4 not like '%p_%': to exclude pre-labelled peptide-to-protein outliers.

6) You will see the datagrid has been filled with the Xs, Xp and Xq (spectrum, peptide and protein fold-changes, respectively), the Ws, Wp and Wq (the corresponding statistical weights), and more information you can find here.

7) You can save this file to compare your results to the results we have got. Change by hand the filename to SILAC_HR_bs_quant_stats.xml and click on write XML.


References

  1. Shao-En Ong, Irina Kratchmarova, and Matthias Mann, «Properties of 13C-substituted arginine in stable isotope labeling by amino acids in cell culture (SILAC)», Journal of Proteome Research , (2003) (DOI: 10.1021/pr0255708)