Unit tests for SanXoT
We present here three unit tests for windows that can be performed using the following programs in the SanXoT software package:
- Aljamia (data parser)
- Klibrate (weight calibrator)
- SanXoT (integration and variance calculation)
- SanXoTSieve (removal of outliers)
- Cardenio (merger of experiments)
- SanXoTSqueezer (category filter)
- SanXoTGauss (category cumulative gaussian graphs)
- Sanson (category clustering)
- Coordinometer (calculator of the degree of coordination)
For detailed explanations, check notes about the Unit test for SanXoT.
Contents
Test 1: The fundamental workflow
The fundamental workflow consists of the steps to quantify proteins using PSMs and quantitative information, based on the WSPP (Weighted Spectrum, Peptide and Protein) statistical model, as described[1], using the following the steps:
- calibration of spectra
- integrating from spectra to peptides
- integrating from peptides to proteins
- quantifying proteins
This test makes use of four programs: Aljamia, Klibrate, SanXoT, and SanXoTSieve.
To run this unit test, follow these steps:
1) download the windows executables, SanXoT.zip. Unzip them in a folder specific for the program.
2) download the files for the unit test, SanXoT_test1.zip. Unzip to a working folder. You should have four text files:
- commands_test1.bat, a windows batch file with command lines to run this sample analysis.
- startingFile.xls, a tab-separated-values text file with identifications and quantitative data from a proteomics experiment.
3) get a) the path of your working folder (where you have unzipped the startingFile.xls), b) the path of the folder where you have unzipped the windows executables, and c) the unit of the latter (C:, D:, etc); modify the following lines in the command_test1.bat file accordingly:
set unit=D: set programFolder="D:\SSP" set workingFolder="D:\SanXoT_test1"
4) execute the commands_test1.bat file, copy and paste the whole commands_test1.bat text into a command prompt window (you can also just double click the bat file; however, if an error arises, the CMD window created will close immediately, so the text of the error will not be available). 5) wait until it finishes. It should take 30-60 seconds (for a regular PC with 64-bit Windows 10, 3.4 GHz, 32 GB RAM).
6) compare your results with the data in the results folder of SanXoT_test1_results.zip (for example using a comparison software such as Beyond Compare). Only the files ending in *_log.txt and *_infoFile.txt should have differences (and only due to the different path for files, and the timestamp, all other features being identical, such as variances, Levenberg-Marquardt steps, and options).
Test 2: Experiment merging
Protein quantifications from different technical or biological replicates can be merged into experiment-independent quantifications, giving the biological and technical variance. In this test we
- use the data from Test 1
- perform the fundamental workflow for another parallel experiment
- we merge both experiments
This test makes use of four programs: Klibrate, SanXoT, SanXoTSieve, and Cardenio.
To run this unit test, follow these steps:
1) use the windows executables unzipped in step 1 of Test 1.
2) download the files for the unit test, SanXoT_test2.zip. This contains all the files generated in Test 1, with the addition of the tag file (for Cardenio) and the commands_test2.bat.
3) remember to change the following lines in the command_test2.bat file, as in step 3 for Test 1:
set unit=D: set programFolder="D:\SSP" set workingFolder="D:\SanXoT_test2"
4) execute the commands_test2.bat file (do the same as in step 4 in Test 1).
5) wait until it finishes. As in Test 1, this test should take 30-60 seconds (for a regular PC with 64-bit Windows 10, 3.4 GHz, 32 GB RAM).
6) as in Test 1, the only files that are not identical should be the merge_logFile.txt from Cardenio, and the *_infoFile.txt from SanXoT, SanXoTSieve and Klibrate.
Test 3: Systems biology
The SanXoT Sowftware Package can be used to perform systems biology analises using the SBT (Systems Biology Triangle) as described.[2] In this test we:
- generate a relations file using data from the DAVID bioinformatics resource
- use the data from Test 2
- calculate a category-level fold-change
- make the analysis of the systems biology.
This test makes use of seven programs: Camacho, SanXoT, SanXoTSieve, SanXoTSqueezer, Sanson, SanXoTGauss, and Coordinometer.
To run this testo, follow these steps:
1) use the windows executables unzipped in step 1 of Test 1.
2) download the files for the unit test, SanXoT_test3.zip. This contains all the files generated in Test 1 and Test 2, with the addition of SB_Homo_19dic-2017.txt (a set of data downloaded from DAVID), fastaHeadersNoSequences.fasta (not essential, but important to provide fasta headers as protein descriptors instead of accession numbers, making the results more human readable for the biological interpretation; the whole FASTA file used for the identification can be used, but here we have removed the amino acid sequences for file size reasons), and commands_test3.bat.
3) remember to change the following lines in the command_test3.bat file, as in step 3 for Test 1 and Test 2:
set unit=D: set programFolder="D:\SSP" set workingFolder="D:\SanXoT_test3"
Additionally, if you want to get the similarity graph the category clustering algorithm (Sanson), you will need:
- to have installed Graphviz (an open source external software)
- from the program folder, modify the dot.ini file changing the following line to include the path to the bin-folder of Graphviz, which could be different for each user:
dotlocation = C:\Program Files (x86)\Graphviz2.36\bin
4) execute the commands_test3.bat file (do the same as in step 4 in Test 1).
5) wait until it finishes. This test should take 10-20 seconds (for a regular PC with 64-bit Windows 10, 3.4 GHz, 32 GB RAM).
6) as in Test 1 and Test 2, the only files that are not identical should be the *_logFile.txt from Camacho, SanXoTSqueezer, Sanson, SanXoTGauss and the Coordinometer, as well as the *_infoFile.txt from SanXoT and SanXoTSieve.
References
- ↑ Navarro, P., Trevisan-Herraz, M., Bonzon-Kulichenko, E., Nunez, E., Martinez-Acedo, P., Perez-Hernandez, D., Jorge, I., Mesa, R., Calvo, E., Carrascal, M., Hernaez, M.L., Garcia, F., Barcena, J.A., Ashman, K., Abian, J., Gil, C., Redondo, J.M. and Vazquez, J. (2014) General statistical framework for quantitative proteomics by stable isotope labeling. Journal of proteome research, 13, 1234-1247.
- ↑ Garcia-Marques, F., Trevisan-Herraz, M., Martinez-Martinez, S., Camafeita, E., Jorge, I., Lopez, J.A., Mendez-Barbero, N., Mendez-Ferrer, S., Del Pozo, M.A., Ibanez, B., Andres, V., Sanchez-Madrid, F., Redondo, J.M., Bonzon-Kulichenko, E. and Vazquez, J. (2016) A novel systems-biology algorithm for the analysis of coordinated protein responses using quantitative proteomics. Molecular & Cellular Proteomics.