Difference between revisions of "Unit tests for SanXoT"

From PROTEOMICA
Jump to: navigation, search
(Unit test for Unix/Linux operating systems)
 
(29 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
* '''[[Coordinometer]]''' (calculator of the degree of coordination)
 
* '''[[Coordinometer]]''' (calculator of the degree of coordination)
  
For detailed explanations, check '''[[notes about the Unit test for SanXoT]]'''.
+
For detailed explanations, check: '''[[exploring SanXoT features]]'''.
 +
 
 +
==Unit test for Windows==
 +
 
 +
Links to SanXoT windows standalone executables are provided in this unit test. Additionally, download links to the source code and up-to-date versions of SanXoT are available at: '''[[SanXoT software package]]'''.
 +
 
 +
To facilitate the user experience, the Windows executables are ready to be used with no need to install anything (not even Python!).
 +
 
 +
==Unit test for Unix/Linux operating systems==
 +
 
 +
Although SanXoT has been mainly tested and used in Windows, SanXoT has been developed in Python. As a multi-platform language, these unit tests can be performed also in Unix/Linux operating systems. To do so, you just need to:
 +
 
 +
# Install Python 2.7.x,
 +
# Install [https://en.wikipedia.org/wiki/NumPy NumPy] and [https://en.wikipedia.org/wiki/SciPy SciPy],
 +
# Install [https://en.wikipedia.org/wiki/Graphviz Graphviz] (used for interpreting the file *.gv in [https://en.wikipedia.org/wiki/DOT_(graph_description_language) DOT language] generating the '''[[Sanson]]''' clustering graphs in Test 3),
 +
# Replace the *.bat files within the unit tests by the files contained here: '''[[File:SanXoT commands for Unix shell.zip]]'''.
 +
# Use the source code from GitHub ('''[https://github.com/CNIC-Proteomics/SanXoT https://github.com/CNIC-Proteomics/SanXoT]'''), rather than the standalone exes provided below.
 +
 
 +
Step #2 can be done by using the following pip commands:
 +
 
 +
pip install numpy
 +
pip install matplotlib
 +
pip install scipy
  
 
==Test 1: The fundamental workflow==
 
==Test 1: The fundamental workflow==
  
The fundamental workflow consists in the steps to quantify proteins using PSMs and quantitative information, following the steps:
+
The '''[[fundamental workflow]]''' consists of the steps to quantify proteins using PSMs and quantitative information, based on the WSPP (''Weighted Spectrum, Peptide and Protein'') statistical model, as described<ref>Navarro, P., Trevisan-Herraz, M., Bonzon-Kulichenko, E., Núñez, E., Martínez-Acedo, P., Pérez-Hernández, D., Jorge, I., Mesa, R., Calvo, E., Carrascal, M., Hernáez, M.L., García, F., Bárcena, J.A., Ashman, K., Abian, J., Gil, C., Redondo, J.M. and Vázquez, J. (2014) '''[https://pubs.acs.org/doi/abs/10.1021/pr4006958 General statistical framework for quantitative proteomics by stable isotope labeling]'''. ''Journal of proteome research'', 13, 1234-1247.</ref>., using the following the steps:
  
1) calibration of spectra
+
* calibration of spectra
2) integrating from spectra to peptides
+
* integrating from spectra to peptides
3) integrating from peptides to proteins
+
* integrating from peptides to proteins
4) quantifying proteins
+
* quantifying proteins
  
 
This test makes use of four programs: '''[[Aljamia]]''', '''[[Klibrate]]''', '''[[SanXoT]]''', and '''[[SanXoTSieve]]'''.
 
This test makes use of four programs: '''[[Aljamia]]''', '''[[Klibrate]]''', '''[[SanXoT]]''', and '''[[SanXoTSieve]]'''.
Line 26: Line 48:
 
To run this unit test, follow these steps:
 
To run this unit test, follow these steps:
  
1) download the windows executables, '''[[SanXoT.zip]]'''. Unzip them in a folder specific for the program.
+
1) download the windows executables, '''[ftp://ftp.cnic.es/ftpsvc/pub/SanXoT.zip SanXoT.zip]'''. Unzip the whole content in a folder specific for the program.
2) download the files for the unit test, '''[[SanXoT_test1.zip]]'''. Unzip to a working folder. You should have four text files:
+
 
::* commands_test1.bat, a windows batch file with command lines to run this sample analysis.
+
2) download the files for the unit test, '''[ftp://ftp.cnic.es/ftpsvc/pub/SanXoT_test1.zip SanXoT_test1.zip]'''. Unzip to a working folder. You should have two folders, one with the input for this test, and another with the expected results. In the former, you will find two text files:
::* startingFile.xls, a tab-separated-values text file with identifications and quantitative data from a proteomics experiment.
+
 
3) Get a) the path of your working folder (where you have unzipped the startingFile.xls), b) the path of the folder where you have unzipped the windows executables, and c) the unit of the latter (C:, D:, etc); modify the following lines in the command_test1.bat file accordingly:
+
::* '''''commands_test1.bat''''', a windows batch file with command lines to run this sample analysis.
 +
::* '''''170415_Marga_GBS_iTRAQ_PSMs.txt''''', a tab-separated-values text file with identifications and quantitative data from a proteomics experiment<ref>Mateos-Hernández, L., et al. (2016) '''[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5342688/ Quantitative proteomics reveals Piccolo as a candidate serological correlate of recovery from Guillain-Barré syndrome]'''. ''Oncotarget'', 7(46): 74582–74591.</ref>.
 +
 
 +
3) get a) the path of your working folder (where you have unzipped the file '''''170415_Marga_GBS_iTRAQ_PSMs.txt'''''), b) the path of the folder where you have unzipped the windows executables, and c) the unit of the latter (C:, D:, etc); modify the following lines in the command_test1.bat file accordingly:
  
 
  set unit=D:
 
  set unit=D:
Line 36: Line 61:
 
  set workingFolder="D:\SanXoT_test1"
 
  set workingFolder="D:\SanXoT_test1"
  
4) execute the bat file, copy and paste the whole commands_test1.bat text into a [https://en.wikipedia.org/wiki/Cmd.exe command prompt window] (you can also just double click the bat file; however, if an error arises, the CMD window created will close immediately, so the text of the error will not be available).
+
4) execute the '''''commands_test1.bat''''' file, copy and paste (or drag and drop) the whole commands_test1.bat text into a [https://en.wikipedia.org/wiki/Cmd.exe command prompt window] (you can also just double click the bat file; however, if an error arises, the CMD window created will close immediately, so the text of the error will not be available).
 +
 
 
5) wait until it finishes. It should take 30-60 seconds (for a regular PC with 64-bit Windows 10, 3.4 GHz, 32 GB RAM).
 
5) wait until it finishes. It should take 30-60 seconds (for a regular PC with 64-bit Windows 10, 3.4 GHz, 32 GB RAM).
6) compare your results with the data in the results folder of SanXoT_test1_results.zip (for example using a comparison software such as [https://en.wikipedia.org/wiki/Beyond_Compare Beyond Compare]). Only the files ending in *_log.txt and *_infoFile.txt should have differences (and only due to the different path for files, and the timestamp, all other features being identical, such as variances, Levenberg-Marquardt steps, and options).
+
 
 +
6) compare your results with the data in folder SanXoT_test1_results, at SanXoT_test1.zip (for example using a comparison software such as [https://en.wikipedia.org/wiki/Beyond_Compare Beyond Compare]). Only the files ending in *_log.txt and *_infoFile.txt should have differences (and only due to the different path for files, and the timestamp, all other features being identical, such as variances, Levenberg-Marquardt steps, and options).
  
 
==Test 2: Experiment merging==
 
==Test 2: Experiment merging==
Line 44: Line 71:
 
Protein quantifications from different technical or biological replicates can be merged into experiment-independent quantifications, giving the biological and technical variance. In this test we
 
Protein quantifications from different technical or biological replicates can be merged into experiment-independent quantifications, giving the biological and technical variance. In this test we
  
1) use the data from Test 1
+
* use the resulting data from Test 1
2) perform the fundamental workflow for another parallel experiment
+
* perform the fundamental workflow for another parallel experiment
3) we merge both experiments
+
* we merge both experiments
  
 
This test makes use of four programs: '''[[Klibrate]]''', '''[[SanXoT]]''', '''[[SanXoTSieve]]''', and '''[[Cardenio]]'''.
 
This test makes use of four programs: '''[[Klibrate]]''', '''[[SanXoT]]''', '''[[SanXoTSieve]]''', and '''[[Cardenio]]'''.
Line 52: Line 79:
 
To run this unit test, follow these steps:
 
To run this unit test, follow these steps:
  
1) use the windows executables unzipped in step 1 of Test 1.
+
1) use the windows executables unzipped in step 1 of Test 1 ('''[ftp://ftp.cnic.es/ftpsvc/pub/SanXoT.zip SanXoT.zip]''').
2) download the files for the unit test, '''[[SanXoT_test2.zip]]'''. This contains all the files generated in Test 1, with the addition of the tag file (for '''[[Cardenio]]''').
+
 
3) remember to change the following lines in the command_test2.bat file, as in step 3 for Test 1:
+
2) download the files for the unit test, '''[ftp://ftp.cnic.es/ftpsvc/pub/SanXoT_test2.zip SanXoT_test2.zip]'''. This contains:
 +
 
 +
::* all the files generated in Test 1
 +
::* the '''''tagFile.txt''''' (for '''[[Cardenio]]''')
 +
::* the '''''commands_test2.bat'''''.
 +
 
 +
3) remember to change the following lines in the '''''command_test2.bat''''' file, as in step 3 for Test 1:
  
 
  set unit=D:
 
  set unit=D:
 
  set programFolder="D:\SSP"
 
  set programFolder="D:\SSP"
  set workingFolder="D:\unitTest"
+
  set workingFolder="D:\SanXoT_test2"
 +
 
 +
4) execute the '''''commands_test2.bat''''' file (do the same as in step 4 in Test 1).
  
4) execute the bat file (same as step 4 in Test 1).
 
 
5) wait until it finishes. As in Test 1, this test should take 30-60 seconds (for a regular PC with 64-bit Windows 10, 3.4 GHz, 32 GB RAM).
 
5) wait until it finishes. As in Test 1, this test should take 30-60 seconds (for a regular PC with 64-bit Windows 10, 3.4 GHz, 32 GB RAM).
6) As in Test 1, the only files that are not identical should be the merge_logFile.txt from Cardenio, and the *_infoFiles.txt from SanXoT, SanXoTSieve and Klibrate.
+
 
 +
6) as in Test 1, the only files that are not identical should be the merge_logFile.txt from Cardenio, and the *_infoFile.txt from SanXoT, SanXoTSieve and Klibrate.
  
 
==Test 3: Systems biology==
 
==Test 3: Systems biology==
 +
 +
The '''[[SanXoT Sowftware Package]]''' can be used to perform systems biology analises using the SBT (''Systems Biology Triangle'') as described.<ref>García-Marqués, F., Trevisan-Herraz, M., Martínez-Martínez, S., Camafeita, E., Jorge, I., Lopez, J.A., Méndez-Barbero, N., Méndez-Ferrer, S., del Pozo, M.A., Ibáñez, B., Andrés, V., Sánchez-Madrid, F., Redondo, J.M., Bonzon-Kulichenko, E. and Vázquez, J. (2016) '''[http://www.mcponline.org/content/15/5/1740.long A novel systems-biology algorithm for the analysis of coordinated protein responses using quantitative proteomics]'''. ''Molecular & Cellular Proteomics'', 15(5):1740-60.</ref> In this test we:
 +
 +
* generate a relations file using data from the '''[https://en.wikipedia.org/wiki/DAVID DAVID bioinformatics resource]''' (you can download yourself the data from them, but for your convenience we already included it)
 +
* use the resulting data from Test 2
 +
* calculate a category-level fold-change
 +
* make the analysis of the systems biology.
 +
 +
This test makes use of seven programs: '''[[Camacho]]''', '''[[SanXoT]]''', '''[[SanXoTSieve]]''', '''[[SanXoTSqueezer]]''', '''[[Sanson]]''', '''[[SanXoTGauss]]''', and '''[[Coordinometer]]'''.
 +
 +
To run this test, follow these steps:
 +
 +
1) use the windows executables unzipped in step 1 of Test 1 ('''[ftp://ftp.cnic.es/ftpsvc/pub/SanXoT.zip SanXoT.zip]''').
 +
 +
2) download the files for the unit test, '''[ftp://ftp.cnic.es/ftpsvc/pub/SanXoT_test3.zip SanXoT_test3.zip]'''. This contains:
 +
 +
::* all the files generated in Test 1 and Test 2
 +
::* the tab-separated text file '''''SB_Homo_19dic-2017.txt''''' (a set of data downloaded from DAVID)
 +
::* the '''''commands_test3.bat'''''.
 +
 +
3) remember to change the following lines in the '''''command_test3.bat''''' file, as in step 3 for Test 1 and Test 2:
 +
 +
set unit=D:
 +
set programFolder="D:\SSP"
 +
set workingFolder="D:\SanXoT_test3"
 +
 +
Additionally, if you want to get the similarity graph the category clustering algorithm ('''[[Sanson]]'''), you will need:
 +
 +
* to have installed '''[https://en.wikipedia.org/wiki/Graphviz Graphviz]''' (an open source external software, which includes the interpreter for the [https://en.wikipedia.org/wiki/DOT_(graph_description_language)] DOT language, used by Sanson to generate the graphs for the clusters)
 +
* from the program folder, modify the ''dot.ini'' file changing the following line to include the path to the ''bin''-folder of Graphviz, which could be different for each user:
 +
 +
dotlocation = C:\Program Files (x86)\Graphviz2.36\bin
 +
 +
4) execute the '''''commands_test3.bat''''' file (do the same as in step 4 in Test 1).
 +
 +
5) wait until it finishes. This test should take 10-20 seconds (for a regular PC with 64-bit Windows 10, 3.4 GHz, 32 GB RAM).
 +
 +
6) as in Test 1 and Test 2, the only files that are not identical should be the *_logFile.txt from Camacho, SanXoTSqueezer, Sanson, SanXoTGauss and the Coordinometer, as well as the *_infoFile.txt from SanXoT and SanXoTSieve. Additionally, if you use different versions of Graphviz (this test used Graphviz 2.36) you might find differences in the file sanson_simGraph.pdf (which is a pdf containing the final graph with the category clusters).
 +
 +
==References==
 +
 +
<references/>
 +
 +
[[Category:SanXoT software package]]
 +
[[Category:unit tests]]

Latest revision as of 16:42, 17 August 2018

We present here three unit tests for windows that can be performed using the following programs in the SanXoT software package:

For detailed explanations, check: exploring SanXoT features.

Unit test for Windows

Links to SanXoT windows standalone executables are provided in this unit test. Additionally, download links to the source code and up-to-date versions of SanXoT are available at: SanXoT software package.

To facilitate the user experience, the Windows executables are ready to be used with no need to install anything (not even Python!).

Unit test for Unix/Linux operating systems

Although SanXoT has been mainly tested and used in Windows, SanXoT has been developed in Python. As a multi-platform language, these unit tests can be performed also in Unix/Linux operating systems. To do so, you just need to:

  1. Install Python 2.7.x,
  2. Install NumPy and SciPy,
  3. Install Graphviz (used for interpreting the file *.gv in DOT language generating the Sanson clustering graphs in Test 3),
  4. Replace the *.bat files within the unit tests by the files contained here: File:SanXoT commands for Unix shell.zip.
  5. Use the source code from GitHub (https://github.com/CNIC-Proteomics/SanXoT), rather than the standalone exes provided below.

Step #2 can be done by using the following pip commands:

pip install numpy
pip install matplotlib
pip install scipy

Test 1: The fundamental workflow

The fundamental workflow consists of the steps to quantify proteins using PSMs and quantitative information, based on the WSPP (Weighted Spectrum, Peptide and Protein) statistical model, as described[1]., using the following the steps:

  • calibration of spectra
  • integrating from spectra to peptides
  • integrating from peptides to proteins
  • quantifying proteins

This test makes use of four programs: Aljamia, Klibrate, SanXoT, and SanXoTSieve.

To run this unit test, follow these steps:

1) download the windows executables, SanXoT.zip. Unzip the whole content in a folder specific for the program.

2) download the files for the unit test, SanXoT_test1.zip. Unzip to a working folder. You should have two folders, one with the input for this test, and another with the expected results. In the former, you will find two text files:

  • commands_test1.bat, a windows batch file with command lines to run this sample analysis.
  • 170415_Marga_GBS_iTRAQ_PSMs.txt, a tab-separated-values text file with identifications and quantitative data from a proteomics experiment[2].

3) get a) the path of your working folder (where you have unzipped the file 170415_Marga_GBS_iTRAQ_PSMs.txt), b) the path of the folder where you have unzipped the windows executables, and c) the unit of the latter (C:, D:, etc); modify the following lines in the command_test1.bat file accordingly:

set unit=D:
set programFolder="D:\SSP"
set workingFolder="D:\SanXoT_test1"

4) execute the commands_test1.bat file, copy and paste (or drag and drop) the whole commands_test1.bat text into a command prompt window (you can also just double click the bat file; however, if an error arises, the CMD window created will close immediately, so the text of the error will not be available).

5) wait until it finishes. It should take 30-60 seconds (for a regular PC with 64-bit Windows 10, 3.4 GHz, 32 GB RAM).

6) compare your results with the data in folder SanXoT_test1_results, at SanXoT_test1.zip (for example using a comparison software such as Beyond Compare). Only the files ending in *_log.txt and *_infoFile.txt should have differences (and only due to the different path for files, and the timestamp, all other features being identical, such as variances, Levenberg-Marquardt steps, and options).

Test 2: Experiment merging

Protein quantifications from different technical or biological replicates can be merged into experiment-independent quantifications, giving the biological and technical variance. In this test we

  • use the resulting data from Test 1
  • perform the fundamental workflow for another parallel experiment
  • we merge both experiments

This test makes use of four programs: Klibrate, SanXoT, SanXoTSieve, and Cardenio.

To run this unit test, follow these steps:

1) use the windows executables unzipped in step 1 of Test 1 (SanXoT.zip).

2) download the files for the unit test, SanXoT_test2.zip. This contains:

  • all the files generated in Test 1
  • the tagFile.txt (for Cardenio)
  • the commands_test2.bat.

3) remember to change the following lines in the command_test2.bat file, as in step 3 for Test 1:

set unit=D:
set programFolder="D:\SSP"
set workingFolder="D:\SanXoT_test2"

4) execute the commands_test2.bat file (do the same as in step 4 in Test 1).

5) wait until it finishes. As in Test 1, this test should take 30-60 seconds (for a regular PC with 64-bit Windows 10, 3.4 GHz, 32 GB RAM).

6) as in Test 1, the only files that are not identical should be the merge_logFile.txt from Cardenio, and the *_infoFile.txt from SanXoT, SanXoTSieve and Klibrate.

Test 3: Systems biology

The SanXoT Sowftware Package can be used to perform systems biology analises using the SBT (Systems Biology Triangle) as described.[3] In this test we:

  • generate a relations file using data from the DAVID bioinformatics resource (you can download yourself the data from them, but for your convenience we already included it)
  • use the resulting data from Test 2
  • calculate a category-level fold-change
  • make the analysis of the systems biology.

This test makes use of seven programs: Camacho, SanXoT, SanXoTSieve, SanXoTSqueezer, Sanson, SanXoTGauss, and Coordinometer.

To run this test, follow these steps:

1) use the windows executables unzipped in step 1 of Test 1 (SanXoT.zip).

2) download the files for the unit test, SanXoT_test3.zip. This contains:

  • all the files generated in Test 1 and Test 2
  • the tab-separated text file SB_Homo_19dic-2017.txt (a set of data downloaded from DAVID)
  • the commands_test3.bat.

3) remember to change the following lines in the command_test3.bat file, as in step 3 for Test 1 and Test 2:

set unit=D:
set programFolder="D:\SSP"
set workingFolder="D:\SanXoT_test3"

Additionally, if you want to get the similarity graph the category clustering algorithm (Sanson), you will need:

  • to have installed Graphviz (an open source external software, which includes the interpreter for the [1] DOT language, used by Sanson to generate the graphs for the clusters)
  • from the program folder, modify the dot.ini file changing the following line to include the path to the bin-folder of Graphviz, which could be different for each user:
dotlocation = C:\Program Files (x86)\Graphviz2.36\bin

4) execute the commands_test3.bat file (do the same as in step 4 in Test 1).

5) wait until it finishes. This test should take 10-20 seconds (for a regular PC with 64-bit Windows 10, 3.4 GHz, 32 GB RAM).

6) as in Test 1 and Test 2, the only files that are not identical should be the *_logFile.txt from Camacho, SanXoTSqueezer, Sanson, SanXoTGauss and the Coordinometer, as well as the *_infoFile.txt from SanXoT and SanXoTSieve. Additionally, if you use different versions of Graphviz (this test used Graphviz 2.36) you might find differences in the file sanson_simGraph.pdf (which is a pdf containing the final graph with the category clusters).

References

  1. Navarro, P., Trevisan-Herraz, M., Bonzon-Kulichenko, E., Núñez, E., Martínez-Acedo, P., Pérez-Hernández, D., Jorge, I., Mesa, R., Calvo, E., Carrascal, M., Hernáez, M.L., García, F., Bárcena, J.A., Ashman, K., Abian, J., Gil, C., Redondo, J.M. and Vázquez, J. (2014) General statistical framework for quantitative proteomics by stable isotope labeling. Journal of proteome research, 13, 1234-1247.
  2. Mateos-Hernández, L., et al. (2016) Quantitative proteomics reveals Piccolo as a candidate serological correlate of recovery from Guillain-Barré syndrome. Oncotarget, 7(46): 74582–74591.
  3. García-Marqués, F., Trevisan-Herraz, M., Martínez-Martínez, S., Camafeita, E., Jorge, I., Lopez, J.A., Méndez-Barbero, N., Méndez-Ferrer, S., del Pozo, M.A., Ibáñez, B., Andrés, V., Sánchez-Madrid, F., Redondo, J.M., Bonzon-Kulichenko, E. and Vázquez, J. (2016) A novel systems-biology algorithm for the analysis of coordinated protein responses using quantitative proteomics. Molecular & Cellular Proteomics, 15(5):1740-60.