Difference between revisions of "Aljamia"
(Created page with "Aljamia v1.10 is a program made in the Jesus Vazquez Cardiovascular Proteomics Lab at Centro Nacional de Investigaciones Cardiovasculares, used to convert data in xml tables i...") |
(No difference)
|
Revision as of 15:43, 3 October 2017
Aljamia v1.10 is a program made in the Jesus Vazquez Cardiovascular Proteomics Lab at Centro Nacional de Investigaciones Cardiovasculares, used to convert data in xml tables into a tab-separated values archive.
Aljamia needs an XML input file, and:
- up to four strings to combine information from the xml field.
- Commands:
-i, -j, -k and -l.
- Usage:
-i[FirstScan] -j[Sequence]
- It is possible to combine fields:
-i[RAWFileName]-[FirstScan]_[Charge]
(which would deliver something like "sampleA.raw-1029-3"). Everything outside brackets will be copied unchanged. Note that the fields are case sensitive.
- the name of the table where these fields are (command -t). Default is "peptide_match".
And delivers:
- an output data file with three columns (id, x, v) suitable to work as input for SanXoT.
Usage:
aljamia.py -x[xml file] -i[fold field] [-j[weight field] -k[id string], ...] [OPTIONS]
Arguments:
-h, --help Display this help and exit. -a, --analysis=string Use a prefix for the output files. If this is not provided, then the prefix will be garnered from the data file. -A, --allow-operations Allow python-style operations for the indicated fields. Example: having Scan = 900, Charge = 3, and using -i"[Scan]-[Charge]" -j"[Scan]-[Charge]" -A"i" Will return "887" and "900-3" i- and j-fields, respectively. By default, no operations are allowed. -d, --allow-duplicates To avoid removal of duplicated relations. -f, --filter=string To filter data to import. Use as in these examples: -f"[Charge]==2" -f"[st_excluded]!=excluded", which means st_excluded must NOT be equal to "excluded" -f"[Charge]=2&&[st_excluded]!=excluded", which means charge must be 2, and st_excluded must not be equal to "excluded" -f"[FirstScan]>=1000" -f"[FASTAProteinDescription]~~clathrin", which means FASTAProteinDescription must include "clathrin" -f"[Sequence]!~C", which means Sequence must NOT include "C" -f"[Sequence]!=ABABABABK", which means Sequence must be different than "ABABABABK" -f"!([Sequence]~~C || [Sequence]~~M)", which means Sequence must not (via "!") contain "C" or (via "||") "M". Note you can use parentheses -f"[Sequence]~~C && [Sequence]~~M", wchich means Sequence must contain "C" and (via "&&") "M" Note that the filter is case sensitive. In forthcoming versions filters will be available for numerical operations, but currently the filter doesn't work with conditionals such as [Mass] > 565.2, only for (in)equalities such as [Mass] == 565.2 -i, --id1=string Identifier for the first column. XML tags must be in square brackets, while the rest of the text will be kept unaltered. Here are some examples using tags such as "FirstScan", "Charge", "Mass" or "Sequence" or "PTM": "ABCD" -> "ABCD" (no tags -> unchanged, to all rows) "FS[FirstScan]_q=[Charge]" -> "FS2991_q=2" "ABCD-[Charge]" -> "ABCD-3" "ABCD_[Charge]_[Mass]" -> "ABCD_3_578.1684" "[Sequence]_[PTM]" -> "SAPEREAVDEK_15.994915" Note that tags are case-sensitive. -j, --id2=string Identifier for the second column (see -i). -k, --id3=string Identifier for the third column (see -i). -l, --id4=string Identifier for the fourth column (see -i). -L, --logfile=filename To use a non-default name for the log file. -o, --output=filename To use a non-default name for the output file. -p, --place, --folder=foldername To use a different common folder for the output files. If this is not provided, the the folder used will be the same as the input folder. -R, --initialrow=integer To set the position of row with headers (default is 1). -t, --table=number To select fields from a table different than QuiXML's peptide_match (which corresponds to the default, 3). -x, --input=filename, --filename=filename Input xml or txt (tsv) file.