Difference between revisions of "Aljamia"
(version) |
|||
Line 106: | Line 106: | ||
-x, --input=filename, --filename=filename | -x, --input=filename, --filename=filename | ||
Input xml or txt (tsv) file. | Input xml or txt (tsv) file. | ||
+ | |||
+ | [[Category:SanXoT software package]] |
Revision as of 15:18, 19 February 2018
Aljamia v1.11 is a program made in the Jesus Vazquez Cardiovascular Proteomics Lab at Centro Nacional de Investigaciones Cardiovasculares, used to convert data in xml tables into a tab-separated values archive.
Aljamia needs an XML input file, and:
- up to four strings to combine information from the xml field.
- Commands:
-i, -j, -k and -l.
- Usage:
-i[FirstScan] -j[Sequence]
- It is possible to combine fields:
-i[RAWFileName]-[FirstScan]_[Charge]
(which would deliver something like "sampleA.raw-1029-3"). Everything outside brackets will be copied unchanged. Note that the fields are case sensitive.
- the name of the table where these fields are (command -t). Default is "peptide_match".
And delivers:
- an output data file with three columns (id, x, v) suitable to work as input for SanXoT.
Usage:
aljamia.py -x[xml file] -i[fold field] [-j[weight field] -k[id string], ...] [OPTIONS]
Arguments:
-h, --help Display this help and exit. -a, --analysis=string Use a prefix for the output files. If this is not provided, then the prefix will be garnered from the data file. -A, --allow-operations Allow python-style operations for the indicated fields. Example: having Scan = 900, Charge = 3, and using -i"[Scan]-[Charge]" -j"[Scan]-[Charge]" -A"i" Will return "887" and "900-3" i- and j-fields, respectively. By default, no operations are allowed. -d, --allow-duplicates To avoid removal of duplicated relations. -f, --filter=string To filter data to import. Use as in these examples: -f"[Charge]==2" -f"[st_excluded]!=excluded", which means st_excluded must NOT be equal to "excluded" -f"[Charge]=2&&[st_excluded]!=excluded", which means charge must be 2, and st_excluded must not be equal to "excluded" -f"[FirstScan]>=1000" -f"[FASTAProteinDescription]~~clathrin", which means FASTAProteinDescription must include "clathrin" -f"[Sequence]!~C", which means Sequence must NOT include "C" -f"[Sequence]!=ABABABABK", which means Sequence must be different than "ABABABABK" -f"!([Sequence]~~C || [Sequence]~~M)", which means Sequence must not (via "!") contain "C" or (via "||") "M". Note you can use parentheses -f"[Sequence]~~C && [Sequence]~~M", wchich means Sequence must contain "C" and (via "&&") "M" Note that the filter is case sensitive. Warning: using this argument, the filter is seen as text-only, which means that [Mass] > 3 will not include Mass = 10, as in ASCII order "3" comes after "1". For numerical operations use -F. -F, --filter-using-numbers Same as -f, but considering a number everything that looks like a number. Note that this doesn't currently make operations with those numbers, it can only be used for conditionals, such as [Mass] > 565.2 Note that whenever an error occurs (for example when text cannot be converted to a number, or when the text-only conditions ~~ or !~ are used), the concerning operation will be treated as text in all cases. -i, --id1=string Identifier for the first column. XML tags must be in square brackets, while the rest of the text will be kept unaltered. Here are some examples using tags such as "FirstScan", "Charge", "Mass" or "Sequence" or "PTM": "ABCD" -> "ABCD" (no tags -> unchanged, to all rows) "FS[FirstScan]_q=[Charge]" -> "FS2991_q=2" "ABCD-[Charge]" -> "ABCD-3" "ABCD_[Charge]_[Mass]" -> "ABCD_3_578.1684" "[Sequence]_[PTM]" -> "SAPEREAVDEK_15.994915" Note that tags are case-sensitive. -j, --id2=string Identifier for the second column (see -i). -k, --id3=string Identifier for the third column (see -i). -l, --id4=string Identifier for the fourth column (see -i). -L, --logfile=filename To use a non-default name for the log file. -o, --output=filename To use a non-default name for the output file. -p, --place, --folder=foldername To use a different common folder for the output files. If this is not provided, the the folder used will be the same as the input folder. -R, --initialrow=integer To set the position of row with headers (default is 1). -t, --table=number To select fields from a table different than QuiXML's peptide_match (which corresponds to the default, 3). -x, --input=filename, --filename=filename Input xml or txt (tsv) file.