Difference between revisions of "Aljamia"
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | Aljamia v1. | + | Aljamia v1.12 is a program made in the Jesus Vazquez Cardiovascular Proteomics Lab at Centro Nacional de Investigaciones Cardiovasculares, used to convert data in xml tables into a tab-separated values archive. |
Aljamia needs an XML input file, and: | Aljamia needs an XML input file, and: | ||
Line 36: | Line 36: | ||
Will return "887" and "900-3" i- and j-fields, | Will return "887" and "900-3" i- and j-fields, | ||
respectively. By default, no operations are allowed. | respectively. By default, no operations are allowed. | ||
+ | -c, --curly-brackets | ||
+ | To use curly brackets, {}, instead of the default | ||
+ | parentheses, (), when using the filters (see options -f | ||
+ | and -F). | ||
-d, --allow-duplicates | -d, --allow-duplicates | ||
To avoid removal of duplicated relations. | To avoid removal of duplicated relations. | ||
Line 60: | Line 64: | ||
Note that the filter is case sensitive. | Note that the filter is case sensitive. | ||
+ | Warning: when using parentheses () generate conflicts | ||
+ | in the commandline, the option -c can be used to switch | ||
+ | to curly-brackets {} mode. | ||
+ | Warning: when using && or || as logical operators | ||
+ | generate conflicts in the commandline, the option -w can | ||
+ | be used to switch to word operators such as *and* or | ||
+ | *or*. | ||
Warning: using this argument, the filter is seen as | Warning: using this argument, the filter is seen as | ||
text-only, which means that [Mass] > 3 will not include | text-only, which means that [Mass] > 3 will not include | ||
Line 104: | Line 115: | ||
-t, --table=number To select fields from a table different than QuiXML's | -t, --table=number To select fields from a table different than QuiXML's | ||
peptide_match (which corresponds to the default, 3). | peptide_match (which corresponds to the default, 3). | ||
+ | -w, --word-operators | ||
+ | To use *and* and *or* (including asterisks) as logical | ||
+ | operators instead of the default && and || in filters | ||
+ | (see options -f and -F). | ||
-x, --input=filename, --filename=filename | -x, --input=filename, --filename=filename | ||
Input xml or txt (tsv) file. | Input xml or txt (tsv) file. | ||
[[Category:SanXoT software package]] | [[Category:SanXoT software package]] |
Latest revision as of 13:39, 14 March 2018
Aljamia v1.12 is a program made in the Jesus Vazquez Cardiovascular Proteomics Lab at Centro Nacional de Investigaciones Cardiovasculares, used to convert data in xml tables into a tab-separated values archive.
Aljamia needs an XML input file, and:
- up to four strings to combine information from the xml field.
- Commands:
-i, -j, -k and -l.
- Usage:
-i[FirstScan] -j[Sequence]
- It is possible to combine fields:
-i[RAWFileName]-[FirstScan]_[Charge]
(which would deliver something like "sampleA.raw-1029-3"). Everything outside brackets will be copied unchanged. Note that the fields are case sensitive.
- the name of the table where these fields are (command -t). Default is "peptide_match".
And delivers:
- an output data file with three columns (id, x, v) suitable to work as input for SanXoT.
Usage:
aljamia.py -x[xml file] -i[fold field] [-j[weight field] -k[id string], ...] [OPTIONS]
Arguments:
-h, --help Display this help and exit. -a, --analysis=string Use a prefix for the output files. If this is not provided, then the prefix will be garnered from the data file. -A, --allow-operations Allow python-style operations for the indicated fields. Example: having Scan = 900, Charge = 3, and using -i"[Scan]-[Charge]" -j"[Scan]-[Charge]" -A"i" Will return "887" and "900-3" i- and j-fields, respectively. By default, no operations are allowed. -c, --curly-brackets To use curly brackets, {}, instead of the default parentheses, (), when using the filters (see options -f and -F). -d, --allow-duplicates To avoid removal of duplicated relations. -f, --filter=string To filter data to import. Use as in these examples: -f"[Charge]==2" -f"[st_excluded]!=excluded", which means st_excluded must NOT be equal to "excluded" -f"[Charge]=2&&[st_excluded]!=excluded", which means charge must be 2, and st_excluded must not be equal to "excluded" -f"[FirstScan]>=1000" -f"[FASTAProteinDescription]~~clathrin", which means FASTAProteinDescription must include "clathrin" -f"[Sequence]!~C", which means Sequence must NOT include "C" -f"[Sequence]!=ABABABABK", which means Sequence must be different than "ABABABABK" -f"!([Sequence]~~C || [Sequence]~~M)", which means Sequence must not (via "!") contain "C" or (via "||") "M". Note you can use parentheses -f"[Sequence]~~C && [Sequence]~~M", wchich means Sequence must contain "C" and (via "&&") "M" Note that the filter is case sensitive. Warning: when using parentheses () generate conflicts in the commandline, the option -c can be used to switch to curly-brackets {} mode. Warning: when using && or || as logical operators generate conflicts in the commandline, the option -w can be used to switch to word operators such as *and* or *or*. Warning: using this argument, the filter is seen as text-only, which means that [Mass] > 3 will not include Mass = 10, as in ASCII order "3" comes after "1". For numerical operations use -F. -F, --filter-using-numbers Same as -f, but considering a number everything that looks like a number. Note that this doesn't currently make operations with those numbers, it can only be used for conditionals, such as [Mass] > 565.2 Note that whenever an error occurs (for example when text cannot be converted to a number, or when the text-only conditions ~~ or !~ are used), the concerning operation will be treated as text in all cases. -i, --id1=string Identifier for the first column. XML tags must be in square brackets, while the rest of the text will be kept unaltered. Here are some examples using tags such as "FirstScan", "Charge", "Mass" or "Sequence" or "PTM": "ABCD" -> "ABCD" (no tags -> unchanged, to all rows) "FS[FirstScan]_q=[Charge]" -> "FS2991_q=2" "ABCD-[Charge]" -> "ABCD-3" "ABCD_[Charge]_[Mass]" -> "ABCD_3_578.1684" "[Sequence]_[PTM]" -> "SAPEREAVDEK_15.994915" Note that tags are case-sensitive. -j, --id2=string Identifier for the second column (see -i). -k, --id3=string Identifier for the third column (see -i). -l, --id4=string Identifier for the fourth column (see -i). -L, --logfile=filename To use a non-default name for the log file. -o, --output=filename To use a non-default name for the output file. -p, --place, --folder=foldername To use a different common folder for the output files. If this is not provided, the the folder used will be the same as the input folder. -R, --initialrow=integer To set the position of row with headers (default is 1). -t, --table=number To select fields from a table different than QuiXML's peptide_match (which corresponds to the default, 3). -w, --word-operators To use *and* and *or* (including asterisks) as logical operators instead of the default && and || in filters (see options -f and -F). -x, --input=filename, --filename=filename Input xml or txt (tsv) file.