Aljamia

From PROTEOMICA
Revision as of 15:18, 19 February 2018 by Mtrevisan (talk | contribs)
Jump to: navigation, search

Aljamia v1.11 is a program made in the Jesus Vazquez Cardiovascular Proteomics Lab at Centro Nacional de Investigaciones Cardiovasculares, used to convert data in xml tables into a tab-separated values archive.

Aljamia needs an XML input file, and:

  • up to four strings to combine information from the xml field.
Commands:
  -i, -j, -k and -l.
Usage:
  -i[FirstScan] -j[Sequence]
It is possible to combine fields:
  -i[RAWFileName]-[FirstScan]_[Charge]

(which would deliver something like "sampleA.raw-1029-3"). Everything outside brackets will be copied unchanged. Note that the fields are case sensitive.

  • the name of the table where these fields are (command -t). Default is "peptide_match".

And delivers:

  • an output data file with three columns (id, x, v) suitable to work as input for SanXoT.

Usage:

aljamia.py -x[xml file] -i[fold field] [-j[weight field] -k[id string], ...] [OPTIONS]

Arguments:

  -h, --help          Display this help and exit.
  -a, --analysis=string
                      Use a prefix for the output files. If this is not
                      provided, then the prefix will be garnered from the data
                      file.
  -A, --allow-operations
                      Allow python-style operations for the indicated fields.
                      Example: having Scan = 900, Charge = 3, and using
                          -i"[Scan]-[Charge]" -j"[Scan]-[Charge]" -A"i"
                      Will return "887" and "900-3" i- and j-fields,
                      respectively. By default, no operations are allowed.
  -d, --allow-duplicates
                      To avoid removal of duplicated relations.
  -f, --filter=string To filter data to import. Use as in these examples:
                      
                          -f"[Charge]==2"
                          -f"[st_excluded]!=excluded", which means
                              st_excluded must NOT be equal to "excluded"
                          -f"[Charge]=2&&[st_excluded]!=excluded", which
                              means charge must be 2, and st_excluded must
                              not be equal to "excluded"
                          -f"[FirstScan]>=1000"
                          -f"[FASTAProteinDescription]~~clathrin", which means
                               FASTAProteinDescription must include "clathrin"
                          -f"[Sequence]!~C", which means
                               Sequence must NOT include "C"
                          -f"[Sequence]!=ABABABABK", which means
                               Sequence must be different than "ABABABABK"
                          -f"!([Sequence]~~C || [Sequence]~~M)", which means
                               Sequence must not (via "!") contain "C" or
                               (via "||") "M". Note you can use parentheses
                          -f"[Sequence]~~C && [Sequence]~~M", wchich means
                               Sequence must contain "C" and (via "&&") "M"
                      
                      Note that the filter is case sensitive.
                      Warning: using this argument, the filter is seen as
                      text-only, which means that [Mass] > 3 will not include
                      Mass = 10, as in ASCII order "3" comes after "1". For
                      numerical operations use -F.
                      
  -F, --filter-using-numbers
                      Same as -f, but considering a number everything that
                      looks like a number. Note that this doesn't currently
                      make operations with those numbers, it can only be used
                      for conditionals, such as [Mass] > 565.2
                      
                      Note that whenever an error occurs (for example when
                      text cannot be converted to a number, or when the
                      text-only conditions ~~ or !~ are used), the concerning
                      operation will be treated as text in all cases.
                      
  -i, --id1=string    Identifier for the first column. XML tags must be in
                      square brackets, while the rest of the text will be kept
                      unaltered. Here are some examples using tags such as
                      "FirstScan", "Charge", "Mass" or "Sequence" or "PTM":
                      
                         "ABCD" -> "ABCD" (no tags -> unchanged, to all rows)
                         "FS[FirstScan]_q=[Charge]" -> "FS2991_q=2"
                         "ABCD-[Charge]" -> "ABCD-3"
                         "ABCD_[Charge]_[Mass]" -> "ABCD_3_578.1684"
                         "[Sequence]_[PTM]" -> "SAPEREAVDEK_15.994915"
                      
                      Note that tags are case-sensitive.
                      
  -j, --id2=string    Identifier for the second column (see -i).
  -k, --id3=string    Identifier for the third column (see -i).
  -l, --id4=string    Identifier for the fourth column (see -i).
  -L, --logfile=filename
                      To use a non-default name for the log file.
  -o, --output=filename
                      To use a non-default name for the output file.
  -p, --place, --folder=foldername
                      To use a different common folder for the output files.
                      If this is not provided, the the folder used will be the
                      same as the input folder.
  -R, --initialrow=integer
                      To set the position of row with headers (default is 1).
  -t, --table=number  To select fields from a table different than QuiXML's
                      peptide_match (which corresponds to the default, 3).
  -x, --input=filename, --filename=filename
                      Input xml or txt (tsv) file.