Difference between revisions of "Aljamia"

From PROTEOMICA
Jump to: navigation, search
(Created page with "Aljamia v1.10 is a program made in the Jesus Vazquez Cardiovascular Proteomics Lab at Centro Nacional de Investigaciones Cardiovasculares, used to convert data in xml tables i...")
(No difference)

Revision as of 15:43, 3 October 2017

Aljamia v1.10 is a program made in the Jesus Vazquez Cardiovascular Proteomics Lab at Centro Nacional de Investigaciones Cardiovasculares, used to convert data in xml tables into a tab-separated values archive.

Aljamia needs an XML input file, and:

  • up to four strings to combine information from the xml field.
Commands:
  -i, -j, -k and -l.
Usage:
  -i[FirstScan] -j[Sequence]
It is possible to combine fields:
  -i[RAWFileName]-[FirstScan]_[Charge]

(which would deliver something like "sampleA.raw-1029-3"). Everything outside brackets will be copied unchanged. Note that the fields are case sensitive.

  • the name of the table where these fields are (command -t). Default is "peptide_match".

And delivers:

  • an output data file with three columns (id, x, v) suitable to work as input for SanXoT.

Usage:

aljamia.py -x[xml file] -i[fold field] [-j[weight field] -k[id string], ...] [OPTIONS]

Arguments:

  -h, --help          Display this help and exit.
  -a, --analysis=string
                      Use a prefix for the output files. If this is not
                      provided, then the prefix will be garnered from the data
                      file.
  -A, --allow-operations
                      Allow python-style operations for the indicated fields.
                      Example: having Scan = 900, Charge = 3, and using
                          -i"[Scan]-[Charge]" -j"[Scan]-[Charge]" -A"i"
                      Will return "887" and "900-3" i- and j-fields,
                      respectively. By default, no operations are allowed.
  -d, --allow-duplicates
                      To avoid removal of duplicated relations.
  -f, --filter=string To filter data to import. Use as in these examples:
                      
                          -f"[Charge]==2"
                          -f"[st_excluded]!=excluded", which means
                              st_excluded must NOT be equal to "excluded"
                          -f"[Charge]=2&&[st_excluded]!=excluded", which
                              means charge must be 2, and st_excluded must
                              not be equal to "excluded"
                          -f"[FirstScan]>=1000"
                          -f"[FASTAProteinDescription]~~clathrin", which means
                               FASTAProteinDescription must include "clathrin"
                          -f"[Sequence]!~C", which means
                               Sequence must NOT include "C"
                          -f"[Sequence]!=ABABABABK", which means
                               Sequence must be different than "ABABABABK"
                          -f"!([Sequence]~~C || [Sequence]~~M)", which means
                               Sequence must not (via "!") contain "C" or
                               (via "||") "M". Note you can use parentheses
                          -f"[Sequence]~~C && [Sequence]~~M", wchich means
                               Sequence must contain "C" and (via "&&") "M"
                      
                      Note that the filter is case sensitive.
                      In forthcoming versions filters will be available for
                      numerical operations, but currently the filter doesn't
                      work with conditionals such as [Mass] > 565.2, only for
                      (in)equalities such as [Mass] == 565.2
                      
  -i, --id1=string    Identifier for the first column. XML tags must be in
                      square brackets, while the rest of the text will be kept
                      unaltered. Here are some examples using tags such as
                      "FirstScan", "Charge", "Mass" or "Sequence" or "PTM":
                      
                         "ABCD" -> "ABCD" (no tags -> unchanged, to all rows)
                         "FS[FirstScan]_q=[Charge]" -> "FS2991_q=2"
                         "ABCD-[Charge]" -> "ABCD-3"
                         "ABCD_[Charge]_[Mass]" -> "ABCD_3_578.1684"
                         "[Sequence]_[PTM]" -> "SAPEREAVDEK_15.994915"
                      
                      Note that tags are case-sensitive.
                      
  -j, --id2=string    Identifier for the second column (see -i).
  -k, --id3=string    Identifier for the third column (see -i).
  -l, --id4=string    Identifier for the fourth column (see -i).
  -L, --logfile=filename
                      To use a non-default name for the log file.
  -o, --output=filename
                      To use a non-default name for the output file.
  -p, --place, --folder=foldername
                      To use a different common folder for the output files.
                      If this is not provided, the the folder used will be the
                      same as the input folder.
  -R, --initialrow=integer
                      To set the position of row with headers (default is 1).
  -t, --table=number  To select fields from a table different than QuiXML's
                      peptide_match (which corresponds to the default, 3).
  -x, --input=filename, --filename=filename
                      Input xml or txt (tsv) file.