Sanson

From PROTEOMICA
Jump to: navigation, search

Sanson v1.11 is a program made in the Jesus Vazquez Cardiovascular Proteomics Lab at Centro Nacional de Investigaciones Cardiovasculares, used to generate the similarity graph of a set of categories.

A similarity graph is a graph that shows the relationship between a set of categories by taking into account how many proteins they share. This is measured with a variable f such that for categories c1 and c2, we get:

  f(c1, c2) = (#proteins shared by c1 and c2) / (#proteins of c1)

for instance:

  • if c1 == c2, we get f(c1, c2) = f(c2, c1) = 1;
  • if c1 and c2 do not share any proteins, we get f(c1, c2) = f(c2, c1) = 0;
  • if c2 is contained in c1, we get f(c1, c2) <= 1, f(c2, c1) = 1, etc

If no f number is given with the parametres (-e), then the program automatically calculates the best f number, by maximising both the number of category clusters and the number categories within each cluster.

Sanson needs three input files:

  • a stats file, the outStats file from SanXoT (using the -z command)
  • a higher level list to graph (using the -c command)
  • a relations file (using -r command)

And delivers five output files:

  • the graph in PNG format (default suffix: "_simGraph.png")
  • the DOT language text file used to generate the graph (default suffix: "_simGraph.gv")
  • a table showing the clusters generated (default suffix: "_outClusters")
  • the similarity matrix used to generate the graph (default suffix: "_outSimilarities")
  • a log file (default suffix: "_logFile")

Usage:

sanson.py -z[stats file] -r[relations file] -c[higher level list file] [OPTIONS]

Arguments:

  -h, --help          Display this help and exit.
  -a, --analysis=string
                      Use a prefix for the output files. If this is not
                      provided, then the prefix will be garnered from the
                      stats file.
  -b, --nosubstats    To avoid colouring the boxes according to the proteins
                      that are in the concerning category (in this case, the
                      box is coloured using the Zij of the category, when this
                      information is available in the higher level list to
                      graph, see -c command).
  -c, --list=filename The text file containing the higher level elements whose
                      categories we want to relate. If the first element is
                      not taken, it might help saving the file with ANSI
                      format. If a header is used, then it must be in the form
                      "id>n>Z>FDR" or "id>Z>n" (where ">" means "tab").
  -d, --dotfile=filename
                      To use a non-default name for the text file in DOT
                      language, which is used to generate the graph.
  -e, --similarity=float
                      To override the calculation of the optimal f number (see
                      above for more details).
  -g, --graphformat=string
                      File format for the similarity graph (default is "png").
  -G, --outgraph=filename
                      To use a non-default name for the graph file.
  -l, --graphlimits=integer
                      To set the +- limits of the most intense red/green
                      colours in the graph (default is 6).
  -L, --logfile=filename
                      To use a non-default name for the log file.
  -m, --simfile=string
                      To use a non-default name for the similarity matrix
                      file.
  -N, --altmax=integer
                      Maximum number of lower level elements that the alt text
                      of the higher level node will show per side. For
                      instance, for N = 3, alt text will show all the elements
                      up to six; beyond this, only the first and last three
                      will be shown. (Default is N = 5.) (Note that this will
                      have effect if the SVG format is used.)
  -p, --place, --folder=foldername
                      To use a different common folder for the output files.
                      If this is not provided, the the folder used will be the
                      same as the stats file folder.
  -r, --relfile, --relationsfile=filename
                      Relations file, with identificators of the higher level
                      in the first column, and identificators of the lower
                      level in the second column.
  -s, --outcluster=filename
                      To use a non-default name for the file containg the
                      list of clusters.
  -z, --outstats=filename
                      The outStats file from a SanXoT integration.
                      
  --graphdpi=float    To define the resolution (in DPI, dots per inch) for the
                      output graph. (Default is 96.0)
  --graphratio=float  To define the height/width ratio in the output graph.
                      (Default is 0.0, which means the ratio is not adjusted,
                      si the ratio is automatically set by graphviz)
  --minfontsize=float To define the minimum font size in nodes. If larger than
                      maxfontsize, the maxfontsize will be used (so
                      minfontsize = maxfontsize). (Default is 10.0)
  --maxfontsize=float To define the maximum font size in nodes.
                      (Default is 70.0)
  --nonparetoopacity=float
                      To "downlight" nodes not part of the Pareto front.
                      (default = 0.5, 0.0 means node color = background,
                      1.0 means no difference between Pareto front and
                      non-Pareto font)
                      
  --selectednodecolor=#rrggbb, --selectednodecolour=#rrggbb
  --defaultnodecolor=#rrggbb, --defaultnodecolour=#rrggbb
  --defaultnodetextcolor=#rrggbb, --defaultnodetextcolour=#rrggbb
  --errornodecolor=#rrggbb, --errornodecolour=#rrggbb
  --middlecolor=#rrggbb, --middlecolour=#rrggbb
  --mincolor=#rrggbb, --mincolour=#rrggbb
  --maxcolor=#rrggbb, --maxcolour=#rrggbb