logo

Menu:



Updates:

Apr 18, 2017:
The ArbAlign paper and code is published at JCIM.

Dec 15, 2016:
An updated version of ArbAlign and documentation is posted here.

Nov 01, 2016:
An updated version of ArbAlign is made available here. Please send in any questions, comments and suggestions.



Print
ArbAlign - Usage Web Server
The web server needs
  1. The Cartesian coordinates (*.xyz) of the molecules to align. If the coordinates are in a different format, one can use OpenBabel to convert them to Cartesian coordinates.
  2. A selection of whether hydrogens are included in the alignment process or not. For larger molecules, ignoring the hydrogens will yield a faster alignment. For many molecules, the heavy-atom RMSD is much more important than the all-atom RMSD anyway. By default, all-atom alignments are performed.
  3. A selection of whether the Kuhn-Munkres algorithm is employed on the initial coordinate system, or all possible axes and reflections thereof. There are six possible axes swaps and eight reflections, yielding a total of 48 swap+reflections worth considering. Therefore, considering all these possibilities will take 48x more time than the initial coordinate system alone. The default is to consider all 48 possibilities and report the one that yields the lowest RMSD.


Commandline Tool

This driver script called ArbAlign-driver.py uses the Kuhn-Munkres or Hungarian algorithm to optimally align two arbitrarily ordered isomers. Given two isomers A and B whose Cartesian coordinates are given in XYZ format, it will optimally align B on A to minimize the Kabsch root-mean-square deviation (RMSD) between structure A and B.

Here is some usage information:
Usage: ArbAlign-driver.py [-b/--by {l, t, c}] [-n/--noHydrogens] [-s/--simple] A.xyz B.xyz

-b {l,t,c}, --by {l,t,c}
  Match atoms by l-label, SYBYL t-type, or NMA connectivity (-c).
  The default is by atom label (-l)
-s, --simple
  Perform Kuhn-Munkres assignment reordering without axes swaps and reflections.
  The default is to perform axes swaps and reflections
-n, --noHydrogens
  Ignore hydrogens.
  The default is to include all atoms

If the pairs of structures pass a sanity test, the tool will align them optimally and provide the following information.
  1. The initial Kabsch RMSD,
  2. The Kuhn-Munkres reorderings for each atom and the corresponding RMSDs,
  3. The final Kabsch RMSD after the application of the Kuhn-Munkres algorithm, and
  4. The coordinates corresponding to the best alignment of the second structure with the first.
What is Needed to Run the Commandline Tool?

 A web server provides a convenient way to access all the functionalities of ArbAlign, but the commandline tool is a better option for many users. The following tools are provided to enable commandline use:

  1. ArbAlign-driver.py - A Python driver script the calls the necessary script to do the alignments based on the options the user selects.
  2. ArbAlign.py - A Python script the ArbAlign-driver.py calls to to the alignments.
  3. PrinCoords.py - A Python script that converts molecules from an arbitrary coordinate system to their principal coordinate system.
  4. genTypes.csh - a small shell script which converts XYZ file to SYBYL Mol2 (sy2) format and recasts the atom label to contain atom type information.
  5. genConn.csh - a small shell script which converts XYZ file to NMA (nma) format and recasts the atom label to contain atom's bonding/connectivity information.

While this tool is kept as standalone as possible in order to ensure ease of use and portability, it does require these two Python packages beyond what's included in standard python installations.

 

Python Modules
  1. Python Numpy module
  2. Python Hungarian module by Harold Cooper
    (Hungarian: Munkres' Algorithm for the Linear Assignment Problem in Pytho
    (https://github.com/Hrldcpr/Hungarian) This is a wrapper to a fast C++ implementation of the Kuhn-Munkres algorithm. The installation instructions are described clearly at https://github.com/Hrldcpr/Hungarian.
    Alternatively, one can use Brian Clapper's Munkres module or another similar module includeded in SciNumpy. This could require one to make small changes to the current script. We'll provide an version that uses SciNumpy's Munkres module at a later time.
Other Tools Needed to Align by Atom Type or Connectivity
  1. OpenBabel - We use OpenBabel to convert Cartesian coordinates (XYZ) to formats containing atmm types including connectivity and hybridization information. It is necessary to use OpenBabel to convert the Cartesian coordinates to SYBYL Mol2 (sy2) and MNA (mna) formats. 
  2. genTypes.csh - a small shell script which converts XYZ file to SYBYL Mol2 (sy2) format and recasts the atom label to contain atom type information.
  3. genConn.csh - a small shell script which converts XYZ file to NMA (nma) format and recasts the atom label to contain atom's bonding/connectivity information.
As an example, you can look at the different castings of atoms to contain bonding and connectivity information.
alignment

If you find this script useful for any publishable work, please cite the companion paper:

  Berhane Temelso, Joel M. Mabey, Toshiro Kubota, Nana Appiah-padi, George C. Shields. 
J. Chem. Info. Model.. 2017, 57 (5), 1045–1054