logo

Menu:



Updates:

Apr 18, 2017:
The ArbAlign paper and code is published at JCIM.

Dec 15, 2016:
An updated version of ArbAlign and documentation is posted here.

Nov 01, 2016:
An updated version of ArbAlign is made available here. Please send in any questions, comments and suggestions.



Print
ArbAlign - Kuhn-Munkres alignment of any two isomers Upload Cartesian (*.xyz) coordinates of the two isomers to align:


Options: by default a) isomers are aligned by atom label, b) hydrogens are included , and c) all axes swaps and reflections are considered.
Exclude Hydrogens Don't Consider Axes Swaps and Reflections


This site is a web implementation of ArbAlign which uses the Kuhn-Munkres or Hungarian algorithm to optimally align two arbitrarily ordered isomers. Given two isomers A and B whose Cartesian coordinates are given in XYZ format, it will optimally align B on A to minimize the Kabsch root-mean-square deviation (RMSD) between structure A and B after

  1. a Kuhn-Munkres assignment/reordering (quick)
  2. a Kuhn-Munkres assignment/reordering factoring in axes swaps and reflections (~48x slower) 


Cartesian Coordinates:

The user needs to provide the Cartesian coordinates (XYZ) of the two structures to be aligned. One can use OpenBabel to convert molecular coordinates from virtually any format to Cartesian (XYZ). The only limitation here is that the size of the coordinate file not exceed 20kb to ensure timely alignment.


Options: Aligning by atom label, type or connectivity:

The script attempts to find the best ordering of atoms in the second isomer to align optimally with the first (reference) isomer. That is done one atom label, atom type or atom connectivity classs at a time. These atom classes will differ depending on the representation we choose. For sulfuric acid, the three reprentations used for alignment are shown below. The number of atom classes ranges from three (S, O, H) in the first case to four in the second (S-S.2, O-O.2, O-O.3, H-H) and third case (S-SOOOO, O-OS, O-OHS, H-HO).
alignment
  1. The most general case is to match the atoms of the same label. For example, sulfuric acid’s atom labels would be S, O, O, O, O, H, and H.
  2. The atom type can include other information including bonding and local environments as defined in the Tripos SYBYL Mol2 file format implemented in OpenBabel. For example, sulfuric acid’s atom types would be S.O2, O.2, O.2, O.3, O.3, H, and H, where S.O2, O.2 and O.3 are defined as sulphone sulfur, sp2 oxygen and sp3 oxygen, respectively.
  3. The atoms can also include connectivity information based on the multilevel neighborhoods of atoms (MNA) file format implemented in OpenBabel. In the current case, we only consider the connectivity to the nearest neighbor (level 1). Accordingly, sulfuric acid’s connectivity types are S-SOOOO, O-OS, O-OS, O-OHS, O-OHS, H-HO, and H-HO.
For stereoisomers (isomer with atoms bonded the same way), aligning by any of the three classes should yield the same same RMSD, but aligning by atoms type or connectivity should give a more meaningful alignment. However, for constitutional isomers which would have atoms bonded differently, alignment by atom type or atom connectivity would not work since the atom type and connectivity classes for the two isomers will likely be different. In that case, one would have to resort to alignment by atom label.
One can select which alignment option to use. By default, atoms are aligned by label.

Options: Including or Excluding Hydrogens:

In some cases such as those involving large biological isomers, aligning the heavy atoms and ignoring hydrogens is acceptable. It will also result in significant time savings.
One can select the 'Exclude Hydrogens' radio button to align the heavy atoms only. By default, all atoms are aligned.

Options: Considering Axes Swaps and Reflections:

When converting two isomers' axes from arbitrary (x,y,z) to in their principal coordinate frame (a,b,c), two things can happen.
  1. Because the definition of positive and negative axes directions are somewhat arbitrary, performing reflections may be necessary for optimal alignment.
  2. Because the order of a, b, and c axes can be somewhat arbitrary, axes swaps may be necessary to obtain the best alignment.
To address these two issues, one can perform swaps and reflections:
  1. Swaps correspond to the switching of (x,y,z) axis to ([x,y,z], [x,y,z], [x,y,z]). There are six such possible swaps.
  2. Reflections take (x,y,z) coordinates to (±x, ±y, ±z). There are eight possible reflections.

swap-reflect
There are forty-eight combinations of swaps and reflections to consider. Some of these combinations are redundant, but we perform all of them for the sake of simplicity.
By default, all swaps and reflections are performed. One can select the 'Ignore Swaps and Reflections' radio button otherwise.

  Berhane Temelso, Joel M. Mabey, Toshiro Kubota, Nana Appiah-padi, George C. Shields.  J. Chem. Info. Model.. 2017, 57 (5), 1045–1054