| Search | Source | Introduction | History | Background | Implementation - Distance | Implementation - Searching | Comparison of Search Methods | Tables | References |

SAUC

Search of Alternate Unit Cells
Copyright Keith J. McGill 2013, 2014
Keith J McGill, Mojgan Asadi, Maria T. Karakasheva, Lawrence C. Andrews, Herbert J. Bernstein
Rev 0.8, 24 Apr 2014, Mojgan Asadi, Herbert J. Bernstein
Rev 0.9, 21 Jul 2015, Herbert J. Bernstein
Rev 1.0, 22 Nov 2016, Herbert J. Bernstein
Rev 1.1, 11 Apr 2019, Herbert J. Bernstein
Work Supported In Part By NIGMS
Original Contact Email: kjmcgill7@gmail.com
Support Contact Email: yayahjb@gmail.com
A crystallographic cell is a representation of a lattice, but each lattice can be represented just as well by any of an infinite number of such unit cells. Searching for matches to an experimentally determined crystallographic unit cell in a large collection of previously determined unit cells is a useful verification step in synchrotron data collection and can be a screen for "similar" substances, but it is more useful to search for a match to the lattice represented by the experimentally determined cell. For identification of substances with small cells, a unit cell match may be sufficient for unique identification. Due to experimental error and multiple choices of cells and differing choices of lattice centering representing the same lattice, simple searches based on raw cell edges and angles can miss similarities among lattices. A database of lattices using the G6 representation of the Niggli-reduced cell as the search key provides a more robust and complete search. A database using the S6 representation provides very similar results with much faster times both for database builds and searches, and is now the preferred metric,

Searching is implemented by finding the distance from the probe cell to related cells using a topological embedding of the Niggli reduction in G6, so that all cells representing similar lattices will be found. The space S6 uses Selling reduction to restrict the reduced cells considered to only cell with obtuse or 90 degree angles, avoiding much of the complexity of boundaries for Niggli reduced cells. The comparison of results with those from older cell-based search algorithms suggests significant value in the new approach.


Please read the NOTICE below before use of this web page


Lattice Centering:
Metric:
L1
L2
NCDist
V7
D7
S6
Cell Lengths and Angles:
Length of A Angle of Alpha
Length of B Angle of Beta
Length of C Angle of Gamma
Similarity: Limit Nearest or Sphere to Hits
Nearest
 
Sphere
Radius of Sphere     %     Angstroms
Sort by family  distance
Range


NOTICE

You may redistribute this program under the terms of the GPL.

Alternatively you may redistribute the functions and subroutines of this program as an API under the terms of the LGPL.

GPL NOTICES LGPL NOTICES
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of (the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Some of the software and documents included within this software package are the intellectual property of various parties, and placement in this package does not in anyway imply that any such rights have in any way been waived or diminished.

With respect to any software or documents for which a copyright exists, ALL RIGHTS ARE RESERVED TO THE OWNERS OF SUCH COPYRIGHT.

Even though the authors of the various documents and software found here have made a good faith effort to ensure that the documents are correct and that the software performs according to its documentation, and we would greatly appreciate hearing of any problems you may encounter, the programs and documents any files created by the programs are provided **AS IS** without any warrantee as to correctness, merchantability or fitness for any particular or general use.

THE RESPONSIBILITY FOR ANY ADVERSE CONSEQUENCES FROM THE USE OF PROGRAMS OR DOCUMENTS OR ANY FILE OR FILES CREATED BY USE OF THE PROGRAMS OR DOCUMENTS LIES SOLELY WITH THE USERS OF THE PROGRAMS OR DOCUMENTS OR FILE OR FILES AND NOT WITH AUTHORS OF THE PROGRAMS OR DOCUMENTS.


Access to the source of SAUC

This program and related scripts are available from github at https://github.com/yayahjb/sauc or as a zip file from https://github.com/yayahjb/sauc/archive/master.zip.
The information below is available for download from arxiv.org/abs/1307.1811.

Revison 1.0

Revision 1.0 of 22 November 2016 adds support for searching cells from the Crystallographic Open Database (COD), and has considerable speed improvements. Use of COD in place of CSD avoids license issues, but provides fewer small molecule cells.

Revision 0.9 of 11 August 2015 added support for searching cell from the Cambridge Structural Database from Cambridge Crystallographic Data Centre as a development effort in support of image identification for synchrotrons. That version was experimental. Support for specifying the sphere radius as a percentage and for specifying a limit on the number of hits are provided.

Revision 0.8 of 24 April 2014 had revised output, following the suggestions of Graeme Winter and the good example of Nearest-Cell (Ramraj et al., 2011) in giving some of the provenance of the cell and organizing long output lists into collapsed lists of families. Unlike Nearest-Cell we have chosen to organize the families on the PDB HEADER field, rather than the sequence. For clarity, many uses of the term "algorithm" have been changed to "metric". The output of the nearest cell search has been changed to include the distance and to conform to the style of the style of the Sphere search output. The NCDist metric algorithm has been updated to the most current release. -- MA & HJB

Introduction

(Andrews & Bernstein, 2012) introduced a topological embedding of the Niggli "cone" of reduced cells with the goal of calculating a meaningful distance between unit cells. In the second paper of this series, the embedding was used to determine likely Bravais lattices for a unit cell. Here we apply the embedding to searching within a database for lattices "close" to the lattice of a given probe cell.

A crystallographic cell is a representation of a lattice, but each lattice can be represented just as well by any of an infinite number of such unit cells. Searching for matches to an experimentally determined crystallographic unit cell in a large collection of previously determined unit cells is a useful verification step in synchrotron data collection and can be a screen for "similar" substances (Ramraj et al., 2011) (Mighell, 2002), but it is more useful to search for a match to the lattice represented by the experimentally determined cell, which may involve many more cells. For identification of substances with small cells, a unit cell match may be sufficient for unique identification (Mighell, 2001).

Due to experimental error and multiple cells representing the same lattice and differing choices of lattice centering, simple searches based on raw cell edges and angles can miss similarities. A database of lattices using the G6 representation of the Niggli-reduced cell as the search key provides a more robust and complete search. Searching is implemented by finding the distances from the probe cell to related cells using a topological embedding of the cone of Niggli reduced cells in G6. Comparison of results to those from older cell-based search algorithms suggests significant value in the new approach.

History

Tabulations of data for the identification of minerals dates to the 18th and 19th centuries. Data collected included interfacial angles of crystals (clearly related to unit cell parameters) and optical effects. See the historical review in (Burchard, 1998). With the discovery of x-ray diffraction, those tables were supplanted by new collections. Early compilations that included unit cell parameters arranged for material identification were "Crystal Structures" (Wyckoff, 1931), "Crystal Data Determinative Tables" (Donnay, 1943), and Handbook for Metals and Alloys (Pearson, 1958). Early computerized searches were created by JCPDS in the mid-1960's (Johnson, 2013) and the Cambridge Structural Data file and its search programs (Allen et al., 1973).

Those first searches were sensitive to the issues of differing equivalent presentations of the same lattice. The first effective algorithm for resolving that issue was (Andrews et al., 1980) using the V7 algorithm (NIH/EPA, 1980). Subsequently, other programs using the V7 algorithm have been described (see Table 1). The V7 algorithm has the advantage over simple Niggli-reduction based cell searches of being stable under experimental error. However, sensitivity to a change in an angle is reduced as that angle nears 90 degrees.

Background

An effective search method must find ways to search for related unit cells, even when they appear to be tabulated in ways that make them seem different. A trivial example is:
a = 10.0, b = 10.01, c = 20, α = 65, β = 75, γ = 90
versus
a = 10.0, b = 10.05, c = 20, α = 75, β = 65, γ = 90.

Clearly, these unit cells are almost identical, but simple tabulations might separate them. A somewhat more complex example includes the following primitive cells:
a = 3.1457, b = 3.1457, c = 3.1541, α = 60.089, β = 60.0887, γ = 60.104
versus
a = 3.1456, b = 3.1458, c = 3.1541, α = 90.089, β = 119.907, γ = 119.898.

Here the relationship is not as obvious. The embedding of (Andrews & Bernstein, 2012) can be used to show that the distance between these two cells is quite small in G6 (0.004 Ångstrom units squared in G6).

Implementation - Distance

The program SAUC is structured to allow use of several alternative metrics for searching among cells in an attempt to identify cells representing similar lattices. To simplify comparisons among results with the different metrics, all have been linearized and normalized, i.e. converted to Ångstrom units and scaled to be commensurate with the L2 norm given below:

A simple L1 or L2 norm based on
[a, b, c, α(b + c)/2, β(a + c)/2, γ(a + b)/2]
with the distance scaled by 1/√(6) in the case of the L1 norm and unscaled in case of the L2 norm. The angles as assumed to be in radians and the edges in Ångstroms. The angles were converted to Ångstroms by multiplying by the average to the relevant edge lengths.

The square root of the BGAOL Niggli cone embedding distance NCDist based on
[a2, b2, c2, 2bc cos(α), 2ac cos(β), 2ab cos(γ)].
Before taking the square root, the distances are scaled by √(6) divided by the average length of the cell edges. The overall square root linearizes the metric to Ångstrom units.

The V7 distances based on individual components linearized to Ångstrom units
[a, b, c, 1/a, 1/b, 1/c,V1/3]
and scaled by √(6/7). V is the volume.

These metrics are applied to reduced primitive cells [a, b, c, α, β, γ] and, when the reciprocal cell [a, b, c, α, β, γ] is needed for the V7 metric, that cell is also reduced.

In order to facilitate comparisons to older searches that just consider simple ranges in [a, b, c, α, β, γ], an option for such searches was also included in SAUC.

Validity of using the square root

The use of the square root on a metric preserves the triangle inequality, which is important in order to preserve the metric as a metric-space "metric". The triangle inequality states that for any triangle, the sum of the lengths of any two sides is greater than the length of the third side. In metric space terms, the metric d(x, y) of a metric space M satisfies d(x, z) ≤ d(x, y) + d(y, z), ∀x, y, z ∈ M. Suppose a function f satisfies the following conditions:
u ≥ v → f(u) ≥ f(v), ∀u, v
f(u + v) ≤ f(u) + f(v), ∀u, v
then, if d(x,y) satisfies the triangle inequality, f(d(x,y))will also satisfy the triangle inequality:
d(x, z) ≤ d(x, y) + d(y, z)
→ f(d(x, z)) ≤ f(d(x, y) + d(y, z)) ≤ f(d(x, y)) + f(d(y, z))

The square root satisfies the stated requirements. It is monotone, and
√(u + v) ≤ √(u) + √(v)
↔ u + v ≤ (√(u) + √(v))2 = u + v + 2√(uv)
which is clearly true.

Implementation - Searching

Range searching in amapped embedding needs to be done using a nearest-neighbor algorithm (or "post-office problem" algorithm (Knuth, 1973)). Exact matches are unlikely since most unit cells representing lattices in a database are experimental, and probe cells are also likely to be experimental data. Several efficient algorithms are available; we have used an implementation of neartree (Andrews, 2001).

The raw unit cell data is loaded into the tree once and serialized to a dump file on disk; subsequent searches do not need to wait for the O(Nlog(N)) tree build, which for the 70,000+ cells from the PDB can take half an hour in the BGAOL NCDist metric. The linearization makes the search space more compact and reduces the tree depth, thereby speeding searches. Because the PDB unit cell database contains many identical cells, we modified NearTree to handle the duplicates in auxiliary lists, further reducing the tree depth and speeding searches.

Comparison of Search Methods

The simplest approach to lattice searching is a simple box search on ranges in unit cell a, b and c and possibly on α, β, and γ, as for example in the "cell dimensions" option in the RCSB advanced search at http://www.rcsb.org/pdb/search/advSearch.do for the Protein Data Bank (Berman et al., 2000). In the following examples, we will call that type of search "Range". For the reasons discussed above, such simple searches can fail to find unit cells representing similar lattices but with very different edges that actually represent similar lattices. Such searches are best characterized as cell searches, rather than as lattice searches.

Searching on primitive reduced cells greatly improves the reliability of a search, as for example in (Ramraj et al., 2011) at http://www.strubi.ox.ac.uk/nearest-cell/nearest-cell.cgi, which uses a metric based on the reduced cell and all permutations of axes.While an improvement over simple range searches as discussed above, such searches can also miss similar lattices if the number of alternate lattice presentations considered is not complete. One way to reduce such gaps in searches is to use only parameters that do not depend on the choice of reduced presentation. The (Andrews et al., 1980) approach using 7 parameters (three reduced cell edges, three reduced reciprocal cell edges and the volume), "V7", helps, but has difficulty distinguishing cells with angles near 90 degrees. The NCDist approach used here, derived from (Andrews & Bernstein, 2012) both fills in the gaps and handles angles near to 90 degrees.

Consider, for example, the unit cells of phospholipase A2 discussed by (Le Trong & Stenkamp, 2007). They present three alternate cells from three different PDB entries that are actually for the same structure:
[57.98, 57.98, 57.98, 92.02, 92.02, 92.02] from entry 1FE5 (Singh et al., 2001) in space group R32,
[80.36, 80.36, 99.44, 90, 90, 120] from entry 1U4J (Singh et al., 2005) in space group R3 and
[80.949, 80.572, 57.098, 90.0, 90.35, 90.0] from entry 1G2X (Singh et al., 2004) in space group C2. No simple Range search can bring these three cells together. For example, if we use the PDB advanced cell dimensions search around the cell from IU4J with edge ranges of +-3 Ångstroms and angle ranges of +-1 degree, we get 28 hits: 1CG5, 1CNV, 1FW2, 1G0Z, 1GS7, 1GS8, 1HAU, 1ILD, 1ILZ, 1IM0, 1LR0, 1NDT, 1OE1, 1OE2, 1OE3, 1QD5, 1U4J, 2BM3, 2BO0, 2H8A, 2HZ5, 2OHG, 2REW, 2WCE, 3I06, 3KKU, 3Q98, 3RP2, of which only three actually have cells close to the target using the linearized NCDist metric : 2WCE at 2.96 Ångstroms, 1G0Z at 0 Ångstroms, and 1U4J, the target itself. The remaining cells are, as we will see, rejected under the Nearest-Cell and the V7 metric. The simple Range searches are not appropriate to this problem.

Table 2 shows partial results from a lattice search using Nearest-Cell, and a V7 search using SAUC and a NCDist search using SAUC. We have restricted the searches to NCDist distances ≤ 3.5 Ångstroms. The Nearest-Cell metric appears to be in Å2. The column with the square root of the Nearest- Cell metric facilitates comparison with the linearized SAUC V7 and NCDist metrics. The searches showed consistent behavior: The three cells noted by (Le Trong & Stenkamp, 2007) are found in the same relative positions by all three searches. All cells found by Nearest-Cell are also found by both V7 and NCDist. Of the 42 structures found by all three metrics within 3.5 Ångstroms under the NCDist metric, four (1G0Z, 1G2X, 1DPY and 1FE5) are E.C. class 3.1.1.4 Phospholipase A2 structures, and three (1PKR, 1SGC and 1VRI) are other hydrolases (E. C. classes 3.4.21.7, 3.4.21.80, and 3.4.19.2, respectively) However, ten cells found by V7 and NCDist were not found by Nearest-Cell (2OSN, 2CMP, 3MIJ, 2SGA, 2YZU, 3SGA, 4SGA, 5SGA, 1CDC and 2CVK). Of those ten, one (2OSN) is an E.C class 3.1.1.4 Phospholipase A2 structure and four (2SGA, 3SGA, 4SGA and 5SGA are hydrolases, specifically E.C. class 3.4.21.80 Proteinase A. Two of the ten (2YZU and 2CVK) are thioredoxin, for which the ProMOL (Craig et al., submitted) motif finder shows significant active site homologies to multiple hydorolase motifs (2YZU has site homologies to 132L, 135L and 1LZ1 in E.C. class 3.2.1.17 and to 4HOH in E.C. class 3.1.27.3, 2CVK to 1AMY in E.C. class 3.2.1.1, to 1BF2 in E.C. class 3.2.1.68, to 1EYI in class 3.2.3.11, etc.). For 1CDC, a "metastable structure of CD2", proMOL shows an active site homology to 1ALK of E.C. class 3.1.3.1, another hydrolase.

The significant gaps in the Nearest-Cell search do not appear to be a problem of the distance for the Nearest-Cell search having been cut off at a too-small value. For the common hits between the square root of the Nearest-Cell metric and the linearized NCDist metric, a linear fit is excellent, with R2 = 0.89 and no points are very far from the line. The agreement of the linearized V7 to the other two metrics is much noisier because of loss of sensitivity of the V7 metric for angles near 90 degrees and the inherent difficulty the V7 metric has in discriminating between the + + + and - - - parts of the Niggli cone. For example, 1GUT (Schüttelkopf et al., 2002) is at distances 1.2 and 3.7 from 1UJ4 in the Nearest-Cell and linearized NCDist metrics, respectively, but only 0.1 in the V7 metric. The 1GUT cell is

[78.961, 82.328, 57.031, 90.00, 93.44, 90.00] in C 1 2 1, Z=24, with a primitive cell

[57.031, 57.0367, 57.0367, 92.3918, 92.3804, 92.3804] which corresponds to a G6 vector

[3252.53, 3253.18, 3253.18,-271.53,-270.208,-270.208] and a linearized V7 vector

[52.8004, 52.8057, 52.8057, 52.7101, 52.7101, 52.7053, 52.7569]. The 1U4J cell is

[80.36, 80.36, 99.44, 90, 90, 120] in R3, Z=18, with a primitive cell

[57.02, 57.02, 57.02, 89.605, 89.605, 89.605] which corresponds to a G6 vector

[3251.28, 3251.28, 3251.28, 44.8265, 44.8265, 44.8265] and a linearized V7 vector

[52.7902, 52.7902, 52.7902, 52.7878, 52.7878, 52.7878, 52.789]

This is almost identical to the 1GUT V7 vector, even though the corresponding primitive cells and G6 cells differ significantly.

Tables

Table 1. Programs designed to perform effective searches in a unit cell database
ProgramReference
CrystAndrews et al., 1980
NIH/EPA, 1980
V7
cdsearchToby, 1994V7
QuestAllen et al., 1973Reduced cell
Nearest-CellRamraj et al., 2011Reduced cell
WebCSD, ConquestThomas et al., 2010G6 iterative
SAUCthis workG6, Niggli embedding

Table 2. Comparison of search results for cell [80.36, 80.36, 99.44, 90, 90, 120] from entry 1U4J in space group R3. See (Le Trong & Stenkamp, 2007). In each case the PDB ID (Bernstein et al., 1977) (Berman et al., 2000) found is shown with the distance metric for that method. In the case of Nearest-Cell (Ramraj et al., 2011) a second column with the square root of the metric is provided as well. The results are sorted by the NCDist distance. Results have been cut off at 3.5 Angstroms in the NCDist metric. The three alternate cells cited by (Le Trong & Stenkamp, 2007) are marked with "(*)". The Nearest Cell results are from the www.strubi.ox.ac.uk/nearest-cell/nearest-cell.cgi web site. The NCDist and V7 results are from SAUC.
PDB IDNearest-
Cell
metric
Sqrt of
Nearest-
Cell
V7
metric
NCDist
metric
Molecule E.C. Code
1U4J (*)0000Phospholipase A2 isoform 23.1.1.4
1G0Z0000Phospholipase A23.1.1.4
1G2X (*)0.110.330.20.9Phospholipase A2 3.1.1.4
2OSN0.20.9Phospholipase A2 isoform 33.1.1.4
2CMP0.71.5Terminase small subunit
3KP80.430.661.11.7VKORC1/Thioredoxin domain protein
3MIJ1.01.7RNA (5'-R(*UP*AP*GP*GP*GP*UP
          *UP*AP*GP*GP*GP*U)-3')
3E560.40.631.51.9Putative uncharacterized protein
1CSQ0.490.71.82.0Cold Shock Protein B (CSPB)
3SVI0.540.731.92.1Type III effector HopAB2
1FKF0.830.912.72.4FK506 binding protein5.2.1.8
1FKJ0.830.912.72.4FK506 binding protein5.2.1.8
1BKF0.910.952.82.5Subtilisin Carlsberg3.4.21.62
1FKD0.860.932.82.5FK506 binding protein5.2.1.8
2FKE0.910.952.92.6FK506 binding protein5.2.1.8
3TJY0.880.943.02.6Effector protein hopAB3
2I5L1.061.033.72.7Cold shock protein cspB
2WCE1.211.13.53.0Protein S100-A12
3P631.281.134.03.0Ferredoxin
1F9P1.371.174.73.1Connective tissue activating peptide-III
2CXD1.361.174.73.1Conserved hypothetical
        protein, TTHA0068
2SGA4.93.1Proteinase A3.4.21.80
2YZU4.83.1Thioredoxin
3SGA5.03.1Proteinase A (SGPA)3.4.21.80
4SGA4.83.1Proteinase A (SGPA)3.4.21.80
5SGA4.93.1Proteinase A (SGPA)3.4.21.80
1GUS1.151.070.33.2Molybdate binding protein II
1PKR1.441.24.73.2Plasminogen3.4.21.7
1SGC2.451.575.23.2Proteinase A3.4.21.80
2VRI1.51.225.23.2Non-structural protein 33.4.19.12
1CDC4.93.3CD2
1DPY1.241.112.33.3Phospholipase A23.1.1.4
1FE5 (*)1.241.112.33.3Phospholipase A23.1.1.4
1GUT1.21.10.13.37Molybdate binding protein II
2C9Q1.461.214.83.3Copper resistance protein C
2CVK5.23.3Thioredoxin
2HE21.481.224.83.3Discs large homolog 2
2IT51.621.275.63.3CD209 antigen, DCSIGN-CRD
3SU11.591.265.23.3Genome polyprotein
3SU51.581.265.13.3NS3 protease, NS4A protein
3SU61.521.235.03.3NS3 protease, NS4A protein
1SL41.681.35.83.4mDC-SIGN1B type I isoform
2IT61.731.326.03.4CD209 antigen
3CYO1.811.355.63.4Transmembrane protein
3SU21.61.265.23.4Genome polyprotein
3SU31.641.285.33.4NS3 protease, NS4A protein
1H9M1.181.091.13.5Molybdenum-binding-protein
1X901.341.165.23.5Invertase/pectin methylesterase
             inhibitor family protein
2E6L1.781.335.83.5Nitric oxide synthase, inducible1.14.13.39
3CP11.981.416.13.5Transmembrane protein
3SU01.751.325.73.5Genome polyprotein
3SV61.741.325.63.5NS3 protease, NS4A protein
3SV71.731.325.63.5NS3 protease, NS4A protein

References

Allen, F. H., Kennard, O., Motherwell, W. D. S., Town, W. G. & Watson, D. G. (1973). Journal of Chemical Documentation, 13(3), 119 -- 123.
Andrews, L. (2001). C/C++ Users Journal, 19, 40 -- 49. http://sf.net/projects/ neartree.
Andrews, L. C. & Bernstein, H. J. (2012). arXiv preprint arXiv:1203.5146. arxiv.org/abs/1203.5146.
Andrews, L. C. & Bernstein, H. J. (2013). arXiv preprint arXiv:1305.6561. arxiv.org/abs/1305.6561.
Andrews, L. C., Bernstein, H. J. & Pelletier, G. A. (1980). Acta Crystallogr. A36, 248 -- 252. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235 -- 242.
Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, Jr., E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535 -- 542.
Burchard, U. (1998). The Mineralogical Record, 29(6), 517 -- 583.
Craig, P. A., Hanson, B., Westin, C., Rosa, M., Bernstein, H. J., Grier, A., Osipovitch, M., MacDonald, M., Dodge, G., Boli, P. M., Corwin, C. W. & Kessler, H. (submitted). B. M. C. Bioinformatics.
Donnay, J. D. H. (1943). The American Minerologist, 28, 313.
Johnson, G. G. (2013). Private Communication.
Knuth, D. E. (1973). Sorting and Searching (The Art of Computer Programming volume 3). Addison Wesley, Reading, MA.
Le Trong, I. & Stenkamp, R. E. (2007). Acta Crystallogr. D63, 548 -- 549.
Mighell, A. D. (2001). Journal of Research - National Institute of Standards and Technology, 106(6), 983 -- 996.
Mighell, A. D. (2002). Journal of Research - National Institute of Standards and Technology, 107(5), 425 -- 430.
NIH/EPA (1980). User's Manual NIH-EPA Chemical Information System, chap. User's Guide to CRYST The X-Ray Crystallographic Search System. National Institutes of Health, Environmental Protections Agency.
Pearson, W. B. (1958). Handbook of Lattice Spacings and Structures of Metals and Alloys. International Series of Monographs on Metal and Physics and Physical Metallurgy, G. V. Raynor (ed.). Pergamon Press.
Ramraj, V., Esnouf, R. & Diprose, J. (2011). Nearest-Cell A fast and easy tool for locating crystal matches in the PDB. Tech. rep. Division of Structural Biology, University of Oxford. http://www.strubi.ox.ac.uk/nearest-cell/nearest-cell.cgi.
Schüttelkopf, A. W., Harrison, J. A., Boxer, D. H. & Hunter, W. N. (2002). Journal of Biological Chemistry, 277(17), 15013 -- 15020.
Singh, G., Gourinath, S., Saravanan, K., Sharma, S., Bhanumathi, S., Betzel, C., Srinivasan, A. & Singh, T. P. (2004). Acta Crystallogr. F61(1), 8 -- 13.
Singh, G., Gourinath, S., Sarvanan, K., Sharma, S., Bhanumathi, S., Betzel, C., Yadav, S., Srinivasan, A. & Singh, T. P. (2005). J. Struct. Biol. 149(3), 264 -- 272.
Singh, G., Gourinath, S., Sharma, S., Paramasivam, M., Srinivasan, A. & Singh, T. P. (2001). J. Mol. Biol. 307(4), 1049 -- 1059.
Thomas, I. R., Bruno, I. J., Cole, J. C., Macrae, C. F., Pidcock, E. & Wood, P. A. (2010). J. Appl. Crystallogr. 43(2), 362 -- 366.
Toby, B. (1994). Private Communication.
Wyckoff, R. W. G. (1931). The structure of crystals. No. 19. The Chemical Catalog Company. inc.


| Search | Source | Introduction | History | Background | Implementation - Distance | Implementation - Searching | Comparison of Search Methods | Tables | References |

Updated 27 December 2017.