README.cif2xml

               Information for cif2xml 0.1.1 -- alpha, 29 November 2009

     ----------------------------------------------------------------------------

                     Before using this software, please read the
                                        NOTICE
                               and please read the IUCr
                                        Policy
              on the Use of the Crystallographic Information File (CIF)

     ----------------------------------------------------------------------------

     \ | /      \ | /
      \|/        \|/
     -- -->>>>>>-- --               c i f 2 x m l
 ...... CIF COPY PROGRAM
      /|\        /|\
     / | \      / | \                       Version 0.1.1 - alpha
                                              29 November 2009


  cif2xml  is a fortran program using CIFtbx2 to copy a CIF on standard
  -------  input to an equivalent XML on standard output, while checking
           data names against dictionaries and reformating numbers with
           esd's to conform to the rule of 19.  A quasar-style request
           list may be specified, otherwise the entire CIF is copied.
           The XML output may be literally derived from the CIF input,
           or transformations may be specified in a dictionary.
           The declarations required for the XML document may either be
           embedded in the new document, written to an external DTD, or
           referred to an existing file.

           This program is based on cif2cif, and differs from the program
           primarily in the format of the output.

            cif2xml
                   by

                   Copyright (c) 2000, 2009
                   Herbert J. Bernstein (yaya@bernstein-plus-sons.com)
                   Bernstein + Sons
                   5 Brewster Lane
                   Bellport, NY 11713, U.S.A.


   This program is part of the Bernstein+Sons xmlCIF project (H. J. Bernstein and
   F. C. Bernstein). See http://www.bernstein-plus-sons.com/software/xmlCIF.

   In order to ensure continuing availability of source code and documentation
   cif2xml and its documentation are subject to copyright. This does not prevent
   you from using the program, from making copies and changes, but prevents the
   creation of "closed source" versions out of the open source versions. See
   NOTICE.

   Science is best served when the tools we use are fully understood by those who
   wield those tools and by those who make used of results obtained with those
   tools. When a scientific tool exists as software, access to source code is an
   important element in achieving full understanding of that tool. As our field
   evolves and new versions of software are required, access to source allows us to
   adapt our tools quickly and effectively.

   In the early days of software development, most scientific software source code
   was freely and openly shared with a minimum of formalities. These days, it
   appears that carefully drawn legal documents are necessary to protect free
   access to the source code of scientific software. We are all deeply indebted to
   Richard Stallman for showing us how a creative combination of copyrights and
   seemingly restrictive licenses could give us truly unfettered freedom to use
   programs, to read their source code and to develop new versions. The GNU
   project, and the Linux project have shown that an open source approach works. We
   use the GNU General Public License (the "GPL") for our program. Older releases
   use the license from OpenRasmol. The OpenRasMol conditions for use have
   correctly been called "GPL-like".

   If you are a user of this program, you will find that the copyrights and notices
   ask little more of you than that you avoid mistakes by others by keeping the
   notices with copies, display scientific integrity by citing your sources
   properly and treating this like other shared scientific developments by not
   inferring a warranty. If you are a software developer and wish to incorporate
   what you find here into new code, or to pick up bits and pieces and used them in
   another context, the situation becomes more complex. Read the copyrights and
   notices carefully. You will find that they are "infectious". Whatever you make
   from our Open Source code must itself be offered as Open Source code. In
   addition, in order to allow users to understand what has changed and to ensure
   orderly development you have to describe your changes.

     ----------------------------------------------------------------------------

   cif2xml reads the input CIF from the standard input device (normally device 5).
   An optional STAR data name dictionary (in DDL format) is opened. A reformatted
   copy of the input CIF is written to standard output (device 6). Messages are
   output to the standard error device (normally device 0). Note that the PARAMETER
   'MAXBUF' should contain the maximum number of char- acters contained on a single
   text line. The default value is 200. If a request list (a file listing data_
   block names and tags) is provided that list controls the ordering and selection
   of tags and values to copy. Otherwise the entire CIF is copied in the order
   presented

   In a unix-like environment, the program is run as:

  cif2xml [-i input_cif] [-o output_xml] [-d dictionary] [-c catck]\
          [-f command_file] [-e esdlim_] [-a aliaso_] [-p prefix]\
          [-t tabl_] [-q request_list] [-b {row|col} ] [-x {xfer|keep|zap}]\
          [-u {drop|insert}]
          [-s {inline | referto spec_dtd | writeto spec_dtd} ] \
          [input_cif [output_xml [dictionary [request_list [spec_dtd ]]]]]

  where:
          input_cif defaults to $CIF2XML_INPUT_CIF or stdin
          output_xml defaults to $CIF2XML_OUTPUT_XML or stdout
          dictionary defaults to $CIF2XML_CHECK_DICTIONARY
            (multiple dictionaries may be specified)
          request_list defaults to $CIF2XML_REQUEST_LIST
          input_cif of "-" is stdin, output_xml of "-" is stdout
          request_list of "-" is stdin
          -e has integer values (e.g. 9, 19(default) o 29)
          -a has values of t or 1 or y vs. f or 0 or n
          -p has string values in which "_" is replaced by blank
          -t has values of t or 1 or y vs. f or 0 or n, default f
          -s defaults to inline, -b defaults to col
          -x defaults to zap, -u to drop

   Note: The options -s inline and -s writeto spec_dtd are not implemented in this
   release.

   The basic approach is to map categories into an outer level of XML tags and
   individual tags into the next level down the tree. Three new dictionary tags are
   defined to allow for mapping of CIF categories and tags to XML entity names:

   _xml_mapping.token                gives the CIF token to be mapped
   _xml_mapping.token_type           gives the type of CIF token
   _xml_mapping.target               gives the string to be used in xml

   The mapping is optionally by rows or by columns. Mapping by columns is the
   default because it allows a much high density of data versus tags.

   Here is the beginning of the cell information from 1crn as mapped by cif2xml:

 <cell.entry_id>                 1CRN
 </cell.entry_id>
 <float builtin="acell">         40.96
 </float>
 <float builtin="bcell">         18.65
 </float>
 <float builtin="ccell">         22.52
 </float>
 <float builtin="alpha">         90.
 </float>
 <float builtin="beta">          90.77
 </float>
 <float builtin="gamma">         90.
 </float>
 ...

   Note the non-CML tag cell.entry_id included. cif2xml allows for request lists so
   that such tags may be excluded, but, for use with Jmol, there is no need to
   exclude them.

   The output of cif2xml when used to produce data by columns agrees with the
   output of the BioDOM program pdb2xml [Moore 99] for such non-looped data. For
   coordinate lists the higher information density of the cif2xml output results in
   faster dataset reading and display when used with Jmol.

1. INSTALLATION

   Here is the recommended procedure for implementing and testing this version of
   cif2xml.

   1.0. Before you try to install this version of cif2xml

   
      *** ========================================================== ***
      *** ========================================================== ***
      *** ==>>> You must have ciftbx version 2.6.4 or greater  <<<== ***
      *** ==>>> installed in a directory named ciftbx.src.     <<<== ***
      *** ==>>> The scripts mkdecompln and rmdecompln, which   <<<== ***
      *** ==>>> come with ciftbx, must be installed in the     <<<== ***
      *** ==>>> top level directory and executable.            <<<== ***
      *** ==>>> To test cif2xml, you must have a compressed    <<<== ***
      *** ==>>> copy of the dictionary cif_mm.dic in a         <<<== ***
      *** ==>>> directory named dictionaries.                  <<<== ***
      *** ========================================================== ***
      *** ========================================================== ***

   The directory structure within which you will work is

                   top level directory
                   -------------------
                            |
                            |
             ------------------------------
             |              |             |
        dictionaries   ciftbx.src     cif2xml.src
        ------------   ----------     -----------

   You may have acquired this package in one of several forms. The most likely are
   as a "C-shell Archive," a "Shell Archive", or as separate files. The idea is to
   get to separate files, all in the same directory, named cif2xml.src, parallel to
   the directory ciftbx.src, but let's start with the possibility that you got the
   package as one big file, i.e. in one of the archive file formats. Place the
   archive in the top level directory.

    *** ========================================================== ***
    *** ========================================================== ***
    *** ==>>> The files in this kit will unpack into a       <<<== ***
    *** ==>>> directory named cif2xml.src.  It is a good idea<<<== ***
    *** ==>>> to save the current contents of cif2xml.src    <<<== ***
    *** ==>>> and then to make the directory empty           <<<== ***
    *** ========================================================== ***
    *** ========================================================== ***

   If you are on a machine which does not provide a unix-like shell, you will need
   to take apart the archive by hand using a text editor. We'll get to that in a
   moment.

   1.1. ON A UNIX MACHINE

   If you have the shell archive on a unix machine, follow the instructions at the
   front of the archive, i.e. save the uncompressed archive file as "file", then,
   if the archive is a "Shell Archive" execute "sh file". If the archive is a
   "C-Shell Archive" execute "csh file".

   1.2. IF YOU DON'T HAVE UNIX

   If sh or csh are not available, then it is best to start with the "C-Shell
   Archive" and do the steps that follow. If you must use the "Shell Archive" you
   should be aware that the lines you want to extract have been prefixed with "X",
   while most of the lines you want to discard have not. For a "C-Shell Archive"
   such prefixes are rare and the file is easier to read. Assume you have a
   "C-Shell Archive".

   Use your editor to separate the different parts of the file into individual
   files in your workspace. Each part starts with a lot of unixisms, then several
   blank lines and then two lines which identify the file, and most importantly,
   contain the text "CUT_HERE_CUT_HERE_CUT_HERE" You can look at the line before
   and the line after to see if you are at the head or tail of a file. Use your
   editor to search for the "CUT_HERE" lines. Each part is carefully labeled and
   indicates the recommended filename for the separated file. On some machines
   these filenames may need to be altered to suit the OS or compiler.

   1.3. MANIFEST

   The partitions are as follows:

    part  filename                   description

      1  COPYING                          GPL (GNU General Public License)
      2  NOTICE                           Notices
      3  cif2xml.src/README.cif2xml       additional information on cif2xml
      4  cif2xml.src/MANIFEST             a list of files in the kit
      5  cif2xml.src/Makefile             a preliminary control file for make
      6  cif2xml.src/4ins.cif             example mmcif file used to test cif2xml
      7  cif2xml.src/4ins.out             example XML output from test of cif2xml
      8  cif2xml.src/4ins.prt             example list file from test of cif2xml
      9  cif2xml.src/cif_cml.dic          example of CML mapping definitions
     10  cif2xml.src/cif2xml.cmn          cif2xml common block
     11  cif2xml.src/cif2xml.f            cif2xml fortran source
     12  cif2xml.src/xtalt2.cif           example cif file used to test cif2xml
     13  cif2xml.src/xtalt2.out           example XML output from test of cif2xml
     14  cif2xml.src/xte29.out            example XML output from test of cif2xml
     15  cif2xml.src/xttne9.out           example XML output from test of cif2xml

  2. COMPILING AND EXECUTING

   Here are the recommended steps for a UNIX system. Vary this according to the
   requirements of your OS and compiler. You will simplest to work if you place the
   cif2xml files together in a common subdirectory named 'cif2xml.src'. Be very
   careful if you place them in a directory with other files, since some of the
   build and test instructions remove or overwrite existing files, especially with
   extensions such as '.o', '.lst', or '.diff'. On a UNIX system, most of what you
   need to do to build and test cif2xml is laid out in Makefile. *** Be sure to
   examine and edit this file appropriately before using it.*** But, once properly
   edited, all you should need to do is 'make clean' to remove old object files,
   'make all' to build new version of 'cif2xml' and 'make tests' to test what you
   have built.

   For non-UNIX-like environments, you will have to provide replacements for iargc,
   getarg and getenv. The following are reasonable possibilities:

          integer function iargc(dummy)
          iargc=0
          return
          end

          subroutine getarg(narg,string)
          integer narg
          character*(*) string
          string=char(0)
          return
          end

          subroutine getenv(evar,string)
          character*(*) evar,string
          string=char(0)
          if(evar.eq.'CIF2XML_INPUT_CIF')
         *  string='INPCIF.CIF'//char(0)
          if(evar.eq.'CIF2XML_OUTPUT_XML')
         *  string='OUTXML.XML'//char(0)
          if(evar.eq.'CIF2XML_CHECK_DICTIONARY')
         *  string='CIF_CORE.DIC'//char(0)
          if(evar.eq.'CIF2XML_REQUEST_LIST')
         *  string='REQLST.DAT'//char(0)
          return
          end

   This combination of substitute routines would "wire-in" cif2xml to read its
   input cif from a file named INPCIF.CIF, write its output cif to a file named
   OUTXML.CIF, check names against CIF_CORE.DIC and use the tag names given in
   REQLST.DAT to selects the ones to copy

  FILES USED

        dictionary input         input   on device 2
        Reformatted CIF          output  on device 6 ('stdout')
        Input CIF                input   on device 2, if a file, 5  if 'stdin'
        Message device           output  on device 0 ('stderr')
        Direct access            in/out  on device 3
        Request list             input   on device 4, if a file, 5 if 'stdin'

TEST files

   Three test CIFs are provided. xtal2.cif is a test file borrowed from xtal_gx
   (file xtest2.cif at ftp://ftp.crystal.uwa.edu.au/free/test., provided by S. R.
   Hall. 4ins.cif is an mmCIF file created from the PDB entry 4INS by G.G. Dodson,
   E. J. Dodson, D. C. Hodgkin, N.W. Isaacs and M. Vijayan (1989) by the program
   pdb2cif (P.E. Bourne, F.C. Bernstein and H.J. Bernstein, 1996, see
   http://ndbserver.rutgers.edu/software).

   xtalt2.cif provides good test cases for the conversion of esd's. The command

     cif2xml -t y < xtalt2.cif > xtalt2.new

   ensures that all esd's follow the rule of 19, while

     cif2xml -t y -e 29 < xtalt2.cif > xte29.new

   converts esd's to the rule of 29. The difference between the two rules is that
   for the rule of 19, all esd's lie between 2 and 19, so that an esd of (1) has to
   be converted to (10), while for the rule of 29, all esd's lie between 3 and 29,
   so that an esd of (2) also has to be converted, in this case to (20). The option
   "-t y" tidies the output to tab stops.

   One last test with this file is to use the command

    cif2xml -e 9 < xtalt2.cif > xttne9.new

   to copy the original cif spacing and to use the rule of 9 on esd's

   4ins.cif has many comments, text fields and dense loops. The test in the
   Makefile tests handling of these items and adds the additional complication of
   processing a prefix ".._" with the command

    cif2xml -t y -p .._ < 4ins.cif > 4ins.new

   The output spacing is controlled by the program.

   If we wish to map tags to an essential subset of the CML XML tags, we can use
   the command

    cif2xml -d cif_mm.dic -d cif_sml.dic -s referto cml.dtd \
       < 4ins.cif > 4ins.new

  CHANGES

  KNOWN PROBLEMS

   cif2xml does not copy white space exactly, and will reformat some data values.
   Some aspect of this are inherent in the differences between CIF and XML. Always
   compare the original to the output.

   XML does not allow multiple root elements. cif2xml maps the first DATA_ block
   encountered to the root element. This can cause problems for XML parsers if
   multiple DATA_ block appear in the input CIF.

   The code used by xml2cif to write DTDs is not ready for release, and has not
   been included.

  References

     * [Bernstein et al. 98] Bernstein, H.J.,Bernstein, F.C., Bourne, P.E.
       "pdb2cif: Translating PDB Entries into mmCIF Format", J. Appl. Cryst., 31,
       pp. 282-295, 1998, software available from http://www.iucr.org/iucr-top/cif
       and http://ndbserver.rutgers.edu.
     * [Bray, Paoli, Sperberg-McQueen 98] Bray, T., Paoli, J., Sperberg, C. M.,
       eds, "Extensible Markup Language (XML)", W3C Recommendation 10-Feb-98,
       REC-xml-19980210, http://www.w3.org/TR/1998/REC-xml-19980210
     * [Fitzgerald et al. 96] Fitzgerald, P. M. D., Berman, H. M., Bourne, P. E.,
       McMahon, B., Watenpaugh, K., Westbrook, J. "The mmCIF Dictionary: Community
       Review and Final Approval", 17th IUCR Congress and General Assembly,
       Seattle, Washington, USA, 8-17 August 1996, Abstract E1226. Version 0.8.02
       available from http://ndbserver.rutgers.edu.
     * [Gezelter 99] Gezelter, D., "Jmol" and open source Java program. See
       http://www.openscience.org/jmol.
     * [Hall, Allen, Brown 91] Hall,S. R. Allen, F. H., Brown, I. D., "The
       Crystallographic Information File (CIF): A New Standard Archive File for
       Crystallography", Acta Cryst. A47, 655-685 (1991),
       http://www.us.iucr.org/iucr-top/cif/standard/cifstd1.html.
     * [Hall, Bernstein 96] Hall, S.R., Bernstein, H.J., "CIFtbx2: Extended Tool
       Box for Manipulating CIFs," J. Appl. Cryst., 29, pp 598-603 (1996).
     * [Hendrickson, Teeter 81] Hendrickson, W. A., Teeter, M. M., "Crambin", PDB
       Entry 1CRN. See also Teeter, M. M., "Water Structure Of A Hydrophobic
       Protein At Atomic Resolution. Pentagon Rings Of Water Molecules In Crystals
       Of Crambin", Proc. Nat. Acad. Sci., USA, 81, 6014 ff. (1984).
     * [Longridge 98] Longridge, J. J., "Tetrasodium Hexacyanoferrate(II)
       Decahydrate", Acta Cryst. C54, 1998, CIF-Access paper, IUCR9800028.cif.
     * [Moore 99] Moore, A., "pdb2xml", March 1999, released as pdb2xml-protbot.pl
       on the BioDOM website,
       http://ala.vsms.nottingham.ac.uk/biodom/software/protsuite-user-dist/
     * [Murray-Rust, Rzepa 99] Murray-Rust, P., Rzepa, H., "Chemical markup, XML
       and the WWW, Part I: Basic principles," J. Chem. Inf . Comp. Sci, 39 No. 6,
       928-942,(1999). See http://www.xml-cml.org

     ----------------------------------------------------------------------------

   Updated 29 November 2009

    yaya@bernstein-plus-sons.com