[IUCr Home Page] [CIF Home Page]   <<<==== ENTER PDB ID CODE HERE

CIFFOLD

CIFFOLD 0.5.4 Pre-Release


1 February 2006
by Kostadin Mitev, Georgi Todorov and Herbert J. Bernstein

User's Manual

Copyright © Kostadin Mitev 2005, 2006
Work funded in part by the International Union of Crystallography under a grant to Dowling College.
  1. Copyright and Distribution
  2. Introduction
  3. Installation
  4. Using CIFFOLD
  5. List of Options
  6. Default Options
  7. Logical integrity checks
  8. Terse Formatting
  9. Non-terse Formatting
  10. MAP
  11. Command-line Arguments
  12. How are files folded/wrapped
  13. How are files unfolded/unwrapped
  14. OTHER SOURCES
  15. Change Log
  16. Known Bugs

1. Copyright and Distribution

This software is covered by the GNU General Public License.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

2. INTRODUCTION

Until recently, information in Crystallographic Information File (CIF) format was limited to 80 characters per line and there was no way to represent longer data items and comments faithfully. With the release of CIF version 1.1, the maximum line size has been increased to 2048 characters and a protocol has been specified for folding and unfolding text fields and comments that exceed any given maximum line size. The C/C++ program CIFFOLD implements this line folding/unfolding protocol without loss of the semantic information in the files. This allows new, long-line CIF 1.1 files to be converted to a form suitable for processing by existing software for 80-character line CIF 1.0 files and to recover long-line CIF 1.1 files from CIFs produced by CIF 1.0 software. In addition to folding and unfolding, the software performs logical integrity checks and allows the user to set a variety of options providing control over the tradeoff between faithful versus compact representations.

3. INSTALLATION

You must first obtain a copy of the source kit of CIFFOLD, CIFFOLD.tar.gz.

To unpack the file on a UNIX machine type the command

gunzip CIFFOLD.tar.gz
and then the command
tar -xvf CIFFOLD.tar
to extract the files in a subdirectory named CIFFOLD_0.5.4 under the current directory.

To create the executable run

make
in the CIFFOLD_0.5.4 directory, which will create the executable named "ciffold". To run the program interactively simply type the command "./ciffold -g" and hit enter.

4. USING CIFFOLD

To run ciffold`s GUI form the UNIX prompt type

./ciffold -g
in the CIFFOLD directory and you will be shown the startup menu. The menu is comprised of several windows that are shown one by one. The top frame of each window contains the option, while the bottom one contains either the available options from which you have to select one or he prompt "Enter:" after which you have to enter your choice and hit enter. You can select an option by using the up and down arrow keys to highlight the desired option and hit enter.

5. LIST OF OPTIONS:

6. DEFAULT OPTIONS

CIFFOLD has some default options for the options that have not been selected. These options are used if during processing of the file something goes wrong for example if the file should be formated according to a MAP but it does not contain a MAP or the MAP becomes invalid at some point then the default options will be used and the user will be warned. The program uses the following default options:

7. LOGICAL INTEGRITY CHECKS

CIFFOLD checks the file for some basic logical integrity errors and generates warnings about them.

The checks performed are:

In addition to the logical integrity checks CIFFOLD will detect and change the delimiter of a string with the following peculiarity: The same character as the delimiter appears right after the opening delimiter or before the closing delimiter. The delimiter of such a string will be changed to its alternative one for example " to ' and vice versa so the string "rambo"" will be changed to 'rambo"'. A warning will be issued about the change and if the option "Output the warning messages ?" is selected then it will be outputted as a special comment at the end of the output file. A warning will be issued if there is a presence of non delimited reserved character such as([, ], _, etc.)

8. TERSE FORMATTING

If the option terse is chosen then the program will attempt to reduce the amount of white space to a minimum by putting as much information as possible on one line, while the file is still a valid cif. This option is considered user unfriendly and is used to reduce the size and length of the file. If a string is delimited with a single/double quote and immediately after the opening delimiter there is another single/double quote or immediately before the closing delimiter there is a single/double quote. Then the delimiter is changed to its alternative which is single quote for the double quote and vise versa. For example if we have a string of the type ""rambo" it will be converted to a string of type '"rambo'. This is done to avoid ambiguity and improve the clarity of the content of cif files. Any single hashmark will be put on new line.

9. NON TERSE FORMATTING

10. MAP

The optional map is used to save information on the original positions of information when a files is folded.

The MAP is a file that contains of "dh" for data and h is the delimiter of the data either ;, ' , " or nothing if no delimiter is used "sn" for space and "tn" for tabs where n is the number of spaces/tabs. For each line of the input file there is a line in the MAP file that shows the layout of the line. For example d's7d shows that there is data delimited by a single quote followed by 7 spaces and nondelimeted data. The MAP file is then concatenated to the output file such that each line is prefixed by #_M# indicating that the line is part of the map file. The MAP file is useful if a file is folded and then it is necessary to recostruct exactly the same file by unfolding it. WARNING: As of version 0.3 of CIFFOLD the line length of the map may be of arbitrary length. This means that if there are 60 separate items on a single line of the input file the corresponding line in the MAP file will be more than 60 characters long and if maxline length has been selected to be 60 the MAP will exceed it.

11. Command-line Arguments

	ciffold [-i input_cif] [-o output_cif] [-x n-n,n-n]
         [-l n] [-m n] [-C n] [-p a[w][e]] [-v file_vers]
         [-c] [-d] [-e] [-g] [-w [-n]] [-u] [-L] [-t] [-h] [-M] [-V]
	

If you want to run CIFFOLD with the options specified on the command line you can do that by typing "./ciffold specify the options here" and then hit enter. The options provided are:

    [-i input_cif]  corresponds to "ENTER INPUTFILE:" (see above)
			for command line use, a "-" indicates standard input
                        input_cif defaults to stdin
    [-o output_cif] corresponds to "ENTER OUTPUT FILE:" (see above)
			for command line use, a "-" indicates standard output
			output_cif defaults to stdout
    [-d ]           corresponds to "Is this a dictionary file:?" with 
                        value of "yes" (see above)
    [-u ]           corresponds to "Folding (Yes) or Unfolding (No):" 
                        with value of "no" (see above)
    [-w ]           corresponds to "Folding (Yes) or Unfolding (No):" 
                        with a value of "yes" (see above)
    [-n ]           corresponds to the "Minimal Folding (Yes or No)"
                        with a value of "yes" (see above)
    [-m  maxline]   corresponds to "Specify the maximum line length?:" 
                        (see above) Note: this option is considered
                        only when folding files. In unfolding the
                        maximum line length will be forced to be 2048.
    [-v file_version] corresponds to "File version:" valid 
                        file_versions are 1.0 or 1.1 (see above)
    [-t ]           corresponds to "Terse Folding/Unfolding?:" with a 
                        value of "yes" (see above)
    [-l integer]    corresponds to "Terse formatting on loops?:" with a 
                        value of "yes" and digit corresponds to "How many 
                        items is a big loop? :" (see above)
    [-L]            corresponds to "Preserve leading blanks" with a
                        value of "yes" (see above)
    [-c]            corresponds to "Format only comments:" with a value 
                        of "yes" (see above)
    [-e]            corresponds to "Format everything except comments:" 
                        with a value of "yes" (see above)
    [-C integer]    corresponds to "The column with respect to which 
                        the data should be aligned:" (see above)
    [-p character]  Valid characters for "character" are:
		 "a"- corresponds to "Output the error message:"with a value 
                   of "yes" and "Output the warnings:" with a value of 
                   of "yes".
		 "w"- corresponds to "Output the warnings:" with a value of 
                   "yes".
	         "e"- corresponds to "Output the error messages:" with a 
		   value of "yes" (see above)
    [-g]          Takes no values and invokes the GUI interface
    [-M]          If folding corresponds to "Create a MAP?" with a 
                        value of "yes". 
                        If unfolding corresponds to "Read from a MAP?" 
			with a value of "yes" (see above)
    [-h]          Takes no values. Prints a help message and exits.
    [-x n-n,n-n]        corresponds to "Process the entire file?" with a value 
	                of "no". n-n correspond to "Enter chunk pairs or END 
                        to continue:"with n-n being a string where the first 
                        n is the starting integer and the following is the 
                        ending. Example: if you want to format only the chunks 
			9-10 40-70 you would specify that as -x 9-10,40-70
     [-V]         Takes no values. Prints the current version and exits.

12. How are files folded/wrapped

CIFFOLD will make two passes through the file. On the first pass it will perform logical integrity checks, issue the appropriate warnings and error messages and will create a temporary file where the input file will be stored. It will also create a MAP for the file if the MAP option is selected and will create a temporary file for the MAP. Some additional information about the file is gathered as well. On the second pass CIFFOLD will actually fold/wrap the file according to the following rules:

Lines will be folded/wrapped only if they exceed the maximum line length. Thus if a text field has lines that are less than the maximum allowed line length it will not be folded/wrapped. Strings that have lines less than the maximum allowed line length but they end beyond the column of the maximum allowed line length will either be brought back to the left by deleting blank characters or will be placed on a new line if the former is not possible. The loops will be formated according to the following rules:

Every tag is placed on a new line. If possible the data tokens in the loop will be aligned into rows and columns such that each row contains as many data tokens as are the number of tags. If such alignment is not possible the original formatting will be preserved as much as possible.

The option preserve the leading blanks will not preserve the leading blanks for the tokens that fall within a loop. Unless the trailing blanks fall within a text field they will be deleted.

When finished processing the temporary files are deleted.

13. How are files unfolded/unwrapped

CIFFOLD will make two passes through the file. On the first pass it will perform logical integrity checks, issue the appropriate warnings and error messages and will create a temporary file where the input file will be stored. It will also create a temporary MAP file which will hold the MAP of the input file if it exists and the MAP option is selected. Some additional information about the file is gathered as well. On the second pass CIFFOLD will actually unfold/unwrap the file according to the following rules (if the default options are used):

Every tag will be placed on a new line. A data associated with a tag will be placed on the same line as the tag if: the resulted line length does not exceed the maximum allowed line length and the new line characters between the tag and the data are not more than 1

. Example:

_a_tag  
data
and
_a_tag data
will be unfolded/unwrapped as:
_a_tag data
but:
_a_tag

data
will be unfolded/unwrapped as:
_a_tag

data

The loops will be formated according to the following rules unless the -n option has been selected:

Every tag is placed on a new line. If possible the data tokens in the loop will be aligned into rows and columns such that each row contains as many data tokens as are the number of tags. If such alignment is not possible the original formatting will be preserved as much as possible.

Unless the trailing blanks fall within a text field they will be deleted.

The option preserve the leading blanks will not preserve the leading blanks for the tokens that fall within a loop. The only way the original file can be exactly recovered is by using the MAP option.

When finished processing the temporary files are deleted.

14. OTHER SOURCES:

For information about cif files visit: http://www.iucr.org/. For information about the the folding/unfolding protocol of cifs visit: http://www.iucr.org/iucr-top/lists/cif-developers/msg00147.html

15. Change Log

16. Known Bugs

Written by K. Mitev, 15 April 2005,
revised, H. J. Bernstein, 16 April 2005, 19 April 2005,
K. Mitev, 22 April 2005,
H. J. Bernstein, K. Mitev, G. Todorov, 27 April 2005,
G. Todorov 28 April 2005,
K. Mitev 29 April 2005,
K. Mitev 6 May 2005,
H. J. Bernstein, 7 May 2005,
K. Mitev, H. J. Bernstein 14 May 2005,
K. Mitev 31 May 2005,
K. Mitev 11 July 2005, H. J. Bernstein 22 July 2005,
H. J. Bernstein 23 July 2005,
G. Todorov, H. J. Bernstein 25 July 2005,
H. J. Bernstein 1 August 2005, H. J. Bernstein 30 September 2005,
K. Mitev, H. J. Bernstein 1 February 2006