Programming Module

An Overview of Programming

This is an overview of programming. It is targeted towards readers who want to understand how to solve problems with computers by writing new programs and scripts, rather than by using a canned application, but it is not intended for those computer professionals who are only happy if they have made their own job as difficult as possible. See, for example, Real Programmers Don't Eat Quiche. Rather the focus is on helping working scientists and other academics in coping with computer programming as a necessary evil in getting other, much more important work done, much as a homeowner might learn to fix their own toilets or put in a new electrical outlet.

What is Programming

Programming is directing a computer to perform a specific sequence of actions (commands or instructions) in some specified order. Think of writing a recipe for somebody else to follow to bake a cake. If you assume anything or leave out any steps, the cake will probably not be what you intended it to be and may even be inedible.

In order to program a computer, it is helpful to understand what a computer is. A computer is a device that can be told to transform information, and information is just a bunch of numbers. As counter-intutive as that may be, there is now a respectable history and sound theory behind the idea. You can get a sense of that history from any introduction to computer science (e.g. Introduction to Computer Science) but, if you are a hurry and are not trying to prepare to be a computer scientist, it is sufficient to understand what is called the "von Neumann architecture" of a computer.

The von Neumann Model

The classic von Neumann architecture for a computer consists of a digital memory to hold both programs and data, a control unit to obey the program instructions in memory, either in sequence or jumping to specified locations in memory, modifying the contents of memory according to those instructions, various arithmetic units to perform calculations required by the program as directed by the control unit, and input/output units.

Despite many serious efforts to come up with a better, "non-von-Neumann" computer architecture, and the vast increase in power and complexity of computers over the years, this is still be basic structure of modern computers.

The Process of Programming

Everything has to be reduced to numbers. The memory has to be organized into places to hold those numbers, into "data structures". Some of the data structures will hold numbers that don't change during calculations -- those data structures are "constants". Some of the data structures will hold number that may change during calculations -- those data atructures are "variables". Some portions of memory will hold the numbers representing the organized groups of instructions of the stored program to be executed -- "routines", "subroutines", "functions", "methods", "actions", .... We use the term "algorithm" to describe the steps that will be taken without having to be too specific about the exact representation of instructions in memory, and the term "program" to describe the exact sequence of instructions that will be followed.

Working with the von-Neumann model in mind, the process of creating a computer program can be broken down into the following steps:

Design the output data structures: Work out what information the program is intended to produce
Design the input data structures: Work out what information the program will need to produce that output
Design the algorithms and internal data structures: Work out what we need to do to get from the specified input to the specified output

Fortunately, there are many programs available that simplify the programming process, letting us work with something a little easier to understand than strings of numbers, making it the responsibility of those programs, called "assemblers", "compilers" and "interpreters", to translate from symbolic langauges to strings of numbers. See Computer Programming Languages -- Recapitulation for more detail.

There are many useful computer programming languages. There is no one "right" language that can be used to solve all problems, no one language you need to learn. As computers and the problems to be solved evolve, the choices of languages that should be used change. Some currently useful languages are:

Python, a scripting language and general purpose programming language created by Guido van Rossum in the late 1980s (see http://en.wikipedia.org/wiki/Python_(programming_language). It has become very popular for a wide range of applications, often providing the framework for gluing powerful C-based algorithms to GUI interfaces (see http://docs.python.org/release/2.5.2/ext/intro.html making complete graphics productions programs by using tcl/tk (a rapid prototyping scripting language, http://en.wikipedia.org/wiki/Tcl) via an interface package called Tkinter (http://wiki.python.org/moin/TkInter)
C, C++ and Java, the "C-family" of languages, the modern work-horses of both systems and applications programming.
Fortran and Cobol, the original "higher-level" languages, and still critical to scientific and business programming respectively.
Basic, a simpler langauge than Fortran that became very popular for computer education in the 1970 and 1980. It remains an important language for engineering programming and for writing scripts for spreadsheet programming.
Perl and Ruby, two of the most popular scripting languages supported a wide range of web service and database applications. Perl is particularly popular in programming for bioinformatics.

There are many ways in which to program, but for most scientists and academics, the approach that seems to work in most cases is to program from templates -- existing examples of data structues and algorithms that have some similarity to the new problem being solved.

Some useful tutorials

Python tutorial-- http://docs.python.org/tutorial/
Java tutorial-- http://download.oracle.com/javase/tutorial/

C tutorial

C++ tutorial-- http://www.cplusplus.com/doc/tutorial/
Perl tutorial-- http://www.perl.com/pub/2000/10/begperl1.html
Fortran 77 tutorial-- http://www.stanford.edu/class/me200c/tutorial_77/
Fortran 90 tutorial--- http://www.cs.mtu.edu/~shene/COURSES/cs201/NOTES/fortran.html
Technology Guide: Computer Programming History by Jonathan O'Brien -- http://www.certstaff.com/trainingcatalog/computer-programming-history.html

The last link, recommended by Zelda Kitchen, provides a great deal of useful additional information.

What to program

In order to learn to program you need some cases to try:

The standard first programming assignment is to write a "Hello World" program that will print out a line saying "Hello World". Here is what one looks like in several different languages.
- hello_world.py -- Python version of Hello World
- hello_world.java -- Java version of Hello World
- hello_world.c -- C version of Hello World
- hello_world.cpp -- C++ version of Hello World
- hello_world.pl -- Perl version of Hello World
- hello_world.f -- Fortran version of Hello World
Once you have an idea of how to produce output, you need to understand how to accept input. Let us look at a program that accepts a line of input and echos it back.
- echo.py -- Python version of echo program
- echo.java -- Java version of echo program
- echo.c -- C version of echo program
- echo.cpp -- C++ version of echo program
- echo.pl -- Perl version of echo program
- echo.f -- Fortran version of echo program

If you can accept input and produce output, then it is time to process the input to produce a related input. In the United States, the wind chill is computed according to the formula:
TWC = 35.74 + 0.6215 T - 35.75 V^0.16 + 0.4275 T V^0.16
where TWC is the wind chill temperature in degrees Fahrenheit, T is the air temperature in degrees Fahrenheit and V is the wind speed in miles per hour. The formula is only valid for wind speeds above 3 miles per hour. Here is a program to ask for a temparature and wind speed and produce a wind chill temperature.
- wind_chill.py -- Python version of wind chill program
- wind_chill.java -- Java version of wind chill program
- wind_chill.c -- C version of wind chill program
- wind_chill.cpp -- C++ version of wind chill program
- wind_chill.pl -- Perl version of wind chill program
- wind_chill.f -- Fortran version of wind chill program

Graphics is an increasingly important aspect of program output. Unfortunately there is no general agreement on how to produce graphical output. Rather than being viewed as in intrinsic responsibility of each language, graphics is usually viewed as an add-on feature, with multiple competing approaches. Two langauges that have developed at least as partial graphics repetoire are Python and Java. Python makes use of Tkinter to provide access to graphics for both input and output. Java comes with a simple 2D graphics library. Here are two simple examples of graphic output in Python and in Java.
- fish.py -- Simple fish swimming cartoon on Python
- FishWinks.java -- More complex fish cartoon in Java

Graphics is also an increasingly important mode of input. There are many alternate ways in which to allow an application to get its input in a graphical context. One important approach is to use a web browser to provide a graphical user interface to a program on a server as in the first example below. In the second example, we have a program that dislays a button and a text field in a modified version of the prior python swimming fish example to allow the user to provide a name for the fish and to push the button to start it swimming.
- python-based cgi forms example
- fishin.py -- Simple fish with button and name selection

Is this real?

Real programs, hopefully, do a lot more than make fish swim. Promol, at ProMOL.org is a real program. Here are some components of that program:

promolglobals.pyfrom the top level
__init__.py from the GUI directory
advanced_toolbox.py from GUI
ez_viz.py from GUI
motif_maker.py from GUI
movie_maker.py from GUI
toolbox.py from GUI
view.py from GUI
welcome.py from GUI
motif.py from Methods
movie.py from Methods
save.py from Methods
setting.py from Methods
utility.py from Methods
visual.py from Methods

Please look at welcome.py and work out what you would need to change to insert an informative paragraph, and then look at motif_maker.py to see what you would need to do to make more serious changes in motif_maker.py. We'll be adding more pieces of ProMOL to this web page below and looking at what is involved in making serious changes to this program, connecting the graphical input to the graphical output.

Once you have tried to extend welcome.py by yourself, see if you came up with something different that what another student did. They looked at the lines in welcome.py that say:



    canvas.create_text(10, 10, text = 'ProMOL', font='-*-new century schoolbook-bold-r-normal-*-34-*-*-*-*-*-*-*', anchor=tk.NW)
    canvas.create_text(50, 50, text = 'Developed by the SBEVSL Project', font='-*-new century schoolbook-bold-r-normal-*-25-*-*-*-*-*-*-*', anchor=tk.NW)
    canvas.create_text(50, 70, text = 'Licensed under GPL, No Warranty', font='-*-new century schoolbook-bold-r-normal-*-25-*-*-*-*-*-*-*', anchor=tk.NW)

and added a similar line, noticing that the first 2 arguments are an x and a y coordinate, and that the number in the middle of the font string is the size of the characters. They allowed a full paragraph in one line by just using treble-quoted text.

One interesting problem that still needs to be resolved with this change was to avoid any conflict with any other uses of this screen real estate. Look down a few lines. If there are any motif loading errors, where will they appear. Maybe you shoud try to force an error into a Motif and see. If the two uses of the same screen real-estate are competing with each other, how should you resolve the conflict?

By moving the new line down into the case of code used when there are no motif errors to report, we get


import Tkinter as tk
import tkFont as tkF
from pmg_tk.startup.ProMol import promolglobals as glb

def initialise():
    canvas = tk.Canvas(glb.GUI.welcome['tab'],height=110, width=500)
    canvas.grid(row=0, column=0, sticky=tk.NW)
    canvas.create_text(10, 10, text = 'ProMOL', font='-*-new century schoolbook-bold-r-normal-*-34-*-*-*-*-*-*-*', anchor=tk.NW)
    canvas.create_text(50, 50, text = 'Developed by the SBEVSL Project', font='-*-new century schoolbook-bold-r-normal-*-25-*-*-*-*-*-*-*', anchor=tk.NW)
    canvas.create_text(50, 70, text = 'Licensed under GPL, No Warranty', font='-*-new century schoolbook-bold-r-normal-*-25-*-*-*-*-*-*-*', anchor=tk.NW)
    if len(glb.MOTIFS['errors']) != 0:
        errorbox = tk.LabelFrame(glb.GUI.welcome['tab'], text='Motif Loading Errors')
        errorbox.grid(row=1, column=0)
        xscroll = tk.Scrollbar(errorbox, orient=tk.HORIZONTAL)
        xscroll.grid(row=1, column=0, sticky=tk.E+tk.W)
        yscroll = tk.Scrollbar(errorbox, orient=tk.VERTICAL)
        yscroll.grid(row=0, column=1, sticky=tk.N+tk.S)
        errors = tk.Listbox(errorbox, height=10, width=70,
            xscrollcommand=xscroll.set, yscrollcommand=yscroll.set)
        errors.grid(row=0,column=0)
        xscroll["command"] = errors.xview
        yscroll["command"] = errors.yview
        for error in glb.MOTIFS['errors']:
            errors.insert(tk.END,error)
    else:
        canvas2 = tk.Canvas(glb.GUI.welcome['tab'], height=200, width=450)
        canvas2.grid(row=1, column=0)
        canvas2.create_text(10,10, text = '''This paragraph is inserted to test the ability of the
welcome.py file to display a paragraph on the welcome window.
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
''', font='-*-new century schoolbook-bold-r-normal-*-15-*-*-*-*-*-*-*', anchor=tk.NW)

The resulting welcome screen is then:

This modification was further changed to include grant credit lines and to read additional text from a file call motd and posted on the sourceforge repository for the SBEVSL project.

The full ProMol program listings are available at ProMOL-4.0 organized by a pymol to html program called pygments and the following script written as a C-shell script:


#!/bin/csh
set verbose
if ($#argv > 0) then
  set target=$argv[1]
else
  set target=`pwd`
endif
if (-d $target) then
  cd $target
  rm -rf ./pygindex.html
  echo '<html>'
  echo '<head>'
  echo '<title>Code in' ${target:t} '</title></head>'  > ./pygindex.html
  echo '<body><font face="Helvetica,Arial,Times" size="4">' >> ./pygindex.html
  echo '<h2 align="center">'Code in ${target:t}'</h2><p><ul>' >> ./pygindex.html
  foreach file (${target}/*)
  echo $file
  if (-d $file) then
    echo '<li>subdirectory: <a href="'${file:t}'/pygindex.html">'${file}'/</a>' >> ./pygindex.html
  endif
  ~/bin/pygall $file
  end
  echo '</ul></font></body></html>'
  cd ..
else
  echo $target
  echo ${target:e}
  if (${target:e} == "py" || ${target:e} == "c" || ${target:e} == "cpp") then
     set otarget=`echo ${target:t}.html | sed 's/\//_/g'| sed 's/^_//'`
     echo $otarget
     ~/bin/pygmentize -f html -O full -o $otarget $target
     echo '<li><a href="'$otarget'">'$target'</a>' >> ./pygindex.html
  else
     if (${target:e} != "html") then
     echo '<li><a href="'${target:t}'">'$target'</a>' >> ./pygindex.html
     endif
  endif
endif

In this case the desired output was a structured tree of folders containing the pygmentized versions of the python code and files called pygindex.html in each folder with links to each pygmentized python code module and with direct links to the other files. Such a script is an example of the sort of quick and dirty reorganization of data done with scripting languages.

What you need to do serious programming

In order to do serious programming you need a computer set up as a development environment. You will need tools to edit the source code of the program, for a compiled program, you will need a compiler for the language involved. Whether you are compiling or interpreting your code, you may need some packages to provide support the exection of your program. You will need an environment in which your can execute your program, and some input data and a plan to test whether the output is correct. Different systems use different approaches, but there are two basic choices -- use of command line tools and use of Graphic User Interface (GUI) tools, often in what is called an Integrated Development Environment (IDE). For Microsoft Windows development, Visual Studio is a popular IDE. Java development under many systems is done with Eclipse. Python comes with its own integrated development environment, IDLE. However, most professional programmers and non-professional programmers trying to avoid confusing program and system failures usually find themselves having to make significant use of command-line tools. The main advantage of command-line tools is that they make it easier to see precisely what is going wrong. In almost all cases the tools that are needed are the ones provided under the various Unix operating systems, especially under Linux, especially the gnu compiler collection, gcc. Even under Microsoft Windows, these same tools are used via using Minimalist GNU for Windows, MinGW.

Once you have a reasonable development environment, you will need to acquire the software packages you will need. Different applications need different supporting packages, and each of those supporting packages may in turn need its own supporting packages. You can reduce the complexity of acquiring the packages you need, by remembering when you put together your system, to install as many standard libraries as you have room for. Don't make a "lean" system.

Packages tend to come in multiple forms -- as precompiled binaries for particular systems, such as Microsoft Windows, Apple Mac OS X, and various flavors of Unix. Packages may come as source kits, either preconfigured for particular systems, or as generic source kits for a wide range of systems. Many systems have preferred package management systems.

Revision Control

One of the most important resources you will need if you are going to maintain any significant program is a revision control system and a reliable place to maintain access to earlier versions. Two of the most important revision control systems are RCS and CVS, which have since been joined by many other Revision Control Systems. One of the most important is subversion.

In order to do revision control you need a place to store the revisions. You certainly can do that on your own computer, but do keep backups somewhere separate. When working in a collaborative way in a group, you need a server. If you are developing proprietary closed source software you will need your own server or you will have to buy time on a commericial server, but, if you are developing open source software, there are free open source development resource servers. See the wikipedia review at en.wikipedia.org/wiki/Comparison_of_open_source_hosting_facilties. One of the most popular services is sourceforge.net.

In order to use these services for free, you will need to use an open source license.

Let is look at an open source project on SourceForge: NearTree. On the site you will find file releases and an SVN source code repository. This is a fairly typical C++ and C project the provides a utility library.

A source kit includes a README and a Makefile. The README provides information to read in understanding the package. The Makefile is used to compile and install the package in a Unix enviroment. More complex packages may have a separate INSTALL file.

In order to manage the complexity of different rules for managing libraries on different systems, GNU libtool is used.

For most open source packages, especially for libraries, such source kits are commonly used. For applications, however, compiled binaries are often provided for the most common platforms, especially for Micrsoft Windows, Mac OS X and linux. RasMol is another project on SourceForge. This is an application, so it provides compiled binaries. The windows binary installer in this case was packaged by another sourceforge project, Nullsoft Scriptable Install System (NSIS). The open source software community provides a very complete and mutually supportive software development environment.

Conclusion

This has been a brief introduction to programming. To become a professional programmer, you need to learn the details of several programming languages and a great deal more lore about how to use those languages, but if you investigate the links from this page, you should be able to learn enough to tackle programming issues involved in solving real problems. Remember to start by understanding what you wish your program to produce as output and what inputs will be provided. Then slowly and carefully, using existing code as templates, piece in one input to output transformation at a time, and you are very likely to be able to use programming as a useful tool.

Last Updated on 20 January 2011
By Herbert J. Bernstein
Email: yaya@bernstein-plus-sons.com