An Overview of Programming

© Copyright 2011 Herbert J. Bernstein

This is an overview of programming. It is targeted towards readers who want to understand how to solve problems with computers by writing new programs and scripts, rather than by using a canned application, but it is not intended for those computer professionals who are only happy if they have made their own job as difficult as possible. See, for example, Real Programmers Don't Eat Quiche. Rather the focus is on helping working scientists and other academics in coping with computer programming as a necessary evil in getting other, much more important work done, much as a homeowner might learn to fix their own toilets or put in a new electrical outlet.

What is Programming

Programming is directing a computer to perform a specific sequence of actions (commands or instructions) in some specified order. Think of writing a recipe for somebody else to follow to bake a cake. If you assume anything or leave out any steps, the cake will probably not be what you intended it to be and may even be inedible.

In order to program a computer, it is helpful to understand what a computer is. A computer is a device that can be told to transform information, and information is just a bunch of numbers. As counter-intutive as that may be, there is now a respectable history and sound theory behind the idea. You can get a sense of that history from any introduction to computer science (e.g. Introduction to Computer Science) but, if you are a hurry and are not trying to prepare to be a computer scientist, it is sufficient to understand what is called the "von Neumann architecture" of a computer.

The von Neumann Model

The classic von Neumann architecture for a computer consists of a digital memory to hold both programs and data, a control unit to obey the program instructions in memory, either in sequence or jumping to specified locations in memory, modifying the contents of memory according to those instructions, various arithmetic units to perform calculations required by the program as directed by the control unit, and input/output units.

Despite many serious efforts to come up with a better, "non-von-Neumann" computer architecture, and the vast increase in power and complexity of computers over the years, this is still be basic structure of modern computers.

The Process of Programming

Everything has to be reduced to numbers. The memory has to be organized into places to hold those numbers, into "data structures". Some of the data structures will hold numbers that don't change during calculations -- those data structures are "constants". Some of the data structures will hold number that may change during calculations -- those data atructures are "variables". Some portions of memory will hold the numbers representing the organized groups of instructions of the stored program to be executed -- "routines", "subroutines", "functions", "methods", "actions", .... We use the term "algorithm" to describe the steps that will be taken without having to be too specific about the exact representation of instructions in memory, and the term "program" to describe the exact sequence of instructions that will be followed.

Working with the von-Neumann model in mind, the process of creating a computer program can be broken down into the following steps:

Fortunately, there are many programs available that simplify the programming process, letting us work with something a little easier to understand than strings of numbers, making it the responsibility of those programs, called "assemblers", "compilers" and "interpreters", to translate from symbolic langauges to strings of numbers. See Computer Programming Languages -- Recapitulation for more detail.

There are many useful computer programming languages. There is no one "right" language that can be used to solve all problems, no one language you need to learn. As computers and the problems to be solved evolve, the choices of languages that should be used change. Some currently useful languages are:

There are many ways in which to program, but for most scientists and academics, the approach that seems to work in most cases is to program from templates -- existing examples of data structues and algorithms that have some similarity to the new problem being solved.

Some useful tutorials

The last link, recommended by Zelda Kitchen, provides a great deal of useful additional information.

What to program

In order to learn to program you need some cases to try:

Is this real?

Real programs, hopefully, do a lot more than make fish swim. Promol, at ProMOL.org is a real program. Here are some components of that program:

Please look at welcome.py and work out what you would need to change to insert an informative paragraph, and then look at motif_maker.py to see what you would need to do to make more serious changes in motif_maker.py. We'll be adding more pieces of ProMOL to this web page below and looking at what is involved in making serious changes to this program, connecting the graphical input to the graphical output.

Once you have tried to extend welcome.py by yourself, see if you came up with something different that what another student did. They looked at the lines in welcome.py that say:



    canvas.create_text(10, 10, text = 'ProMOL', font='-*-new century schoolbook-bold-r-normal-*-34-*-*-*-*-*-*-*', anchor=tk.NW)
    canvas.create_text(50, 50, text = 'Developed by the SBEVSL Project', font='-*-new century schoolbook-bold-r-normal-*-25-*-*-*-*-*-*-*', anchor=tk.NW)
    canvas.create_text(50, 70, text = 'Licensed under GPL, No Warranty', font='-*-new century schoolbook-bold-r-normal-*-25-*-*-*-*-*-*-*', anchor=tk.NW)


and added a similar line, noticing that the first 2 arguments are an x and a y coordinate, and that the number in the middle of the font string is the size of the characters. They allowed a full paragraph in one line by just using treble-quoted text.

One interesting problem that still needs to be resolved with this change was to avoid any conflict with any other uses of this screen real estate. Look down a few lines. If there are any motif loading errors, where will they appear. Maybe you shoud try to force an error into a Motif and see. If the two uses of the same screen real-estate are competing with each other, how should you resolve the conflict?

By moving the new line down into the case of code used when there are no motif errors to report, we get


import Tkinter as tk
import tkFont as tkF
from pmg_tk.startup.ProMol import promolglobals as glb

def initialise():
    canvas = tk.Canvas(glb.GUI.welcome['tab'],height=110, width=500)
    canvas.grid(row=0, column=0, sticky=tk.NW)
    canvas.create_text(10, 10, text = 'ProMOL', font='-*-new century schoolbook-bold-r-normal-*-34-*-*-*-*-*-*-*', anchor=tk.NW)
    canvas.create_text(50, 50, text = 'Developed by the SBEVSL Project', font='-*-new century schoolbook-bold-r-normal-*-25-*-*-*-*-*-*-*', anchor=tk.NW)
    canvas.create_text(50, 70, text = 'Licensed under GPL, No Warranty', font='-*-new century schoolbook-bold-r-normal-*-25-*-*-*-*-*-*-*', anchor=tk.NW)
    if len(glb.MOTIFS['errors']) != 0:
        errorbox = tk.LabelFrame(glb.GUI.welcome['tab'], text='Motif Loading Errors')
        errorbox.grid(row=1, column=0)
        xscroll = tk.Scrollbar(errorbox, orient=tk.HORIZONTAL)
        xscroll.grid(row=1, column=0, sticky=tk.E+tk.W)
        yscroll = tk.Scrollbar(errorbox, orient=tk.VERTICAL)
        yscroll.grid(row=0, column=1, sticky=tk.N+tk.S)
        errors = tk.Listbox(errorbox, height=10, width=70,
            xscrollcommand=xscroll.set, yscrollcommand=yscroll.set)
        errors.grid(row=0,column=0)
        xscroll["command"] = errors.xview
        yscroll["command"] = errors.yview
        for error in glb.MOTIFS['errors']:
            errors.insert(tk.END,error)
    else:
        canvas2 = tk.Canvas(glb.GUI.welcome['tab'], height=200, width=450)
        canvas2.grid(row=1, column=0)
        canvas2.create_text(10,10, text = '''This paragraph is inserted to test the ability of the
welcome.py file to display a paragraph on the welcome window.
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
''', font='-*-new century schoolbook-bold-r-normal-*-15-*-*-*-*-*-*-*', anchor=tk.NW)


The resulting welcome screen is then:

This modification was further changed to include grant credit lines and to read additional text from a file call motd and posted on the sourceforge repository for the SBEVSL project.

The full ProMol program listings are available at ProMOL-4.0 organized by a pymol to html program called pygments and the following script written as a C-shell script:


#!/bin/csh
set verbose
if ($#argv > 0) then
  set target=$argv[1]
else
  set target=`pwd`
endif
if (-d $target) then
  cd $target
  rm -rf ./pygindex.html
  echo '<html>'
  echo '<head>'
  echo '<title>Code in' ${target:t} '</title></head>'  > ./pygindex.html
  echo '<body><font face="Helvetica,Arial,Times" size="4">' >> ./pygindex.html
  echo '<h2 align="center">'Code in ${target:t}'</h2><p><ul>' >> ./pygindex.html
  foreach file (${target}/*)
  echo $file
  if (-d $file) then
    echo '<li>subdirectory: <a href="'${file:t}'/pygindex.html">'${file}'/</a>' >> ./pygindex.html
  endif
  ~/bin/pygall $file
  end
  echo '</ul></font></body></html>'
  cd ..
else
  echo $target
  echo ${target:e}
  if (${target:e} == "py" || ${target:e} == "c" || ${target:e} == "cpp") then
     set otarget=`echo ${target:t}.html | sed 's/\//_/g'| sed 's/^_//'`
     echo $otarget
     ~/bin/pygmentize -f html -O full -o $otarget $target
     echo '<li><a href="'$otarget'">'$target'</a>' >> ./pygindex.html
  else
     if (${target:e} != "html") then
     echo '<li><a href="'${target:t}'">'$target'</a>' >> ./pygindex.html
     endif
  endif
endif

In this case the desired output was a structured tree of folders containing the pygmentized versions of the python code and files called pygindex.html in each folder with links to each pygmentized python code module and with direct links to the other files. Such a script is an example of the sort of quick and dirty reorganization of data done with scripting languages.

What you need to do serious programming

In order to do serious programming you need a computer set up as a development environment. You will need tools to edit the source code of the program, for a compiled program, you will need a compiler for the language involved. Whether you are compiling or interpreting your code, you may need some packages to provide support the exection of your program. You will need an environment in which your can execute your program, and some input data and a plan to test whether the output is correct. Different systems use different approaches, but there are two basic choices -- use of command line tools and use of Graphic User Interface (GUI) tools, often in what is called an Integrated Development Environment (IDE). For Microsoft Windows development, Visual Studio is a popular IDE. Java development under many systems is done with Eclipse. Python comes with its own integrated development environment, IDLE. However, most professional programmers and non-professional programmers trying to avoid confusing program and system failures usually find themselves having to make significant use of command-line tools. The main advantage of command-line tools is that they make it easier to see precisely what is going wrong. In almost all cases the tools that are needed are the ones provided under the various Unix operating systems, especially under Linux, especially the gnu compiler collection, gcc. Even under Microsoft Windows, these same tools are used via using Minimalist GNU for Windows, MinGW.

Once you have a reasonable development environment, you will need to acquire the software packages you will need. Different applications need different supporting packages, and each of those supporting packages may in turn need its own supporting packages. You can reduce the complexity of acquiring the packages you need, by remembering when you put together your system, to install as many standard libraries as you have room for. Don't make a "lean" system.

Packages tend to come in multiple forms -- as precompiled binaries for particular systems, such as Microsoft Windows, Apple Mac OS X, and various flavors of Unix. Packages may come as source kits, either preconfigured for particular systems, or as generic source kits for a wide range of systems. Many systems have preferred package management systems.

Revision Control

One of the most important resources you will need if you are going to maintain any significant program is a revision control system and a reliable place to maintain access to earlier versions. Two of the most important revision control systems are RCS and CVS, which have since been joined by many other Revision Control Systems. One of the most important is subversion.

In order to do revision control you need a place to store the revisions. You certainly can do that on your own computer, but do keep backups somewhere separate. When working in a collaborative way in a group, you need a server. If you are developing proprietary closed source software you will need your own server or you will have to buy time on a commericial server, but, if you are developing open source software, there are free open source development resource servers. See the wikipedia review at en.wikipedia.org/wiki/Comparison_of_open_source_hosting_facilties. One of the most popular services is sourceforge.net.

In order to use these services for free, you will need to use an open source license.

Let is look at an open source project on SourceForge: NearTree. On the site you will find file releases and an SVN source code repository. This is a fairly typical C++ and C project the provides a utility library.

A source kit includes a README and a Makefile. The README provides information to read in understanding the package. The Makefile is used to compile and install the package in a Unix enviroment. More complex packages may have a separate INSTALL file.

In order to manage the complexity of different rules for managing libraries on different systems, GNU libtool is used.

For most open source packages, especially for libraries, such source kits are commonly used. For applications, however, compiled binaries are often provided for the most common platforms, especially for Micrsoft Windows, Mac OS X and linux. RasMol is another project on SourceForge. This is an application, so it provides compiled binaries. The windows binary installer in this case was packaged by another sourceforge project, Nullsoft Scriptable Install System (NSIS). The open source software community provides a very complete and mutually supportive software development environment.

Conclusion

This has been a brief introduction to programming. To become a professional programmer, you need to learn the details of several programming languages and a great deal more lore about how to use those languages, but if you investigate the links from this page, you should be able to learn enough to tackle programming issues involved in solving real problems. Remember to start by understanding what you wish your program to produce as output and what inputs will be provided. Then slowly and carefully, using existing code as templates, piece in one input to output transformation at a time, and you are very likely to be able to use programming as a useful tool.


Last Updated on 20 January 2011
By Herbert J. Bernstein
Email: yaya@bernstein-plus-sons.com