Chapter 1 Introduction ====================== In this chapter you will be introduced to the history of the C language and where C falls in the hierarchy of higher-level languages. Also, the operating system and compiler the student will be working with will be discussed. In addition, the concept of structured programming and problem solving will be introduced. 1.1 History of C ~~~~~~~~~~~~~~~~ Dennis Ritchie of Bell Labs created C in 1972. He and Ken Thompson worked on designing the UNIX operating system. C came from Thompson's B language. C was created as a tool for working systems programmers that needed a more readable programming language than assembler but still needed the low level access capabilities of an assembler. C has rapidly become one of the most important and popular programming languages. Most of the UNIX operating system, and MS-DOS are written in C as are most compilers and other systems and applications software. 1.2 Higher Level Languages ~~~~~~~~~~~~~~~~~~~~~~~~~~ C is often called a middle-level computer language. Middle-level does not mean C is less powerful, harder to use, or less developed than high level languages such as BASIC or Pascal; nor is C similar to a low-level language such as assembly language. C combines elements of a high-level language with the functionalism of an assembler. : High Level: ADA, BASIC, COBOL, FORTRAN, Pascal, PL/I, Algol : Middle Level: C, FORTH, C++ : Low Level: Assembler A middle level language gives programmers a minimal set of control and data-manipulation statements that they can use to define high-level constructs. A high-level language is designed to try to give programmers everything they could possibly want already built into the language. A low-level language forces programmers to define all program functions directly because nothing is built in. Middle-level languages are sometimes thought of as building block languages, because the programmer first creates the routines to perform all the program's necessary functions and then puts them together. C and C++ allows a programmer to define routines to perform high-level commands. These routines are called functions and are very important to C and C++. You can tailor a library of C and C++ functions to perform tasks that are carried out by your program. C and C++ manipulates the bits, bytes and addresses with which the computer functions. Unlike BASIC which operates on strings of characters, C and C++ operates on characters. In BASIC there are built-in read and write statements. In C and C++ these procedures are performed by functions that are not part of the C and C++ language itself. These input-output functions are written in C and C++ to perform these operations. C and C++ has very few statements to remember and only 60 keywords as opposed to 159 in BASIC. This means that a C and C++ compiler can be written quite easily. Since C and C++ operates on the same data types as the computer, the code output by C and C++ is efficient and fast. C and C++ can be used in place of assembler for most tasks. C and C++ were first used for systems programming. Systems programming refers to a class of programs that either are part of or work closely with the operating system of the computer. C and C++ is used for systems programming when: #. The program must run quickly; C and C++ programs run almost as fast as ones in assembler. #. C and C++ is a programmers language, it lacks restrictions and easily manipulates bits, bytes, and addresses. #. A programmer needs direct control of I/O and memory management functions that C and C++ gives. 1.3 Operating Systems ~~~~~~~~~~~~~~~~~~~~~ The C and C++ language is available on every operating system in use today. The two most widely popular operating systems in use have the majority of their code written in the C language, UNIX and MS-DOS or PC-DOS. This class is conducted on a mini-computer using the UNIX operating system. The UNIX operating system is a multi-user, multi-tasking operating system. The term multi-user means that multiple people can access the system at the same time. The term multi-tasking means that each user can have the system running more than one task or job or program at the same time. Most modern day operating systems, those developed in the 1970's and later are patterned after the UNIX operating system. The student in this class needs to know how to logon to the UNIX system, use a text editor, use system utility programs, compile C and C++ programs, access the host system over a network for terminal emulation and file transfer. See Appendix A for details on how to use a UNIX system. 1.4 Structured Programming and Problem Solving ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C is a structured language as are ADA and PASCAL. BASIC, COBOL, and FORTRAN are non-structured languages. The most distinguishing feature of a structured language is that it uses blocks. A block is a set of statements that are logically connected. A structured language supports the concept of subroutines with local variables. A local variable is simply a variable that is known only to the subroutine or block in which it is declared. A structured language also supports several loop constructs, such as the **while**, **do- while**, and **for**. A structured language allows separately compiled subroutines or blocks to be used without being in the same program source file. This means that a library of useful, tested blocks or subroutines or functions can be accessed by any program written. A structured language is usually free form. People learn programming languages so they can use the computer as a problem-solving tool. At least four steps can be identified in the computer-aided problem-solving process: #. Problem analysis and specification. #. Algorithm development. >LI>Program coding. #. Program execution and testing. 1.4.1 Problem Analysis and Specification ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Most problems that are to be solved with a computer usually break down to an input component, a process component and an output component. Because the initial description of a problem may be somewhat vague and imprecise, the first step in the problem- solving process is to review the problem carefully in order to determine its **input** - what information is given and which items are important in solving the problem - and its **output** - what information must be produced to determine that the problem was solved. The **process** identifies what actions must be performed on the input inorder to produce the output. Input, process and output are the major parts of the problem's **specification**, and for a problem that appears in a programming text, they are usually not too difficult to identify. In a real-world problem encountered by a professional programmer, however, the specification of the problem often includes other items and considerable effort may be required to formulate it completely. 1.4.2 Algorithm Development ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once a problem has been specified, a procedure or process to produce the required output from the given input must be designed. Since the computer is a machine possessing no inherent problem-solving capabilities, this procedure must be formulated as a detailed sequence of simple steps. Such a procedure is called an **algorithm**. The steps that comprise an algorithm must be organized in a logical, clear manner so that the program that implements this algorithm is similarly well structured. **Algorithms** and **programs** are designed using three basic methods of control: : Sequential:: Steps are performed in a strictly sequential manner, each step being executed exactly once. : Selection:: One of several alternative actions is selected and executed. : Repetition:: One or more steps is performed repeatedly. These three structures appear to be very simple, but in fact they are sufficiently powerful that any algorithm can be constructed using them. Programs to implement algorithms must be written in a language that the computer can understand. It is natural, therefore, to describe algorighms in a language that resembles, the language used to write computer programs, or as it is more commonly called, pseudocode. Unlike high-level programming languages such as Pascal or C, there is not a set of rules that defines precisely what is and what is not pseudocode. It varies from one programmer to another. Pseudocode is a mixture of natural language, such as English, and symbols, terms, and other features commonly used in one or more high-level languages. The following features are common to most pseudocodes: #. The usual computer symbols are used for arithmetic operations: **+** for addition, **-** for subtraction, ***** for multiplication, and **/** for division. #. Symbolic names (identifiers) are used to represent the quantities being processed by the algorithm. #. Some provision is made for including comments. This is usually done by enclosing each comment between a pair of special symbols such as **/*** and ***/**. #. Certain key words that are common in high-level languages may be used: for example, **read** or **enter** to indicate an input operation; **display**, **print**, or **write** for output operations. #. Indentation is used to set off certain key blocks of instructions. The structure of an algorithm can be displayed in a structure diagram or flowchart that shows the various tasks that must be performed and their relation to one another. These diagrams are especially useful in describing algorithms for more complex problems. 1.4.3 Program Coding ~~~~~~~~~~~~~~~~~~~~ The third step in using the computer to solve a problem is to express the algorithm in a programming language. In the second step, the algorithm may be described in English or pseudocode, but the program that implements that algorithm must be written in the vocabulary of a programming language and must conform to the syntax of that language. The major portion of this text is concerned with the vocabulary and syntax of the programming languages C and C++. In any programming language, names are used to identify various quantities. These names are called **variables**. In C and C++ variable names must begin with a letter or underscore character, which may be followed by any number of letters, digits and underscores up to a maximum length of thirty characters. This allows us to choose names that suggest what the variable represents. In the pseudocode description of an algorithm, words such as "enter" and "read" are used for input operations and "display", "print", and "write" are used for output operations. One C or C++ statement that may be used for input is **gets**, and one that may be used for output is **printf**. These two statements are not really part of the C and C++ language definition, but are functions that exist in external libraries provided by the compiler manufacturer. The C and C++ languages actually have no syntax for input and output statements, but through out the years people developing C compilers and now C++ compilers have reached agreement on certain external functions that are to be provided to support input and output along with other functions needed to develop meaningful programs. 1.4.4 Program Execution and Testing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The fourth step in using the computer to solve a problem is to execute and test the program. The procedure for entering a program into the computer varies from one machine to another. Additional details about input of program statements is provided by the instructor. Usually, a text editor is used to input statements. The program source file must be compiled to produce an object module, sometimes called a relocatable binary module(RBM). The RBM is then given to a linkage editor utility which binds the various RBMs together to make a load module. The load module can then be executed under control of the host operating system. If the load module fails to produce the desired results, there could be some type of logic fault in the algorithm, a poor implementation of the algorithm has been done, or simply a typing mistake has caused the meaning of some statement or statements to change. The program source file can be modified, compiled, linked and executed again for another test run. This process continues until the load module delivers the desired results. 1.4.5 Software Engineering ~~~~~~~~~~~~~~~~~~~~~~~~~~ Programming and problem solving is an art in that it requires a good deal of imagination, ingenuity, and creativity. But it is also a science in that certain techniques and methodologies are commonly used. The term **software engineering** has come to be applied to the study and use of these techniques. The **life cycle** of software, that is programs, consists of five basic phases: #. Problem analysis and specification. #. Algorithm development. #. Program coding. #. Program execution and testing. #. Program maintenance. This book will deal with algorithm development and program coding. The initial step in the software life cycle, problem analysis and specification, will be left to another class and another book.