COMS W4115
Programming Languages and Translators
Lecture 2: January 26, 2015
Basic Elements of Programming Languages
Overview
- Language processing tools
- Language specifications
- Programming paradigms
- Language design
- Language project
1. Language Processing Tools
- Compiler
- Interpreter
- Bytecode interpreter
- Just-in-time compiler
- Linker and loader
- Preprocessor
- For details see ALSU Section 1.1.
2. A Language Specification Defines
- the representation of programs
- the syntax and constraints of the language
- the semantic rules for interpreting programs
- the representation of input data to be processed by programs
- the representation of output data produced by programs
- other restrictions on programs (such as what makes a program portable
across different implementations and platforms)
3. Programming Paradigms
- A programming paradigm is a fundamental style for writing computer programs.
A paradigm is characterized by the way in which the structure and elements
of a computer program are put together.
Programming languages are often characterized by the typical programming paradigms
they are designed to support.
Many of today's programming languages have been designed to support more than
one programming paradigm.
Here is a list of programming paradigms commonly used in today's programs.
- Imperative
- An imperative program specifies how a computation is to be done.
An imperative program is a sequence of statements that change the state of a program.
- Example: C is a language most often used for imperative programming.
- Declarative
- A declarative program specifies what computation is to be done.
It expresses the logic of a computation without describing its control flow.
- Examples: The functional language Haskell and the domain specific language
Yacc are declarative languages.
- Object oriented
- An object-oriented program is one that does its computation with interacting objects, which
are data structures that contain fields and code often called methods.
- C++ and Smalltalk support object-oriented programming.
- Functional
- A functional language is one whose computation model is based on the lambda calculus.
A program written in a functional style treats computation as the evaluation of mathematical
functions and avoids side effects.
- Haskell is currently considered one of the purest functional languages.
- Scripting
- A scripting language is a dynamically typed, interpreted language with high-level
operators for
"gluing" computations together.
- AWK and Perl are scripting languages.
- Multi-paradigm
- A multi-paradigm language is one that supports more than one programming style.
- Example: Python and Ruby are multi-paradigm languages that support
object-oriented and functional programming.
4. Language Design Issues
- The design of a programming language is an exercise in engineering tradeoffs.
In 1973 Tony Hoare stated that there are so many important but conflicting criteria
in language design that their reconciliation and satisfaction is a major engineering
task.
Here are just some of the issues that need to be
considered in the design of a language.
- Domain of application
- What class of problems is the language designed to solve?
- For example, C was initially designed by Dennis Ritchie at Bell Labs
in the early 1970s as a programming language
for building the Unix operating system. It is still one of the world's
most widely used systems programming languages.
The C examples in this section refer to ANSI C (C89).
- Programming model
- What programming paradigms should the programming language support?
- C is an imperative language, designed around the
von Neumann model of computation.
- Program structure
- What is the basic translation unit for a program?
- In C, a translation unit is a sequence of function definitions and declarations
stored in a file.
- Character set and lexical conventions
- What are the source and target character sets? What are the lexical units of a
source program?
- The character set of C source programs is contained within seven-bit ASCII.
- Encodings of the Unicode Standard for character sets are implemented in many languages
and browsers.
The most commonly used encodings for Unicode characters are the Unicode
Transformation Formats UTF-8 (in which characters are from 1 to 4 bytes long)
and UTF-16 (which is a variable-length character encoding for the entire
Unicode character repertoire).
- A token is a meaningful sequence of characters in a source program.
- C has six classes of tokens: identifiers, keywords, constants, string literals,
operators, and separators.
- Names, scopes, bindings, and lifetimes
- A name is a string of characters used to identify some element in a program.
- In C identifiers are named with sequences of letters (here, underscore is
considered a letter) and digits.
The first character of an identifier name must be a letter. At least the first
31 characters in an identifier name are significant.
- The scope of a name is the region of the program in
which it is known (visible).
- A binding is an association between two things such as between a
variable and its type or between a symbol and the operation it represents.
The time at which this association is determined is called
the binding time. Bindings can take place at various times ranging
from language design time to run time.
- The lifetime of a variable is the time during which the variable
is bound to a specific memory location.
- Memory management
- One important function of a programming language is to provide facilities
for managing memory either implicitly or explicitly.
- C provides three ways to allocate memory:
- Static allocation where space for an object is provided before the program
begins to execute.
- Automatic allocation where temporary objects are stored by the compiler on
the runtime stack.
This space is automatically freed by the compiler after the block in which
the object is
declared is exited.
- Dynamic allocation where blocks of memory of arbitrary size can be requested
by a programmer using runtime library functions such as malloc from a region of
memory called the heap. These blocks
persist until freed by a call to a runtime library function such as free.
- Data types and operators
- A data type defines a set of data values and the operations allowed
on those values.
- C has a small number of basic types including
char, int, double,
long double, float,
enum, void
. The types char
and int
may be further qualified, e.g., unsigned char
.
- C has a potentially infinite number of recursively
defined derived types
such as arrays of objects of some type, functions returning objects of
some type, pointers to objects of some type, structures containing a sequence
of objects of various types, and unions containing any one of several
objects of various types.
- C has a rich set of arithmetic, relational, logical, and assignment
operators.
- Expressions and assignment statements
- Expressions are the primary means for specifying computations
in a programming language.
- Assignment statements are basic constructs in imperative programming languages.
Assignment statements allow the programmer to dynamically change
the bindings of values to variables.
- Control flow
- Flow of control refers to the sequence in which the operations specified in
a program are executed at run time. There are flow-of-control issues at
many levels such as flow of control within
expressions, among statements, and among program units.
Most programming languages have control statements and other control
structures for controlling the flow of control within a program.
- C has a variety of flow-of-control constructs such as blocks for grouping
together sequences of statements and
control statements such as
if-else, switch, while, for,
do-while, break, continue,
and goto
.
- Functions and process abstraction
- Functions are perhaps the most important building blocks of programs.
Functions are often called procedures, subroutines, or subprograms.
Functions break large computing tasks into smaller ones and facilitate
code reuse. Functions are such an important topic in programming
languages that we will talk about them in much more detail later in
this course.
- Data abstraction and object orientation
- Data abstraction in the form of abstract data types was introduced
into programming languages after process abstraction. The programming
language Simula67 was instrumental
in motivating constructs for supporting object-oriented programming
in modern programming languages such as C++, C#, Java, and Smalltalk.
Object orientation is characterized by encapsulation, polymorphism,
inheritance, and dynamic binding.
- Concurrency
- Concurrent execution of programs has assumed much more importance with
the widespread deployment of multi-core and many-core processors.
- Concurrency
in software execution can occur many levels of granularity: instruction,
statement, subprogram, and program.
- Concurrency can be achieved with libraries (like MPI for Fortran, pthreads
for C) or with direct language support (as in Cilk, X10).
- However, effective, robust exploitation of concurrency is still an open
research area in software.
- Exception and event handling
- Many languages have facilities for reacting to run-time error conditions.
C++ has the
try-catch
construct to catch exceptions raised by the
throw
statement.
- Event handling is like exception handling in that an event handler is called
by the occurence of an event. Implementing reactions to user interactions
with GUI components is a common application of event handling.
- Usability issues
- A good language design should facilitate learnability,
readability, writability, maintainability, and efficiency.
5. Language Project
- Project description
- Form a team of five by February 4, 2015 .
- Create an innovative (Turing-complete) little language of your own design.
- Write a compiler or interpreter for your language.
- Each team member must write at least 500 lines of
code of the compiler or interpreter.
- Project constitutes 40% of final grade.
- Project team: elect one person to serve each of the following functions.
- Project manager
- This person sets the project schedule, holds weekly meetings
with the entire team, gets the team to agree on a common style guide,
maintains the project log, and makes
sure the project deliverables get done on time.
- Language and tools guru
- This person defines the baseline process to track language changes
and maintain the intellectual integrity of the language.
- This person teaches the team how to use various tools
used to build the compiler.
- System architect
- This person defines the compiler architecture, modules, and
interfaces.
- System integrator
- This person defines the system integration environment, the runtime,
and makes sure the compiler components work together.
- Tester and validator
- This person defines the test suites and automates their execution
to make sure the compiler meets the language specification.
- Project due dates and deliverables:
- Feb. 25: Language white paper (written by entire team, 3-4 pages).
- Mar. 25: Language tutorial (written by entire team, around 15 pages).
- Chapter 1 of Kernighan and Ritchie is a good model of a
language tutorial.
- Describe a few representative programs that illustrate the
nature and scope of your language.
- A "hello, world" program is de rigueur.
- Show how programs in your language are to be written, compiled, and run.
- Mar. 25: Language reference manual (written by entire team, around 20 pages).
- Appendix A of Kernighan and Ritchie is a good model.
- Include a complete description of the lexical and syntactic
constructs of your language along with a description of their semantics.
- Include a full grammar for your language.
- May 10: Final project report due on Courseworks.
- May 11-13: Demo of language and working compiler to the teaching staff in the
Computer Science Conference Room.
6. Practice Problems
- Describe the von Neumann model of computation (computer architecture).
- Compare C and Python with regard to their (a) programming model and
(b) primitive data types.
7. Reading Assignment
8. References
- Brian Kernighan and Dennis Ritchie, The C Programming Language,
2nd edition, Prentice Hall, 1988.
This is the classic reference on ANSI C (C89).
-
Python Tutorial
- Robert W. Sebesta, Concepts of Programming Languages,
tenth edition, Pearson, 2012.
An easy-to-read book describing the important constructs found in many modern programming languages.
- C. A. R. Hoare, "Hints on Programming Language Design,"
Proceedings ACM SIGACT/SIGPLAN Conference on Principles of Programming Languages, 1973.
aho@cs.columbia.edu