Your continued donations keep Wikipedia running!

Programming language

From Wikipedia, the free encyclopedia

Computer code (HTML with JavaScript) in a tool that uses syntax highlighting (colors) to help the developer see the purpose of each piece of code.

A programming language is a stylized communication technique intended to be used for controlling the behaviour of a machine (often a computer). Like human languages programming languages have syntactic and semantic rules used to define meaning.

Thousands^[1] of different programming languages have been created and new ones are created every year. (see list of programming languages). Few languages ever become sufficiently popular that they are used by more than a few people, but professional programmers are likely to use dozens of different languages during their career.

1 Definitions of programming language
2 Specification
3 Features of programming language
4 History of programming languages
5 Classifications of programming languages
6 Formal semantics
7 See also
8 Notes
9 External links

[edit]

Definitions of programming language

There is no universally agreed definition for the term programming language. The following is a list of some of the methods that have been used to categorize a language as being a programming language.

What it is used for. For instance, a programming language is a language used to write programs.
Those involved in the interaction. For instance, a programming language differs from natural languages in that natural languages are used for interaction between people, while programming languages are used for communication from people to machines (this rules out languages used for computer to computer interaction).
The constructs it contains. For instance, a programming language contains constructs for defining and manipulating data structures, and for controlling the flow of execution.
Its expressive power. The theory of computation provides a classification of languages based on the range of computations expressible by them, with the most expressive language being one that is Turing complete. Any algorithm that can be implemented in a language that is Turing complete can also be implemented in another language that is Turing complete. Some examples of languages that are not Turing complete are pure HTML (the use of embedded PHP or Javascript makes it Turing complete) and SQL (SQL vendors invariably add language extensions that create a Turing complete language, e.g., PL/SQL).

[edit]

Specification

The specification of a language can take several forms, including:

a description of the language's syntax and semantics (e.g., C language). The latter usually lists the results of performing an operation and restrictions on the kinds of operations that can be performed.
a description of the behavior of a translator (e.g., C++). The syntax and semantics of the language can be inferred from this.
the language is defined in terms of an executable specification (e.g., Prolog), often written in that language.

[edit]

Features of programming language

Each programming language can be thought of as a set of formal specifications concerning syntax, vocabulary, and meaning.

These specifications usually include:

Type system
Data structures
Instruction and control flow
Design philosophy
Compilation and interpretation

Those languages that are widely used – or have been used for a considerable period of time – have standardization bodies that meet regularly to create and publish formal definitions of the language and discuss the extension of existing definitions.

[edit]

Type system

Main article: Type system

A type system defines how a programming language classifies values and variables into types, how it can manipulate those types and how they interact. The design and study of type systems is known as type theory.

Internally, all data in modern digital computers are stored simply as zeros or ones (binary). The data typically represent information in the real world such as names, bank accounts and measurements, so the low-level binary data are organized by programming languages into these high-level concepts as data types. There are also more abstract types whose purpose is just to warn the programmer about semantically meaningless statements.

Languages can be classified with respect to their type systems.

static vs. dynamic typing: This distinction refers to when values have a type associated with them. In a dynamically typed language types are an attribute of data, and thus a variable acquires a type by being assigned a particular value at runtime. In statically typed languages, variables are given a type by the programmer at compile time, and data assigned to the variable must match its type.
strong vs. weak typing: This distinction refers to the behavior of the language when values are operated on. In a strongly typed language, operations must match the types of the values operated on (addition must be performed on numbers, for example). In weakly typed languages operations might be made on values which are inappropriate.

Type inference is a mechanism whereby the type specifications can often be omitted completely, if it is possible for the compiler to infer the types of values from the contexts in which they are used – for example, if a variable is assigned the value 1, a type-inferring compiler does not need to be told explicitly that the variable is an integer. There are however many different uses for integers; it might e.g. make sense in a program to prevent inadvertent adding of a phone number to the number of apples in a box. Therefore some languages such as Ada allow defining different kinds of incompatible integers; this is called strong typing. Type-inferred languages can be more flexible to use, particularly when they also implement parametric polymorphism.

It is possible to perform type inference on programs written in a dynamically typed language, but it is entirely possible to write programs in these languages that make type inference infeasible. Sometimes dynamically typed languages are called latently typed.

Note that strong vs. weak is a continuum; Java is a strongly typed language relative to C, but is weakly typed relative to ML. Use of these terms is often a matter of perspective, much in the way that an assembly language programmer would consider C to be a high-level language while a Java programmer would consider C to be a low-level language.

Strong and static are orthogonal concepts. See the cross reference table below. But beware that some people incorrectly use the term strongly typed to mean strongly, statically typed, or, even more confusingly, to mean simply statically typed – in the latter usage, C would be called strongly typed, despite the fact that C doesn't catch that many type errors and that it's both trivial and common to defeat its type system (even accidentally).

Aside from when and how the correspondence between expressions and types is determined, there's also the crucial question of what types the language defines at all, and what types it allows as the values of expressions (expressed values) and as named values (denoted values). Low-level languages like C typically allow programs to name memory locations, regions of memory, and compile-time constants, while allowing expressions to return values that fit into machine registers; ANSI C extended this by allowing expressions to return struct values as well (see record). Functional languages often restrict names to denoting run-time computed values directly, instead of naming memory locations where values may be stored, and in some cases refuse to allow the value denoted by a name to be modified at all. Languages that use garbage collection are free to allow arbitrarily complex data structures as both expressed and denoted values.

Finally, in some languages, procedures are allowed only as denoted values (they cannot be returned by expressions or bound to new names); in others, they can be passed as parameters to routines, but cannot otherwise be bound to new names; in others, they are as freely usable as any expressed value, but new ones cannot be created at run-time; and in still others, they are first-class values that can be created at run-time.

[edit]

Type system cross reference list

edit talk
Programming language	static / dynamic	strong / weak	safety	Nominative / structural
Ada	static	strong	safe	nominative
assembly language	none	strong	unsafe	structural
APL	dynamic	weak	safe	nominative
BASIC	static	weak	safe	nominative
C	static	weak	unsafe	nominative
C++	static	strong	unsafe	nominative
C# ²	static	strong	both³	nominative
Clipper	dynamic	weak	safe	duck
D	static	strong	unsafe	nominative
Fortran	static	strong	safe	nominative
Haskell	static	strong	safe	structural
Io	dynamic	strong	safe	duck
Java	static	strong	safe	nominative
JavaScript	dynamic	weak	safe	nominative
Lisp	dynamic	strong	safe	structural
ML	static	strong	safe	structural
Objective-C ¹	dynamic	weak	safe	duck
Pascal	static	strong	safe	nominative
Perl 1-5	dynamic	weak	safe	nominative
Perl 6	hybrid	strong	safe	duck
PHP	dynamic	strong	safe	?
Pike	static	weak	safe	nominative
Python	dynamic	strong	safe	duck
Ruby	dynamic	strong	safe	duck
Scheme	dynamic	weak	safe	nominative
Smalltalk	dynamic	strong	safe	duck
xHarbour	dynamic	weak	safe	duck

applies to the Objective-C extension only. The C basis is unchanged.
C# 3.0 has hybrid typing with Anonymous Types.
C# can be both unsafe and safe with use of 'unsafe' functions and code blocks.

[edit]

Data structures

Main article: data structure

Most languages also provide ways to assemble complex data structures from built-in types and to associate names with these new combined types (using arrays, lists, stacks, files).

Object oriented languages allow the programmer to define data-types called "Objects" which have their own intrinsic functions and variables (called methods and attributes respectively). A program containing objects allows the objects to operate as independent but interacting sub-programs: this interaction can be designed at coding time to model or simulate real-life interacting objects. This is a very useful, and intuitive, functionality. Languages such as Python and Ruby have developed as OO (Object oriented) languages. They are comparatively easy to learn and to use, and are gaining popularity in professional programming circles, as well as being accessible to non-professionals. It is commonly thought that object-orientation makes languages more intuitive, increasing the public availability and power of customized computer applications.

[edit]

Instruction and control flow

Once data has been specified, the machine must be instructed how to perform operations on the data. Elementary statements may be specified using keywords or may be indicated using some well-defined grammatical structure.

Each language takes units of these well-behaved statements and combines them using some ordering system. Depending on the language, differing methods of grouping these elementary statements exist. This allows one to write programs that are able to cover a variety of input, instead of being limited to a small number of cases. Furthermore, beyond the data manipulation instructions, other typical instructions in a language are those used for control flow (branches, definitions by cases, loops, backtracking, functional composition).

[edit]

Design philosophy

For the above-mentioned purposes, each language has been developed using a special design or philosophy. Some aspect or another is particularly stressed by the way the language uses data structures, or by which its special notation encourages certain ways of solving problems or expressing their structure.

Since programming languages are artificial languages, they require a high degree of discipline to accurately specify which operations are desired. Programming languages are not error tolerant; however, the burden of recognizing and using the special vocabulary is reduced by help messages generated by the programming language implementation.

There are a few languages which offer a high degree of freedom in allowing self-modification in which a program re-writes parts of itself to handle new cases. Typically, only machine language, Prolog, PostScript, and the members of the Lisp family (Common Lisp, Scheme) provide this capability. In MUMPS language this technique is called dynamic recompilation; emulators and other virtual machines exploit this technique for greater performance.

[edit]

Compilation and interpretation

Main article: compiler

There are, broadly, two approaches to execute a program written in a given language. These approaches are known as compilation, done by a program known as a compiler; and interpretation, done by an interpreter. Some programming language implementations support both interpretation and compilation.

An interpreter parses a computer program and executes it directly. One can imagine this as following the instructions of the program line-by-line. In contrast, a compiler translates the program into machine code – the native instructions understood by the computer's processor. The compiled program can then be run by itself.

Compiled programs usually run faster than interpreted ones, because the overhead of understanding and translating the programming language syntax has already been done. However, interpreters are frequently easier to write than compilers, and can more easily support interactive debugging of a program.

Many modern languages use a mixture of compilation and interpretation. For example, the "compiler" for a bytecode-based language translates the source code into a partially compiled intermediate format, which is later run by a fast interpreter called a virtual machine. Some "interpeters" actually use a just-in-time compiler, which compiles the code to machine language immediately before running it. These techniques are often combined. Like other aspects of programming languages, "compiled" and "interpreted" may be best understood as opposite ends of a spectrum, rather than the only two options.

[edit]

History of programming languages

The development of programming languages follows closely the development of the physical and electronic processes used in today's computers.

Programming languages have been under development for years and will remain so for many years to come. They got their start with a list of steps to wire a computer to perform a task. These steps eventually found their way into software and began to acquire newer and better features. The first major languages were characterized by the simple fact that they were intended for one purpose and one purpose only, while the languages of today are differentiated by the way they are programmed in, as they can be used for almost any purpose. And perhaps the languages of tomorrow will be more natural with the invention of quantum and biological computers.

Charles Babbage is often credited with designing the first computer-like machines, which had several programs written for them (in the equivalent of assembly language) by Ada Lovelace.

In the 1940s the first recognizably modern, electrically powered computers were created. Some military calculation needs were a driving force in early computer development, such as encryption, decryption, trajectory calculation and massive number crunching needed in the development of atomic bombs. At that time, computers were extremely large, slow and expensive: advances in electronic technology in the post-war years led to the construction of more practical electronic computers. At that time only Konrad Zuse imagined the use of a programming language (developed eventually as Plankalkül) like those of today for solving problems.

Subsequent breakthroughs in electronic technology (transistors, integrated circuits, and chips) drove the development of increasingly reliable and more usable computers. The first widely used high-level programming language was FORTRAN, developed during 1954–57 by an IBM team led by John W. Backus. It is still widely used for numerical work, with the latest international standard released in 2004. A Computer Languages History graphic shows a timeline from FORTRAN in 1954.

Shortly after, Lisp was introduced. Lisp was based on lambda calculus, and is far more regular in its syntax than most non-Lisp derived languages.

Dennis Ritchie developed the C programming language, initially for DEC PDP-11 in 1970.

During the 1970s, Xerox PARC developed Smalltalk, an object oriented language.

Based on the development of Smalltalk and other object oriented languages, Bjarne Stroustrup developed a programming language based on the syntax of C, called C++ in 1985.

Sun Microsystems released Java in 1995 which became very popular as an introductory programming language taught in universities. Microsoft presented the C# programming language in 2001 which is very similar to C++ and Java. There are many, many other languages (cf. List of programming languages).

[edit]