The Wasteful Legacy of Programming as Language

  Carlos Oliveira        2011-11-28 10:36:34       2,692        0    

A few years ago I visited a friend who is a graduate student in linguistics. After some time he asked me if I was aware of the work by Chomsky on formal languages. I told him that yes, Chomsky work was a basis for much of the developments in theoretical computer science. More than that, I was glad to learn that there was something technical that I could share and discuss with other people in linguistics.

At the time I found this was just a great coincidence. It was only recently, though, that I started to think seriously about the implications of the idea that much of our understanding of computer programming stated with the study of human languages.

Language and Computers

One of the early ambitions in computer science was to understand, or at least be able to parse, human language. In order to do this, computer scientists explored models of how language worked in general. Based on these models, researchers figured out that we could classify languages in terms of complexity. Moreover, many computer languages could be entirely contained in some of the lower levels of this hierarchy, which is called the Chomsky hierarchy.

This was an incredible discovery for scientists working on compilers and interpreters, because its developments resulted in the necessary tools to efficiently parse languages. After that, instead of creating ad hoc parsers, which was the norm in the early period, one could apply a rigorous method to find a parser for a particular programming language.

The insidious side of this success, however, is that it completely changed the perceptions computer scientists have of how to give instructions to computers. We now have a frame of mind in which the important thing in the process of instructing a computer is to define a suitable language – and the more powerful the language the better.

The problem of this line of reasoning is that it creates a number of hurdles that are completely artificial, as we will soon discuss. For example, it makes us believe that there is something fundamentally different between programming languages, when in fact it doesn’t.

Computer Languages versus Human Languages

When we look at human languages, it does make sense that different languages exist. After all, language is a cultural concept. For example, as a speaker of two languages, it is very clear to me that speaking in English conveys different meaning and context, compared to things spoken in my mother language. There are a set of ideas that are better expressed in English, as well as there is a huge subset of thoughts that I wouldn’t be able to express well in anything other than my primary language.

As human beings, we want language to be inexact and able to convey several meanings. Contrary to the popular belief among mathematicians and engineers, this is a good feature of human languages, because it mimics the way we think. The truth about the study of language is that we are dealing with the whole culture of a community, not just with a stream of tokens that convey meaning.

The opposite, however, is true for computer languages. While we might strongly favor the use of human-like language, computers don’t really care if we use a bracket of a “begin” symbol to start a block of commands. Which means that we are, by introducing the subtleties of human language, complicating the task of how to instruct computers. Instead, we could just avoid the subtle differences in programming language and use a simple, uniform representation.

Unnecessary Work

The other problem with using current programming languages is that they create a lot of unnecessary work. Computer scientists have throughout the years devised baroque languages such as C++ and Java in an effort to provide programming instructions in a powerful way. It turns out, however, that the whole steps of writing a sequence of tokens, parsing that stream, and checking the syntax is unnecessary.

Everything in a computer program can be interpreted as a tree of expressions. If you’re encoding that tree in a block generated by a bracketed expression or a for loop it is not important for the computer. There is no reason why we cannot manipulate the syntax tree directly, for example, using a tree model on the screen, instead of parsing tokens used to recreate the tree every single time.

As a result, the use of a language is unnecessary in both sides: computers don’t care where the stream of bits is coming from. It could be coming from a direct binary representation or from a Java program with all variables in German.

Human beings, on the other hand, don’t necessarily need to use the language either. Although it may look like a useful feature, the existence of a language implies the manual labor of keeping a textual representation, as well as the algorithmic work of translating that representation back into a tree of expressions. The point is that, for all we care, we could just as well be moving boxes around the screen to represent variables — there is no necessity to create a textual document representing a program.

Divide and Conquer

Still, the other issue with programming languages is that it tends to divide programmers into artificial categories, one for each language or even flavor of a language. For example the only difference between a Java programmer and a C# programmer is that they use somewhat different keywords and have to program against a different run time library. The only reason C# and Java programmers think of themselves as different is that they use the concept of language to define themselves.

Being more polemic, the only difference between a C programmer and a Lisp programmer is in what their languages force them to do. A C programmer is forced by its language to write pointer-related code. While a Lisp programmer is forced to think in terms of lists of expressions (even when they’re using other data structures).

A language-less programming system, on the other hand, would avoid this kind of issue altogether by providing only the means to do work, without forcing programmers to use a pre-defined way of thinking. Whatever the form used to interact with the program, a language-less system doesn’t care if parts of the program are shaped as s-expressions or C blocks: it just combine them as needed to create a program.

In a program written in this way, using garbage collection is just a matter of plugging the right run time support, which you could get along with the system or you could buy from a third party. Instead, we erroneously consider garbage collection as a fundamental attribute of the language in use. As another example, in a language agnostic system we would would be able to create and call closures by just adding a simple module to the system, not by devising yet another language that supports closures as a first class citizen.

But the advantages don’t stop there. Without the need for a language, it also disappears the need for a compiler. Everything is such a system is already parsed and ready to use. When we combine pieces of code they already know how to glue themselves to each other, so there is no need for a linker either.

A system would regenerate a new executable in a fraction of a second, instead of going through the painful and wasteful process of language interpretation, code generation, linking. For example, what exactly is the need to recompile an entire file when we change only a single character? In a traditional language, that is entirely necessary, because you could be modifying the token “blass” to “class”, which changes entirely the meaning of the whole file. In a language-less system, this could never happen, because each transformation is related to the existing syntax tree, which is already stored in the system. Therefore, a single change like this can only have local effect in the program.

Why Such a System Doesn’t Exist?

There are several reasons, but the least important of them is technology. We have graphical capabilities in any modern desktop computer that are beyond what is necessary to implement the visualization features for such a programming environment. The main difficulty separating us from that goal is the experience we already have as programmers.

Each one of us have been trained to use textual languages as the main way of interact with computers. We have learned to use associated tools, such as grep and make, that are imperfect but that give good results if you are disciplined.

Moreover, it is satisfying to use the huge amount of code available with the existing technologies. Everyone of us have been conditioned for many years to look at what has been done in our programming language, and even reuse the code if needed.

I think that this is the main reason why visual languages have had such a hard time to establish themselves. After using such a language for a few minutes, you immediately feel that it is so much easier to go back and write yet another program in C or Java or Python. Of course, it will always be much more productive to use a tool that you’re used to.

However, we have never been able to really improve a language-less system enough to see their advantages as compared to traditional language-based systems. I believe that there will be a big jump in programming quality the day a group of people decide to develop a true language-agnostic programming system. And when programmers in general learn that they don’t need a language to create valid code they will think that we were just crazy for using, for such a long time, a concept as wasteful as that of programming language.

Source:http://coliveira.net/software/the-wasteful-legacy-of-programming-as-language/

PROGRAMMING LANGUAGE  HUMAN LANGUAGE  CHOMSKY 

       

  RELATED


  0 COMMENT


No comment for this article.



  RANDOM FUN

System crash at high load