I first planned to release this text as an appendix entry for Learn You Some Erlang, but considering this feels more like editorial content and not exactly something for a reference text, I decided it would fit better as a blog post.
Many newcomers to Erlang manage to understand the syntax and program
around it without ever getting used to it. I've read and heard many
complaints regarding the syntax and the 'ant turd tokens' (a
subjectively funny way to refer to ,
, ;
and .
), how annoying it is, etc.
As mentioned at some point in the book, Erlang draws its syntax from Prolog. While this gives a reason for the current state of things, it doesn't magically make people like the syntax. I mean, I don't expect anyone to respond to this by saying "Oh, it's prolog, I get it. Makes complete sense!" As such, I'll suggest three ways to read Erlang code to possibly make it easier to understand.
The Template
The template way is my personal favorite. To understand it, one must first get rid of the concept of lines of code and think in Expressions. An expression is any bit of Erlang code that returns something.
In the shell, the period (.
) ends an expression. After writing 2 + 2
, you must add a period (and then press <Enter>) for the expression to be ran to then return a value.
In modules, the period ends forms. Forms are module attributes and
function declarations. Forms are not expressions as they don't return
anything. This is why they're terminated in a different manner than
everything else. Given forms are not expressions, it could be argued
that the shell's use of .
to terminate expression is what is not standard here. Consequently, I'd suggest not caring about the shell for this method of reading Erlang.
Alright. So the first rule is that the comma (,
) separates expressions:
C
=
A
+
B, D
=
A
+
C
This is easy enough. However, it should be noted that if ... end
, case ... of ... end
, begin ... end
, fun() -> ... end
and try ... of ... catch ... end
are all expressions. As an example, it is possible to do:
Var
=
if
X
>
0
->
valid
;
X
=<
0
->
invalid
end
And get a single value out of the if ... end
. This
explains why we will sometimes see such language constructs followed by a
comma; it just means there is another expression to evaluate after it.
The second rule is that the semi-colon (;
) has two roles. The first one is separating different function clauses:
fac(
0
)
->
1
;
fac(N)
->
N
*
fac
(N
-1
)
.
The second one is separating different branches of expressions like if ... end
, case ... of ... end
and others:
if
X
<
0
->
negative
;
X
>
0
->
positive
;
X
==
0
->
zero
end
It's probably the most confusing role because the last branch of the
expression doesn't need to have the semi-colon following it. This is
because the ;
separates branches, it doesn't terminate
them. Think in expressions, not lines. Some people find it easier to
illustrate the role of separator by writing the above expression in the
following way, which is arguably more readable:
if
X
<
0
->
negative
;
X
>
0
->
positive
;
X
==
0
->
zero
end
This makes the role of separator more explicit. It goes in between branches and clauses, not after them.
Now, because the semi-colon is used to separate expression branches and function clauses, it becomes possible to have an expression such as a case
construct followed by ,
when followed by another expression, a ;
when in the last position of a function clause, or a .
when at the last position of a function.
The line-based logic for terminating lines such as in C or Java must go out the window. Instead, see your code as a generic template you fill (hence the name The Template):
head1(Args) [Guard]
->
Expression1, Expression2,
.
.
.
, ExpressionN
;
head2(Args) [Guard]
->
Expression1, Expression2,
.
.
.
, ExpressionN
;
headN(Args) [Guard]
->
Expression1, Expression2,
.
.
.
, ExpressionN
.
The rules make sense, but you need to get into a different reading
mode. That's where the heavy lifting needs to be done: moving from lines
and blocks towards a pre-defined template. I mean, if you think about
it, things like for (int i = 0; i >= x; i++) { ... }
(or even for (...);
)
have a weird syntax when compared to most other constructs in languages
supporting them. We're just so used to see these constructs we don't
mind them anymore.
The English Sentence
Although this manner is not the one I like the most, I do realize different people have different ways to make sense of logical concepts and this is one manner I've heard being praised many times.
This one is about comparing Erlang code to English. Imagine you're writing a list of things. Well, no. Don't imagine it, read it.
I will need a few items on my trip: if it's sunny, sunscreen, water, a hat; if it's rainy, an umbrella, a raincoat; if it's windy, a kite, a shirt.
An Erlang translation can remain a bit similar:
trip
_
items(
sunny
)
->
sunscreen
,
water
,
hat
;
trip
_
items(
rainy
)
->
umbrella
,
raincoat
;
trip
_
items(
windy
)
->
kite
,
shirt
.
Here, just replace the items by expressions and you have it. Expressions such as if ... end
can be seen as nested lists.
And, Or, Done.
Another variant of this one has been suggested to me on #erlang. The user simply reads ,
as 'and', ;
as 'or' and .
as being done. A function declaration can then be read as a series of nested logical statements and affirmations.
In Conclusion...
Some people will just never like "ant turd tokens" or being unable to swap lines of code without changing the token at the end of the line. I guess there's not much to be done when it comes to style and preferences, but I still hope this text might have been useful. After all, "the syntax is only intimidating, it's far from difficult."