The ugliest C feature:

  Miloslav Trmač        2011-12-26 08:33:27       46,129        0    

<tgmath.h> is a header provided by the standard C library, introduced in C99 to allow easier porting of Fortran numerical software to C.

Fortran, unlike C, provides “intrinsic functions”, which are a part of the language and behave more like operators. While ordinary (“external”) functions behave similarly to C functions with respect to types (the types of arguments and parameters must match and the restult type is fixed), intrinsic functions accept arguments of several types and their return type may depend on the type of their arguments. For example Fortran 77 provides, among others, an INT function which accepts Integer, Real, Double or Complex arguments and always returns an Integer, and a SIN function which accepts Real, Double or Complex arguments and returns a value of the same type.

This helps the programmer somewhat because the function calls don’t have to be changed if variable types change. On the other hand user-defined functions can’t behave this way, so the additional flexibility is really limited to single subroutines that don’t need to call user-defined functions.

Some C programmers would call the feature ugly from the above description already, for the same reason integrating printf into the language would be ugly.

This functionality was incorporated in C99 together with other features for better support of numerical computation and it is provided in the abovementioned <tgmath.h> header. Provided are goniometric and logarithmic functions, functions for rounding and a few others. The header defines macros that shadow the existing functions from <math.h>; e.g. the cos macro behaves like the cos function when its parameter has type double, like cosf for float, cosl for long double, ccos for double _Complex, ccosf for float _Complex, ccosl for long double _Complex. Finally, when the parameter has any integer type, the cos function is called, as if the parameter were implicitly converted to double.

The second reason why this feature is ugly is that it attempts to imitate functions, but the imitation is imperfect and even dangerous: If you try to pass the generic macro cos as a function parameter, you actually always supply the cos function operating on doubles because the macro expansion doesn’t happen when cos is not followed by a left parenthesis.

The final reason why this feature is ugly is that such macros can’t be implemented in strictly conforming C, they have to rely on some kind of compiler support – and experience (e.g. the speed with which bugs in the glibc implementation are discovered) seems to suggest this features is used very rarely and doesn’t deserve to be a part of the “core language”, especially because the underlying feature is not available. (Contrast this to <stdarg.h>, which is available for portable use.)

Now, if the feature is both ugly and not used in practice, why mention it at all? I’m writing this article because I have examined the glibc implementation and it is such an ingenious hack that I feel it should be recorded for posterity, in some better way than this commit message:

2000-08-01  Ulrich Drepper  <drepper@redhat.com>
            Joseph S. Myers  <jsm28@cam.ac.uk>

* math/tgmath.h: Make standard compliant. Don't ask how.

The most straightforward way to mimic the Fortran compilers is to use a very simple macro: (I’ll be using cos for most examples; the semantics of other macros is similar.)

#define cos(X) __tgmath_cos(X)

The compiler would then treat __tgmath_cos as a built-in operator and transform it to one of the function calls in the frontend.

The cleanest solution I have seen proposed is to add basic function overloading support to the compiler frontend which can be activated using a vendor extension (the compiler would otherwise be required by the C standard to diagnose incompatible declarations of a single identifier):

#define cos(X) __tgmath_cos(X)
#pragma compiler_vendor overload __tgmath_cos
double __tgmath_cos (double x)
{return (cos) (x); }
float __tgmath_cos (float x)
{return cosf (x); }
long double __tgmath_cos (long double x)
{return cosl (x); }

…

(Warm-up excercise: Why are the parentheses around cos in the definition of __tgmath_cos (double) necessary?)

Of course implementing this for the sole purpose of <tgmath.h> is a lot of work (although it can perhaps share code with the C++ frontend, if any). Nobody should want to use such an unportable extension in their C programs, and there are probably very few programs using <tgmath.h>, so it is probably not worth the effort to extend the compiler this way.

The glibc implementation has had to rely to extensions present in already used gcc versions, so the implementation is rather complex.

First, let’s implement a macro that can choose the right function type. Because C doesn’t support conditional macro expansion, the conditions will have to be in the expanded code. We want something like

#define cos(X) \
  (X is real) ? ( \
    (X has type double \
     || X has an integer type) \
    ? (cos) (X) \
    : (X has type long double) \
    ? cosl (X) \
    : cosf (X) \
  ) : (
    (X has type double _Complex) \
    ?  ccos (X)

…

and it turns out writing the conditions above is quite easy:

  • “X is real” is sizeof (X) == sizeof (__real__ (X))
  • “X has an integer type” is (__typeof__ (X))1.1 == 1.

    (Medium-difficulty excercise: (__typeof__ (X))0.1 == 0 is incorrect. Why?)

    (glibc actually uses __builtin_classify_type, an internal gcc built-in, in some cases, and something similar to the above tests in other cases.)

  • “X has type double/long double/float” can be determined using sizeof. This won’t work precisely on architectures where some C types are mapped to the same hardware type, but on those architectures there is no difference between operations on those types, and the outside C program can’t recognize the difference. By the C “as-if” rule, this is good enough.

Good, so our cos macro now can select the correct function and call it. Unfortunately it always returns a result of type long double _Complex because the ? : operators above return a value with the type of the “usual arithmetic conversions” between the second and third operands of the operator.

We can avoid these type conversions in favor of a type we choose by using another gcc extension, statement expressions:

#define cos(X) ({ result_type __var; \
  if (X is real) { \
    if ((X has type double) \
        || (X has an integer type)) { \
      __var = (cos) (X); \
    else if (X has type long double) \
      __var = cosl (X); \

…

  __var; })

Now the result of the macro will always have the type result_type, and the problem is solved.

Is it?

It’s not, actually. How do we define result_type? For floating-point types we can just use __typeof__ (X), but we want double for integer arguments, and C doesn’t have a ? : operator for types, does it?

The two excercises above are not there because I’m a teacher and I’d like to check your progress or something, they are there to prepare you for the final, difficult excercise – or to scare you off before you get to this part (well, I’m sure I have bored everyone to death and nobody is reading this anyway :) ). Although the context of the excercise is a big hint already, it is probably not enough, so here goes:

Difficult excercise: What is the difference between results of

1 ? (int *)0 : (void *)0

and

1 ? (int *)0 : (void *)1

and why does the difference exist?

Unlike the first two excercises, which can be solved with a bit of research and experimentation, this one (especially the “why” part) probably requires reading of the C standard, so I’ll explain it here.

First, it is necessary to explain the following concepts:

  • An integer constant expression is simply an integer expression with a constant value, from the point of view of the compiler: the compiler is able to compute the constant value without any optimization except for constant folding. In particular the expression doesn’t use values of any variables.
  • A null pointer is a pointer value equal to the integer 0. A null pointer can have any pointer type.
  • A null pointer constant is a syntactic construct. The value of a null pointer constant, when converted to a pointer type, is a null pointer (“null pointer, the value”, described above). The NULL macro expands to a null pointer.

    Because a null pointer constant is a syntactic construct, it has a syntactic definition: it is either an integer constant expression with value 0, or such an expression cast to void *. For example 0, 0L, 1 - 1, (((1 - 1))), (void *)0 and (void *)(1 - 1) are null pointer constants, 1, (int *)0 and (void *)1 are not null pointer constants.

    (OK, so it is not really a syntactic construct when it’s definition refers to a value of an expression, but it’s best to think of it as if it were a syntactic construct. In most cases the “integer constant expression with value 0″ is a literal 0 anyway.)

Now we can look at the standard’s section 6.5.15, paragraph 6, which refers to the conditional operator ? : and says, among other things:

If both the second and third operands are pointers …, the result type is a pointer … . Furthermore, if both operands are pointers to compatible types …, the result type is a pointer to … the composite type; if one operand is a null pointer constant, the result has the type of the other operand; otherwise, … the result type is a pointer to … void.

Thus, in

1 ? (int *)0 : (void *)0

the third operand is a null pointer constant, so the result is (int *)0. In

1 ? (int *)0 : (void *)1

the third operand is not a null pointer constant, so the result is (void *)0. This is our conditinal operator for types, now we just need to polish it a bit.

Note that (X has an integer type) is an integer constant expression. Therefore the result of

1 ? (__typeof__ (X) *)0 : (void *)(X has an integer type)

is (void *)0 if X has an integer type and (__typeof__ (X) *)0 otherwise, and the result of

1 ? (int *)0 : (void *)(!(X has an integer type))

is (int *)0 if X has an integer type and (void *)0 otherwise. Note that in each case the result of one of the expressions is (void *)0.

Let’s call the above expressions E1 and E2. Then the result of

1 ? (__typeof__ (E1))0 : (__typeof__ (E2))0

is (int *)0 if X has an integer type and (__typeof__ (X) *)0 otherwise; again, note that one of the second and third expressions is a null pointer constant.

Finally, we can define result_type as

__typeof__ (*(1 ? (__typeof__ (E1))0 : (__typeof__ (E2))0))

That’s it. The rules for macros with more than one argument are a bit more complicated, but all the building blocks are described above.

Source : http://carolina.mff.cuni.cz/~trmac/blog/2005/the-ugliest-c-feature-tgmathh/

C    FORTRAN  INTRINSIC FUNCTIONS  C99  UGLY 

       

  RELATED


  0 COMMENT


No comment for this article.



  RANDOM FUN

Not fully tested PayPal