Wednesday 20 July 2016

Implementing A Programming Language In C - Update

Static or Dynamic

It’s been quite some time since my last post because I’ve been considering making the language statically typed. Most of the languages I’ve made are dynamically typed (except for my most recent one) and they’re perfectly fine. The problem is, it’s particularly difficult optimizing such languages for real-world use without greatly increasing the size or complexity of the implementation. Ultimately, most dynamically typed languages are still strongly typed (save a few: https://www.destroyallsoftware.com/talks/wat), and errors which would be caught at compile time are caught at runtime. This runtime overhead is easily avoided by moving type-checking over to the compiler. The language runtime itself can still be “dynamic” but type errors can be caught at compile-time (this is what python does with type hints).
One of the most common arguments against static typing is that it adds a lot of superfluous noise to the source code:
BitBuffer* bitBuffer = new BitBuffer; // lots of typenames
And while that may be true, much of this noise can be removed by inferring types, making the code practically indistinguishable from its dynamically typed equivalent:
auto bitBuffer = new BitBuffer; // not very different from 'var bitBuffer = new BitBuffer;'
Do note that no type information is lost in the latter example.
Concerning performance, bytecode produced for a dynamically typed language will likely be much slower than for a statically typed one. For example:
def add(x, y):
    return x + y
would produce something like (hypothetical machine code):
.add
    type_of_x = typeof x                 ; types extracted at runtime
    type_of_y = typeof y
    
    beq type_of_x, int, add_x_int           ; if the type of x is 'int' goto ...
    beq type_of_x, float, add_x_float       ; ...
    
    print "Cannot add values"
    error

.add_x_int
    beq type_of_y, int, add_x_y_int
    beq type_of_y, float, add_x_int_y_float
    
.add_x_float
    beq type_of_y, int, add_x_float_y_int
    beq type_of_y, float, add_x_float_y_float
    
.add_x_y_int
    result = addi x, y
    ret

.add_x_int_y_float
.add_x_float_y_int 
    print "attempted to add int and float"
    error

.add_x_float_y_float
    result = addf x, y
    ret
Whereas a statically typed program like the following:
int add(int x, int y)
{
    return x + y;
}
Would produce something like:
.add
    result = addi x, y
    ret
I’d say that’s a win.

Memory Management

Okay so languages with dynamic runtimes pretty much have to have a means of automatically managing memory because they simply do not know the size of any values at compile-time. If they didn’t, you’d have to write shit like:
def shit():
    x = 10
    free(x) # free the dynamically allocated integer value
Or at least:
def shit():
    x = 10 
    release(x) # manual ref counting
Keeping track of dynamically allocated values is hard enough. Having to keep track of values allocated on the “stack” would only exacerbate the problem. So, most languages have garbage collectors or automatic reference counting + a garbage collector for reference cycles (like python).

New Syntax

Taking all that I’ve discussed into consideration, I’ve revised the syntax of “Tut” (the programming language we’re making) to the following:
var pos : vec2; // do note that the type is declared after its use

struct vec2
{
    x : int;
    y : int;
};

func make_vec2(x : int, y : int) : vec2
{
    var result : vec2;
    
    result.x = x;
    result.y = y;
    
    return result;
}

// functions which make use of strings should take cstr's as parameters
func puts(str : cstr) : void
{
    printf("%s\n", str);
}

func main() : int
{
    pos = make_vec2(10, 10);
    printf("(%i, %i)\n", pos.x, pos.y);
    
    var name : cstr = "Andy";                       // "cstr" is an immutable reference to a string
    var start_of_name : str = substr(name, 0, 2);   // "str" is a mutable heap-allocated string
    
    puts(name);
    puts(startOfName);                              // "str"s are implicitly casted to cstr
    
    free_str(start_of_name);                        // substr allocates memory so it has to be freed
    return 0;
}
Yes, this is basically C with postfix types, but it has built-in strings and, more importantly, it’s ours to change as we see fit!
Well, I'll get to work on an implementation ASAP.

No comments:

Post a Comment