Showing posts with label erlang. Show all posts
Showing posts with label erlang. Show all posts

Friday, December 25, 2009

teeterl fixes Erlang records

I think Erlang records syntax is neat. Expressive, tight and consistent. A bit repetitive sometimes, yes. There is still a big problem with Erlang records. Not with the syntax but with record declarations shared between modules in .hrl files. This is fixed in teeterl.

teeterl now supports (partially) a new breed of records called 'named tuples'. Named tuples and records are very similar - at some point both are mapped to tagged tuples. The true difference is that named tuples are never declared. The inner structure of a named tuple is inferred from its use. No declarations, no .hrl files, no need to recompile.

Named tuple has a separate syntax from records but this is not of essence. Examples:

%% create a tuple named 'car' with fields 'model' and 'mpg'
Car1 = {car#model <- "Lexus", mpg <- 30},

%% create another tuple named 'car'; name is inferred
%% from use in the previous line
Car2 = {model <- "Honda"},

%% get index of the field in the tuple; needed for keyfind()
%% and the likes
Index = car#mpg,

%% access named tuple field; again name is inferred,
%% the alternative is Car1.car#mpg
Mpg = Car1.mpg,

Values:
Car1 = {car,"Lexus",30}
Car2 = {car,"Honda,undefined}
Index = 3
Mpg = 30

The trick is that field offsets of a named tuple are assigned during module loading not during preprocessing or some other compilation step. Thus the layout of a given named tuple may be different in different VMs.

Now if you need another field in a named tuple just assign a value to the field. Send the extended tuple to the old code and you will not break anything. Modules are compiled in complete isolation, no worry about record versions. Plus information about fields is accessible during runtime. And all these goodies come with no performance penalty. Named tuples work at exactly the same speed as conventional records.

EDIT: the teeterl's source code is on GitHub

Sunday, November 29, 2009

New teeterl; 4x faster

Three months ago I decided to give teeterl a once over. The goal was to make it faster, order of magnitude faster. Today, the new teeterl version has become mature enough to run estone_SUITE and produce meaningful performance measurement printout. The result: it is almost 4 times faster than earlier teeterl. Not 10x, but definitely on the right track. BEAM is still a distant performance star boasting more than 2 times more estones.

The new version of teeterl has a goodish bit under the hood. The most noticeable are the new partial, generation-aware garbage collector and the fact that teeterl VM is now register-based.

The new teeterl can not do much more than run a few test suites now. Nevertheless I will release the source of github as soon I figure out how to push it into existing 'teeterl' repo without destroying the previous version.

stdin/stdout in Erlang

Erlang blogs and message boards are littered with messages expressing dissatisfaction with Erlang syntax, the language's handling of strings, verbosity of records etc. Members of the discontented crowd usually stop short when specifics are requested. I will try to make my small rant about Erlang as specific as possible.

My grudge is about stdin/stdout. Why would modern software need these? Existence of stdin/stdout assumes that Erlang is started from a text-based shell. These days it usually not the case. Why io:format() pretty-printing is crafted to output text on a screen of 80 columns? Erlang is all about non-blocking input/output and then there is io:get_line() getting text from nowhere. I am aware of Elang's telecom heritage and RS-232 interface. But this kind of thing definitely slows the adoption of Erlang in today's world of World-Wide Web.

The new version of teeterl does not have io module as it does not have stdin/stdout concept. On the other hand the new teeterl has had an embedded web server with comet functionality from day one. All input/output in the new teeterl is done through a socket and HTTP. Instead of io:format() there is format:to_json() call and all actual formatting including pretty-printing is done on the client side with jQuery/Javascript. Well, a printout still may appear on stderr but only as an epitaph on the tomb of init process.

Laying out VM vitals

The teeterl's poor performance felt like a cold shower to me. And at the opportune moment I decided to have a closer look at why it is so bad. By then I had a scaffold implementation of a register-based teeterl VM. The new version was already churning out primes numbers in spadefuls without a hitch. But I decided to take two steps back and measure if the decisions I made were of any merit. It happened many of them did more harm than good.

For about a week I was rewriting my shiny new VM into a new application, cleverly named teepad, that was doing nothing but creating random stack frames, touching various memory locations, allocating chunks of memory from the heap, a dispatching random instructions from a dozens of garbage modules. It weighs heavily on a fellow's mind to write a simulation software of the sort. When someone laboriously develop a sizable do-nothing system for a long time, thoughts start to creep in. I am lucky it did not take too long. Now I have speed readings of the garbage mincer and can make a few observations.

Do not reenter VM's main function
As everybody knows the main function of a VM is where the huge switch statement is. I was toying with the idea of reentering the routine on every Erlang function call. Thus register values will be instantaneously back to existence after return. Well, Erlang process should yield from time to time, and I added a separate stack which was filled in when the process was yielding. Boy, it is probably the worst stack management scheme out there. teepad was able to dispatch 2.5 times more commands if such nastiness is avoided.

Keep the number of opcodes moderate
With teepad it was possible to boost the number of different opcodes to a sizable number increasing the number of cases in the master switch statement accordingly. I thought it would be neat to have an opcode for, say, 'move r1, r2' and another opcode for 'move r2, r3'. I was quite wrong again. teepad experiments showed that twice as many opcodes means 15% less throughput, 5 times - shaves 25% from performance.

Packing/unpacking of arguments does not matter
These days of multi-layered caches you may do any number of bitwise operations on a value already retrieved from memory. Take highest 6 bits, mask out 3 lower bits, add 7 to the result, do it on every dispatch and you won't tax your performance noticeably. Bitwise operations help to reduce the memory footprint of a VM and thus stimulate a better cache behavior that more than offset the additional CPU cycles burn of bit fiddling.

More details for someone who care

1. Instruction set generated from the types below:
TypeProb#variants#argsOptions
test0.2101-3[branches]
enter0.0511-1[reduce,jumps]
enterf0.0123-4[reduce,jumps,intermodule]
call0.0211-1[reduce,calls]
callf0.0123-4[reduce,calls,intermodule]
bif0.0182-3[reduce,heap]
move0.3100-2[]
make0.1101-3[heap]
select0.151-3[heap]
compute0.1202-3[]
apply0.0121-2[reduce,jumps]
'receive'0.00111-1[yields,messages]
error0.000111-1[exits]

Implementation of each opcode is randomly generated using Options column, e.g. 'receive' instruction has 1 variant and appears once per each 1000 instructions. Its implementation may include unpacking its arguments, touching the mailbox memory and faking a context switch.

2. Modules are generated from instruction set using probabilities of commands of given type

# of modules is 10.
Module size ranges between 1000 and 100000.
Reductions before yield is, obviously, 1000.
Instruction arguments are packed (4 arguments into 1 word).

3. Generated program is compiled, run for 5sec and the number of instructions dispatched is reported.

The source code of teepad and raw measurement data are gladly provided upon an e-mail inquiry.

Monday, July 27, 2009

apr_pool_t with counter

Lack of tracking of the total size of all allocations made from a given apr_pool_t is a big pain for teeterl. I had to introduce yet another layer of indirection by adding xpool_t structure which is no more than a counter and apr_pool_t reference. Now most of the time the size of allocations are added up neatly to check when the threshold is reached and gc should kick in.
Recently I transgressed to the point of putting the size counting inside apr_pool_t thus producing a custom version of APR. Yes, it did make teeterl a bit smaller and cleaner and auxilliary allotments were counted better. On the other hand, the impure act of meddling with APR code made the installation of teeterl much longer and trickier. I still do not have a single person who managed to build and run teeterl on his/her machine and the recent change make the possibility even more remote. I am rolling everything back to xpool_t's.

Friday, January 23, 2009

TLS 1.0 in Erlang, no OpenSSL

I was borrowing ideas from Erlang/OTP for my X Erlang implementation as a matter of routine. Sometimes the shortcut did not work because I would not understand Erlang/OTP sources or would think they are too convoluted.

Erlang/OTP approach to crypto functions is a disappoinment. It uses external OpenSSL library through a port driver. Erlang has nice binary parsing, big numbers, hash function BIFs etc and all these are replicated in OpenSSL. Plus OpenSSL itlself is three times bigger than whole X Erlang.

Now I have a working TLS library with all cryptographic primitives implemented in Erlang. No OpenSSL or anything of its ilk. The exception are MD5 and SHA1 functions which are BIFs done in C. I barely believe when my TLS server makes its way through a jungle of TLS handshake calculations and starts shoveling application data.

I tesify, and give my oath, that TLS is implementable using RFC 2246 and RFC 2104 (and, of course, Wikipedia).

Sunday, January 18, 2009

Moved to Flex/Actionscript

Why it was never mentioned that dojo toolkit is reimplementation of Flex framework? It would probably saved me many months of poking in dark corners of Javascript with mediocre results.

Today, one month after the 'discovery' my X Erlang IDE has a syntax highlighting editor, a build system, a process viewer, an Erlang console, etc running in Flash Player:



What I am still missing sorely is function call completion. The exact positioning of a small floating window near the cursor is nearly impossible to implement reliably in Javascript. Flex/Actionscript does this and many other things elegantly, is identical in all browsers and is present in 90% of computers around the globe.

I have to mention that it is not possible to process right mouse button click with Flex/Actionscript. There is always, they say, something.

Thursday, December 11, 2008

Javascript talks Erlang

When extending my web-based IDE for X Erlang I stumbled on the fact that I am doing much the same thing over and over again adding a small snippets of code on the server and corresponding integration-only additions to Javascript on the client. Many times the obtained result would be an Erlang term represented by opaque string for the user (me) not the software to understand and interpret.

The next big step was standardizing the way Javascript (JSON) objects are generated from a subset of possible Erlang terms. For instance, [{name:"Judas"}, {occupation:"priest"}] would be converted to { name: 'Judas', occupation: 'priest' } on the Javascript side. So far so good, but the encoding scheme would work only in one direction - from Erlang-based server to the web browser.

The enligtment came later when I recognized that it is not at all necessary to add a new server stub for each request the web application may want to send. The web application may just call Erlang functions directly by their names, provide properly-encoded parameters and expect results as JSON messages. The question was the two-way encoding between Erlang terms and JSON objects. Here comes the scheme:

1. Erlang strings, including empty string "", are represented by Javascript strings

"" -> ""
abc" -> "abc"

2. Atom 'true' and 'false' become Javascript boolean values

'true' -> true
'false' -> false

3. Numbers are just numbers

123 -> 123
123.5 -> 123.5

4. Erlang atoms become objects with a single property "atom"

'abc' -> {atom: "abc"}

5. Erlang binaries become similar objects with a single property, this time named "binary"

<<1,2,255>> -> {binary: "0102ff"}

Binary data is encoded into string, each byte represented by two hex digits
(more compact representations can be imagined)

6. PIDs, References and Ports are encoded as objects too

#Ref<1.2.3> -> {reference: 2, node: 1, creation: 3}
#Port<1.2.3> -> {port: 2, node: 1, creation: 3}
<1.2.3> -> {pid: 2, node: 1, creation: 3}

7. Tuples become objects with integer field names

{T1,T2,T3} -> {1: T1, 2: T2, 3: T3}

8. Lists (which happened not to be strings) are the easiest

[T1,T2,T3] -> [T1, T2, T3]

The described encoding allows to represent any Erlang term as a Javascript value (but not vice-versa). The small open is funs. Today, they are encoded as {fun: name} but this representation is not reversible. Besides it dificult to understand how to reconstract fun object in Erlang (binary_to_term()?).

The additional sugar to the above scheme is record descriptions. For known records, the representation of tuples could be made simpler and more manageable. For instance, the record 'req' is known to have 'module', 'function' and 'args' fields. Then, the Erlang tuple {req,Mod,Fun,Args} is represented by more concise Javascript object with explicit field names: {record: "req", module: Mod, function: Fun, args: Args}. Record description work both ways, during encoding and decoding.

The implementation of the described representation allowed my dojo-based web client to call Erlang functions on the server and pass complex structures back and forth with little or no hassle.

Now, the Erlang console which evaluated expressions using erl_parse and erl_eval stores the resulting bindings on the client as Javascript objects and passes them to subsequent calls as a context not heeding their alien nature.

Thursday, November 13, 2008

Web-based IDE for X Erlang

In one of my previous posts I described a console application that allows the user to interact with X Erlang instance. As the logical next step I wanted to make it more a 'Visual Studio' than a 'DOS prompt'.

After two months of petty bickering I finally settled to use dojo for portability among browsers and ready-to-go widget goodies. Currently, the application allows management of the in-memory filesystem, editing of Erlang code using CodeMirror control and compilation of the code. It even jumps to the correct line in the source 
code when the error message is clicked in the compiler output pane!

The progress is stalled by the capabilities of the X Erlang itself. Now it is not possible to 'pause' a process to observe its internals.

The current version looks like this:


Wednesday, July 23, 2008

X Erlang shell application

To demonstrate X Erlang system I quickly put together a simple web server which serves static files and runs scripts. On top of it I implemented a simple shell application which allows to enter and evaluate arbitrary Erlang expressions from a web browser window. First I put the system running on an Amazon EC2 server but then I put it down after all my friend had a look at how it works.

The typical shell session looks like this:

X Erlang v0.2 is here, type '?' for help

> test:primes(100).
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73, 79,83,89,97]


> [{X,Y} X <- test:primes(100), Y <- test:primes(100), Y =:= X+2].
[{3,5}, {5,7}, {11,13}, {17,19}, {29,31}, {41,43}, {59,61}, {71,73}]


> code:all_loaded().
[lists,test,dict,orddict,queue,sets,erl_parse,regexp,gen_tcp,inet,io_lib_format,erl_lint,dashboard,stdio,gb_sets, io_lib,erlang,io,packages,io_lib_pretty,prim_erlang,code,erl_eval,string,init,erl_internal,files,ordsets,efile, erl_scan,spur,error_handler,x_internal]


$_

The blue characters are entered by the user, the rest is the output of the system. The first
request is to calculate a few prime numbers. Every programming language is demonstrated using prime number routine. Why should I be different? The second request is trickier. The entered expression finds all 'twin primes' less than 100. As you noticed X Erlang supports list comprehensions (but not yet binary comprehensions). The last expression is a maintenance job. It shows which modules are loaded.

Nothing to fancy but it works. It took me a day or two to program a robust shell application on top of a web server. It may take even less for other programmers using other versions of Erlang.

X Erlang as a userland OS

Erlang system has many features of an operating system. The most obvious is processes, the more subtle is modules.

X Erlang takes a few steps further to the autonomous userland OS than OTP does. In X Erlang the operating system is never searched when the system needs to load a module. Two dozens of modules are 'embedded' into the X Erlang executable. There is also an internal storage for module images. These images may be loaded on request. If the referenced module is not embedded and it is not found in the internal storage then {undef,...} is thrown. The file system of the underlying OS is never touched in the process.

To calm down the readers I need to ascertain them that there is a file system in X Erlang. It is internal and in-memory, simplistic, yet hierarchical. In X Erlang file:write_file("temp1.txt", <<1,2,3>>) creates an in-memory 'file' (or rather a named binary). The external file system of the underlying OS is still accessible though 'efile' interface. Compiler works on files and emits output files to the internal file system only.

In X Erlang you need exactly one executable file copied to a freshly installed operating system to launch your application. No runtime, no libraries, a single file typically of less than 2 MB size.

Tuesday, July 22, 2008

APR means tough life for garbage collector

X Erlang is designed to be as portable as possible and a few things would have been done differently, probably, improving performance along the way. I decided not to do this and stick to the original rule. When it comes to memory management the rule is this: use memory pools provided by APR (Apache Portable Runtime) only.

If you played with APR than you should be aware that the library uses a specific memory model which makes a lot of sense to the web server environment. Most of the time it prevents memory leaks. Memory is allocated extremely fast from "memory pools". Internally the allocation is just incrementing a a free memory pointer. However, freeing up memory is not that easy. It can be done by destroying the whole memory pool at once.

So far so good. Following the rule X Erlang creates an APR memory pool per process and quickly allocates the values from it. When it comes to garbage collection reachable values are copied to a a new pool and the original pool is destroyed. It means that, yes, X Erlang uses stop-the-world garbage collection and VM is stalled until all garbage is collected for the overgrown process.

There are two points to watch here.

The first is that in many cases variables reference the same locations in memory in X Erlang. If values are immutable there is no reason to copy them around. Thus during garbage collection we should discern the values which were already copied to the new pool. This would have been easy if APR had a way to say that a given pointer was allocated from a given pool. Unfortunately, this can not be done in APR. The approach adopted by X Erlang is to substitute the value in the current pool with a special structure (I call it a 'grave') with a special tag (a 'cross') and a 'buried' pointer to the copy of value in the new pool. The 'grave' structure occupies 4 bytes and the whole scheme works because all referenced values are at least 4 bytes long. In addition, APR by default aligns allocated memory on 8th boundary but this behavior can be changed.

The second is that is not easy to determine the 'size' of a memory pool. There is just no function for this. Technically, there is such a function in a debug version of the library but it determines the size of the pool by scanning it element by element - no use for the high-performance virtual machine :-). Thus X Erlang VM has to add up sizes of all allocations to keep track of the size of the current memory pool. It is ugly but there seems no other way around.

The garbage collection may not recover any memory at all. The size of the new pool may well be larger than the current one. This happens quite often in X Erlang (and OTP too?) and results in immediate termination of the process grown out of proportion.

There is also a lot to be said about handling literal values but it will be a subject matter of another post.

Monday, July 21, 2008

Erlang reimplemented

For several years I was tinkering with various tools and languages trying to find the best mix for the software I was interested in (peer-to-peer). Many times such quest ends up with a desire to write your own programming language. The resultant ugly duckling was a strange cross between C and IDL. And it was a dead end.

The light at the end of tunnel was Joe Armstrong's paper describing exactly the problems I was fighting with and his elegant solution - Erlang. I started to learn the language by taking the wrong path again - by reimplementing it.

A handful of major versions and a year and a half later to my surprise I created a usable Erlang-based system. It starts, it runs, it even compiles itself.

The implementation is based on a simple stack-based virtual machine and, most probably, suboptimal especially when compared to OTP. But it is very portable and it is small. I call it X Erlang.