Do Threads Remain the Best Concurrency Abstraction?

Crazed123 · **Joined:** Thu Oct 21, 2004 11:00 pm **Posts:** 248

Do continuations in functional languages capture global state? I don't think so, but I wouldn't bet my life on that.

Colonel Kernel · **Posted:** Fri May 25, 2007 9:37 am

Crazed123 wrote:

Do continuations in functional languages capture global state? I don't think so, but I wouldn't bet my life on that.

In pure functional languages, global state is immutable, so continuations wouldn't need to capture it.

Crazed123 · **Joined:** Thu Oct 21, 2004 11:00 pm **Posts:** 248

Yes, but what about impure functional languages, like the ever-popular Scheme?

Colonel Kernel · **Posted:** Fri May 25, 2007 1:08 pm

Crazed123 wrote:

Yes, but what about impure functional languages, like the ever-popular Scheme?

I'm guessing they wouldn't capture global state... It's one of those cases where "if it hurts, stop doing it" (which is not a very robust solution

). Are there mutexes in Scheme?

Crazed123 · **Joined:** Thu Oct 21, 2004 11:00 pm **Posts:** 248

The basic Scheme spec does not include any form of concurrency, so no. Scheme implementations that add concurrency always add mutexes.

In an OS context, not capturing global state actually creates a useful area of shared mutable state that continuations can exploit to communicate; we all know that shared-memory communication runs faster than pure message passing.

Actually, I believe Scheme doesn't capture global state in continuations due to global state not comprising a part of the theoretical continuation, but I really should learn more denotational semantics before saying that with certainty.

Avarok · **Joined:** Thu Aug 30, 2007 9:09 pm **Posts:** 102

Umm...

Actually, internally threads are very very simple to switch between. You simply need a single assembly-written interrupt handler, or function if you wrap it slightly differently.

switch_context_interrupt_handler:
pusha;
mov [thread_stack_next], esp;
; cycle the thread_stack pointer in a ring by decrementing, testing it against
; the floor, and cycling it back to the top when it hits the floor.
mov esp, [thread_stack_next];
popa;
iret;

By keeping a different stack (which is just an array that resizes at the beginning not the end) for each thread, we simply need to change stacks during a function call to change context.

The EIP register (current instruction) is stored on the stack when the function is called, so when we call the function it's on the stack, we then store the registers... later we get back to it... we pop the registers off, the EIP register is right there where it was when the function was called, and then we iret returning us right where we were.

It doesn't get more elegant.

JamesM · **Posted:** Mon Sep 03, 2007 3:22 am

Avarok: I can assure you that these guys know how to switch threads efficiently. They were discussing the theory of alternative methods.

miro_atc · **Joined:** Tue Jul 10, 2007 12:19 am **Posts:** 6

Hi guys,

I read for the first time this topic now and to be honest I do not understand completly the continuation system. So perhaps I should not write this, but still this topic is about thread related abstractions, so I will give you mine idea.

In many of my embedded projects I have used a concept that is alternative to the threads. Instead of threads I use "modules".
When the modules are running they have their own CPU context and basically they look like threads.

Now, let me explain the differences:

1)
The threads, usually are loops. You create the thread then you wait for an event to process, then another one and so on...
In order to be able to catch an event, you need to create first the thread, which means that the thread is eating memory even if it is just waiting.

The modules are more like functions. First you declare the function so the OS and the rest of the world knows about it. But this does not start the function automatically. Someone has to send a "request".
With other words, the module is actualy a black box that can accept "requests" and after processing it may return "responses".
The OS is responsible to queue all incomming requests and to call the module function (as a task). When the function returns, the OS will restart it with the next request if any or kill the context.

2)
The threads have priorities. Most of the OSes are based on RMA scheduler. So the user knows that if there are two threads ready and which one will get the CPU. In most of the cases, the system is quite predictable.
What the people fail to undesrtand usualy here is that in order to make the system more efficient, the threads should be assigned with priority according to their rate, not the importance!

From other side, we can assign importance to the modules. For example imagine a mobile device with two modules - UART and battery charger.
We can say, that the charger is more important, because failures in the charger can blow the device off. While failures in the UART can loose some bytes only. We can also give some time to the requests. For example, if we want to send something on the UART we would like this to be done in miliseconds, while for the charger is enough to check it once in few seconds. This time, is actualy a "laxity" for the task.
One possible approach to determine which module will gain the CPU is to combine the laxity and the importance in a single number. Where the laxite will decrement down to zero. So if the OS has two modules ready to execute, it will preffer the module with lower laxity. So usually the UART will have priority over the charger. Unless they both become critical - with zero laxity. Then the one with more importance wins.

3) multitasking communications.
With threads it is usually a hell to make communications between various tasks. The OS provide some tools, but the main effort remains - it is easier to go over TCP/IP rather to talk between tasks.

The modules, can have addresses so in order to make a request to another module you need the address. And that's all. You don't care if the other module is running, busy etc. You just prepare a request, containg the sourse, destination and the time allowed. Then you choose to block the current execution until the response is ready, or you may continue (checking the response later or specifying that you don't care about the response). The OS is responsible to route the request, even if is not to the same CPU, host, or network.
Note that the requests have a time frame, so if a task sends a blocking request to a remote module, the local kernel will resume the task, if there is a timeout or no response.

PS
In many cases, such a modular system is more efficient than traditional thread based OS. Especially for embedded systems - you get better scheduling, better use of the resurses and simpler software design. Once you start writting modules, you define clear interfaces between them - you just have to do it.
While with threads - who defines what a thread does? Two threads written from different guys usually means problems, because one of them usually will think he is very important and won't sleep.

While the modules do not loop - they have a piece of work to do and when they are done - they exit. Eventually releasing memory etc.
Since this is a concept, many OS related functionality can use it as well. For example - the dynamic memory. When you try to allocate memory you send a request... If there is no free memory at the moment - you wait... If you are lucky, you will get the requested memory before your deadline. No extra code is required and everything is simple and safe.
It is also very easy to debug and interface such a system. Since the OS serves as a router, you can change the routing tables and "replace" some modules (like display or keyboard) from the target to the host or whatever.

Please excuse me if you think this is not related to the topic ;-)

Avarok · **Joined:** Thu Aug 30, 2007 9:09 pm **Posts:** 102

@James:
Okay, yeah, I just hadn't heard that one yet. I *am* fairly new and just thought it up as being the most obvious solution to me.

Quote:

I read for the first time this topic now and to be honest I do not understand completly the continuation system. So perhaps I should not write this, but still this topic is about thread related abstractions, so I will give you mine idea.

Likewise.

Essentially what you're describing is implementing multiple entry-points for processes, which can act as event handlers? You pass the handler to some other process as an entry-point address, and they simply blind-call it, and hope for the best?

My line of thinking has been that this employs a great deal of trust in the called *thing* (for lack of a better word). Having both an entry-point and scheduler acknowledgment of the task switch, and a thread in the receiving task to handle the request as well as anything else it was doing seems to make sense to me, though it results in having a few dozen stacks and entry points in a single process sometimes.

That's the concept I hope to run with, unless something else I hear catches my interest.

miro_atc · **Joined:** Tue Jul 10, 2007 12:19 am **Posts:** 6

Avarok wrote:

Essentially what you're describing is implementing multiple entry-points for processes, which can act as event handlers? You pass the handler to some other process as an entry-point address, and they simply blind-call it, and hope for the best?

Not exactly.
First, the module needs to have public "address" but this is not a memory address. Any addressing scheme can be used - in my stuff I am using numbers, but you can use let's say "sys/dev/serial/uart0/send" (just as an example).
Second, you do not call other modules in your own context. Instead you call a system function passing the request. Then the OS should pass the request to the receiving module and the receiving module will be executed in its own context.
No "blind calls" no hazard! The only thing in common between the modules are requests. If they are on the same machine, then the requests/responses could be on phisical memory accessable from both processes. Otherwise, the kernel is responsible to copy the data from machine to machine, but in no case one module could have access private data of another module.

Quote:

My line of thinking has been that this employs a great deal of trust in the called *thing* (for lack of a better word).

You need to trust the OS only :-)

The OS is responsible to return you a "timeout" or "destination unreachable" responses, so you don't wait forever.
You don't care about the "called thing" - you care about the response only. If you like the response - great! If you don't - you have a free will ;-)

Crazed123 · **Joined:** Thu Oct 21, 2004 11:00 pm **Posts:** 248

I've actually simplified the continuation system since posting it. So-called "virtual processors" are now only seen by the kernel, which starts every processor it can find using the same entry point of the same init program at boot.

A continuation is what remains of an unfinished computation. At the OS level, a continuation is the saved registers and stack of a suspended task. A running program can communicate with another program by sending data to the other program's saved continuation, which is called throwing the continuation.

The operation call_with_current_continuation(function) suspends the running program, turning it into a continuation. It then passes that continuation to the given function of one argument, which runs with that continuation as input. When it wants to, it can throw data to the continuation, thus ceasing to run itself as the continuation resumes on the current processor.

Events are an abstraction over interrupts. You can suspend the current continuation waiting for an event to fire, or you can make an already-saved continuation wait on the event. In this way they work rather like condition variables. When an event fires, the extremely-simple kernel scheduler decides whether to preempt the running computation to run an event handler (it won't preempt running event handlers to avoid infinitely recursive interrupts).

These abstractions mean that high-level scheduling (beyond preempting normal code for event handlers) can be moved into user-land code that knows best how to handle its load. You could even run a system with 4 processors on which a different scheduler handled each processor.

Colonel Kernel · **Posted:** Tue Sep 04, 2007 1:20 pm

@miro_atc:
(Aside) Are you by any chance Miro Samek?

Your idea reminds me a lot of "actors" in Erlang, perhaps better known as the "active object" pattern in more mainstream languages. Correct me if I'm wrong, but this is what I see as the two fundamental advantages of what you've described:

By packaging up mutable state into discrete "modules", you avoid a lot of the hazards of conventional multi-threaded programming (race conditions, etc.) This is the basic idea behind all forms of message-passing concurrency.
By decoupling the "modules" from the underlying concurrency mechanism (usually threads), the OS scheduler has a lot more freedom in how it assigns CPU time to the "modules".

Does this summarize it accurately?

From now on I will call your "modules" "actors" instead, since I think this is the generally accepted terminology for what you've described.

Where continuations might fit into the picture is in the underlying implementation of actors. If you were to implement each "actor" with a dedicated thread, there would be a lot of overhead, as you mentioned (memory use even if the actor were just waiting for messages, etc.). To make an actor-based system scalable to tens of thousands of actors or more, the system has to be able to run actors in different contexts transparently (e.g. -- sometimes in a dedicated thread, sometimes in the context of the thread sending the request).

Scala has an Actor library that uses a continuation-style technique (not true continuations, since Scala doesn't support them natively) to implement Actors without dedicated threads. An Actor in Scala is implemented as a loop that receives messages and dispatches to handler code based on pattern-matching of the incoming message type. In the lightweight implementation, that loop "suspends itself" by creating a closure of itself and storing it so that it can be resumed later (in Scala's case, the compiler implements closures as anonymous classes, similar to how anonymous delegates and iterators work in C# 2.0). When a client sends a message to the Actor, the system (the Actor library in this case) calls that closure to resume the Actor. If the Actor needs to suspend again, it throws an exception to unwind its execution back to the system "send" function, which can then do other things (dispatch queued messages or more likely send the reply to the original sender).

These papers probably explain it better than I can, although they make more sense if you're familiar with Scala (or at least functional programming in general):

http://lamp.epfl.ch/~phaller/doc/haller07coord.pdf

http://lampwww.epfl.ch/~odersky/papers/jmlc06.pdf

miro_atc · **Joined:** Tue Jul 10, 2007 12:19 am **Posts:** 6

Colonel Kernel wrote:

@miro_atc:
(Aside) Are you by any chance Miro Samek?

No chance

He is a way more famous than I am. He posted an article recently in embedded.com about ARM, so it seems we work in the same area, but that's all.

About the advantages...

Yes, your definition about the advanatages is correct and probably better than mine.
I use to describe the side effects rather than the reasons. I started with a small embedded system with slow CPU and very fast peripherals (network) which require fast response. I wasn't happy with the conventional techniques because if I don't assign high priority to the peripheral related stuff I loose the network during normal traffic. If I do assign high priority I kill the system when the network is heavily loaded.
So I needed addaptive priorities or scheduling.... so I got to the above idea.
The only problem is that I didn't implemented a complete OS or RTOS.
Few years ago I tryed to build RTOS for ARM using this concept... and I failed ;-)

The implementation wasn't so efficient as it was with the old hardware. It turned out that for ARM is better to use more conventional techniques. So now I am using my own "conventional" RTOS, but I can't forget the good old days ;-)

About the terms...

Sure, "actors" sounds good. I know there are many "message-passing concurrency" concepts out there and different terms with the same meaning. I started with "modules" because I had to interface real hardware modules, and it was easy to keep the same term for the software interface.

Avarok · **Joined:** Thu Aug 30, 2007 9:09 pm **Posts:** 102

Crazed123:

Quote:

At the OS level, a continuation is the saved registers and stack of a suspended task.

My threads involve exactly that, but I pusha, making it simply a stack, and then I store the ESP somewhere I can find it so I can load the stack up later, popa, and resume.

Quote:

A running program can communicate with another program by sending data to the other program's saved continuation, which is called throwing the continuation.

That sounds cool. If you modified the stack while the thread wasn't running, you could pass asynchronous messages to it that it would process immediately once it resumed. I don't know what might happen if it was already running on a different processor.

Quote:

The operation call_with_current_continuation(function) suspends the running program, turning it into a continuation. It then passes that continuation to the given function of one argument, which runs with that continuation as input. When it wants to, it can throw data to the continuation, thus ceasing to run itself as the continuation resumes on the current processor.

Would this be equivalent to having the thread sleep/wait itself (by interrupt/call etc), and then have some other process mark it as "awake" again after changing it's stack?

Quote:

When an event fires, the extremely-simple kernel scheduler decides whether to preempt the running computation to run an event handler (it won't preempt running event handlers to avoid infinitely recursive interrupts).

This sounds to me like what interrupts tend to already do, but with threaded/continuated interrupt handlers. This can be particularly useful to do above simply 'running' them when you do I/O APIC and LAPIC work.

Say you are running x. A low priority interrupt happens, takes over. You thread it. Then a higher priority (via LAPIC) interrupt happens, and you need to go further up again... you need to preserve the state of the low priority interrupt? (they're both using the ESP0 and same registers?)

Also, this might let another CPU pick up the low priority interrupt, simply by sending the thread's stack accross?

Just some thoughts, I have never really implemented anything of this nature before.

Crazed123 · **Joined:** Thu Oct 21, 2004 11:00 pm **Posts:** 248

Avarok wrote:

My threads involve exactly that, but I pusha, making it simply a stack, and then I store the ESP somewhere I can find it so I can load the stack up later, popa, and resume.

And for good reason. A continuation is the saved state of a suspended computation. A thread is a persistent object that comes in two states:

1) Running. The thread's computation is running.
2) Suspended. The thread's computation is saved as a continuation inside the thread.

Quote:

That sounds cool. If you modified the stack while the thread wasn't running, you could pass asynchronous messages to it that it would process immediately once it resumed. I don't know what might happen if it was already running on a different processor.

Continuations do not run on top of threads. They run under threads. You can build threads on top of a continuation system.

Since a continuation is a suspended computation, a continuation can never be running. Throwing to a continuation runs the saved computation and destroys the continuation. So... references to "running continuations" are invalid on their face because there's no such thing as a running continuation.

Quote:

Would this be equivalent to having the thread sleep/wait itself (by interrupt/call etc), and then have some other process mark it as "awake" again after changing it's stack?

Just about, yes. The only difference is that continuations can only be used once. So the kernel scheduler doesn't have a reference to every continuation on the system, it holds references to every event-handler continuation that wants the processor Right Now.

This is the reason for call/cc: your computation suspends itself and passes the created continuation to another computation. User-land manages the continuations instead of a kernel thread scheduler, and user-land can therefore create whatever it wants using these continuations.

Quote:

This sounds to me like what interrupts tend to already do, but with threaded/continuated interrupt handlers. This can be particularly useful to do above simply 'running' them when you do I/O APIC and LAPIC work.

Precisely. Continuations alone allow the creation of cooperative concurrency systems, so I added events in to abstract over machine interrupts and enable preemptive concurrency.

When an event handler runs, it receives the continuation of the computation it preempted, as though that computation had done call/cc(event_handler).

Quote:

Say you are running x. A low priority interrupt happens, takes over. You thread it. Then a higher priority (via LAPIC) interrupt happens, and you need to go further up again... you need to preserve the state of the low priority interrupt? (they're both using the ESP0 and same registers?)

See above. In my basic design, event handlers cannot preempt other event handlers. You could add priorities, in which case a high-priority event handler could preempt a low-priority one just like any other continuation. The high-priority handler would receive a continuation of the low-priority one. The low-priority event-handler's continuation would include the saved data naming the continuation of the non-event-handler computation it originally preempted.

Preempting computations and passing their continuations to event handlers works recursively.

Quote:

Also, this might let another CPU pick up the low priority interrupt, simply by sending the thread's stack across?

No. First of all, other CPUs have their own things to handle, including their own events. Part of the point of a continuation system is to seperate the implementation of OS-provided concurrency from the use of hardware parallelism. Each processor runs its own instruction stream without asking a master processor for work.

Any processor can throw to any continuation that the current computation has a valid reference to, but one processor cannot order the others around. This reduces the need to synchronize the processors.

Quote:

Just some thoughts, I have never really implemented anything of this nature before.

Few have.

OSDev.org

Do Threads Remain the Best Concurrency Abstraction?

Who is online