OSDev.org

The Place to Start for Operating System Developers
It is currently Tue Apr 16, 2024 9:30 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 36 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: Newbie Memory Management
PostPosted: Sun Feb 26, 2006 6:37 pm 
Appologies if what I'm about to ask is appauling newbie-ish or covered by another topic (I did search first however), but as I've only just started to play with the OS Dev./Kernel Design I'm feeling a tiny bit out of my depth :)

I've got a very basic Hello World Kernel up and running from Joachim Nock and K.J.'s tutorial on Bona Fide (a fair task in itself seeing as I come from the rather pampered C dev route of Visual Studio). Now looking through various articles and tutorial kernels it would seem that my next task would be to implement some form of memory management (tho I'm open to other suggestions as to where I could move on from this point). I notice that geekOs and the first Linux Kernel both implement some form of malloc/free for their memory management, now I've dabbled with basic memory management in a few projects (nothing of any real scope tho) and from an ease of use POV (if not nessicarily ease of coding) I've always been very fond of Garbage Collectors. (There is quite a nice article here on various memory management stratergies http://www-128.ibm.com/developerworks/library/l-memory/)

My question here is are there any specific gotcha's I need to be aware of if I'm thinking of using GC as a memory management stratergy in a Kernel?

Many thanks

E


Top
  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 12:54 am 
Offline
Member
Member
User avatar

Joined: Thu Nov 16, 2006 12:01 pm
Posts: 7614
Location: Germany
Non-deterministic behaviour. Effectively you rule out any kind of real-time support for user applications, because you cannot tell when the kernel garbage collector triggers.

My gut feeling regarding GC in kernel space is about the same as for kernels written in Visual Basic: Yuk... don't do it. 8)

_________________
Every good solution is obvious once you've found it.


Top
 Profile  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 1:13 am 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 6:06 pm
Posts: 1437
Location: Vancouver, BC, Canada
Solar wrote:
Non-deterministic behaviour. Effectively you rule out any kind of real-time support for user applications, because you cannot tell when the kernel garbage collector triggers.

My gut feeling regarding GC in kernel space is about the same as for kernels written in Visual Basic: Yuk... don't do it. 8)


Your gut needs to learn about more GC algorithms. :) I would agree that a "stop-the-world" GC in the kernel would be a terrible idea. Having a single GC for the entire OS is also a terrible idea.

However, I think it would be reasonable (depending on the architecture of the rest of the system) for the kernel to have its own local GC, if the right algorithm is chosen. For example, Singularity has a concurrent mark-sweep collector in its kernel that is responsible for reclaiming kernel memory only. Each process then has its own GC, with whatever algorithm is right for that process. Concurrent mark-sweep is supposedly more real-time friendly than other GC algorithms.

Read Singularity's technical report. It's worth it to get a change in perspective once in a while.

@elaverick:
I wouldn't take on learning how to implement a GC at the same time as learning general OSdev. If you're interested in GC, I'd recommend experimenting with it in user-space first before attempting to implement it for your OS.

_________________
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!


Top
 Profile  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 9:12 am 
Offline
Member
Member
User avatar

Joined: Wed Oct 18, 2006 2:31 am
Posts: 5964
Location: In a galaxy, far, far away
elaverick wrote:
My question here is are there any specific gotcha's I need to be aware of if I'm thinking of using GC as a memory management stratergy in a Kernel?


I may need more knowledge of GCs, but as far as i know, they rely heavily on the fact a given structure can be scanned for pointers ... and i don't think that will be practical without support from the compiler or the language.

I mean, let's say you have a structure such as
Code:
device_nfo {
   char *name;            // garbage-collectable pointer
   unsigned dev_id;    // just a raw int.
   vaddr_t logical_buffer; // another raw int, but which holds an address
   dev_info *info;       // another garbage-collectable pointer
   dev_fn *functions; // likely a pointer to in-kernel (statically allocated) stuff.
};


now your garbage collector learns of a pointer to be inspected at 0x1337d474, and it happens to be a device_nfo structure ...
* how does the collector knows it's a device_nfo structure in first place ?
* how does it know words 0 and 3 should be followed as pointers, but not words at 1,2 and 4
* if ever it does know, how does it know the type of pointed elements ?

Not mentionning the fact that a GC typically relocate objects while running...

_________________
Image May the source be with you.


Top
 Profile  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 10:23 am 
Pype.Clicker wrote:
Not mentionning the fact that a GC typically relocate objects while running...

How is that actually implemented in programming languages supporting this? That seems to be very slow, if you have to find out where a variable is, before you could use it, isn't it?


Top
  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 11:13 am 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 6:06 pm
Posts: 1437
Location: Vancouver, BC, Canada
Pype's point about tracking pointers is well taken. That's why most of Singularity is written in a dialect of C#. :)

I've heard of "conservative collectors" for C/C++ that can deal with such a loose type system, but I'm not sure how they work. I think there are papers out there if you google for them...

bluecode wrote:
How is that actually implemented in programming languages supporting this? That seems to be very slow, if you have to find out where a variable is, before you could use it, isn't it?


I don't think that mark-sweep collectors relocate, but I'm not 100% sure. For GCs that do relocate, for each object moved, all references to that object are also changed to point to the new location. There is no extra level of indirection involved.

_________________
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!


Top
 Profile  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 11:53 am 
Colonel Kernel wrote:
bluecode wrote:
How is that actually implemented in programming languages supporting this? That seems to be very slow, if you have to find out where a variable is, before you could use it, isn't it?


I don't think that mark-sweep collectors relocate, but I'm not 100% sure. For GCs that do relocate, for each object moved, all references to that object are also changed to point to the new location. There is no extra level of indirection involved.

I came up with a idea... If you got a local variable that an object A, the compiler makes a pointer to a pointer to object A out of this. If now the address of the object changes the gc just changes the pointer to Object And not the pointer to the pointer. An example:

void test()
{
objectA bar();
objectA foo = bar;
}

The compiler now makes an objectA on the heap. Then it creates the gc's pointer to the objectA on the heap. This pointer should be reference counted. The local variable bar is then assigned a pointer to the gc's pointer to the objectA. The reference count is also incremented. Same goes for the variable foo. Both of these references to the objectA are deleted after the function was executed, so the reference count of the gc's pointer is 0.
The gc's pointer and the real objectA are deleted when the next gc's cleanup cycle is reached, but this can be anytime.

Is this all correct?


Top
  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 3:13 pm 
bluecode wrote:
I came up with a idea... If you got a local variable that an object A, the compiler makes a pointer to a pointer to object A out of this. If now the address of the object changes the gc just changes the pointer to Object And not the pointer to the pointer.


Could be that some GC's out there implement a similar thing, but that is actually the 'extra level of indirection' that Colonel Kernel was talking about.

Whenever your code accesses an object, it will have to look up its real location in the gc's pointer (on the heap, in your case). Furthermore, it will have to lock that pointer somehow, so that the GC can't relocate the object while someone is working on it (or is that the reference counting you proposed?).

Wouldn't it also be better to put all those pointers into a coherent table instead of allocating them on the heap individually? Maybe one table per object type?

AFAIK, the example you provided resembles quite well how 'conservative GCs' for C++ work. But IMO the best results with garbage collection can only be achieved with a special language, or at least a special compiler, which handles locking, caching and reference loops mostly on its own.

cheers Joe


Top
  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 3:23 pm 
JoeKayza: at least in the .NET world you do not have to look up a pointer in your code (or what the JIT does for you), instead the GC temporarly halts your threads and updates *your* pointers when memory blocks move (since it is a compacting GC).

However, as previously indicated Microsoft's research OS, Singularity, uses a mark and sweep GC in its kernel (which I assume follows a non-moving strategy). Whereas each process has its own GC which can be of a different type.

In other words, depending on where you want to use such a construct it is probably wise to try different methods and see which works best in a kernel. You definitely do not want it to stall critical stuff. And since you own the code in the kernel it should be more "easy" to optimize the whole system.

With that said, I think it makes more sense in a .NET / Java (etc.) environment (what Singularity uses) than in a world where people can manipulate pointers themselves (and yes, I do know you can manipulate pointers in .NET, but that produces unsafe binaries which Singularity in this example won't run).


Top
  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 4:24 pm 
[edit: I'm not talking about interpreted language (including bytecode languages). I'm talking about garbage collection in native code. Just to make things clearer]

Rob wrote:
JoeKayza: at least in the .NET world you do not have to look up a pointer in your code (or what the JIT does for you), instead the GC temporarly halts your threads and updates *your* pointers when memory blocks move (since it is a compacting GC).

Well, that pretty much smells like **** :P Why does a garbage collector would want to save pointer to all pointer that are using one object and modify them when neccessary? Why not modify only one pointer?

Quote:
Whenever your code accesses an object, it will have to look up its real location in the gc's pointer (on the heap, in your case). Furthermore, it will have to lock that pointer somehow, so that the GC can't relocate the object while someone is working on it (or is that the reference counting you proposed?).

If you got e.g. two drivers in your kernel both using a reference to the same pci_driver object, then the gc's can only free the pci_driver object when both drivers release the pci_driver object. The reference counter just tells you how many references to this object exist. So the reference counting is just there that the gc knows which objects to free (if the reference counter hits 0).
As I said above, it is imho a very bad thing (TM) to save pointers to all the pointers referencing an object, instead I suggest using one "level of indirection", where every pointer you use in your program doesn't directly point to the object, but to the gc's pointer to the object. That makes it easy for the gc to relocate objects: It just needs to change one pointer and doesn't need to save all the pointers pointing to one object.
btw. Why would you want to lock an object, when it is used or how do you define "use"? Is having a reference to an object already "using" it or nor? If so how could the two drivers from above access the same pci_driver object?

Quote:
Wouldn't it also be better to put all those pointers into a coherent table instead of allocating them on the heap individually? Maybe one table per object type?

That's just a speed improvement.


Top
  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 4:32 pm 
Offline
Member
Member
User avatar

Joined: Wed Oct 18, 2006 2:31 am
Posts: 5964
Location: In a galaxy, far, far away
Rob wrote:
JoeKayza: at least in the .NET world you do not have to look up a pointer in your code (or what the JIT does for you), instead the GC temporarly halts your threads and updates *your* pointers when memory blocks move (since it is a compacting GC).


and here's where things get nasty for kernel level:
- what if your object is actually a buffer for which you grabbed the physical frames and wrote them in a DMA list ?
- what if a pointer is actually being "cached" into a register in another thread -- even if the thread is halted... or maybe a pointer derivated from the "base type" to an indexed element, etc.

_________________
Image May the source be with you.


Top
 Profile  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 4:51 pm 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 6:06 pm
Posts: 1437
Location: Vancouver, BC, Canada
bluecode wrote:
Why does a garbage collector would want to save pointer to all pointer that are using one object and modify them when neccessary? Why not modify only one pointer?


A compacting collector doesn't save "pointers to all pointers". It has to periodically scan the roots (static variables + registers) and the heap in order to determine which objects are currently being referenced and which are not. Those that are not are garbage. Those that are can be moved to cover up the holes created by the garbage. During this process, the GC sees every pointer anyway, so it need not store pointers to those pointers. I hope this makes sense.

Quote:
and here's where things get nasty for kernel level:
- what if your object is actually a buffer for which you grabbed the physical frames and wrote them in a DMA list ?
- what if a pointer is actually being "cached" into a register in another thread -- even if the thread is halted... or maybe a pointer derivated from the "base type" to an indexed element, etc.


And this is why you don't use a compacting collector in kernel-space.

_________________
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!


Top
 Profile  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 5:13 pm 
bluecode wrote:
Why would you want to lock an object, when it is used or how do you define "use"? Is having a reference to an object already "using" it or nor? If so how could the two drivers from above access the same pci_driver object?


see also:

Quote:
- what if a pointer is actually being "cached" into a register in another thread -- even if the thread is halted...


Whenever you access an object, you'd have to retrieve it from the table (I'm assuming the pointers are now put into a table :P), then do some manipulations on it. A logical optimization (that will probably be applied by some c++ compilers already) is to load the pointer once, cache it into a register and use it multiple times. And this is exactly where you need to lock it: as soon as you retrieve the pointer, the gc must not relocate the object until you are done (and release it again). If you access the object again after releasing it, you will have to do the table lookup again to make sure you got the most recent pointer to the object. With C++, this is quite heavyweight work, you'd need to do this mostly by yourself (to circumvent the compiler's optimizations).

But I agree with Colonel Kernel that a compacting garbage collector is probably not worth it in kernel land anyway. An allocator that reuses objects of the same type (something like a slab allocator) might do a better job to minimize fragmentation.

cheers Joe


Top
  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Mon Feb 27, 2006 5:18 pm 
Pype.Clicker wrote:
and here's where things get nasty for kernel level:
- what if your object is actually a buffer for which you grabbed the physical frames and wrote them in a DMA list ?
- what if a pointer is actually being "cached" into a register in another thread -- even if the thread is halted... or maybe a pointer derivated from the "base type" to an indexed element, etc.


That is actually no problem at all.

(Now remember I'm answering this all on how .NET works, just to give anyone an idea on an implementation that works *user-level*)

1) you can pin objects, while these are pinned their memory will not be moved (or collected, since you have a reference). Of course you wan't to minimize this pinning since it fragments your heap

2) during the .NET GC all threads are paused. The GC assumes all objects are available for collecting and starts walking registers, stacks etc. to see which objects still have references. Those are set to "active" and not collected. So yes, it will check registers (how this exactly works I don't know, but since the system generated the x86 code you can assume it knows at which EIP which registers are in use for what)

Again, this may not work nicely in the kernel, but there are different solutions to that.

For example, all memory that is send from one process to another in Singularity is allocated on a seperate heap called the exchange heap. This is not garbage collected (so classes are not moved in memory!) but reference counted.

I think any DMA is done through there as well (I know it can, just not sure if that is always the case) since such buffers will need to passed to other processes (than the driver) which can only be done through that mechanism [in Singularity]. Otherwise I assume they'll pin that memory block.

bluecode: yeah, we got a bit sidetracked there ;)

I think GC / reference counting in native code is a very tricky thing to do. But since you own the kernel it might be quite possible to do in a good and reliable way. You won't know until you try it (and design it on paper first)

bluecode wrote:
Well, that pretty much smells like **** :P Why does a garbage collector would want to save pointer to all pointer that are using one object and modify them when neccessary? Why not modify only one pointer?


Perhaps I (we) did a poor job explaining it. It is, of course, a difficult subject to discuss without writing a paper on it with pictures and such.

But I've looked quite extensively into how .NET creates its x86 code and how it handles all of this, including the GC.

For example, it is interesting to see the difference in code generated to get to a class in .NET 1.1 versus 2.0. The latter is more optimized than the former.

I can tell you that they (Microsoft) has done a lot of work on making this work and perform well (again, in *user-land* and with non-native [processor] languages). I've read numerous papers on the subject and a lot of thought and performance measurements went into that design.


Top
  
 
 Post subject: Re:Newbie Memory Management
PostPosted: Tue Feb 28, 2006 11:01 pm 
in my mind there is one particular issue with garbage collection and implementing it in the kernel (or in c/c++ in general).

pointers, the main reason why garbage collection is a relatively easily tackled problem in languages like java/c# and like is because they use references to heap objects, not pointers.

What does this mean? It means that every time a line of code that new is involved or more likely any line of code which assigns to a reference variable, the compiler can emit GC helper code. Since we are tracking all modifications to reference variables, we can update our reference counts when references are set to "null" or set to refer to a new object on the heap.

Now why is this a problem in c/c++, can't we use a similar scheme? well yes, but it is harder to be perfect. Things like pointer arithmetic make it a pain in the @$$. Some example code which would probably be annoying to handler properly:

Code:
int *o = new Object; // o pointer to an Object reference count of 1
++o; // oh, boy, o no longer points to that object, reduce it's reference count, may or may not get deleted before next line of code
--o; // hrmm, it is now a proper pointer to our object again, does it still exist? are we a valid reference again? lots of questions here


now, you could say, "why would i write such code?!" and of course you would be justified, but the language _allows_ it. with just references, things like pointer arithmetic simply aren't a factor.

So while I think it is doable, I wouldn't really encourage it, not in a language with pointers which can literally point to anywhere.

proxy


Top
  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 36 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 512 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group