OSDev.org

The Place to Start for Operating System Developers
It is currently Fri May 17, 2024 6:14 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 33 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: sync on a smp system (again)
PostPosted: Tue Nov 15, 2005 2:36 am 
Ok, I haven?t done much in the last few weeks, but this is because I can?t solve my problem. We had a discussion about the thread locking - the scheduler gets the spinlock of a thread when this thread will run and it gives it back when this thread has finished - and I took the idea of pype. This worked in the beginning.

I think I have to explain the problem again in detail. My OS works an a smp machine, but not if I?m using semaphores :(

I have 2 threads and both only print a number. I lock the print function with a semaphore. The code works for a short time and then I get a page fault, because there is a field in a struc, which is NULL and this field can?t be NULL! The problem occures when the semaphore calls the function to add the thread to the ready queue.

This is the function:
Code:
;----------------------------
PROC scheduler_add_scheduler, ptr2thread
;----------------------------
BEGIN
   cli

   CALL spinlock_acquire, schdl_spin
;----------------------------
;   add thread to ready queue
   mov esi,[ptr2thread]               ;esi= actThread
   mov ebx,1
   mov ecx,[esi+thread_t.dyn_prio]
   mov edi,[ready_queue_ptr]
   shl ebx,cl

   or [ready_queue_bitmap],ebx

   mov eax,[edi+4*ecx]                  ;eax= firstThread
   xor edx,edx

   test eax,eax
   jz .first

   mov ebx,[eax+thread_t.prev]            ;ebx= firstThread.prev= lastThread
   mov [esi+thread_t.prev],ebx            ;actThread.prev= lastThread
   mov [esi+thread_t.next],edx            ;actThread.next= NULL
   mov [eax+thread_t.prev],esi            ;firstThread.prev= actThread
   mov [ebx+thread_t.next],esi            ;lastThread.next= actThread

   jmp .end
;----------------------------
;   it is the 1st thread in this priority queue
align 4
.first:
   mov [esi+thread_t.prev],esi            ;actThread.prev= firstThread.prev= actThread
   mov [esi+thread_t.next],eax            ;actThread.next= firstThread.next= NULL
   mov [edi+4*ecx],esi
;----------------------------
align 4
.end:
   or dword[schdl_flags],SCHEDULER_RESCHEDULE

   CALL spinlock_release, schdl_spin

   sti

   RETURN
ENDP
;----------------------------

I load the head of the queue into eax and then I test if eax is NULL and it isn?t. So I load ebx with the end of the queue and this value can?t be NULL, but it is and so it gives me a page fault when it wants to use ebx as a base pointer.

-> see the next post


Top
  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Tue Nov 15, 2005 2:36 am 
This is my semaphore code:
Code:
;----------------------------
PROC semaphore_acquire_smp, sem
;----------------------------
BEGIN
;----------------------------
;   make sure we are the only one to work on the semaphore
   cli

   CALL spinlock_acquire, dword[sem]

   lock sub dword[esi+semaphore_t.count],1
   jz .end
;----------------------------
;   look if we have to wait or if we can go right away
   cmp dword[esi+semaphore_t.count],0
   jg .end
;----------------------------
;   we have to wait
   APIC_GET_ID eax
   mov ebx,[cpu_ptr+4*eax]
   mov edi,[esi+semaphore_t.threads]      ;edi= firstThread
   mov eax,[ebx+cpu_t.schdl_act_thread]   ;eax= actThread

   test edi,edi
   jz .first

   mov ebx,[edi+thread_t.prev]            ;ebx= firstThread.prev= lastThread
   xor ecx,ecx
   mov [eax+thread_t.prev],ebx            ;actThread.prev= lastThread
   mov [eax+thread_t.next],ecx            ;actThread.next= NULL
   mov [edi+thread_t.prev],eax            ;firstThread.prev= actThread
   mov [ebx+thread_t.next],eax            ;lastThread.next= actThread

   jmp .scheduler
;----------------------------
;   we are the first thread
align 4
.first:
   mov [esi+semaphore_t.threads],eax

   mov [eax+thread_t.prev],eax            ;actThread.prev= firstThread.prev= actThread
   mov [eax+thread_t.next],edi            ;actThread.next= firstThread.next= NULL
;----------------------------
;   scheduler have to know that this thread wants to wait
align 4
.scheduler:
   or dword[eax+thread_t.flags],THREAD_WAIT or THREAD_RESCHEDULE

   CALL spinlock_release, dword[sem]

   CALLINT scheduler_reschedule_smp

   sti

.end_wait:
   RETURN
;----------------------------
align 4
.end:
   CALL spinlock_release, dword[sem]

   sti

   RETURN
ENDP
;----------------------------

;----------------------------
PROC semaphore_release, sem
;----------------------------
BEGIN
;----------------------------
;   make sure we are the only one to work on the semaphore
   cli

   CALL spinlock_acquire, dword[sem]

   lock add dword[esi+semaphore_t.count],1
;----------------------------
;   look if we need to awake a thread
   cmp dword[esi+semaphore_t.count],0
   jg .end
;----------------------------
;   we have to awake the thread on the top of the queue
   mov eax,[esi+semaphore_t.threads]      ;eax= firstThread
   mov ebx,[eax+thread_t.next]            ;ebx= firstThread.next= secondThread
   mov ecx,[eax+thread_t.prev]            ;ecx= firstThread.prev= lastThread

   test ebx,ebx
   jz .last
;----------------------------
;   put the 2nd thread onto the top of the queue and put the last thread onto the 2nd threads prev ptr
   mov [ebx+thread_t.prev],ecx            ;secondThread.prev= lastThread
   mov [esi+semaphore_t.threads],ebx      ;firstThread= secondThread

   jmp .scheduler
;----------------------------
;   there is no more thread on the queue
align 4
.last:
   mov [esi+semaphore_t.threads],ebx
;----------------------------
;   scheduler needs to awaken the thread in eax
.scheduler:
   and dword[eax+thread_t.flags],not THREAD_WAIT

   push eax

   CALL spinlock_release, dword[sem]

   sti

   CALL scheduler_add_scheduler         ;par is in pushed eax
;----------------------------
.end_awaken:
   RETURN
;----------------------------
align 4
.end:
   CALL spinlock_release, dword[sem]

   sti

   RETURN
ENDP
;----------------------------

The only time when my code writes NULL into the "thread_t.prev" and "thread_t.next" field is, when the thread is dequeued from the ready queue and is the next to run! And with the spinlock for every thread I thought that this situation couldn?t happen.


Top
  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Tue Nov 15, 2005 7:02 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

I think the problem is in "semaphore_acquire_smp" where it adds a thread to the semaphore's queue, and "semaphore_release" where it takes a thread off the queue.

To be honest, I drew little diagrams on paper of what each thread's previous and next links would be pointing to each time a new thread was added to the queue. Then I did it a second time typing it out and I'm still not quite sure what's wrong or what needs to be changed to make it right. I only know it's not right (grab some paper and try it!)..

IMHO the main problem is that you're trying to implement a FILO queue using a double linked list. Instead, try a single linked list and record the "last" thread in the semaphore itself.

For example, for the start of "semaphore_acquire":

Code:
;----------------------------
PROC semaphore_acquire_smp, sem
;----------------------------
BEGIN
;----------------------------

;Make sure we are the only one to work on the semaphore

  cli
  CALL spinlock_acquire, dword[sem]

;Try to acquire semaphore

  lock sub dword[esi+semaphore_t.count],1     ;Is it acquired?
  jge .end                                    ; yes, done

;Prepare to add current thread to semaphore's queue

  APIC_GET_ID eax
  mov ebx,[cpu_ptr+4*eax]                     ;ebx = address for this CPU's data
  mov edi,[esi+semaphore_t.tail]              ;edi = thread at tail of queue
  mov eax,[ebx+cpu_t.schdl_act_thread]        ;eax = current thread

  test edi,edi                                ;Is there a thread one the queue?
  jz .first                                   ; no, this thread is the first thread

;Add current thread to other threads on semaphore's queue

  mov [edi+thread_t.prev],eax                 ;Link last thread to the current thread
  mov [esi+semaphore_t.head],eax              ;Set queue head to current thread
  jmp .scheduler

;Create new semaphore queue with first thread

  mov [esi+semaphore_t.head],eax              ;Set queue head to current thread
  mov [esi+semaphore_t.tail],eax              ;Set queue tail to current thread
  jmp .scheduler

Then, for the start of "semaphore_release":

Code:
;----------------------------
PROC semaphore_release, sem
;----------------------------
BEGIN
;----------------------------

;Make sure we are the only one to work on the semaphore

  cli
  CALL spinlock_acquire, dword[sem]

;Release the semaphore and check if another thread needs to wake up

  add dword[esi+semaphore_t.count],1
  jg .end

;Remove thread from queue

  mov eax,[esi+semaphore_t.head]              ;eax = oldest thread on queue
  mov ebx,[eax+thread_t.prev]                 ;ebx = next oldest thread on queue
  mov [esi+semaphore_t.head],ebx              ;Set queue head to next oldest thread

;Wake up thread in EAX

I think that should work..


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Tue Nov 15, 2005 8:29 am 
As I said the problem is not the semaphore code, but the function "scheduler_add_scheduler". Because on an uni cpu system the code works perfectly.

I don?t know how it can happen that this function is running on 1 cpu and the code for dequeueing a thread is running on another cpu, but both work with the same thread!

These are 2 functions of my scheduler which work with threads:
Code:
;----------------------------
PROC scheduler_enqueue_smp
;----------------------------
BEGIN
;----------------------------
;   save esp
   mov ebx,cr0
   mov [eax+thread_t.esp3],ecx
;----------------------------
;   test if need to save the fpu
   test ebx,8
   jne .go_on

   mov edi,[eax+thread_t.ptr2fpu]
;----------------------------
;   save fpu env
.fpu_save:
   fxsave [edi]

   or ebx,8
   mov cr0,ebx
;----------------------------
;   look if we just ran the idle thread
align 4
.go_on:
   push eax

   CALL spinlock_acquire, schdl_spin

   pop eax

   cmp eax,dword[edx+cpu_t.idle_thread]
   je .end
;----------------------------
;   get and check the flags of the thread
   mov ebx,[eax+thread_t.flags]
   and dword[eax+thread_t.flags],not THREAD_RESCHEDULE

   test ebx,THREAD_KILL
   jne .kill
   test ebx,THREAD_WAIT
   jne .wait
   test ebx,THREAD_SLEEP
   jne .sleep
;----------------------------
;   enqueue the thread into the ready queue
;   test dword[cpu0.scheduler_flags],scheduler_reschedule
;   jne .do_it

;   cmp dword[eax+thread_t.dyn_prio],0
;   je .do_it

;   sub dword[eax+thread_t.dyn_prio],1
;----------------------------
;   put it onto the ready queue
.do_it:
   mov ecx,[eax+thread_t.dyn_prio]
   mov ebx,1
   mov esi,[ready_queue_ptr]
   shl ebx,cl

   or [ready_queue_bitmap],ebx

   mov edi,[esi+4*ecx]                  ;edi= firstThread
   xor edx,edx

   test edi,edi
   jz .first

   mov ebx,[edi+thread_t.prev]            ;ebx= firstThread.prev= lastThread
   mov [eax+thread_t.prev],ebx            ;actThread.prev= lastThread
   mov [eax+thread_t.next],edx            ;actThread.next= NULL
   mov [edi+thread_t.prev],eax            ;firstThread.prev= actThread
   mov [ebx+thread_t.next],eax            ;lastThread.next= actThread

   jmp .end
;----------------------------
;   it is the first thread in this priority
align 4
.first:
   mov [eax+thread_t.prev],eax            ;actThread.prev= firstThread.prev= actThread
   mov [eax+thread_t.next],edi            ;actThread.next= firstThread.next= NULL
   mov [esi+4*ecx],eax

   jmp .end
;----------------------------
;   the thread waits for something in a wait queue
align 4
.wait:
;   cmp dword[eax+thread_t.dyn_prio],31
;   je .test_schd_queue

;   add dword[eax+thread_t.dyn_prio],1

   jmp .end
;----------------------------
;   the owner of the thread is going to be killed
align 4
.kill:
   xor ecx,ecx
   mov [eax+thread_t.prev],ecx
   mov [eax+thread_t.next],ecx

   jmp .end
;----------------------------
;   the thread wants to sleep
align 4
.sleep:
   ;to-do

   jmp .end
;----------------------------
align 4
.end:
   and dword[schdl_flags],not SCHEDULER_RESCHEDULE

   add eax,thread_t.flags
   CALL spinlock_release, eax

   RETURN
ENDP
;----------------------------


-> see next post


Top
  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Tue Nov 15, 2005 8:29 am 
Code:
;----------------------------
PROC scheduler_dequeue_intel_smp, this_cpu
;----------------------------
BEGIN
;----------------------------
;   get the thread with the highest priority
   mov eax,[run_queue_bitmap]
   mov esi,[run_queue_ptr]
;----------------------------
;   get the highest priority thread
.get_thread:
   bsr eax,eax
   jz .change_queues

   mov ebx,[esi+4*eax]                  ;ebx= firstThread
   mov edx,[ebx+thread_t.next]            ;edx= firstThread.next= secondThread
   mov ecx,[ebx+thread_t.prev]            ;ecx= firstThread.prev= lastThread

   test edx,edx
   jz .last

   mov [esi+4*eax],edx                  ;firstThread= secondThread
   mov [edx+thread_t.prev],ecx            ;secondThread.prev= lastThread

   mov eax,ebx
;----------------------------
;   move the needed vales from the thread struc in the right regs
.init:
   mov edx,[this_cpu]
   cmp eax,[edx+cpu_t.schdl_act_thread]
   je .set_time

   mov [edx+cpu_t.schdl_act_thread],eax
;----------------------------
;   set base addr for fs and gs regs
   push eax

   CALL gdt_set_base, dword[edx+cpu_t.fs], eax

   pop eax

   add eax,thread_t.free_start
   CALL gdt_set_base, dword[edx+cpu_t.gs], eax
;----------------------------
;   write the esp value for the ring0 code into the msr reg
   mov eax,[fs:thread_t.esp0]
   mov ecx,176h
   xor edx,edx

   wrmsr

   jmp .set_time
;----------------------------
;   this is the last thread in the queue, so del the bit in the bitmap
align 4
.last:
   xor ecx,ecx
   mov edx,1
   mov [esi+4*eax],ecx
   mov ecx,eax
   shl edx,cl

   xor [run_queue_bitmap],edx

   mov eax,ebx

   jmp .init
;----------------------------
;   we have to change the queues
align 4
.change_queues:
   mov eax,[run_queue_bitmap]
   mov ebx,[run_queue_ptr]
   xchg eax,[ready_queue_bitmap]
   xchg ebx,[ready_queue_ptr]
   mov [run_queue_bitmap],eax
   mov [run_queue_ptr],ebx

   mov eax,[run_queue_bitmap]
   mov esi,[run_queue_ptr]

   test eax,eax
   jnz .get_thread
;----------------------------
;   there is no thread so we take the idle thread
   mov edx,[this_cpu]
   mov eax,[edx+cpu_t.idle_thread]

   jmp .init
;----------------------------
align 4
.set_time:
;----------------------------
;   we removed the thread from the queue and we need to give the thread some time 2 run
   CALL spinlock_release, schdl_spin

   mov edx,[this_cpu]
   xor ebx,ebx
   mov eax,[edx+cpu_t.schdl_act_thread]
   mov ecx,32

   mov [eax+thread_t.prev],ebx
   sub ecx,[eax+thread_t.dyn_prio]
   mov [eax+thread_t.next],ebx

   cmp ecx,32
   jne .end

   mov ecx,1
;----------------------------
.end:
   mov [eax+thread_t.time2run],ecx

   push eax

   add eax,thread_t.flags
   CALL spinlock_acquire, eax

   pop eax

   RETURN
ENDP
;----------------------------


Top
  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Tue Nov 15, 2005 10:19 am 
Quote:
As I said the problem is not the semaphore code [..] Because on an uni cpu system the code works perfectly.


It might just be me, but that assumption makes no sense. A multi-cpu system cares a lot more about semaphores than a single-cpu one.


Top
  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Tue Nov 15, 2005 11:30 am 
Yeah, but the code can only be run on 1 cpu at the same time! This works because of the spinlock. So the linked list code is working. And also the problem appears when one cpu runs the "scheduler_add_scheduler" code.


Top
  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Tue Nov 15, 2005 12:09 pm 
flashburn, just because code doesn't _appear_ to be related, doesn't mean it isn't. I once had my console output function cause random data corruption and sometimes a triple fault because i had a small flaw in my locking logic when outputing characters. And you know what, it didn't always seem like the console was the problem...sometimes it looked like i had a null pointer somewhere.

so please try not to write off broken syncronization primitives just because it looks like it might be something else. You may be right, but you also may be wrong.

proxy


Top
  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Tue Nov 15, 2005 12:57 pm 
I don?t know if I?m right with my opinion, but my os gives me a page fault in this function and there are only 2 things which write to the 2 fields of the thread struc, the scheduler and the semaphores.


Top
  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Tue Nov 15, 2005 11:24 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

Sorry about my last post - I think you're right now. I had a good sleep (> 12 hours) and then worked through the enqueue/dequeue code again.

To be specific, for "semaphore_acquire_smp", "spinlock_acquire", "scheduler_enqueue_smp" and "scheduler_dequeue_intel_smp" I checked the enqueue/dequeue code only. For example, for "scheduler_dequeue_intel_smp" I only checked the following parts:

Code:
   mov ebx,[esi+4*eax]                  ;ebx= firstThread
   mov edx,[ebx+thread_t.next]            ;edx= firstThread.next= secondThread
   mov ecx,[ebx+thread_t.prev]            ;ecx= firstThread.prev= lastThread

   test edx,edx
   jz .last

   mov [esi+4*eax],edx                  ;firstThread= secondThread
   mov [edx+thread_t.prev],ecx            ;secondThread.prev= lastThread

Code:
;----------------------------
;   this is the last thread in the queue, so del the bit in the bitmap
align 4
.last:
   xor ecx,ecx
   mov edx,1
   mov [esi+4*eax],ecx


And the equivelent parts of the other functions. This leaves a large amount of code I didn't check, but does mean my earlier post wasn't the problem.


Could you add debugging code and run it through Bochs? After 2 weeks I'd be using Bochs I/O port 0xE9 to get full information when any of the code changes anything...

For example, every time any of these function is used you could display the function name, the CPU that is using it and which thread is running it. You could also dump the contents of each queue every time any queue changes.

I'd start by writing temporary/debugging code that creates something like:

[tt]semaphore_acquire_smp: entered - CPU=0, thread=0x1234
spinlock acquired - CPU=0, lock address=0x12345678
semaphore_acquire_smp: thread queued - CPU=0, thread=0x1234
spinlock released - CPU=0, lock address=0x12345678
scheduler_reschedule_smp: entered - CPU=0, thread=0x1234
THREAD SWITCH: CPU=0 now running thread=0x2345[/tt]

It's a lot of work, but it's an extremely powerful method of bug finding. It can also be a good idea to wrap some of it in conditional code so that it can be re-used each time there's bugs (rather than removing it all when the current problem is fixed)...

I've attached a file containing some "helper routines" - these routines are what I use as a basis for this style of debugging (it's a small start, I know)...


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Fri Nov 18, 2005 6:32 am 
I will write such debug functions (I have the code for printing over port 0xe9 and use it for my exceptions) and test it. It will be some time till I will post again, because I will have internet in 2 weeks!


Top
  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Sun Nov 20, 2005 2:59 am 
Ok, I?ve written that test code and then it worked :( And I?ve tested my code on a real smp machine and it worked :( So the problem lies in Bochs!

Maybe some of you can test my code on a smp system?!


Top
  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Sun Nov 20, 2005 11:54 am 
do not assume that when code works on a real machine and not on a emulator, that the problem is with the emulator

the pc is a loose standard: not all computers are gaurenteed to work the same (in fact they usually don't)
bochs emulates a single, specific (perfect) computer, and if it doesn't work in bochs it prob wont work on many real computers (even if it does work on most)


Top
  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Sun Nov 20, 2005 2:25 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

JAAman wrote:
do not assume that when code works on a real machine and not on a emulator, that the problem is with the emulator

the pc is a loose standard: not all computers are gaurenteed to work the same (in fact they usually don't)
bochs emulates a single, specific (perfect) computer, and if it doesn't work in bochs it prob wont work on many real computers (even if it does work on most)


Especially for re-entrancy problems, etc where the exact timing between things can make a huge difference - a tiny change in timing could make the problem go away (or at least the symptoms).

@FlashBurn: I tried it on Bochs (with 2, 4 and 8 CPUs) and it seemed to work.

Have you got a bootable floppy image?

Just for fun I tried to convert it into a boot floppy, but all I got was:

[tt]error occured while reading disk
Press any key to reboot[/tt]

I think this is because your boot sector uses int 0x13, ah = 0x42 to read a sector, but this doesn't work on floppies. I know - it was never designed for floppies to start with... ::)

BTW have you ever done a hex dump of your CD image? I did - it includes the word "Microsoft" at least 3 times, and in one place has:

[tt]MICROSOFT CORPORATION, ONE MICROSOFT WAY, REDMOND WA 98052, (425)882-8080[/tt]

I thought it was funny - "One Microsoft Way" in an alternative OS 's boot image.... ;D


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re:sync on a smp system (again)
PostPosted: Sun Nov 20, 2005 3:01 pm 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 11:33 pm
Posts: 3882
Location: Eindhoven
Brendan wrote:
I thought it was funny - "One Microsoft Way" in an alternative OS 's boot image.... ;D


That's just Microsoft's home address... nothing wrong with that, except for using their code in your OS.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 33 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot] and 27 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group