OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 1:58 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 85 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
Author Message
 Post subject:
PostPosted: Sun Sep 09, 2007 3:38 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

nooooooooos wrote:
Now I have a new problem with the APIC. Namely I saw that for redirecting Ints in the IOApic i can use only 4 bit long LApic IDs. But whats if a system has more then 16 Apics?
Could it be that the extension of the LApic ID wasn't adapted to these spezifications and that I can simply use all 8bit?


If I remember correctly, the 80486's APIC (or 82489DX chip) used 8-bit APIC IDs. Then for Pentium and P6 CPUs Intel changed to 4-bit APIC IDs (and added "cluster addressing" for computers with more than 16 APICs). Finally for P4 and later Intel changed back to 8-bit APIC IDs.

There are other differences too...

nooooooooos wrote:
EDIT: Which pin polarity do I have to programm for ExtInt?


Appendix D (Programming The LINT0 and LINT1 Inputs) in Intel's System Programming Guide gives an example of programming the local APIC's extINT connection as "edge triggered active high". However, the local APIC inputs should be configured according to the information returned by ACPI and/or MP Specification tables, as there's no guarantee that the "extINT" signal (or any others) use any specific polarity or trigger mode.


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 29, 2007 5:17 am 
Offline
Member
Member

Joined: Fri Jul 20, 2007 1:39 am
Posts: 45
Now I have set the destination mode in the IO-Apic for Pic-Interrutps to logical mode and the destination field to 1s. Because I have enabled Cluster-mode all Interrupts should be delivered to all processors.
The task-priority I have set to zero in each processor, so that I get all interrupts.

But I only become interrupts at my bsp, but he don't leaves the endless loop after the sti (though the ap is enabled).

I also have unique TSS-Deskriptors for each processor. (but not unique Task-State-Segmenents)

Thats my code, where I enable the local Apic on the ap:
Code:
   mov eax,[apicregs]
   mov ecx,[eax+0xF0]                     ;ENABLE APIC
   or ecx,0101001101b                     ;+spuriousvector
   mov [eax+0xF0],ecx


   mov DWORD [eax+0xD0],0xFF000000            ;logical id


   mov DWORD [eax+0x80],0x0                  ;taskpriority
   

   mov DWORD ecx,[eax+0xE0]                 ;destinationformat
   or ecx,0xF0000000
   mov DWORD [eax+0xE0],ecx
   

   mov ecx,[eax+0x350]                     ;lint0:  extint
   and ecx,0xFFFE58FF
   or ecx,0x700
   mov [eax+0x350],ecx


   mov ecx,[eax+0x360]                     ;lint1: nmi
   and ecx,0xFFFE58FF
   or ecx,0x400
   mov [eax+0x360],ecx


The code, which sets the entry for the intr-signal (first entry form the IO-Apic Redirection Table) is:
Code:
   mov DWORD [ecx],0x10
   mov eax,[ecx+0x10]
   and eax,00101000000000000b
   or eax,0xF00
   mov [ecx+0x10],eax

   mov DWORD [ecx],0x11
   mov eax,[ecx+0x10]
   or eax,0xFF000000
   mov [ecx+0x10],eax

ECX is loaded with the memory mapped adress of the IO-Apic read form the accordant IO-Apic entry in the MP Configuration Table.

EDIT: With this code, i set the IMCR-Bit:
Code:
   and al,10000000b
   cmp al,0x0
   je noimcr

   mov al,0x70
   out 0x22,al
   mov al,0x1
   out 0x23,al

noimcr:
In al I have saved the worth of the second featurebyte form the MP-Floating Point Structure.


Thanks
Noooooooooos


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 29, 2007 6:56 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

Some general notes. If your OS is "long mode only" or something, then some of these things may not apply...

Don't use cluster mode addressing, especially "hierarchical cluster mode". AFAIK it was intended for large NUMA systems, where there's a "node controller" for each NUMA domain that forwarded interrupts to CPUs within that NUMA domain (with a seperate APIC bus for each NUMA domain). Unless your chipset has these "node controllers" (or "cluster managers" as Intel calls them) it won't work, and no modern computers have them (AFAIK there are only a few obscure Pentium III/P6 NUMA systems that ever did). You want to use "flat model" for normal SMP and for most NUMA systems (including AMD's).

For compatability with P6 family CPUs, it's best to use a spurious interrupt vector that has the lowest 3 bits set (e.g. 0x1F, 0x2F, 0x3F, etc) because these bits are hardwired to '1' in P6 CPUs. At the moment you're doing "spurious = spurious | 0101001101b", so on a P4 you'd be using vector 0x4D and on a Pentium III you'd be using vector 0x4F.

For most local APIC registers, there isn't much point doing "register = register | stuff" and it's much simpler to just do "register = stuff". For example, if (for some strange reason) the BIOS left the spurious interrupt register left set to 0x000000FF, then you'd end up doing "spurious = 0xFF | 0x14D" and using vector 0xFF for the spurious interrupt.

The "logical ID" should be treated as a bit mask, as the CPU does the equivelent of "if( (APIC_message_destination & local_APIC_logical_ID) != 0" when it decides if it should accept the message or not. Some OSs set one bit for each CPU, so that (for e.g.) sending a message to the logical destination 0x10101010 would send the message to every second CPU. Of course this doesn't work if there's more than 8 CPUs. I give special meanings to each bit, for e.g.:
    bit 0 - set in all running CPUs
    bit 1 - set if the CPU is the first CPU in the core
    bit 2 - set if the CPU is the first CPU in the chip
    bit 3 - set if the CPU is the first CPU in the NUMA domain
This means I can use logical destinations to send an IPI to every CPU, to one CPU in each core, to one CPU in each chip, and to one CPU in each NUMA domain. It's potentially less efficient than "one bit per CPU", but it works regardless of how many CPUs are present (but these may not be appropriate meanings for these bits in other OS designs).

LINT0 and LINT1 should be setup dynamically (according to whatever the MPS table or ACPI says), rather than hardcoded. Harcoding them (like in the example in Intel manuals) is only really appropriate for BIOSs (where they know how the inputs are connected in advance). The same applies to I/O APIC inputs (for e.g. the first entry in the I/O APIC redirection table might not be the PIT chip).

Also, for older systems, you may need to disable the PIC chips using the IMCR. For e.g.:
Code:
   test dword [SIBBOOTdetectionFlags],DETECTFLAGhasIMCRP   ;Does IMCR need to be disabled?
   je .noIMCR                  ; no
   mov al,0x70
   out 0x22,al                  ;Select IMCR register
   mov al,1
   out 0x23,al                  ;Disable PICs
.noIMCR:

The MPS table or ACPI tables will tell you if this is necessary or not.

For testing purposes, start by sending an IPI using "broadcast to all including self" destination shorthand - it doesn't rely on things like the I/O APIC or half the local APIC configuration, which makes it a good place to start. If that works, then try sending IPIs to specific CPUs (you can do these steps while you're still using the PIC). Once you're sure the local APIC is setup right and working correctly, then try setting up I/O APIC IRQs. The idea is to minimise the things that could be wrong at each step, so that if something is wrong there's less you need to check... ;)


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 29, 2007 12:24 pm 
Offline
Member
Member

Joined: Fri Jul 20, 2007 1:39 am
Posts: 45
About setting the imcr, I have later added them with edit. Perhaps you didn't see that, because of the forum. I also sometimes don't see my Edits.

To the MPS; how I have to set the polarity and the trigger-mode, when in the MPS the corresponding field is set to zero (conforms to specifications of bus). What's the bus from Pic or the NMI signal?

Thanks
Noooooooooooos


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 29, 2007 1:22 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

nooooooooos wrote:
To the MPS; how I have to set the polarity and the trigger-mode, when in the MPS the corresponding field is set to zero (conforms to specifications of bus). What's the bus from Pic or the NMI signal?


The local APIC ignores the trigger mode for all local APIC interrupts and uses edge triggered, *unless* they're programmed as "fixed", so you don't need to worry about the trigger mode for "extINT" or "NMI".

For the polarity, I think both of them are active high (same as ISA bus).


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 29, 2007 3:37 pm 
Offline
Member
Member

Joined: Fri Jul 20, 2007 1:39 am
Posts: 45
But, when trigger-mode and polarity is fixed, whats the reason to make these settings all dynamically? Could there be a SMI or a fixed Interrupt for LINT? And if yes, how I then get Interrupts for the PIC? Then I should have much more code only for programm the LINTs. Much less that I also have to provide a corresponding ISR. How do you (not only Brendan;)) handle with this possibility, that ther could be an SMI or fixed int for LINTS?

You say, that the local Apic uses mostly always edge-triggered. But what about the io-apic. What I have to use there when i get a 00 in the mps for the trigger-mode.

With the IO-Apic I have to set 64 bits. Now I don't know accurately, which DWORD (this with the destination field oder this DWORD with all the other bits) should be written in e.g: 0x10 and which in 0x11.

Cheers
Noooooooooooooos


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 29, 2007 10:22 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

nooooooooos wrote:
But, when trigger-mode and polarity is fixed, whats the reason to make these settings all dynamically? Could there be a SMI or a fixed Interrupt for LINT? And if yes, how I then get Interrupts for the PIC? Then I should have much more code only for programm the LINTs. Much less that I also have to provide a corresponding ISR. How do you (not only Brendan;)) handle with this possibility, that ther could be an SMI or fixed int for LINTS?


There's nothing in any of the specifications that prevents a hardware manufacturer from connecting NMI to LINT0 and ExtINT to LINT1 (or visa versa), or connecting ExtINT to the I/O APIC (and not connecting it to the local APIC at all), or using active low instead of active high.

For an obscure/extreme example, I could imagine an embedded system (that wouldn't be "PC compatible" but does use an 80x86 CPU) where several PCI devices share a single IRQ and are connected directly to LINT0 or LINT1, and where there is no PIC or I/O APIC anywhere. Of course you don't design an OS to handle this sort of thing, but you might find yourself adapting the OS for something like this one day and wish the original code was a little more flexible.

nooooooooos wrote:
You say, that the local Apic uses mostly always edge-triggered. But what about the io-apic. What I have to use there when i get a 00 in the mps for the trigger-mode.


In this case you'd get the "bus number" from the MP table's "I/O APIC interrupt assignment entry" and use it to find the associated "bus entry" to see what sort of bus the IRQ comes from. Then you'd configure the trigger mode and polarity according to the bus type (IIRC edge triggered active high for ISA, level triggered active low for PCI, etc). To be honest, it's a pain in the neck and I really can't see why the BIOSs couldn't be specific and tell us the trigger mode and polarity directly.

nooooooooos wrote:
With the IO-Apic I have to set 64 bits. Now I don't know accurately, which DWORD (this with the destination field oder this DWORD with all the other bits) should be written in e.g: 0x10 and which in 0x11.


The high dword goes to the higher location (e.g. 0x11) and the low dword goes to the lower location (e.g. 0x10).... ;)


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Sep 30, 2007 1:01 am 
Offline
Member
Member

Joined: Fri Jul 20, 2007 1:39 am
Posts: 45
Now I have tested IPIs and the ipi to all processors worked as well as the specific ipi. And now, is there another step, between sending an ipi and try to get an timer-interrupt?

I don't no further what I can do...

EDIT: Whats about the EOI. Do I have to send a EOI to the APIC and the PIC or only to the PIC? Do I have to send only 1 EOI if anytime both Porcessors would get the Interrupt?

EDIT2: I also prooved the ESR, but it didn't show any errors...

Cheers
Nooooooooooos


Top
 Profile  
 
 Post subject:
PostPosted: Sun Sep 30, 2007 2:05 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

nooooooooos wrote:
Now I have tested IPIs and the ipi to all processors worked as well as the specific ipi. And now, is there another step, between sending an ipi and try to get an timer-interrupt?

I don't no further what I can do...


Cool :)

If one CPU can send an interrupt to another CPU, then the next step is to see if the I/O APIC can send an interrupt.

nooooooooos wrote:
EDIT: Whats about the EOI. Do I have to send a EOI to the APIC and the PIC or only to the PIC? Do I have to send only 1 EOI if anytime both Porcessors would get the Interrupt?


For all normal interrupts and IPIs (but not for NMI, SMI, INIT or spurious interrupts) you need to send an EOI to the local APIC (or the PIC chip if the interrupt came from there, but they'd be disabled/unused if you're using the I/O APIC).

The first step would be to make sure the PIC chips are disabled by dong the IMCR thing (if necessary) and masking all IRQs in the PIC. Then you'd setup the I/O APIC. At this stage I leave most IRQs disabled in the I/O APIC except whatever the kernel is ready to use (e.g. the timer IRQ that I use for keeping track of real time). The other IRQs can be setup but left disabled until later (e.g. until you're kernel is ready to start device drivers).

nooooooooos wrote:
EDIT2: I also prooved the ESR, but it didn't show any errors...


For temporary errors (like checksum errors) the APICs automatically retry. The rest are programming errors - using an invalid interrupt vector (vectors 0x00 to 0x1F) or trying to access a local APIC register that doesn't exist. Also note that different errors are supported by different CPUs - Pentium only supports the illegical interrupt vector error, while later CPUs support more of the error flags.

There's a few more thing I could/should mention. If your OS supports Pentium CPUs then you need to be careful of a bug in the local APIC where writes to the local APIC registers can be lost/ignored. This bug effects 75 MHz and faster Pentiums (with and without MMX), where family = 5, model = 2 and the stepping is either 11 or below 5 (e.g. steppings 0, 1, 2, 3, 4 and 11).

To avoid this bug you need to do a dummy read from a local APIC register before doing a write to any local APIC register. For an example (from my previous kernel):

Code:
%macro LAPICwrite 2
%ifndef SUPPORT_PENTIUM
   test dword [gs:CPULISTentryStruct.errata1],CPUERRATA1badLAPIC
   je %%l1
   cmp dword [SMEMlocalAPIC+(%1)],0x1234567     ;DUMMY READ BEFORE WRITE
%endif
%%l1:   mov dword [SMEMlocalAPIC+(%1)],%2
%endmacro


Also there's several differences between the local APICs used in more modern CPUs and the local APICs (the external 82489DX chips) used in 80486 (and some early Pentium systems IIRC). I don't worry about these differences - if the OS detects that the local APIC is an 82489DX chip then the OS boots as a single CPU machine without APICs. These machines are very rare, and IMHO they aren't worth the trouble (even though I do support single CPU 80486).

Lastly, some single CPU systems have local APICs and no I/O APICs. In this case you'd probably still want to use the local APIC for it's timer (and performance monitor interrupts and thermal sensor interrupt, if present). There may be multi-CPU machines with local APICs and no I/O APICs (or with broken/dodgy I/O APICs), but (like the 82489DX chips) they aren't worth bothering with IMHO. This gives 4 combinations - single CPU with no local APIC, single CPU with local APIC, single CPU with local APIC and I/O APIC, and multi-CPU.


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Sep 30, 2007 2:55 pm 
Offline
Member
Member

Joined: Fri Jul 20, 2007 1:39 am
Posts: 45
Good, I'm going to check these points tomorrow...
Now, i'm still using bochs...So the bug with the registers can't be the reason of my problem...But good to now for future...Thx

Why I should mask all PIC interrupts? It's my goal to get a timer-Interrupt from the PIC which is connected at LINTx...Is this not possible?
Now I've masked all ints from PIC without Timer and demasked the INTR entry in the IO-Apic...

Quote:
then the next step is to see if the I/O APIC can send an interrupt.
What do you mean with check if the IO-Apic sends an interrupt? That is what i'm doing with the IO-Apic-Interrupt for INTR, but it doesn't work...Should I try the IO-Apic with another interrupt like the keyboard?

Cheers
Noooooooooooooos


Top
 Profile  
 
 Post subject:
PostPosted: Sun Sep 30, 2007 4:07 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

nooooooooos wrote:
Why I should mask all PIC interrupts? It's my goal to get a timer-Interrupt from the PIC which is connected at LINTx...Is this not possible?
Now I've masked all ints from PIC without Timer and demasked the INTR entry in the IO-Apic...


Devices that generate IRQs (e.g. the PIT, RTC, floppy, hard drive, etc) are connected to the PIC chips and the I/O APIC. If you enable the I/O APIC and also leave the PIC enabled, then (depending on other things, like if there's an IMCR) you get 2 interrupts instead of one - one interrupt from the PIC and another interrupt from the I/O APIC.

There's 3 ways to hande this:
    1) disable the PIC and enable the I/O APIC
    2) leave the I/O APIC disabled and use the PIC
    3) used "mixed mode" where individual IRQs are either enabled in the PIC or the I/O APIC (but never both).
I'd strongly recommend avoiding mixed mode, as it doesn't work in some cases (e.g. if there's an IMCR) and is just messy and unnecessary. I'd also recommened using the I/O APIC if it's present - it's more efficient, causes less IRQ sharing and works for multi-CPU (the PIC chips don't work for multi-CPU unless the system is configured so that only one CPU handles all IRQs, which is bad for IRQ balancing).

So the question now is, does your PIT send an IRQ to the PIC and I/O APIC where the PIC forwards it to a CPU and the I/O APIC ignores it; or, does your PIT send an IRQ to the PIC and I/O APIC where the I/O APIC forwards it to the CPU/s and the PIC ignores it?

If the PIC ignores the IRQ, then you want to mask the IRQ in the PIC. If the I/O APIC ignores the IRQ then you want to leave the I/O APIC disabled (e.g. don't do the IMCR thing and don't enable any of the I/O APIC inputs in the IO-APIC redirection table).

To confuse things, for some (mostly older) motherboards the PIT is not connected to the I/O APIC. For SMP systems (and single CPU systems that have a local APIC) I'd recommend using the local APIC timer/s for scheduling and the RTC/CMOS IRQ for keeping track of real time so that the OS doesn't need to care if the PIT is connected to the I/O APIC or not. The alternative for these computers is to either leave the I/O APIC disabled, or use "mixed mode" to get the PIT to interrupt the CPU (or don't support them and refuse to boot).

Also, there's an easy to fix problem with the way FPU exceptions are reported. There's the old dodgy (default) way where the FPU sends an IRQ to the PIC chip and the PIC chip causes an interrupt (which doesn't work if there's more than one CPU), and (in general) can cause race conditions, etc. Then there's the new way where the OS enables "native FPU exceptions" (set bit 5 in CR0) and the FPU generates an exception (interrupt #16) instead of using the PIC. In general, all OSs that aren't DOS should always enable "native FPU exceptions" (unless the CPU is 80386 or older and doesn't support it, in which case you'd want to support both methods and it's a pain in the neck). For SMP systems and single CPU systems that have an I/O APIC, it's likely that none of the FPU error signals (FERR#) from any CPU (including the BSP) is connected to the I/O APIC (if you're using the I/O APIC you need to enable "native FPU exceptions").

Finally, if you support I/O APICs don't be surprised if there's more than one. The computer I'm using to type this reply (a dual Pentium III system) has 2 seperate I/O APICs with 16 inputs each, where the first I/O APIC is mostly used for ISA IRQs and the second I/O APIC is mostly used for PCI IRQs. I have seen datasheets, etc for large servers with 4 I/O APICs. The MP Specification tables and/or APCI tables will tell you how many I/O APICs there are and what addresses they use (and details for each input).


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Oct 07, 2007 1:39 am 
Offline
Member
Member

Joined: Fri Jul 20, 2007 1:39 am
Posts: 45
Good, now I think i'm going to use only Apic-Mode...But when I want to connect for example the keyboard IRQ, which bus does connect the keyboard controller to the APIC and then, which IRQ of this bus i should take? And how I do this for general?

EDIT: Hmm, now I sometimes get a hardware IRQ 7, though I have masked all IRQs. Do you have any ideas why that could be? I don't have worked with lpt yet...



Cheers
Nooooooooooos


Top
 Profile  
 
 Post subject:
PostPosted: Sun Oct 07, 2007 5:52 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

nooooooooos wrote:
Good, now I think i'm going to use only Apic-Mode...But when I want to connect for example the keyboard IRQ, which bus does connect the keyboard controller to the APIC and then, which IRQ of this bus i should take? And how I do this for general?


Devices

In general, there's 3 different types of devices:
    ISA Devices With Fixed IRQs - these devices always have the same ISA IRQs numbers for historical reasons (backwards compatability). This includes the PS/2 controller (IRQ 1 and IRQ 12), the PIT chip, the CMOS/RTC, the first floppy controller, the first 2 serial ports and the first parallel port. The first 2 PATA/ATAPI/SATA hard disk contollers might also fit in this category - modern hard disk controllers have a "legacy mode" and a "native mode" where they use ISA IRQ 14 or 15 in legacy mode and behave like PCI devices in native mode.
    ISA Devices Without Fixed IRQs - these devices use ISA IRQs but there are no predefined IRQs for them. Examples include network cards, sound cards, SCSI controllers, video cards, additional serial ports, additional parallel ports, additional floppy controllers and additional PATA/ATAPI/SATA hard disk controllers; but only if they aren't PCI devices. For these devices you might be able to use "ISA Plug and Play" to detect (and reconfigure) the resources they use, but in general you need to use configuration scripts or something that are setup by the end-user.
    PCI Devices - for these devices the BIOS/firmware/OS can automatically detect (and reconfigure) resources the devices use. For the sake of completeness, MCA devices also fit in this category (I'm not sure about EISA).


The I/O APIC

For the I/O APIC, the MP specification and/or ACPI tables will tell you which IRQ each PCI device uses. This means you'll find an "I/O Interrupt Assignment Entry" for every PCI device (and if 2 or more PCI devices share the same IRQ line you'll get 2 or more "I/O Interrupt Assignment Entries" for that IRQ line rather than just one).

In addition, you will (should) also find an "I/O Interrupt Assignment Entry" for every possible ISA IRQ line, regardless of how many devices are connected to that ISA IRQ line (if any).

How To Make Sense Of It - The Lazy Way

The lazy way to do things is to do very little during boot. When an ISA device driver wants to install an IRQ handler you search the MP specification and/or ACPI tables for the "I/O Interrupt Assignment Entry" that corresponds to the ISA IRQ, then install the IRQ handler and enable the IRQ in the I/O APIC.

When a PCI device driver wants to install an IRQ handler you can do something very similar - search for the PCI device in the MP specification and/or ACPI tables (e.g. using it's "bus:device:function") to find out which I/O APIC input it's using, and then either (if nothing is already using the IRQ) install a new IRQ handler and enable the IRQ in the I/O APIC, or (if something is already using the IRQ) find the existing IRQ handler and add the new device's IRQ handler to it. [Note: I'm not going to describe methods of handling PCI IRQ sharing here]

How To Make Sense Of It - The Compatible Way

Another way to do things would be to setup dummy IRQ handlers for ISA devices during boot, so that (for example) if the PIC chips are being used then ISA IRQs generate interrupts 0x20 to 0x2F, and if I/O APICs are being used then ISA IRQs still generate interrupts 0x20 to 0x2F. In this case, an ISA device driver (e.g. a keyboard driver) just needs to install an IRQ handler (e.g. interrupt 0x21) without caring if the OS is using PIC chips or I/O APICs (except for the EOI).

You could handle PCI devices the same way as you would for the lazy way, or you could install dummy handlers and configure these in the I/O APIC during boot too.

Notes

Copy the information you need from the MP specification and/or ACPI tables into your own data structure/s so that you don't need to care which (MP or ACPI) was used after boot, and can reclaim (free and reuse) any RAM that the ACPI tables were using. I'd recommend a hierarchical tree with one entry for each device that describes everything about the device (I/O ports, IRQs, DMA channels, device driver name, status, etc).

Next, realise that (if you're using I/O APICs) the interrupt vector you set (and the IDT entry you use) determines the priority of the interrupt (but makes no difference otherwise). For this reason the "Compatible Way" is probably a bad way. For example, where possible you might want to ensure that some IRQs are higher priority than others (e.g. for me the RTC/CMOS IRQ is the highest priority IRQ when I/O APICs are used).

Also, some PCI devices support MSI (Message Signalled Interrupts) where the device can send an IRQ directly to the CPU/s (without using an I/O APIC input or being configured in the I/O APIC). I assume this is intended to reduce the need for IRQ sharing.

Lastly, for NUMA and SMP systems you might want to consider some intelligent IRQ balancing. There's several conflicting ideas here - attempt to reduce the overhead of interrupts on CPUs running high priority tasks by sending IRQs to CPUs running low priority tasks; make sure IRQs are sent to CPUs where the corresponding IRQ handler and it's data is still in the CPUs cache; don't send IRQs to sleeping/halted CPUs to avoid waking them up (to improve power management/consumption). Stock Linux kernels are crap when it comes to IRQ balancing (there isn't any), but people (AFAIK Intel is a contributor) are trying to fix it with an add-on daemon called "irqbalance". The irqbalance website does have good ideas and explanations that are worth reading (but I wouldn't recommend fixing bad design with add-on daemons).


Phantom IRQ 7

This is probably a spurious IRQ from the PIC chips. Even though all of the IRQs in the PIC chips are masked/disabled, you can still get spurious IRQs from the PIC chips on IRQ 7 and IRQ 15 (depending on the motherboard).


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: IRQ balancing
PostPosted: Mon Oct 08, 2007 6:06 pm 
Offline
Member
Member
User avatar

Joined: Wed Feb 07, 2007 1:45 pm
Posts: 1401
Location: Eugene, OR, US
Brendan's info here is sterling, vitally needed, and much appreciated,

Brendan wrote:
for NUMA and SMP systems you might want to consider some intelligent IRQ balancing.


But on just that one point, I would like to strongly disagree. I would recommend sending all your IRQs to one CPU. The efficiency benefits that you gain from IRQ balancing do not even equal the costs of the concurrency issues that you aquire in implementing it.


Top
 Profile  
 
 Post subject: Re: IRQ balancing
PostPosted: Tue Oct 09, 2007 12:20 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

bewing wrote:
Brendan's info here is sterling, vitally needed, and much appreciated,

Brendan wrote:
for NUMA and SMP systems you might want to consider some intelligent IRQ balancing.


But on just that one point, I would like to strongly disagree. I would recommend sending all your IRQs to one CPU. The efficiency benefits that you gain from IRQ balancing do not even equal the costs of the concurrency issues that you aquire in implementing it.


Imagine a NUMA system like this:

IO1 <----> CPU1/Mem1 <----> CPU2/Mem2 <----> IO2

Does it make sense for devices connected to the first I/O hub to be handled by CPU2, or for the devices connected to the second I/O hub to be handled by CPU1?

Next, think about how many interrupts you get per second. Something simple like a serial port is capable of operating at 115200 bits per second, which (using 7 data bits, no parity, one stop bit and no FIFO) works out to 14400 interrupts per second, but it can send and receive at the same time (and generate a "transmitter empty" IRQ) so you're looking at a maximum of 28800 interrupts per second. If it takes 1000 cycles to handle the IRQ it'll cost 28800000 cycles per second - enough to consume more than 100% of CPU time on a 25 MHz 80486.

Things like high speed ethernet adapters and fibrechannel can be worse than boring old serial ports. Let's say you've got a server with a pair of 1 GHz CPUs - how many ethernet cards can it handle before it chokes if each ethernet card generates 20000 interrupts per second and if it costs 5000 cycles to handle each IRQ? At 10 ethernet cards there's enough load to use 100% of one CPU - will your OS handle 15 of these ethernet cards going flat out?

Lastly, does your OS support interrupt nesting? Even if it does, (with one CPU handling all IRQs) a lower priority IRQ would need to wait for a higher priority IRQ to be serviced before it can be serviced, which effects the worst case interrupt latency. Spreading IRQs across all CPUs reduces this problem and improves interrupt latency - instead of waiting for a higher priority IRQ to be serviced, another CPU can handle the lower priority IRQ immediately.

So, IMHO making one CPU handle all IRQs doesn't make sense, but why not configure IRQs during boot and avoid dynamic IRQ balancing? Imagine a plain SMP machine:

IO1
|
|__CPU1
|__CPU2

If CPU2 is idle, but CPU1 is hot and is operating with thermal throttling (e.g. running at 25% the speed it normally would), then it'd make sense to shift CPU load from CPU1 to CPU2 and place CPU1 in a power saving state. Trying to use CPU1 to handle the IRQs would increase IRQ latency (partly because it's running slow, and partly because it takes time to bring it out of the sleep state) and also make it harder for CPU1 to cool down and return to normal speed. For example, if CPU1 is doing HLT to save power and cool down, then the IRQs would be constantly taking it out of the HLT state.

Of course you don't need to take my word for it. Instead, have a look at who the main contributor is for the "irqbalance" daemon, and ask yourself why a company like that would do this work if it didn't have advantages... ;)


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 85 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], DotBot [Bot], mrjbom and 61 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group