Yes and no.
It is a better option to use the processor threads over software threads. Then, you can take advantage of things such as simultaneous thread execution just like Windows and Linux do.
However, I have run into a problem with hardware threads. Say the Sync() function synchronized the two threads (waiting for the other one to catch up). If thread one contained the code:
Code:
int *acoolpointer = 0xABCDEF
Sync();
*acoolpointer = 10;
and thread two contained the code:
Code:
int *acoolpointer = 0xABCDEF
Sync();
*acoolpointer = 15;
After the above code executes, 0xABCDEF will have 15. I will explain why. We find out that 0xABCDEF cannot contain both values. So we'll have to examine this set (*acoolpainter = 15) closely.
The processor would read and tunnel the instructions (to move 10/15 into a register then into [0xABCDEF]) at the same time. Of course, it can't contain more than two values so we'll have to split this instruction up.
We know the computer are made up of bits, and to move a value takes 8 cycles (4 to move into register, 4 to move into memory). Thus we observe the following:
- cycle one: send the instruction to the processor
- cycle two: seek the memory location of values
- cycle three: read the memory location
- cycle four: send to register
- cycle five: send the next instruction to the processor
- cycle six: read the register
- cycle seven: seek the new memory location
- cycle eight: write value to the new memory location
But wait!! We are trying to write two values simultaneous. To set the memory, the processor goes through on a byte by byte basis. It first clears the byte, the sets it. We see that we do a binary OR operation.
An int takes up four bytes (or 32 bits). This is the new value of 0xABCDEF:
clear byte 1: 0000 0000
after thread 1: 0000 0000
after thread 2: 0000 0000
clear byte 2: 0000 0000
after thread 1: 0000 0000
after thread 2: 0000 0000
clear byte 3: 0000 0000
after thread 1: 0000 0000
after thread 2: 0000 0000
clear byte 4: 0000 0000
after thread 1: 0000 1010
after thread 2: 0000 1111 (or with 1111 (15) with 1010 (10)
Therefore, after the memory has been written, 0xABCDEF will contain an int value of 15.
This can cause complications, say, if a network program contains states 0 (0 in binary), 1 (1 in binary), and 2 (10 in binary). A simple binary OR on 01 and 10 will result in 11 (3 in decimal). However there is no program state 3.
There are two solutions to this problem;
- Use a stack, so state changes are only parsed one at a time.
- Place one program into a loop, until the other program has finished changing the state.
Solution A can cause difficulties within itself, since what if two threads try to simultaneously allocate memory, create a structure, and add it to the linked list. This will result in possibly a program overflow! Not to mention the *next pointer in the linked list will become corrupt. The only solution to this problem is the second solution.
The second solution is to call the scheduler, and telling placing the program in a infinite loop until the other process has finished. The dilemma over how to make sure which program should wait for the other or not, is removed by the thread manager keeping a list of threads and their state. When a thread calls the loop function, a well designed thread manager should know what thread called the function, and to set it's state to paused. The error of two threads writing to the same memory location is eliminated since each thread's state should be stored in a different location. The thread itself will be placed in a loop that will constantly loop while the thread's state is set to paused. The thread manager will have another loop that will go through each thread, and unpause them one by one. This will guarantee that two threads will unsynchronise and not try to write to the same memory location (the process of comparing a state, setting the state, comparing another state, setting the state, will cause one thread to become a few cycles behind the other thread, enough to eliminate the error).
Most processors, however, will generate an overwrite interrupt, giving the operating system a sufficient warning to place the offending thread into the unsync'ing loops detailed above.
This seems like a lot of work, but it is EXTREMELY unlikely two threads will try to write to the same memory location at once, so your operating system will rarely have to do anything. However, if you are using hardware threads, then there is a chance that it could happen, and you should always have a handler for these events. I find that this is a feature often overlooked.