Hi,
Colonel Kernel wrote:
Quote:
The "commit()" has "per accessed location" overhead.
Only if the policy is "pessimistic" (i.e. -- lazy writes). For an "optimistic" policy (i.e. -- eager writes), the overhead is on abort, not commit.
I discount all optimistic policies, as they tend to allow transactions to read inconsistent state (see this
Wikipedia page). While it's technically possible to detect this in exception handlers and recover (e.g. abort transactions if an exception occurs while a transaction is working on inconsistent data), it's messy and doesn't help if the code goes into an infinite loop.
Of course programmers could be very careful and (try to) avoid writing code that can't handle inconsistent state, but this introduces the possibility of bugs that result in race conditions that are very hard to find and debug. It just doesn't make any sense considering the main idea of STM is to make concurrency easier than traditional locking (rather than harder).
Colonel Kernel wrote:
Quote:
Is unlikely to allow programmers to mix languages
I disagree -- since most of the complexity is in the run-time environment, I think it would be possible for a cross-language environment like the CLR to support STM for any of the supported languages. The compilers would have to support it too, to some degree (it may be possible to implement a lot of the instrumentation at the step where native code is generated from IL).
I was thinking more of using GCC to compile some C and linking it with object files compiled by Delphi and NASM. I was also thinking of the current implementations of STM (which are mostly libraries, not run-time environments).
You are right though - it'd be technically possible for someone like Microsoft to create a suite of compatible compilers that all support the same STM implementation.
Colonel Kernel wrote:
Quote:
HTM - doesn't allow different performance trade-offs to be made by different libraries and/or different compilers.
While this is true, it may be possible to have different policies in the OS support code that goes along with the special HTM hardware... I haven't read enough about it to say for sure.
I guess it'd also be technically possible to have different policies in an OS's page-based STM support code too...
Colonel Kernel wrote:
Quote:
Is unlikely to be usable in practice without linear memory manager assistance.
Probably more than just the linear memory manager. What part of the OS maintains transaction state across interrupts, thread switches, and thread migrations?
I don't know, but whatever part of the OS does would probably just call the linear memory manager routines for shifting data from the CPUs tranactional cache to temporary pages of RAM (where the temporary pages of RAM would need to be handled by the linear memory manager during "transaction abort exceptions" and "transaction commit exceptions").
Colonel Kernel wrote:
Quote:
May never exist unless the CPU manufacturers consider HTM to be a profitable/marketable feature.
Given the massive industry-wide push into multi-core, and the realization that new techniques are required to make multi-threaded programming easier (necessary in order to sell more multi-core chips), I think Intel and AMD will be all over this as soon as the research has stabilized somewhat and reached some consensus.
I see 4 possibilities:
1) STM reaches a point where the performance improvements that HTM could bring aren't enough for CPU manufacturers to bother. Given that some of the papers have benchmarks showing STM outperforming traditional locking, I'd consider this likely.
2) Programmers just keep using traditional locking, and everyone forgets about STM/HTM for another 8 years (until some more researchers decide it's a good idea and write a new batch of papers about it that everyone forgets). This wouldn't really surprise me either (it's not like HTM, STM or SMP is new), and it fits the pattern (1989, 1995, 2003, ..., ?).
3) The industry shifts to something else that makes traditional locking and STM/HTM obsolete (e.g. automatic parallelization).
4) CPU manufacturers actually do implement HTM one day (possibly when the patent on the original HTM research expires).
Colonel Kernel wrote:
Besides the issue of conflicts, which IMO are both expensive and likely with this scheme, it has another big flaw: Only a certain region of the address space can be used for transactional memory. Programs that have large, complex data structures allocated in the heap that are shared by multiple threads would need extra support from the user-level heap allocator as well (to avoid "false conflicts", it would have to try and interleave "related" allocations across many different pages -- but it doesn't know which allocations are "related"!). The compiler and linker would also have to arrange to have global data loaded in the transactional area. Suddenly this technique doesn't look very transparent to the rest of the system.
I'm used to having one heap for process space and another heap for thread space. Transactional space would need at least another heap, but there's no reason transactional space can't have several more heaps (one for each set of related data). How processes manage their heaps is up to the processes (and to be honest, I have a tendancy to use statically allocated space with large sparse arrays, "allocation on demand" and explicit "freeLinearPage()" calls, with no heap managers at all).
The compiler and linker doesn't have to arrange for global data to be loaded into the transactional area. It works the same as thread space and the ".bss" section in process space - it's all just uninitialised data (conceptually filled with zeros).
Colonel Kernel wrote:
Quote:
It would also have different performance characteristics, but I wouldn't consider that a critical flaw (that'd be like refusing to consider interpretted languages because compiled code is faster, even though there's good reasons to use interpretted languages for some things).
If it's terribly slow under normal conditions, I would say it isn't really helping.
The overhead (currently) consists of:
- a page fault when a page is accessed for the first time
- a page fault, allocating a new page and copying 4 KB of data the first time a page is written to
- freeing pages (eventually) that were allocated the first time a page is written to
- checking page table entries for transactions when a transaction commits (only for transactions that modified data and weren't marked as "should abort" before the commit)
- possibly some lock contention (see note)
Note: the spinlock ensures that only one CPU can be messing with page table entries in (any thread's) transactional space at a time. It's used in the page fault handler (for any page faults caused by accesses to transactional space) and in the "commit()" code (only for transactions that modified data and weren't marked as "should abort" before the commit).
There's also overhead when transactions need to be retried. For my current algorithm, transactions that don't modify data never need to be retried, and when transactions that do modify data conflict the newest transaction is aborted.
Of course compiler-based STM has been improved by different researchers over time, and I'm still thinking about possible optimizations for page-based STM.
Colonel Kernel wrote:
Quote:
Of course just because we can't find papers talking about the page-based approach doesn't necessarily mean that they've never been written...
True, but given the efficiency of Google and the popularity of the topic, I don't find this to be very likely.
I'm not convinced that the infinite powers of Google extend to hardcopies sitting on shelves in university libraries that were never released in electronic form to the general public, or to work done by large commercial companies like Intel or IBM who may not want competitors to see their results (regardless of whether the results are actually useful or not). As for the papers that Google does list, I've probably only looked at 10% of them...
Cheers,
Brendan