Research Proposal
I submitted my research proposal today. To be honest, it isn't really well written, but I was told not to spend too much time on it. So here it goes:-
Hardware Supported Software Transactional Memory
Moore’s law is dead – or at least that is what some experts (including Gordon Moore himself) have been saying for a while. While it is true that many of the claims of the demise of Moore’s law are exaggerated, it does seem that the trend as drawn by Moore’s law is slowing down, and that sooner or later it will not hold any more. [1]
Due to this and other economical considerations, the idea of multiprocessor and multi-threading machines is becoming more attractive to processor manufactures. That way, instead of trying to speed up a single processor, all that has to be done is to add more processors to a machine in hopes of getting it to work faster.
The main problem with this scheme is how to get these processors to actually work faster when there are shared resources involved. Like coordinating humans working on similar tasks that involve shared resources, it is difficult to coordinate the work of more than one processor to effectively share resources while avoiding conflicts.
The way conflicts are usually avoided is by designating certain portions of the code where conflicts may arise as critical sections. These critical sections are protected by using locks which prohibit more than one running process to enter this section thus eliminating any conflicts that may arise. [2]
While critical sections do eliminate such conflict, they create problems of their own. These problems include priority inversion, where a lower priority thread would gain precedence over a higher priority one because it managed to acquire the lock first, thus forcing the higher priority one to wait for it until it is done. Moreover, with the use of locks problems such as deadlocks are always an issue. To complicate things further, if a thread holds a lock and for some reason is terminated, then the lock will remain held indefinitely, thus keeping all other threads from ever accessing this critical section or any other critical section from accessing the data guarded by that lock. [3]
Another consideration when using locks is the trade-off between performance and ease of programming. Writing a multi-threaded program is a difficult task as it is, and having to use locks makes it even more complicated. It would be easier if programmers would just use coarse-grained locks and not have to worry about any potential conflicts within their code, but this would result in more serialized code which runs slowly. If they were to use finer grained locks then that would make the program more difficult to reason about. [2]
The database community has solved this problem long ago by introducing the concept of a transaction. A transaction is essentially a series of operations that appear to execute instantaneously, as far as the outside world is concerned. If a transaction fails, it appears to the outside world as if it had never executed to begin with – in other words the state of the world (or the resources said transaction touched) is never in an inconsistent state. [4]
The concept of transactional memory was first introduced to the computer architecture community by Herlihy and Moss. Their implementation was done completely in hardware which suffered from the need to add new instructions and adding a special cache or buffer to keep track of all the changes [3]. Rajwar and Goodman further refined the concept of hardware transactions in their work on Speculative Lock Elision (SLE) and Transactional Lock Removal (TRL). Their solution required no new instructions to be added to the architecture, and utilized certain hardware properties that are already implemented within most modern high performance processors to do most of the work [2]. However, a drawback to most hardware transactional memory implementations is the fact that transactions are limited by how much physical memory or buffering space is allocated for their use. Moreover, handling external events such as operating system context switching can prove to be problematic [5].
On the other hand, transactional memory was also implemented in software. Such an implementation was first introduced by Shavit and Toutoui [6] and then expanded upon by many others such as Herlihy et al in their work on Dynamic Software Transactional Memory [7]. Implementing transactions in software removed the spatial and temporal limitations imposed by hardware, yet the problem is that software implementations are inherently much slower than hardware implementations.
It has been noticed however that most critical sections are short and require little resources to run. Yet the few that actually exceed these resources can impose a significant performance penalty or worse, never succeed at all [8]. Given these observations, what we are hoping to achieve is to have a solution which makes the common case fast, by running it in hardware. When it fails to run in hardware, this transaction should fall back gracefully and run as a software transaction.
Another area that we might investigate is optimizing the performance of software transactional memory by having it take advantage of certain hardware facilities such as SLE if available, yet run correctly (albeit slower) if not. This would make the concept of transactions widely implemented on all architectures, with the ones supporting it running faster.
“He who every morning plans the transaction of the day and follows out that plan, carries a thread that will guide him through the maze of the most busy life. But where no plan is laid, where the disposal of time is surrendered merely to the chance of incidence, chaos will soon reign.”
Victor Hugo
French dramatist, novelist, & poet (1802 - 1885)
References
[1] M. Dubash, "Moore's Law is dead, says Gordon Moore," in Techworld, 2005.
http://www.techworld.com/opsys/news/index.cfm?NewsID=3477
[2] R. Ravi and R. G. James, "Transactional lock-free execution of lock-based programs," in Proceedings of the 10th international conference on Architectural support for programming languages and operating systems. San Jose, California: ACM Press, 2002.
[3] M. Herlihy and J. E. B. Moss, "Transactional memory: architectural support for lock-free data structures," in Proceedings of the 20th annual international symposium on Computer architecture. San Diego, California, United States: ACM Press, 1993.
[4] T. Kyte, Expert One-on-One Oracle, 1st Edition ed: Apress, 2003.
[5] R. Ravi, H. Maurice, and L. Konrad, "Virtualizing Transactional Memory," in Proceedings of the 32nd Annual International Symposium on Computer Architecture: IEEE Computer Society, 2005.
[6] N. Shavit and D. Touitou, "Software transactional memory," in Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing. Ottowa, Ontario, Canada: ACM Press, 1995.
[7] H. Maurice, L. Victor, M. Mark, and William N. Scherer, III, "Software transactional memory for dynamic-sized data structures," in Proceedings of the twenty-second annual symposium on Principles of distributed computing. Boston, Massachusetts: ACM Press, 2003.
[8] C. Scott Ananian, K. Asanovi'c, B. C. Kuszmaul, C. E. Leiserson, and S. Lie, "Unbounded Transactional Memory," presented at 11th International Symposium on High-Performance Computer Architecture (HPCA'05), San Franscisco, California, 2005.
Hardware Supported Software Transactional Memory
Moore’s law is dead – or at least that is what some experts (including Gordon Moore himself) have been saying for a while. While it is true that many of the claims of the demise of Moore’s law are exaggerated, it does seem that the trend as drawn by Moore’s law is slowing down, and that sooner or later it will not hold any more. [1]
Due to this and other economical considerations, the idea of multiprocessor and multi-threading machines is becoming more attractive to processor manufactures. That way, instead of trying to speed up a single processor, all that has to be done is to add more processors to a machine in hopes of getting it to work faster.
The main problem with this scheme is how to get these processors to actually work faster when there are shared resources involved. Like coordinating humans working on similar tasks that involve shared resources, it is difficult to coordinate the work of more than one processor to effectively share resources while avoiding conflicts.
The way conflicts are usually avoided is by designating certain portions of the code where conflicts may arise as critical sections. These critical sections are protected by using locks which prohibit more than one running process to enter this section thus eliminating any conflicts that may arise. [2]
While critical sections do eliminate such conflict, they create problems of their own. These problems include priority inversion, where a lower priority thread would gain precedence over a higher priority one because it managed to acquire the lock first, thus forcing the higher priority one to wait for it until it is done. Moreover, with the use of locks problems such as deadlocks are always an issue. To complicate things further, if a thread holds a lock and for some reason is terminated, then the lock will remain held indefinitely, thus keeping all other threads from ever accessing this critical section or any other critical section from accessing the data guarded by that lock. [3]
Another consideration when using locks is the trade-off between performance and ease of programming. Writing a multi-threaded program is a difficult task as it is, and having to use locks makes it even more complicated. It would be easier if programmers would just use coarse-grained locks and not have to worry about any potential conflicts within their code, but this would result in more serialized code which runs slowly. If they were to use finer grained locks then that would make the program more difficult to reason about. [2]
The database community has solved this problem long ago by introducing the concept of a transaction. A transaction is essentially a series of operations that appear to execute instantaneously, as far as the outside world is concerned. If a transaction fails, it appears to the outside world as if it had never executed to begin with – in other words the state of the world (or the resources said transaction touched) is never in an inconsistent state. [4]
The concept of transactional memory was first introduced to the computer architecture community by Herlihy and Moss. Their implementation was done completely in hardware which suffered from the need to add new instructions and adding a special cache or buffer to keep track of all the changes [3]. Rajwar and Goodman further refined the concept of hardware transactions in their work on Speculative Lock Elision (SLE) and Transactional Lock Removal (TRL). Their solution required no new instructions to be added to the architecture, and utilized certain hardware properties that are already implemented within most modern high performance processors to do most of the work [2]. However, a drawback to most hardware transactional memory implementations is the fact that transactions are limited by how much physical memory or buffering space is allocated for their use. Moreover, handling external events such as operating system context switching can prove to be problematic [5].
On the other hand, transactional memory was also implemented in software. Such an implementation was first introduced by Shavit and Toutoui [6] and then expanded upon by many others such as Herlihy et al in their work on Dynamic Software Transactional Memory [7]. Implementing transactions in software removed the spatial and temporal limitations imposed by hardware, yet the problem is that software implementations are inherently much slower than hardware implementations.
It has been noticed however that most critical sections are short and require little resources to run. Yet the few that actually exceed these resources can impose a significant performance penalty or worse, never succeed at all [8]. Given these observations, what we are hoping to achieve is to have a solution which makes the common case fast, by running it in hardware. When it fails to run in hardware, this transaction should fall back gracefully and run as a software transaction.
Another area that we might investigate is optimizing the performance of software transactional memory by having it take advantage of certain hardware facilities such as SLE if available, yet run correctly (albeit slower) if not. This would make the concept of transactions widely implemented on all architectures, with the ones supporting it running faster.
“He who every morning plans the transaction of the day and follows out that plan, carries a thread that will guide him through the maze of the most busy life. But where no plan is laid, where the disposal of time is surrendered merely to the chance of incidence, chaos will soon reign.”
Victor Hugo
French dramatist, novelist, & poet (1802 - 1885)
References
[1] M. Dubash, "Moore's Law is dead, says Gordon Moore," in Techworld, 2005.
http://www.techworld.com/opsys/news/index.cfm?NewsID=3477
[2] R. Ravi and R. G. James, "Transactional lock-free execution of lock-based programs," in Proceedings of the 10th international conference on Architectural support for programming languages and operating systems. San Jose, California: ACM Press, 2002.
[3] M. Herlihy and J. E. B. Moss, "Transactional memory: architectural support for lock-free data structures," in Proceedings of the 20th annual international symposium on Computer architecture. San Diego, California, United States: ACM Press, 1993.
[4] T. Kyte, Expert One-on-One Oracle, 1st Edition ed: Apress, 2003.
[5] R. Ravi, H. Maurice, and L. Konrad, "Virtualizing Transactional Memory," in Proceedings of the 32nd Annual International Symposium on Computer Architecture: IEEE Computer Society, 2005.
[6] N. Shavit and D. Touitou, "Software transactional memory," in Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing. Ottowa, Ontario, Canada: ACM Press, 1995.
[7] H. Maurice, L. Victor, M. Mark, and William N. Scherer, III, "Software transactional memory for dynamic-sized data structures," in Proceedings of the twenty-second annual symposium on Principles of distributed computing. Boston, Massachusetts: ACM Press, 2003.
[8] C. Scott Ananian, K. Asanovi'c, B. C. Kuszmaul, C. E. Leiserson, and S. Lie, "Unbounded Transactional Memory," presented at 11th International Symposium on High-Performance Computer Architecture (HPCA'05), San Franscisco, California, 2005.
0 Comments:
Post a Comment
<< Home