TM Research Journal

Friday, March 31, 2006

DSTM with GC is Working

Yay!

I finally got DSTM working with garbage collection under Solaris. I almost cried tears of joy! Fortuantely, since I abided by the ANSI C standard, I didn't have much trouble porting the code to Solaris.

For the question on whether to use or not to use garbage collection. What I had in mind what to test the code which is running correctly with garbage collection, and compare it to the leaky manual memory allocation code then see which version is performing better.

If the leaky version performed better than the garbage collected one, then it would be worthwhile trying to figure out where the leaks were and fix them. But if the garbage collected one performed better, then it would be pointless to fix the leaks but we should just stick to the garbage collected one.

So, what results did I get from my tests? (I should note that the manual memory allocation was was even relaxed a bit more to allow for more memory leaks in hopes of better performance).

I used my first DSTM test case which basically increments and decrements a counter equally (thus the counter should end up with the same value it started with). I tried it on both with 1000 increments and 1000 decrements. Then tried it again with 991000 increments and decrements. The result was that it performs signifinctly better (twice as much) with garbage collection rather than manual memory allocations. I was expecting better performance but this was way better than i thought.

What this means is that we should definitely proceed with the Garbage Collector based implementation.

Things to do: compare the performance of DSTM in Java vs C and see if there are any significant differences. I would think that the Java implementation might actually perform better than the C one simply because of the better Garbage Management available there.

Another thing to do: write a makefile to do all the compiling and building.

Results

Quick note of the system I was using: Sun One Studio 10 on a Solaris Sparc machine.

Manual Memory Management

1000 Increments and 990 Decrements
RSS (Memory used): 1064 kbytes
Time: 0:00

991000 Increments and 990990 Decrements
RSS (Memory used): 47736 kbytes
Time: 0:15

Garbage Collection

1000 Increments and 990 Decrements
RSS (Memory used): 1728 kbytes
Time: 0:00

991000 Increments and 990990 Decrements
RSS (Memory used): 1792 kbytes
Time: 0:08

Some useful commads that I used

To find out the running processses execution time as well as their memory utilization:-
ps -ly

To compile the code (for every file):-
cc -c test.c

To build the project with garbage collection:-
cc atomic.o locator.o readtable.o thread.o transaction.o contention.o readset.o test.o tobject.o utility.o -lgc -z muldefs -o test

The -z muldefs is to allow for the multiple definition of some global constants, and the -lgc is to load the garbage collector library (this should come AFTER all the source files)

Note: You definitely should create a makefile instead.

Why is Garbage Collection Faster

A couple of articles that might shed some light:-
http://www.memorymanagement.org/faq.html#delete.speed
http://www.memorymanagement.org/faq.html#gc.speed

Good Garbage Collection References:-
http://www.iecc.com/gclist/GC-faq.html
http://www.memorymanagement.org/faq.html
http://www.hpl.hp.co.uk/personal/Hans_Boehm/gc/

Read this paper:-
ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps

Thursday, March 30, 2006

Garbage Collection in Solaris cc

Since our target environment is Solaris, I thought I should use Solari's libgc which deals with garbage collection for c.

http://developers.sun.com/prodtech/cc/articles/libgc.html

I had a hard time geting it to work, was following the instructions and adding -lgc as a compiler flag. Finally, after a lot of blood, sweat and tears, it runs out that I should add the flag AFTER the source file! Go figure!

e.g. cc testmem.c -lgc -o testmem

http://docs.sun.com/source/819-3688/cc_ops.app.html

Eventually I got it working. It is doing garbage collection and all. The only thing which I need to ascertain whether it is thread-safe or not. If it's not, then I'm back to square one. I posted a question on Sun's forums and am waiting for an answer.

http://forum.sun.com/thread.jspa?threadID=29907

RSTM Paper

Had a quick look at the RSTM paper. Need to go through it again. What I noticed is that they do say that freeing memory that's not being used anymore is a hassle. Which is a problem I discovered the hard way.

Fixing the Memory Leaks

I haven't worked on fixing the memory leaks (for the reasons mentioned my previous post). But I would like to note that adding a reference counter to the user data (or objects) most probably will neccessitate adding one more level of indirection, which will generate more cache misses thus making performance even worse.

That was an argument in favour of using a garbage collector (or an argument in favour of me being lazy and not wanting to fix the leak :) )

Wednesday, March 29, 2006

Compare with and without GC

I just thought of something - I should not waste any more time trying to fix the memory leaks in DSTM until I get the GC version of DSTM working. After doing that, I should compare the performance of the GC version with the current leaky version.

If GC performs better than the current version, then fixing the memory leaks will definitely NOT improve performance, thus would be a moot point.

More reference counting?

After fixing the program as I mentioned in the last post, it worked fine (not immediately, had a couple of bugs to quash).

Yet from about 500,000 memory allocation I still had about 30 memory leaks. Why?

After working on this problem all yesterday and today, I realized that it had to do with the acutal user objects that are being referenced. This is too complicated to explain here but in certain cases, when a thread manages to advance a bit more than I'd anticipated, it would not delete an object it should delete (no way for it to know that it should).

To make a long story short, it seems that the only way around this problem is to use a reference counter for the objects. This way I would in effect be having a reference counter for every datastructure used in DSTM. Moreover, I believe that there are still a couple of memory leaks that I still have unaccounted for.

I am having a hard-time getting the garbage collector to work. Will continue working on that now.

I am not happy :(

Monday, March 27, 2006

I think I can see the light

Ok. It seems that I've almost got it done. My eyes hurt so I won't really write much.

Things I want to say:-

First of all, the solution to the memory management: C requires the programmer to manage their memory by themselves - so this is how I should do it. Whenever I open an object for write, it should be the programmer who's responsible for deallocating the memory for that object when they're done with it.

I started reading one of the papers Jim sent, here's an interesting quote from the abstract of the first one:

Most either employ locks in the underlying implementation or depend on thread-safe general-purpose garbage collection to collect stale data and metadata.


[Source: Lowering the Overhead of Nonblocking Software Transactional Memory, Marathe et al, 2006]

What this shows is that I beleive it will be really difficult to have a lock-free and garbage collector free STM implementation. But remember to investigate the CAS possibility.

To be honest, I haven't read the papers yet. I will read them tomorrow. You hear me Fuad, READ THE PAPERS TOMORROW! :)

DSTM: Found the Problem - Don't know the Solution

I figured out why reference counting the locators isn't working. The problem is, I don't know how to solve this problem.

Basically, when a thread opens an object for writing, and if it manages to compare and swap the start pointer, it decrements the number of references to the newly created locator (since the local reference is going to fall out of scope), and it is possible that in that instant when the number of references has been decremented to 1 (since the start pointer is still poiting to it), and before it actually returning the reference to the object that another transaction might jump in, abort this transaction and swap the pointer. This other transaction will see that the number of references to the locator is one, would decrement it since it it switched the start pointer, and clear it since there are no more pointers to it.

Hope the above made sense.

How to solve this problem?

First solution which comes to mind is to add reference counters to the objects. I fear though that this would require even more locks and that there would be even more unforseen problems here.

Another one would be to somehow (haven't thought this through yet) invalidate the old object (setting the pointer to null) - don't think this'll work though.

James mentioned that the paper Jim sent today deals with Garbage Collection and this sort of thing, so I guess that this afternoon I'll read those papers and maybe see if it is possible to come up with a better solution.

Peace Out!

Saturday, March 25, 2006

Garbage Collection is Inevitable

This is making me depressed. Today I realized that even if reference counting for Locators were to work properly, there is still a problem. Somehow, malloc sometimes crashes! Malloc! I honestly don't know what to do now other than use a garbage collector...

The fact that a garbage collector is required, to be honest, makes me question DSTM even more.

I would like it if Jim or James were to find out how the C++ version of DSTM (which we don't have access to) deals with garbage collection or memory allocation.

Friday, March 24, 2006

Meeting with Robert

We just metup with Robert Strandh since he's an expert when it comes to Garbage Collectors. It seems that the best thing we could do is use a Boehm-Demers-Weiser garbage collector (like the HP one I found the other day). The thing is there garbage collectors aren't incrementatl, they arctually stop the world when they do their stuff. Thus a Java GC would have actually been better.

Memory Leaks

Like I anticipated yesterday - there are memory leaks. Plus I discovered yet another portion of the code that needs to be locked.

The cause of the memory leak is obvious, but the solution isn't. What's causing the leak is when I increment the reference counter for the Locator objects, I do not decrement it if the object fails to aquire it. The simple solution might seem to be just to decrement it, but if I do decrement it later if I fail, all hell breaks loose. I don't know why exactly but judging by previous similar problems this indicates that I actually need to lock yet another portion of the code.

Another portion which had to be locked was where you get the committed version of the object from the locator. When that happens, you attempt to clear the old object from memory (since if it's committed then it's no longer valid). However, if two threads enter that portion of code simultaniously and attempt to clear that portion of code concurrently then we have a problem - which I'm afraid can only be solved by locking.

Nick Hay pointed out an interesting research paper about multi-threaded Garbage Collection at http://www.cs.technion.ac.il/~erez/Papers/cgc9.pdf . GC for multithreaded programs is not trivial. But Nick poitned out that Transactional Memory behaves in a certain predictable pattern, so a specific garbage collecotor for TM might be written.

Moreover, I was thinking that an interesting area of research would be hardware assisted garbage collectors. Just a thought.

All in all, my current DSTM implementation is looking ugly, and I am not sure whether I should proceed as it is or find a garbage collector.

You know what... I think for the time being I will trying to get it working with the minimum number of locks possible, then when that (hopefully) works, try to figure out a way to elude all these locks.

Moreover, investigating garbage collectors,, and maybe finding a simple one might be a good idea.

special case nick

Thursday, March 23, 2006

DSTM Memory Management

Ok. I've decided against using a garbage collector since it will involve using external libraries and I'd like the DSTM implementation to be completely independant of any library used.

Anyway, two problems I've had (and since have solved) are below:-

Clearning the transactions

I have used reference counting to keep track of the transactions like I mentioned in a previous post. But there were still some issues when it came time to clear the transactions. After a lot of work, it turns out that the reference counting is working fine - BUT if two threads get to the point where the number of references to the transaction is 0 at the same time, one will free the transaction and succeed, while the other will fail.

The way around this problem was to implement a check just before you attempt to free thr transaction, and if someone else has freed it then don't try to free it. That sim ple.

I wish I could say the same for the test though....

Clearning Old Locators

Now this one is bad. Really BAD.

The problem here is this. We have two threads which attempt to open an object for writing. Both ones point to the old locator, and both create a new locator which they attempt to install. Now let's say transaction A just pointed to the old locator and was switched out, while transaction B managed to switch the old locator to the new one, and delete the old locator.

Now the problem is, when transaction A continues working, the locator it is poiting to has its memory deallocated, and this will obviously lead to a runtime error. The problem is, there is no way for inform the first transaction that it has been deallocated.

In java, what DSTM does, is that it doesn't free the memory, and in time the transaction will realize that its not valid any more and the garbage collector will take care of all the problems.

In C, we have to manually do all the garbage collection though. The idea would be in this case to implement reference counting again to keep track of how many transactions are pointing to each object. Which is all well until you consider the following scenrario:

Transaction A points to the old locator, it sees that it's there so it attempts to increment the reference count of the locator. Before it actually does that. Transaction B manages to successfully swap the old locator with the new. It attempts to clear the locator but it checks the reference count beforehand to make sure that it should. Having checked the reference count, it finds that its 0 (since A hasn't actually updated it yet). So it deallocates it.

Now Transaction A is up and running again, it attempts to increment the transaction count (which points to a random place in memory now), and hell breaks loose.

The problem here is that a compare-and-swap will not work, since you do not know what you should compare with.

After a lot of thought, I realized that the best way to handle it would be, dare I say, a LOCK! Yes a lock. At first this idea seemed repulsive, since the whole point here is to avoid locks! But as I thought about it more, I realized that if each transactional object (TMObject) implements its own lock, and we just lock incrementing the reference counter as well as decrementing/deallocating then it will actually be at the finest level of granularity possible. In other words, such a lock will only serialize what really has to be serialzed, and it will not serialize the portions that are fine if they ran together.

Anyway, that's what I did. James doubts that Jim would be happy with this solution. But in my humble opinion, this solution is better than using an external library GC. Moreover, even if SLE was used for this particular portion a lock would actually be better since SLE will try, and in case of contention will fail and fall back to a lock. While here we know for a fact that it WILL fail so we just go ahead and lock it from the start.

Rogue Frees

The remaining problem now is that it seems that somewhere I am freeing something I shouldn't free. I do not know if I'm actually up to investigating this today or tomorrow.

Moreover, I have this nagging feeling that I might have a memory leak somewhere. This might be cause by (this will not make any sense to people not familiar with the code) by line 245 in tobject.c . This is just a reminder in case I forget it.

Tuesday, March 21, 2006

DSTM and Memory Management in C

I figured out the problem... the thing is it seems that the solution won't be pretty.

The problem basically is that when I wrote the program to run in C, I had to manually manage memory. I thought about memory management as a single threaded thing, and it was working fine back then.

Now though, when the program is actually multi-threaded, different pointers are mingling and interacting in ways I didn't except, and to be honest, even now, I cannot see how to be able to forsee every possible interaction. Thus from the looks of it, it seems that I need to implement garbage collection.

Of course I'm not gonna write my own collector since that's an involved process. I found one developed by some people at HP which seems to be quiet a respectable GC:
http://www.hpl.hp.co.uk/personal/Hans_Boehm/gc/

So, what I'm going to do now is try to get my DSTM working with a garbace collector. Once I do that I will see if it's possible for me to get rid of the collector.

DSTM Bug Yesterday

I need to write this down before lunch.

James found a bug in my DSTM implementation yesterday. As I mentioned earlier, to memory manage the trasnactions, each transaction has a counter of the number of objects poiting to it.

Now as each transaction aquires an object, this counter is incremented. While as each transaction has an object dissassociated with it, the counter is decremented.

Now this counter can only be incremented by one thread (since only one thread can assocaite objects with a transaction). What I didn't think of is that more than one thread can actually decrement the counter. This is how that could happen:

Let's say a transaction has more than one object assocaited with it, then commits. Now two different new transactions running under different threads could associate different objects that were assocaited with the old transaction, thus decrementing the counter simultaniously. If the decrement isn't atomic, this could cause problems.

This is easily solved by either using an atomic decrement instruction or an atomic compare and swap. (or maybe SLE)

DSTM - You, you, you make a grown man cry

I feel like I'm going to break down and cry!

I was really careful in coding my C DSTM implementation, made sure to follow proper programming practices. Tested each component individually.

Now when I try to run it as a multi-threaded program - it fails! I have no idea where to begin in debugging the program! I think I will go for lunch now, clear my head a bit and then come back with an attack plan...

Monday, March 20, 2006

DSTM Testing

Ok, so far it seems that my C DSTM is working. I need to devise some test scenarios and double check that it works on a single thread:-

Committed transaction with:
- a read and a write
- a read
- a write
- a read upgraded to a write
- a write upgraded to a read
- Nothing
- Several read and written objects with a mix of upgrades

Aborted transactions with the same.

Run more than one transaction in a row and see what values you get.

Verify that all the values are what you expect them to be.

Saturday, March 18, 2006

DSTM Obstacle Solved

I finally managed to solve that problem. The way I went about it was similar to reference couting. I just kept a counter of the number of objects referencing each transaction, and whenever that number would reach zero AND the transaction would be committed or aborted all the memory used by it would be freed.

There are no issues in terms of contention since each transaction is associated with only one thread.

The next step would be to find a proper memory leak tester and perform such a test. The one with Visual Studio isn't working properly for some reason (keeps crashing)!

Friday, March 17, 2006

DSTM C Coding Obstable

I have encountered an obstacle in the coding of DSTM in C. I thought taht maybe writing it down will help me see the big picture and hopefully get around it.

In DSTM, when a transaction begins, each thread creates an object of type Transaction. This transaction object keeps track of the current status of the transaction and whether it's Active, Committed or Aborted.

Now all objects point to the last transaction which has accessed them. When an object is opened for writing, if it is pointing to a committed or aborted transaction then access is granted.

The problem is, in Java, whenever a transaction is created the programmer doesn't worry about deallocating the memory its using since the garbage collector is responsible for doing that when no more objects or threads are pointing to that transaction. While in C, we should deallocate the transaction manually.

The question is, when do we deallocate it? If we deallocate it when the transaction commits, then all the objects that are committed and poiting to this transaction will point to some unknown place in memory. We cannot deallocate it when the object stops pointing to it since we don't know how many objects are pointing to it.

The first idea I have in mind to to create a write list, but after thinking about it more I thought that a counter for every transaction might be better.

Let's have an object counter for every transaction. Whenever an object points to it we increment the counter. Whenever an object stops pointing to it we decrement it. When the counter reaches zero we deallocate the transaction. A new trasnaction is always allocated by every thread at the start of a transaction.

I just realized that this is actually a method of garbage collection known as reference counting. In this case though, this will be more efficient than a generally garbage collected environment since it is limited to a certain portion.

Thursday, March 16, 2006

DSTM Coding Done

I have finished the code for DSTM in C. Now is the time to test it. I hope that there aren't any serious bugs, and if that works well, it should be done by tomorrow.

Monday, March 13, 2006

DSTM Contention Manager

I figured out the contention manager today - it was easier than I expected. I have to say things are progressign quiet smoothly with DSTM. Of course I have only tested tyhe modules and didn't actually test the thing as a whole yet. So it will get exciting when I have to do that.

I need to find a toold which checks for memory leaks and stuff like that. I am kind of uncomfortable with the fact that C doesn't perform any garbage collection (I'm so spoilt, I'm used to living without garbage collection but not any more). Anyway, I have to make suret hat it's 100% accurate it that sense.

Other than that, I had a look at the Hybrid Transactional Memory paper that James sent me, I need to reread it.

I attended Clark's Master's orientation - it was quiet informative.

Saturday, March 11, 2006

Nested Transactions for LogTM and DSTM Progress

I think it would be easy to extend LogTM to support a practically unlimited number of nested transcations - where if a transaction were to fail, you'd only roll back that one. Basically implement some sort of stack and markers in the nested transcation thing.

Expert one-on-one Oracle

Today I read certain chapters from Thomas Kyte's Expert one-on-one Oracle. Good book which talks about how Oracle works - with a chapter on transcations.

DSTM Progress
Code is going well. I need to understand the cotnention manager since that part hasn't been explained in the paper. I had a look at William's port to C, I believe that the main problem he had was (from what I could tell) lack of experience in C programming. I am not a C expert by any means, but C was the first programming langauage I worked with at Uni so I believe that that gives me an advantage.

Tomorrow (Sunday) or Monday I will work on finishing the code and understanding the contention manager. I believe that if the contention manager takes me a day to figure out, I should finish the port in a week.

Thursday, March 09, 2006

DSTM in C Tables

Yesterday and today I continued working on DSTM in C. The implementation is progressing slowly yet surely. Getting back to programming in C after working with C# for over 2 months full-time is more difficult than I thought. Having to worry about pointers, memory allocation, etc....

Anyway, to keep the readset, instead of using a Hashtable I created a normal array based table. I did it that way because if the table is small, due to better memory locality, arrays with linear search perform better than hashtables (due to the cash misses generated by hash tables). The implementation is flexibile enough though to allow it to be changed to a hashtable.

If things continue going the way they are, I hope to get this done in about a week from now. I am not really sure, I should (and will) do a better guesstimate tomorrow.

As a sidenote, fire alarm went off twice today - once in the morning and once in the afternoon. Had to leave our desks and all... very disruptive.

Tuesday, March 07, 2006

pthreads and DSTM in C

Not a very productive day. My computer broke down this morning and it took them a while to fix it.

Anyway, I managed to get pthreads working under Windows using MS Visual Studio (http://sourceware.org/pthreads-win32/). Why? To be honest, I have been using visual studio for about 6 years now and I am comfortable with it. Too lazy to learn how to use emacs or vi and gcc unless I have to. so plan is to develop DSTM using MS VS and then port it to SPARC from there.

Still don't know how to handle the version control thing of old and new objects in DSTM. Problem is, in Java DSTM just compares object - which isn't that straighforward in C. Maybe I'll implement a version number which gets incremented.

Phd

Talked to Cris Calude about the PhD program - need to work on that ASAP and need to prove that I have material worthy of a PhD. Will talk to Jim about that when he's back.

Monday, March 06, 2006

Logging Cache Overflows

The more I think of it, the more I feel that using SLE and logging cache overflows is the way to go...

DSTM in C

Today I started with implementing DSTM in C. From the looks of it, it doesn't seem so difficult. But a really brilliant student tried that before me and faced some problems. Thus I am trying to be really careful about it and do things carefully.

What I did today basically was re-read the paper, focusing on the data-structures used. I drew a diagram on my whtieboard of the whole thing (I hope no one erases it :) ), then I went through the Java DSTM code.

The question is, should I do a direct translation, or just re-implement DSTM with the help of the Java code?

If I translate, I have to figure out how to translate exceptions and things like constructors. If I re-implementing it, I could be reinventing the wheel and might fall into problems which the guys that implemented DSTM have already encountered and solved.

I have a feeling my approach is gonna be a mix of both. I need a better understanding of pthreads though. But I am planning on having this implementation be independant of what kind of threading is used (forks, pthreads or whatever it is that MS does....).

Thursday, March 02, 2006

Virtual Copies

An idea which was just inspired by James - how about we log the changes to the object (to be done by the hardware). And from the software's perspective have it appear as if you had two copies, an old one and a new one?

Milestones in Program Execution

A good thing to do, if we're mixing hardware and software TMs would be to have the hardware TM, after it fails (due to a context switch or an abort), handle the work it has done so far to the software TM.

The problem is, how should the software know where the hardware is at?

My half-baked idea, which I plan on pursuing later, is to have milestones. Like a statement which saves a milestone (number or something like that) every N statements. This when the work is handed to the software, we know where we should proceed from.

This half-baked idea also involves a log somehow which rolls back the last few statements to the milestone.

LogTM and SLE

I came up with this idea today - which appearantly James had actually thought of first, to implement a LogTM and SLE hybrid. Basically the processor would run with SLE as long as there are sufficient resoucres. When it runs out of resources, it only logs the memory values of the data items it evicts.

This scheme, as I first thought, would save a lot on aborts. Since all the abort would have to do is to rollback onthe they (hopefully) few log entries it creates (which should be significantly less than the ones with LogTM), and clear the cache. Moreover, it should reduce the overhead of logging.

The problem is though, the way LogTM does it, the values it saves in the log will most probably have been in the cache already, while the scheme I proposed here will definitely have to go to memory to fetch the old value to log. Thus it would end up being slower than LogTM.

Moreover, aborts are very infrequent, so there is no real need for optimising them.

The problem with LogTM is essentially that it cannot handle context switches. Plus James thinks that even though the idea is cool, the actual implementation is not elegant. I am not familair enough with LogTM yet to make any judgements personally...