Until now, programmers have been looking up to Moore’s law to speed-up their sequential program with each subsequent processor. But due to the physical limitations in chip design, the programmers’ “free lunch” is over: Multi-core architectures – multiple CPUs on a single chip – bring improved power efficiency, while each of the cores could be slower than the latest unicore processor when running traditional applications. This not only means that the acceleration of sequentially written applications may stagnate – they might even slow-down1. Rajesh Karmani discusses challenges, dead-ends and possible directions for developers.
One important question, of course, is what kinds of applications we can and need to parallelize. How much can we parallelize our word processor or the virus scanner? Scientific applications, simulations, numerical computations have been the focus of parallel computing community for some time. But as parallel platforms (like multi-core CPUs) become mainstream, mainstream applications and their programmers have to rise to the challenge. It is an interesting discussion in itself and demands a separate article which I’ll leave aside for later.
Unfortunately, the problem gets worse when chips with hundreds of cores (sometimes called manycore) are rolled out, as is being widely predicted. Two potential consequences for programmers are:
1) Programs should continue to scale (run faster) with the increasing number of cores.
2) Programs based on shared memory model will suffer serious bottlenecks on memory bus.
Shared-memory concurrent programming (in Java)
Java is a widely-used programming language. It supports concurrent programming through threads and it seems that Java’s concurrency features are an additional feature instead of setting out with the goal of being an elegant concurrent programming language. This shows up in quite a few of its constructs. For example, consider the following statement in Java:
Depending on whether x is a local variable or class/instance variable, this statement gets compiled into 1 or 3 bytecode instructions. Crucially, in case x is an instance variable, the sequence of 3 instructions read the value from the heap, increment it, and write it back to the heap. If the code gets called from two different threads, there is a small window such that one increment is observed instead of two increments, which is incorrect. Of course, they told us in their specification what can go wrong, but I am arguing for constructs and abstractions which are simple, intuitive and elegant.
Similarly, the locking mechanism in Java is bug-prone as illustrated by a commonly referred example. This piece of code was part of the StringBuffer class:
public synchronized StringBuffer append (StringBuffer sb)
int len = sb.length();
Although the method is declared as synchronized, another thread can modify ‘sb’ object after the first statement such that its length decreases to less than ‘len’. Therefore, when getChars is called on ‘sb’ with the old value of length, the program raises an exception.
Once again, the semantics of synchronized methods in Java are not intuitive enough. On entering a synchronized method, the runtime obtains a lock for the object on which the method is being called. These semantics guarantee isolation of this block of code (method body) with respect to other blocks capturing lock on the same object. This raises multiple problems; what if the programmer forgets to obtain the lock at some place and what about the consistency of other objects (‘sb’ here) being accessed (and modified) inside the synchronized method.
In order to fix this code, programmer should have obtained another lock on ‘sb’ by using the synchronized block construct:
public synchronized StringBuffer append (StringBuffer sb)
int len = sb.length();
But things are not as simple as they seem. Such an implementation could result in deadlock. Consider the following two threads:
There is a window where Thread 1 obtains the lock for sbObj and Thread 2 for sb. Each of the threads will keep waiting for the other thread to release the lock for respective object before it can proceed forward. In fact, the deadlock could occur trivially without the modification since the methods length() and getChars() are declared as synchronized in StringBuffer. For a more detailed discussion on this problem, see references 6 and 7 below.
To conclude, shared memory concurrent programs can have data races, which are conflicting accesses to shared variables. Low-level mechanism such as locks, semaphores provided in languages like C/C++/Java to prevent data-races put too much burden on programmers and yet can cause errors like deadlocks. Reference 8 details the problems with shared memory programs.
Read on the next page: Software Transactions, Conclusion
Software transactions is an elegant mechanism proposed in order to reduce the complexity of concurrent programs in shared memory models. The idea is to annotate the blocks of code that are required to execute in isolation with the keyword atomic or transaction. The onus is on the compiler or run-time to obey the semantics using one of the few different mechanisms.
A pessimistic approach is to replace the atomic construct with locks acquisition statements that guarantee that there is no deadlock. Although it reduces the burden on programmer by taking care of the locks, it has certain disadvantages:
1) The approach is based on static code analysis, and tends to over-approximate the locks it needs to obtain. Hence, it restricts the potential parallelism in application.
2) It cannot guarantee isolation of atomic block with respect to non-atomic blocks. So, if the programmer forgets to annotate atomic, it can cause inconsistencies.
3) Composition of compiled code can still cause deadlocks.
Another approach taken by researchers is based on the conjecture that conflicts are timing-dependent and are usually rare in practice. Therefore, they execute the atomic blocks optimistically and before committing the results to shared memory, validate if there was a conflict with some other code. This approach is sometimes called TM. It can be implemented either in software (STM), hardware (HTM) or both (Hybrid). In case, a conflict is detected, the transaction is rolled back and executed again. Although, TM can provide more speed-ups in some cases, it can perform poorly in others. Some of its disadvantages are:
1) It’s difficult to roll back system and I/O calls. Hence these can’t be part of the atomic blocks.
2) It can perform poorly in case of frequent conflicts due to repeated roll backs.
3) There is a very high overhead in order to ensure isolation with respect to non-atomic code. In fact, with misunderstood constructs (such as increment operator discussed above) in the language, the behavior of the program is not very intuitive.
Software Transactions is an active research area due to the simplicity of its semantics but lot of questions remain unanswered at the implementation level. A good introduction can be found at 3,5. On the other hand, a heated debate on STM versus other models took place at Ltu4.
Mainstreaming of concurrent programming has woken up language designers. Efforts such as software transactions and library support in Java5 and Intel’s TBB are elegant workarounds, but essentially I would argue they are just patch-works. As multi-cores follow Moore’s law, the number of cores on a chip will grow exponentially and there will be serious bottlenecks at the shared memory bus. I believe it is not feasible to program based on a shared memory model. Moreover, in order to continue their “free lunch” in multicore era (scalability with number of cores instead of GHz), developers may well have to jilt their current love. But there is plenty of optimism, and we are yet to discuss message-passing models :-)
 The Java Memory Model. Jeremy Manson, William Pugh and Sarita Adve. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2005. Long Beach, California. January, 2005.
 “Beautiful concurrency” Simon Peyton Jones. This is a chapter for the book "Beautiful code", edited by Greg Wilson, O'Reilly 2007.
 A. Williams, W. Thies, and M. D. Ernst. Static deadlock detection for Java libraries. In Proc. 2005 European Conference on Object-Oriented Programming (ECOOP) Lecture Notes in Computer Science. Springer-Verlag, July 2005.
 The Problem with Threads. Edward A. Lee. EECS Department University of California, Berkeley Technical Report No. UCB/EECS-2006-1
Disclaimer: The views expressed in the article are author’s personal views and do not represent those of University of Illinois, or the UPCRC at the University of Illinois.
About the author: The author is a graduate student in the Department of Computer Science at University of Illinois at Urbana-Champaign. He is a recipient of the Sohaib and Sarah Abbasi Fellowship. His current area of interest is programming languages and software engineering. Previously, he has worked in wireless sensor networks and multi-agent systems.
Share your thoughts by writing a comment below!