I attended a talk tonight on Java concurrency presented by Stuart Halloway at the Northern Virginia JUG that provided a refresher on the java.util.concurrent package. Stuart is one of the founders of Relevance, author of Component Development for the Java Platform, a frequent speaker at technical symposiums, co-author of Rails for Java Developers, and a great technical speaker.

Stuart Halloway
Stuart Halloway

Stuart spent the first few minutes telling us why he now focuses more on Ruby and Rails than on Java. Paraphrasing the title of a Java book written by his Relevance partner Justin Gehtland and Bruce Tate, Stuart says, “We can go even ‘betterer,’ ‘fasterer’ and ‘lighterer’ with some other technologies” like Rails and the Streamlined framework. However, when multithreading and concurrency are needed, Java way outshines the current state of Ruby and Rails, he said.

When considering multi-threading in order to increase the speed of a process, it is important to consider whether the slowness is due to the application being suspended while waiting for an external resource (e.g. a database, user input, disk), or whether the process is suspended while waiting for free CPU cycles. If the process is waiting for an external resource, Stuart said, the language and the number of CPUs won’t matter much. “Java, assembly language and PHP all wait at the same speed,” he said.

Stuart’s talk covered:

  • Threads
  • Tasks and scheduling
  • Locking
  • Concurrent collections
  • Alternatives to threads

The key point to remember about Java threads is they share code, data, resources, and heap storage. They contain their own instruction pointer and stack. Threading isn’t often needed in server-side programming because components like EJBs and JEE container services abstract the multi-threading away from the developer. But threading is often needed when you need to:

  • Keep a user interface responsive (think Swing)
  • Take advantage of multiple processors in compute-heavy applications
  • Simplify code that would otherwise need to keep checking if other tasks need to be performed (implementing their own task-scheduling loop)

Before Java 1.5, the Java language used Thread objects as the main way to achieve concurrency. Developers would write a class that implements Runnable and pass an instance to a Thread. Two of the shortcomings of the Runnable interface is its single method, run, doesn’t return anything and it isn’t declared to throw an exception to indicate anything went wrong. “It’s completely wrong,” Stuart said.

Java 1.5 introduced higher-level classes to allow more abstraction away from Thread objects. It introduced the Callable interface, whose call method does return something and is declared to throw an Exception. Programmers write Callable classes and pass instances to one of the three ExecutorService classes obtained by the Executors Java factory, or perhaps from an external library. The ExecutorServices provided by the Executors factory provide single-theading execution, and execution by two types of thread pools, a cached, expandable thread pool or a fixed-size thread pool.

When you give a Callable to an ExecutorService, you get back a Future object containing the results of the Callable’s execution. The result can be an object or an exception that will be thrown. Stuart demonstrated code that exercised the new threading objects and shows how to use them. The code and the slides from his presentation is available online.

The Need for Locking

You don’t need locks if you’re just telling separate tasks to run concurrently. You need locking code when multiple threads access the same data at the same time. Java provides lock support with the:

  • synchronized keyword and blocks
  • Java 1.5 Lock interface objects, which offer an improvement over a straight synchronized block because you can tell the code to give up its attempt to acquire a lock after a timeout period expires.
  • ReadWriteLock interface, which offers separate locks for whether the process needs to read data or alter the data. </ul> If you want, you can tweak how the ReadWriteLock operates, such as defining whether readers or writers get lock priority.

    Concurrent Collection Options

    Strategies and the implications of using concurrent collections: strategy and implications:

    • Do nothing

      It’s fast, simple, but not thread-safe

    • Fail-fast iterators (introduced in Java 1.2)

      Fast, not thread-safe. Misuse of concurrent access probably will cause a fast failure. Fail fast uses optimistic locking: It assume everyone can access a shared resource and uses clean-up code if something goes wrong with multi-threaded writes. Java collections implement the fail-fast strategy by using version numbers that iterators use to see whether the collection has changed.

    • Lock the entire collection Simple, slow, might be thread-safe (like Hashtable)
    • Lock partial collection Complex, maybe faster, maybe thread-safe.
    • Copy on write Fast read access, may read stale data. When you write to a collection, you get new copy, so your write can proceed. Iterators for reading threads point to older collection, so data can be stale.
    • Immutable Fast, simple, thread-safe, cannot change objects.
    • Application-controlled locking Difficult, allows any combination of the above strategies.

    Java Collections Design Choices

    Collections and strategies

    • Legacy (pre-Java 1.2): Lock entire collection
    • Collections (1.2) API: Lock none, fail-fast iterators
    • Synchronized wrappers (1.2): Lock entire collection
    • ConcurrentHashMap: Lock partial collection Uses “lock-striping” to allow uses of different buckets in a hash.
    • CopyOnWriteArrayList: Copy-on-write

      Very expensive if using big arrays that are written to regularly. Every write to the collection copies it again. Only advantageous if data is read-mostly.

    • String: immutable

    Alternatives to Threads

    Alternatives, pros and cons:

    • Container-managed threads: Simple. inflexible

      Like J2EE containers. You write applications as if you are the only user of the object. Scales well because most data in server side is in the database. The DB controls concurrency.

    • Non-blocking I/O: Do work when available. Con is it as complex as using threads

      For example, the java.nio (1.4) package. Pro: Do multiple operations and notify me when done. Con: As complicated as threads. Oriented around blocking waits. Tends to get ignored when you’re coding on the server-side.

    • Use multiple processes: Pro: simple. Con: inflexible

      When you need to perform more work, start more processes.

    • Event-driven code: Con: as complex as threads
    • Do nothing: Pro: simple. Con: slow (but performance might not matter for the application)

      “Probably more time has been wasted by optimizing code that doesn’t need to be optimized.”

    Stuart also discussed the double-checked locking Java anti-pattern and why it is a problem. Heck, the perils surrounding the use of double-checked locking in Java have been known since what, 1997, when I think Java Developer’s Journal published an article on it. But I’ve seen wickedly smart developers insert this potentially evil anti-pattern into their code out of ignorance of the subtle problem. I’m glad Stuart mentioned it as a reminder.

    For Java developers interested in learning more about programming using concurrency, Stuart recommended Java Concurrency in Practice by Brian Goetz. The book mixes academic rigor on threading with practical implications for Java developers, he said.

    Stuart also will be in town Wednesday night to speak at the Northern Virginia Ruby User’s Group. He’ll be talking about the Streamlined framework for rapidly developing CRUD applications in Rails.