Coherence 3.5 Book Review

11 Oct


I’ve noticed Oracle Coherence is a technology that is frequently mentioned in job postings, and I’ve decided to get to know it better.  To be honest, I’ve been avoiding these distributed cache solutions for a while now.  I was involved in a project that used Terracotta and the experience left me rather cynical about the approach.  I have always loved my relational database and believed in achieving scale through a clustered RDBMS and a load balanced, stateless web application.

I’ve believed that the whole NoSQL movement can safely be ignored.  For some reasons the Object Oriented community have always had a problem with Relational Databases despite lacking a credible alternative.  Different approaches have come and they’ve gone.  The early efforts, such as Smalltalk’s image files and Python’s ‘pickling‘ always seemed to me to be like reinventing the wheel and ending up with a triangle.

Over the years the wheels have gained more facets to become almost bearable.  The funny thing is that when you move far enough away from the one-sided circle, you start to get something that begins to resemble a circle.  A ten sided decagon looks very like a circle and the thousand sided Chiliagon is indistinguishable from a circle to the human eye.

I tell you this so that you can understand that I am a reluctant reader of this book.  We developers are always having to learn new technologies to keep our skills fresh.  I love reading a good book, but I come to this book as a utilitarian task that must be completed.  I have to learn about this technology, and I look for a book to be written by a competent author who will share with me not just the flawed theory but more importantly their experiences of dealing with all the subsequent difficulties.

Pakt Publishing have a good record in this area, producing niche books written by people with an enthusiasm for their esoteric areas, which is why I chose the book “Oracle Cohesion 3.5” by Aleksandar Seovic, Mark Falco and Patrick Peralta.  Seovic wrote the majority of the book and Falco and Peralta both contributed individual chapters.  They did a great job, far exceeding what I had hoped for.  The book helped me to see that I have been wrong.

The first chapter sets off to an excllent start. It begins by looking at the problem that we want to solve: the challanges of achieving perforamnce, scalabiilty and availability. The problem is well framed with the underlying issues of latency, bandwidth, performance and state quickly introduced. There follows a breif survey of the database solutions of replication, clustering and sharding. It is assumed that the reader is an architect who is already familiar with these concepts, but that doesn’t stop the authors from providing an excellent overview. As they provide an objective review of the pros and cons of each solution it is clear that they have a solid grasp on the subject .

With the groundwork done Oracle Coherence is introduced, and here the objectivity disappears. The author’s bias for their choosen subject is clear, but this isn’t a problem. The whole reasons for wanting to read this book is to get the expert’s perspective and we should expect experts to be biased.  Thankfully the author does not attempt to hide their bias behind a dry tone. Instead they allow their enthusiasm to shine through with a conversational style and flowing text.

The following pages are pre-occupied with Coherence’s pros. This worried me, as I had once inherited a system where the architect had believed the ‘Snap In, Speed Up and Scale Out’ claims of the Terracotta marketing. They had used it to solve performance problems without addressing issues with the applicaiton architecture and database design.  It didn’t work. It the author attempted to claim that the problems of speed and scalability could be solve simply be introducing Coherence they would lose all credibility. I was pleased to see that they did not:

Coherece can take you a long way towards meeting your perforamcne, scalability and availability objectives, but that doesn’t mean that you can simply add it to the system as an afterthought and expect all your problems will be solved… Doing this requires carefult consideration of the problems and evalutation of the alternatives.

Page 30

The chapter then concludes by considering the importance of design, monitoring and team education.  Quite right.  The author had won me over and I was looking forward to what was to follow.

Moving to the second chapter involves a shift in gears: from discussing the high level architectural issues to the very low level activities of downloading and running Coherence locally. So many books fall down in this regard, providing instructions that simply don’t work and forcing the reader to solve difficult problems in order to keep up. This is the first hurdle where readers are lost.

First I have to download Coherence, then get it up and running and finally start writing some code. At each step anything can go wrong. Installation involves finding the distribution, signing up to Oracle’s developer network and unzipping the content. This all goes smoothly.  The links provided still work and signing up to the Oracle Developer Network was painless.  The book told me everything I needed to know.  It seems strange that the Author uses Jetbrain’s IDEA rather than Eclipse but this doesn’t cause me any problems.  The dependencies are simple and the ideas are easily adapted.

Some hands on tutorials follow and I’m impressed by Coherence’s simplicity.  It only requires one jar, with some optional extras.  I can create and populate caches either from the command line or through the simple API.  It’s all very simple, perhaps too simple.  The chapter concludes with a useful cache loader example and some sage advice for testing and debugging.  Comments here directly address my concerns regarding over simplification.

However, you should bear in mind that by doing so you are not testing your code in the same environment it will eventually run in. All kinds of things change internally when you move from a one-node cluster to two-node cluster.

Page 72

If I am to use Coherence within my own architecture I need to understand what lies beneath.  What concerns need to be addressed?  What strategies might I adopt?  The two chapters that follow, regarding cache planning and implementing a domain model, sets all of these things out with clarity.  After the gentle warm-up of preceding chapters the reader has to work hard to get though, but the effort is well worth it.   Some of the concepts were familiar to me from database clustering, such as the replicated and partitioned topologies.  New concepts, such as backing maps and the near cache, are also introduced in the third chapter.

In the fourth chapter a Domain Driven strategy is presented.  Familiarity with Eric Evan’s book is assumed here, and I would hate to have to work through this chapter without knowing it well.  The  concepts from Chapter 3 are given practical application through the Domain Driven patterns such as entities, aggregates and repositories.

The discussion around Entities is worth the price of the book alone.  Consider, for example, the following observation:

One of the most common mistakes that beginners make is to treat Coherence as an in-memory database and create caches that are too finely grained. For example, they might configure one cache for orders and a separate cache for line items.

While this makes perfect sense when using a relational database, it isnt the best approach when using Coherence. Aggregates represent units of consistency from a business perspective, and the easiest way to achieve atomicity and consistency when using Coherence is to limit the scope of mutating operations to a single cache entry.

Page 118

As an architect looking to use Coherence this is exactly the type of knowledge I am looking for.  Learning this the hard way could be so very expensive.  It also challenges my own perception of Coherence as a type of database.

The chapter continues to discuss deep issues such as identity management and data affinity.  The chapter concludes with a discussion of the implications of Object Serialisation and schema evolution.  It’s tough going, and it took my a long time to get through.  I found myself regularly having to go back and reread sections before I could begin to understand them.  This does not reflect badly on the authors, they have made this information as accessible as they possibly could without losing substance.

Making my way through these chapters was a rewarding experience.  I learned a lot, but I couldn’t hep the nagging doubt that all of this detail justified my belief in the Relational Database approach.  The Relational Model provides abstraction that allow a developer to avoid having to understand these things.

Chapter 5 reinforced this opinion.  Querying the data gird involves the definition of Value Extractors and Aggregators, which are clearly explained.  Practical strategies are introduced to lighten the load, such as the definition of Filter Builder that enables a query in the following format:

Filter filter = new FilterBuilder(ReflectionExtractor.class)
.equals("getCustomerId", 123)
.greater("getTotal", 1000.0)
.build();

That’s nice but isn’t it easier to just use SQL?  Isn’t this a lot of hard work to reinvent the database?  Compare the above with the equivalent SQL:

select * from Customer
where id = 123
and total > 1000

Isn’t this exactly the case I was talking about earlier, where the many faceted complexity begins to resemble the simple solution?

From fun.marinov.net

Had all my hard work been a waste, the type of accidental complexity I’ve been trying to avoid?  Perhaps not.  As I read on the use of Aggregations shows one possible benefit:

By using an aggregator, we limit the amount of data that needs to be moved across the wire to the aggregator instance itself, the partial results returned by each Coherence node the aggregator is evaluated on, and the final result. This reduces the network traffic significantly and ensures that we use the network as efficiently as possible. It also allows us to perform the aggregation in parallel, using full processing power of the Coherence cluster.

Page 184

Certainly, this is also possible with an RDBMS and some good design but the developer does not have direct control over it.  Anybody who has every spent there days poring over execution plans and statistics trying to introduce just the right indexes and hints required to persuade a reluctant  optimiser will know just how frustrating it can be.  The ability to directly define the parallel paths to take is powerful and desirable.

The next chapter, 6, builds on this by  introducing Parallel and In-Place Processing, a powerful technique that shows just why Coherence might be chosen over a RDBMS.  Three methods are provided: Entry Processors, the Invocation Service and CommonJ Work Manager specification.  These methods allows the processing to be distributed across the cluster as well as the data.  Not only does this avoid the need to move data across the network it also allows processing to be completed in parallel.  Chapter 7 discusses the processing of Data Grid Events and and expands further on the potential for an alternative architecture based on processing map events.  Listeners can be registered to respond after a change has occurred.  Triggers can execute before the event, with the option of transforming or rejecting the update.

It was reading these chapters that I finally started to understand the author’s enthusiasm for Coherence.  The passion was clear in the previous chapters, but unfathomable.  I had considered it a piece of middleware, something to be introduced later in the development cycle to improve performance and scalability.  The potential for concurrency changes all of this.  Coherence becomes a platform to be targeted, an alternative architecture.  To be honest, I can already think of a project or two that I could have used it on.  I am now regretting my past reluctance to consider new ideas.

When I compare SQL with the Coherence filter I fail to take into consideration all of the JDBC code or Hibernate configuration that is needed to join the Object and Relational worlds, and the cultural separation that has grown between the developers and the database administrators.  Working through the examples the hard work in the preceding chapters pays off  as I get to build upon the foundations laid.  True, the result is almost relational, but the integration between code and data is so much more elegant.  It provides the architect and developers with more control and more opportunities to discover a clean and effective solution.  It shows a path towards a simpler solution that combines both the data and the code.

The relational database returns in chapters 8, with a discussion of how the persistence layer might be implemented.  The patterns of cache-aside, read-through, write-through and write-behind are all familiar.  The relevant implementation details and described and practical matters considered.  The relevant low level details of Coherence are introduced.   The RDBMS isn’t the only possible persistent store.  The backing map could also cover services or a legacy application.

Chapters 9 to 11 introduces details for implementing the transport layer.  There are two proprietary application layers available: The Tangasol Cluster Message Protocol (TCMP) and Coherence*Extend.  TCMP is UDP based and used internally by Coherence.   It is intended for clusters that sit together on a single LAN.  Coherence*Extend is intended for access across wide area networks, and can be used to access a Coherence cluster from .Net or C++.

Chapter 12 concludes the book with sage advice regarding selecting the right tool to achieve performance, scalability and high availability.  This chapter puts Coherence into it’s context within the software architects toolset.  The last page quotes Charles Connell on the topic of beautiful software.

Beautiful programs work better, cost less, match user needs, have fewer bugs, run faster, are easier to fix, and have a longer life span.

Beautiful software is achieved by creating a wonderful whole which is more than the sum of its parts. Beautiful software is the right solution, both internally and externally, to the problem presented to its designers.

http://www.chc-3.com/pub/beautifulsoftware.htm

The author concludes that Coherence is beautiful software, and he has made a strong case.  I had began the book with a utilitarian purpose but I have finished with an aesthetic appreciation.  I had hoped that the book might help move my career forward a few steps and in instead it has set me upon a whole new path.

Advertisements

2 Responses to “Coherence 3.5 Book Review”

Trackbacks/Pingbacks

  1. My Book Reviews on the LJC Book Club « Principles of Software Flow - February 16, 2012

    […] Coherence 3.5 by Aleksandar Seovic, Mark Falco and Patrick Peralta […]

  2. Here is the Cloud – December’s Packt Publishing Competition « LJC Book Club - December 3, 2012

    […] job of explaining concepts such as backing maps and the near cache.  This is a great book.  I reviewed it a while back and this was my […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: