wtorek, 10 czerwca 2014

Writing and testing software for data integrity

Last week I hosted a presentation under auspices of JUG Łódź (which I happen to be a founding member of): "Writing and testing software for data integrity". Data integrity is a broad topic, so I touched only a few chosen aspects. Things that made it into the presentation include:

  • Physical vs logical integrity
  • ACID (strong) consistency
  • BASE (weak) consistency
  • Synchronous vs asynchronous replication
  • Distributed systems limitations - CAP theorem
  • Examples of data consistency violation (like Photo Privacy Violation or Double Money Withdrawal described in "Don't settle for eventual consistency" article)
  • Strong consistency comes with performance penalty. Choosing performance and availability over consistency might be justified and lead to improved revenues (as is the case with Amazon) or lead to spectacular failures like in case of Flexcoin

Local vs distributed transactions

The second part of the presentation was slightly different, though. It included a live demonstration of a situation where local transactions are not sufficient to guarantee data consistency across multiple resources and how distributed transactions come to the rescue. The demonstration was done basing on the scenario below:

The application consumes messages from a JMS queue (TEST.QUEUE), stores message content in a database, does some processing inside VeryUsefulBean and finally sends a message to another JMS queue (OUT.QUEUE).
The application was a web application deployed on JBoss EAP 6.2. JMS broker functionality was provided by ActiveMQ and MySQL acted as a database. Web application logic was built with Spring and Apache Camel.
So let's assume that the processing inside VeryUsefulBean fails:
The exception was injected using JBoss Byteman rule:

As expected with local transactions system state was inconsistent after processing failure. The expected state would be:

  • Incoming message not lost
  • No data saved in database
  • Nothing sent to outbound queue (OUT.QUEUE).

Basically one would expect that system state would not change due to processing failure. However the actual behaviour was:
  • Incoming message not lost
  • Data saved to DB (as many times as message was (re)delivered).
  • Nothing sent to outbound queue (OUT.QUEUE).
The reason for that behaviour was that local transaction that saved data into database was committed independently from JMS message consumption transaction, leading to inconsistent state.

Then the experiment was repeated with JTA transaction manager and XA resources set up. The outcome was correct this time - no data was saved to database. JMS message consumption and all processing including database inserts was handled as part of the same distributed transaction and all changes were rolled back upon failure as expected.

Automated integration test

The test proved that application worked correctly with JTA transaction manager and XA resources (XA connection factory for JMS, XA data source for JDBC), however the test was manual and time consuming. Ideally this behaviour would be tested automatically, and this was the topic of the final part of the presentation. We did a walk through an integration test that verified transactional behaviour automatically.

First test cases were defined as JBehave scenarios:
JUnit was used to execute the scenarios:

Spring context for the web application was splitted into two parts:
  • spring-jmsconsumer-core.xml - contained beans with application logic and definition of Camel context.
  • spring-jmsconsumer-infrastructure.xml - contained beans used to access external resources, like JMS connection factory or JDBC data source.
In order to execute the application logic in fully controlled environment the test had to be completely autonomous. It means that all external interfaces and infrastructure had to be recreated by the test harness:
  • ActiveMQ - replaced by embedded ActiveMQ.
  • Arjuna Transaction Manager provided by JBoss EAP - replaced by standalone Atomikos.
  • MySQL - replaced by embedded HSQLDB.
Both embedded ActiveMQ and HSQLDB support XA protocol, so they could be used to verify transactional behaviour.
While the core context could and had to be reused during test execution, the infrastructure context made sense only when the application was deployed on a real JEE server, as it retrieved necessary resources from JNDI - see below.

Therefore the infrastructure part of the context had to be rewritten for automated test execution:


Note beans with id amq-broker and hsqldbServer - they are responsible for starting embedded JMS broker and DB server needed during test execution.

Having the infrastructure in place, it is quite simple to write test steps defined in JBehave scenarios, e.g.:

There are obviously a few other details that need to be worked out, but since this post has already grown too long - have a look at the complete code on GitHub: https://github.com/mstrejczek/dataintegrity.

poniedziałek, 9 czerwca 2014

Review of Scalar - Scala Conference

This is a long long overdue review of the Scalar conference that took place on 5th of April 2014 in Warsaw. It's been over 2 months since the event so I've forgotten a lot - this post is based mostly on the notes I took during the event. Unfortunately I lost most of the notes when transferring them between devices...

About the event

Scalar was a one day conference dedicated to Scala and organized by SoftwareMill - company whose people are well known among Java/Scala community. It was a free event and the venue was National Library - a lovely location next to Pole Mokotowskie park. There was no breakfast served at all - not great, but acceptable given it was a free event. A warning on the website to have a proper breakfast in the morning would be good, though. It was difficult to stay focused during the first presentation when in starvation mode. The situation was saved to some extent by cookies that were available in large quantities, and I appreciated 100% juice being available instead of sugar-enriched nectars or cola. There was one more issue with logistics - the lobby with cookies and drinks was too small and therefore incredibly crowded during the first breaks. Later the situation improved as more and more people started to go outside to enjoy lovely weather.

Content

There was only one track, so I didn't have to make difficult choices. I missed a few sessions, though - due to personal reasons I could not attend all sessions or stay focused at all times. The first session was about "Database Access with Slick". Slick is a functional-relational mapping offering direct mapping from SQL to Scala code with little impedance mismatch. In SQL world extensions like PL/SQL are proprietary and use ugly imperative form. With Slick one can enrich queries using Java code (or was it Scala code?) There is free support for open source DBs. Commercial extensions are needed to use Slick with proprietary DBs like Oracle. I remember a complex type system and operators presented during the session - I didn't find that aspect of Slick too appealing. Also support for upsert operation is not available yet. Summing up - the presentation was really solid, but I was not impressed by the Slick library itself. One more note - one can try Slick out on Typesafe Activator.

The second session covered "ReactJS and Scala.js". ReactJS is a JavaScript library for building UIs that was developed by Facebook and is used by Instagram. Scala.js allowed to develop code in Scala and compile it into JavaScript. During the presentation there was an extensive walk through code of a chat application (Play on backend, ReactJS + Scala.js on frontend). What I noted was that Scala collections work nice in a web browser. The problem is that JavaScript implementation of Scala standard library is heavy (20 MB), but can be stripped down to include only necessary bits e.g. 200kB in size. Another issue was that conversions from/to JavaScript types were a pain - especially intense if rapid prototyping is what you're after. Another conclusion: ReactJS works great with immutable data. Regarding Scala-to-JavaScript compiler: normal compilation is fast, but the produced files are huge. Closure compilation is slow but produces small output files. So from now on I'm writing from memory only as my surviving notes covered only first two sessions:

The next session was "Simple, Fast & Agile REST with Spray" by Adam Warski from SoftwareMill. The presentation was one of the better ones: well-paced, informative and with really smooth live coding. Live coding, if done well, makes presentations more interesting in my opinion. Not only the presentation itself was good, but the subject too - you can really expose RESTful interface (including starting up HTTP server) with a few lines of code using Spray. Definitely recommended if you are looking for Scala-based REST framework.

After that came "Doing Crazy Algebra With Scala Types". There were some interesting allegories shown between Scala types and mathematic formulae. That was probably the first time I saw Taylor series since I left university...Mildly amusing - I found it a curiosity but could not identify practical uses.

The last session before lunch was "Scaling with Akka", which involved a demo of an Akka cluster running on a few Raspberry Pi nodes. I must admit I don't remember much from this session apart from the fact that Akka cluster worked indeed and Raspberry Pis were painfully slow.

The first session after lunch was devoted to Scala weaknesses and pecularities: "The Dark Side of Scala" by Tomasz Nurkiewicz. It was a good and fast paced presentation, and there were some interesting kinks shown that did not repeat those found in other well known presentations (e.g. the famous "We're Doing It All Wrong" by Paul Phillips).

The next session was a solid introduction to event sourcing: "Event Sourcing with Akka-Persistence" by Konrad Malawski.

I couldn't focus fully during the next three presentations, so I won't write about them. The last one that I remember was "Lambda implementation in Scala 2.11 and Java 8". It included comparing bytecode generated for Java 8 lambda expression with bytecode generated by Scala, plus an excellent explanation how invokedynamic works in Java 8. Java 8 uses invokedynamic to call the lambda expression code, while Scala generates additional anonymous class and invokes a virtual override method. The bytecode for Java 8 looks much more concise, although at the performance is not necessarily better as invokedynamic in Java 8 leads to generation of an anonymous class at runtime. So effectively an anonymous class is used anyway - with Scala it is generated at compile time, with Java 8 - at runtime. So currently the main benefit is smaller size of Java 8 jar files compared to Scala-generated ones. However if in Java 9 or Java 10 the anonymous method generation gets optimized away entirely then invokedynamic will clearly get significant runtime performance boost - without need to touch the source code! Scala is going to migrate to invokedynamic in the future.

There was one more session at the end but I missed it. Summing up - the conference was all right. The sessions were short, which made space for many different topics. The level of presentations was satisfactory on average - not all presentations were perfect and interesting, but I think for the first edition of a free conference - well done.