Batch to the future–Part 2


The Batch is dead, long live to batch jobs

In our previous blog entry, we raised the point that the Cloud is the unique opportunity to promote smart architectures with enough details so that we can address larger issues beyond promoting established Cloud Native and DevOps guidelines for deploying new applications and we close with the question about how to enable cloud capabilities to help revive and enhance legacy applications?

Today, legacy back office applications or system of record are predominant although the internet changed the access mode to applications and user experience. The internet created new system of engagements elements (such as social media, powerful phone application using gps as a door to Virtual reality …), but fundamentally we still require the same system of record applications to manage stocks, managing our check accounts, booking flights or calculate your tax bill.

Those system of record applications have existed since long ago to support the associated business logic. Therefore any benefits, be it cost savings and/or business agility through IT, would be a game breaker if applied to existing legacy portfolio.

Indeed application development was not born with Google and Facebook but decades before. Even today most of worldwide running code does not fit natively in the internet picture. For instance, it is estimated that 75% of total active and running code worldwide is written in COBOL (in number of lines of code). Furthermore we may add others “Mainframe born” languages (PL1, Natural etc…) to the counting and we may consider that 85+% of worldwide code is mainframe oriented and could benefit from going to Cloud.

While there are effective solutions to modernize those applications to modern programming language and up to date target architectures, one typical fear of decision makers and their technical advisors is the that it is not possible to satisfy some critical application capabilities with Cloud technologies. Still many CXO people believe mainframes have unique capabilities such as performance and availability which can’t be replaced by distributed computing.

For instance, the very main belief is that it is not possible to execute massive batch programs outside of mainframes and COBOL like languages. And batch can’t always be replaced by transactions, for instance accounting, fraud detection, big data analytics all requires to run massive computation in cycle.

It is not so much the transformation capability (from legacy architecture to new architecture) which is challenged but rather the challenge that a given modern architecture may do the initial job, at least as efficiently as the original.

In fact Cloud Architecture does have all it takes … not only to enable batch computing but as well to leverage additional benefits such as better cost for performance and maintainability agility.

So our point here is to come up with a list of topics and questions to be answered and highlight how Cloud technology solves them to. Then we will consider a technical and financial model to the advantage of application owners.

So the big question is: What are the main capabilities a Cloud batch architecture must provide to run any existing batch application?

Let us start by the batch features that we want to sustain:

1. A batch is operating as a stand-alone processes. As such, a batch must run in silent mode and must be considered as an application on his own, executed in a lightweight container with the sole input and output through permanent data sort, either flat file or any kind of database.

2. Batch is operating on very large data set, which typically does not fit in memory, and some logic requires to make equivalent of algorithmic join (not with a sql statement!) over collections with 10,000,000 members and more in a very short time frame. Such datasets can’t be expected to be of a single type (file, relational database …) nor single encoding (data may be EBCDIC, UTF 8, compressed …). We need to preserve the capability of legacy batches to merge and/or sort very large data set are prior to performing the computations, using optimized algorithm and optimized hardware.

3. Performance must preserve or even reduce the global execution time (real time) of batches: Batches typically must be ended in a dedicated and rather small time window because transactions are usually forbidden while batch executes in order to avoid data lifecycle conflicts. Batches requires quick execution both in real time and cpu time. In the legacy world, they benefits from mainframe extreme vertical scalability due to both their CPU power and optimization of embedded utilities for handling data operations (sort, merge, split, sequential indexed read …).

4. Batches requires to preserve data access mechanism that are no longer promoted since the rise of SQL and relational database, such as Sequential access file or indexed sequential indexed files. Those access mode must be compliant with loops accessing records in sequence while preserving performance (total real time), batch logic requires this. Accessing data in Radom Access Mode in loops is a performance breaker though, thus different solutions are required.

5. Batch process may be halted, paused and restarted by receiving signal from the outside. A paused batch must be able to save all required data states so that it may be later restarted from the last save point. Then batch restart like it had never been stopped, with no loss of data and errors.

6. It is also a common practice in the legacy word to run the same algorithm both as a batch or a transaction. This can be a performance killer in modern architecture. In fact, Target architecture and application design must be based on middleware and design patterns to avoid duplicating services due to their execution mode.

Let’s now have a look at the cloud technologies features that we want to leverage. A batch, by definition, is a job that is executed in defined time slot and can highly benefit from the cloud by consume an high volume of resource in allocated time windows and then release those resources allocated when the batch complete.

A. Obviously, all customers are looking at the ability for cloud to scale up both horizontally and vertically. Two scaling strategy may be used in Cloud Architecture, Vertical scaling (increase of the cpu power of the machine and use of appropriate programming technics to fully use that power) and Horizontal scaling (distributing the load to different computing unit and using appropriate programing technics so that computation is correct wen distributed).

Vertical scaling is easily addressed by the choice of machine sizing for the price. But the increase in CPU power comes in fact by the number of core and performance boost requires to use multithreading programming.

Horizontal is addressed by load distribution. For user online transaction this is easily performed through user session affinity on web clusters. Basically each user is assigned to a web server which allocate a thread for that session. However this approach does not apply to a single batch because there is only one single process, so it can’t be distributed natively on different threads. However this can be achieved by splitting the data into block (data chunking), to stream continuously chunks to stateless services. Due to the original limit of 4K for data maps in mainframe almost all (if not all) COBOL batches came to chunk data in small block which can be processed by stand-alone stateless processes. One way to stream data to a grid is to use Message Queuing which ensure data transport at the best performance. Computation nodes in charge of processing may then be locating anywhere in the network with no latency cost. It is therefore suggested to pre deploy those machines so that they receive data and react to it. When no data is pushed to them they do not consume CPU power.

B. While most articles and papers explains how to scale up, often they do not consider scaling down. One key budget savings depends upon this feature though, to pay for resources you really use. Indeed the Cloud business model is based on what is used rather than what is deployed. The elements described in previous point allows for scaling down. Indeed every member of the Message Queuing Network is designed to execute stateless services. Removing a node from the queues will not affect the batch job but rather make the data be computed by a different system without error nor delay, thanks to message routing features.

C. Configuration of batches should benefit from “modern” programming techniques. Therefore all elements required for operations should be performed through APIs, declarative and simple configuration files and database. : No operations features should be located into compiled code. Thus the following conditions must be met :

  • Batch pause and restart batch, must be allowed from APIs which can be called from the outside.
  • Execution context must conform to a schema, data must be CRUD available. Configuration must allow in memory or persistent data storage (refer to http://docs.spring.io/spring-batch/reference/html/metaDataSchema.html for instance)
  • Multithreading must be a configuration which does not require changes to source code but based only on external configuration.
  • For efficient I/O, batch frameworks must enable partitioning and data chunking.

D. As stated previously, scaling is important for batch performance. So, by distributing the logic across multiple processing units, we can shrink significantly the total elapsed time. Implementing batches stateless, multithreaded services receiving data from a data chunking service will promote distribution. This perfectly fits with the micro service approach. Typically a micro service in charge of master read and data chunking. Then another micro service is in charge of computation without performing I/O with finally the resulting data being eventually saved by a micro service dedicated to writes. If data transfer is enable with Message Queuing for instance then data continuously flows in and is consumed as soon as a thread or micro service is available.

E. While legacy batch relies on Sort/Merge optimized utilities big data technology brings something even more powerful on the table and Sort and Merge algorithm may be replicated easily with map/reduce. Technology such as Hadoop are not memory dependent and can scale to any data set volume. Such technology executes natively on cluster to distribute the load and achieve performance.

F. We want to build application that take advantage of various cloud landing and billing zone (Private, Virtual Private or Public). While we will develop more in details in subsequent blog article, the proposed architecture with Message Queuing sending data from a master read process toward Cloned services which are natively parallelized and deployed over a distributed architecture it is then possible to deploy on very small computation units and use them only when they receive data. For that reason two interesting cloud landing zone may be selected:

The first option is to select a cloud provider where he charges only based on the CPU % and other consumed resources at execution time. There you can deploy all computational units and pay only for effective consumed resources: let’s review pros and cons of this approaches.

Pros:

  • You may deploy in advanced all required resources because batch logic is deployed on very small units that have a small cost. This is due to their very small configuration and to the fact that data is sent by Message Queuing, therefore nodes are not consuming resources when data is not sent to them.
  • Performance peak do not need to deploy new services at run time when the load increase significantly in a very short time but instead the peak is covered by idle computing units
  • Configuration management of deployed services is easier as relocating or deploying new instances of services is rare.

Cons:

  • You need to be able to be charged only on consumed percentage of cpu with a very low cost when small units do have less than 5-10% of activity.
  • You need to forecast your peak scenario

The second options is to quickly deploy micro services and scale up and down upon demand: a load monitoring process deploys additional resources when CPU usage reaches trigger (80% of CPU for instance). In this approach pros and cons are:

Pros:

  • Use of stateless multithreaded services coupled through message queuing allows transparent scaling, both up and down without loss and errors
  • This allows to be charged only for deployed units and to scale down as soon as need be to avoid paying for useless deployed services

Cons:

  • Configuration management must be automated because the hosting servers will be allocated and deallocated automatically and any service may be deployed or removed on those hosting servers. So in case of failure or when the need for scaling down rise it requires a strong disciple and management capabilities to understand how an application has been deployed across the grid of computational nodes.
  • Cost savings is tied to the capability to scale down in order to remove application instances. While scaling down can easily be performed thanks to Message Queuing, the selection approach for the application instance to scale down is a different matter. There is a need for a central process, capable to access configuration management, identify servers that hosts services which can be switched off without exceeding again the performance threshold that would request a new deployment. Without a smart algorithm capable to understand how workloads are distributed, we may end up with an application deployment and un-deployed continuously across the day. Of course, a system taking care only of scaling up would kill cost savings.

Target Architecture design:

Based on previous section here is a set of open sources technology components to satisfy constraints and objectives. The table below summarizes the mapping between the requirements and the technologies.

  • Spring boot
    • Benefit: creates standalone batch applications ready to deploy on any Java platform.
  • Spring batch:
    • Benefit: Spring Batch provides reusable functions that are essential in processing large volumes of records.
  • Spring:
    • Benefit: Spring executes everywhere you can deploy a JVM.
  • RabbitMQ
    • Benefits: Acts as a generic middleware for cloud and host interconnectivity.
  • PostgreSql / Greenplum (+hibernate)
    • Benefits:
      • Optimized query optimizer for large data sets and native query parallel processing ,
      • Enable data distribution at configuration level for better cluster performance.
  • B+ index trees:
    • Benefit: Enable to mimic Indexed Sequential I/O
  • Hadoop
    • Benefit: Enable distributed parallel sort for files which does not fit in memory~

 

Spring Boot

Spring Batch

Spring Stateless services

RabbitMQ

Hibernate + Postgres /Greenplum

B+ index trees

Hadoop

 

Status

 

Batch features to sustain

1

X

Ok

2

X

X

X

X

Ok

3

X

X

X

X

Ok

4

X

X

X

X

Ok

5

X

Ok

6

X

X

X

Ok

Cloud features to leverage

A

X

X

X

Ok

B

X

X

X

Ok

C

X

X

Ok

D

X

X

X

X

Ok

E

X

Ok

F

X

X

X

Ok

We have now a clear view of the requirements and components that will help us not only to maintain our batch capabilities in a new world, but also to take advantage of the cloud principles to optimize the porting of existing legacy mainframe batch application to the cloud. In our next article, we will cover, through some real life examples, the definition and deployment of an architecture that supported the transformation and deployment of legacy batch COBOL applications to the latest modern platform.

 

About the authors:

Alexis Henry, Chief Technology & Innovation Officer, Netfective technology

Alexis is the Global Lead for Innovation, Research and Development of Blu Age product suites.

His primary responsibilities is to design and lead implementation of disruptive technologies in the field Application Modernization, Cloud and Big Data. Alexis has over 20 years of experience within the IT industry, which helped him build a broad knowledge of the software and service industry. He has occupied various leadership positions, both in Europe and North America, to lead transformation projects or engineering teams for software vendors. Furthermore,  Alexis is involved in R&D project founded by the European Commission (Horizon 2020 [DICE project], FP7 [REMICS project]).

Luc Vogeleer, Global Chief Technologist – Application Transformation, Hewlett Packard Enterprise

Luc is the Global Chief Technologist for Applications Transformation in HPE Enterprise Services. His primary responsibilities include research, development, and deployment of applications modernization and transformation strategies, technologies, methods, and tools focused on HP cloud offerings.  Luc has over 34 years of experience within the IT industry. He joined Hewlett-Packard in 2000, where he has occupied various leadership positions in the service organization both at the European and worldwide levels.

2 thoughts on “Batch to the future–Part 2

Add yours

  1. In LEAN the ambition is BATCH size = 1, so that should be the future of batch 🙂
    Any batch greater than one results in waste.

    Like

Tell me what is going through your mind.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: