AWS Batch – Part 1 (Introduction)

You must opt for batch processing when aggregation of data is required or low cost of processing is a matter of feasibility. Use cases for such processing are Machine Learning model training, web scraping, generating recommendations from customer usage pattern, complex scientific inference etc.

About this series

There are 3 articles in this series on AWS Batch.

  1. AWS Batch – Part 1 (Introduction)
  2. AWS Batch – Part 2 (Design Patterns)
  3.  AWS Batch – Part 3 (A Real Life Example)

What is desired in Batch jobs

  1. Ability to process huge data generally with good throughput
  2. Ability to process data in parallel and processes often interact with each other
  3. Dependent and sequential jobs
  4. Ability to retry if there is a failure
  5. Low cost computation
  6.  Availability of special processors such as GPU
  7.  Monitoring and notification for failed or successful jobs

AWS Batch provides the eco-system to fulfill the above requirements.

What AWS Batch Provides

1. Ability to process huge data generally with good throughput

You may create as many number of Array Jobs as you need. You specify the compute resources, and AWS Batch manages provisioning and execution. For a customer, we have provisioned 2048 CPU for a job and that worked like a charm (instances got created, job was performed, instances got terminated).

2. Ability to process data in parallel and processes often interact with each other

 

You may submit multi-node parallel jobs. The single job spans multiple instances on different nodes. One of the node becomes “main node” and other become child nodes. One may use environment variables AWS_BATCH_JOB_MAIN_NODE_INDEX and AWS_BATCH_JOB_MAIN_NODE_PRIVATE_IPV4_ADDRESS to let a child node communicate with main node.

An AWS Batch multi-node parallel job is compatible with any framework that supports IP-based, inter-node communication, such as Apache MXNet, TensorFlow, Caffe2, or Message Passing Interface (MPI).

Please read about Multi Node Parallel Jobs.

3. Dependent and sequential jobs

 

You may specify an Array job with up to 10,000 steps and a job can be dependent upon up to 20 Batch jobs; overall 0.2 million steps in parallel for a job. An Array Job can also have N to N job dependency which means each step of Array job is dependent upon same step of another Array job when both Array jobs have same array size. Read more about Array jobs.

4. Ability to retry if there is a failure

You may specify the number of retries for each job. A failed job or job step is auto-tried. Please note that the following kind of job failures are not retried.

  •  Jobs that have been cancelled or terminated are not retried.
  • Jobs that fail due to an invalid job definition are not retried.
  • Jobs that have timeout are not retried.

Read more about automatic retries.

5. Low cost computation

 

AWS Batch uses Job Queue to place jobs for execution. Each Job Queue must have one or more compute environments. These compute environments can be on-demand, reserved or spot instances. You can even specify the order of provisioning of these compute environments. We recommend setting Spot instance as priority one for lowest cost.

6. Availability of special processors such as GPU

You may use GPU instance for your Machine Learning model training. Please read more here.

7. Monitoring and notification for failed or successful jobs

 

You may create Cloudwatch event rule for failed jobs. The rule can send a SNS notification or trigger an alarm. The SNS notification can trigger communication (emails, SMS) or action using Lambda invocation (create Service desk ticket, some automation).

Back to you

Do your batch processes meet customer requirement and are yet low cost and maintainable? What would you like to hear from us on Batches in subsequent articles? We would love to hear from you.

About VisionFirst Technologies Pvt. Ltd.

We are a group of researchers and practitioners of cutting edge technology. We are AWS Registered Partner. Our tech stack includes Machine Learning, offline/2G tolerant mobile apps, web applications, IOT and Analytics. Please contact us to know how we may help you.

Add a Comment

Your email address will not be published.

Related Blogs