Monday, September 14, 2015

Three ways to create a micro-service on Google Cloud - Part 1: App Engine

So I had this function, written in Java. And I wanted to turn it into a micro-service. The function is basically an AI compiler. It takes a set of business constraints and compiles them into a more tractable format. It takes one big string as input and returns one big string as output. The function is stateless and has no side-effects.  It is a perfect candidate for a micro-service.

I have a few reasons for wanting to do this:
  1. Scalability and Performance. This function is compute-intensive. And it's a performance bottle neck for the application. I would like to scale this function independently from the rest of the application. The stateless nature of this function makes it a good candidate for replication and load balancing. 
  2. Non-Java Clients. By turning the function into an HTTP service, non-java clients can easily access it.
  3. Versioning. It's easy to deal with multiple versions of a service. But dealing with multiple jar files is less easy. I can deploy an updated version of a service without having to redeploy apps that consume the service. This is not so easy with jar files.
As an experiment I decided to create three distinct deployments of this micro-service, using three distinct Google Cloud offerings:
  1. Google App Engine (GAE)
  2. Google Compute Engine
  3. Google Container Engine
The fact is, I could use any of these three technologies to create my micro-service. Each would work. But there are pros and cons to each. One of the three ended up working best for our needs. By sharing this experiment, I thought I could provide some useful insight for those facing similar choices.

Note: this is not meant to be a thorough tutorial of anyone of these technologies but rather to give you a feel for how the three offerings differ.

Part 1: Google App Engine
The simplest and easiest of the three is Google App Engine. GAE is the highest level of the three, requiring the least amount of system administration. You can sort of think of GAE as a Tomcat hosting service. Well not technically. We don't really know if they are using Tomcat. But they are using something similar (I have heard GAE is based on Jetty). All you really need to know is that App Engine is a Servlet 2.5 Web Container.

Here is what you need to do:
  1. Create a new project (I called ours cspService) in Google Cloud's Admin Console.
  2. Create a Java WebApp (based on the Servlet 2.5 spec).
  3. Add a minimal GAE config file (WEB-INF/appengine-web.xml).
  4. Deploy the exploded WebApp to GAE using Google's command line tool:

    appcfg.sh -A cspservice-1065 update target/cspService

    where:

      cspservice-1065 
    is my project id
      target/cspService is the path to my exploded WebApp on my local system.

    The command line tools come part of the Google App Engine SDK for Java.
Here is what you don't have to do:
  • Own a server computer
  • Rent a machine or a virtual machine (at least not explicitly)
  • Know what Linux is or how to use it
  • Know anything about VM's
  • Create a ticket with your web ops group
  • Most types of system admin work
Now that Google is hosting your WebApp you get:
  • Google scaling
    • Load balancing
    • Multiple replicas of your app to handle load
    • Google automatically scales the number of instances (replicas) based on usage patterns
  • Google reliability
  • Google system administration
  • Google monitoring
  • Multiple concurrent versions
To put things into perspective timewise: it took me about 20 minutes to wrap my Java function in a servlet and WebApp. After that, it took about 5 minutes to deploy the exploded WebApp to GAE. I now have a live running micro-service, ready to handle 1000s of concurrent users. All in under half-an-hour.

At first, everything was awesome. But after some further testing, we ran up against a few issues:

DeadlineExceededException
Some of our larger input files take a super long time to compile. Well it turns out that GAE, by default, limits the amount of time our servlet is allowed to spend handling any one request. By default, that limit is 60 seconds. This works for 90% of our use-cases. But that's not good enough. Fortunately, this limit can be circumvented easily on GAE. We have two options:
  1. Switch from GAE's default setting of automatic-scaling to basic-scaling.  This is done in WEB-INF/appengine-web.xml. This has other consequences that I won't go into, but suffice it to say, it works for our app.
  2. Our other option is to use a feature of GAE called Task Queues. These are meant for long running background jobs, so as to not slow down client requests. Well, in this micro-service, compiling is not a background job its the only job. So it doesn't seem appropriate to use this option. Another point to note is that Task Queues require the use of some Google-specific API's. So if WebApp portability is important to you, this may be a factor.
In our case, we decided that option #1 was the simplest way to go.

Servlet API 2.5
I had inadvertently taken advantage of a feature from Servlet API 3.0 in our servlet. Not every execution path uses this feature, so it didn't come up right away. But eventually we got a java.lang.NoSuchMethodError. The reason: GAE does not support Servlet 3.0 API. It supports 2.5.

The problem here is a classic one: my test environment was different than the production environment (stay tuned for part 3 of this post, which addresses this issue).

Anyway, this issue was easy to work around. We just modified the servlet to not use that Servlet API 3.0 specific method. It was not a deal breaker.

Memory Limits
Our constraint compiler uses a lot of memory. And the larger the input file, the more memory it uses. The maximum heap space allowed by GAE is 1024 MB. This works for 90% of our use-cases. But it's a deal breaker. Unlike the "long running" issue, there appears to be no simple work-around for the memory issue within GAE.

GAE Wrap Up
Before moving on to the other two Google Cloud offerings let me note that there are a few other restrictions beyond what I have already noted (for example, GAE only supports web apps written in Java, Python, PHP or Go) but for the most part, they are few. I believe that our high memory requirements are very unique. For many many applications, GAE is the easiest and smartest choice.

Google Compute Engine and Google Container Engine are much more flexible and powerful than GAE. But that comes with additional complexity and administration.

The other day, a client shared with me an internal debate they were having. They were trying to decide which of these three offerings to choose. This was a big shot managerial person who oversees dozens of projects. So this was not a project-specific decision she was trying to make. She was trying to come up with some sort of policy. She was leaning more toward Google Compute Engine and less toward GAE. 

I didn't think to say this at the time, but I think the correct advice is this: don't standardize on just one of these offerings. If you are overseeing a dozen projects, why not run some apps on GAE and others on Google Compute Engine or Google Container Engine. If your requirements fit within the limitations of GAE, than go with GAE. Otherwise go with Google Compute Engine or Google Container Engine. 

No comments: