Thorsten Schaeff has been studying Computer Science and Media at the Media University in Stuttgart and the Institute of Art, Design and Technology in Dublin. For the past six months he’s been interning with the Maps for Work Team in London, researching best practice architectures for working with big geo data in the cloud.
Google’s Cloud Platform offers a great set of tools for creating easily scalable applications in the cloud. In this post, I’ll explore some of the special challenges of working with geospatial data in a cloud environment, and how Google’s Cloud can help. I’ve found that there aren’t many options to do this, especially when dealing with complicated operations like geofencing with multiple complex polygons. You can find the complete code for my approach on GitHub.
Geofencing is the procedure of identifying if a location lies within a certain fence, e.g. neighborhood boundaries, school attendance zones or even the outline of a shop in a mall. It’s particularly useful in mobile applications that need to apply this extra context to someone’s exact location. This process isn’t actually as straight forward as you’d hope and, depending on the complexity of your fences, can include some intense calculations and if your app gets a lot of use, you need to make sure this doesn’t impact performance.
In order to simplify this problem this blogpost outlines the process of creating a scalable but affordable geofencing API on Google’s App Engine.
And the best part? It’s completely free to start playing around.
If you want to dive right in you can download the finished geofencing API from my GitHub account.
The requirements are:
Storing complex fences (represented as polygons) and some metadata like a name and a description. For this I use Cloud Datastore, a scalable, fully managed, schemaless database for storing non-relational data. You can even use this to store and serve GeoJSON to your frontend.
Indexing these polygons for fast querying in a spatial index. I use an STR-Tree (part of JTS) which I store as a Java Object in memcache for fast access.
Serving results to multiple platforms through HTTP requests. To achieve this I use Google Cloud Endpoints, a set of tools and libraries that allow you to generate APIs and client libraries.
That’s all you need to get started – so let’s start cooking!
Creating the Project
To set up the project simply use Apache Maven and follow the instructions here. This creates the correct folder structure, sets up the routing in the web.xml file for use with Google’s API Explorer and creates a Java file with a sample endpoint.
Adding additional Libraries
I’m using the Java Topology Suite to find out which polygon a certain latitude-longitude-pair is in. JTS is an open source library that offers a nice set of geometric functions.
To include this library into your build path you simply add the JTS Maven dependency to the pom.xml file in your project’s root directory.
In addition I’m using the GSON library to handle JSON within the Java backend. You can basically use any JSON library you want to. If you want to use GSON import this dependency.
Writing your Endpoints
Adding Fences to Cloud Datastore
For the sake of convenience you’re only storing the fences’ vertices and some metadata. To send and receive data through the endpoints you need an object model which you need to create in a little Java Bean called MyFence.java.
Now you need to create an endpoint called add. This endpoint expects a string for the group name, a boolean indicating whether to rebuild the spatial index, and a JSON object representing the fence’s object model. From this App Engine creates a new fence and writes it to Cloud Datastore.
For some use cases it makes sense to fetch all the fences at once in the beginning, therefore you want to have an endpoint to list all fences from a certain group.
Cloud Datastore uses internal indexes to speed up queries. If you deploy the API directly to App Engine you’re probably going to get an error message, saying that the Datastore query you’re using needs an index. The App Engine Development server can auto-generate the indexes, therefore I’d recommend testing all your endpoints on the development server before pushing it to App Engine.
To speed up the geofencing queries you put the fences in an STR tree. The JTS library does most of the heavy lifting here, so you only need to fetch all your fences from the Datastore, create a polygon object for each one and add the polygon’s bounding box to the index.
You want to have an endpoint to test any latitude-longitude-pair against all your fences. This is the actual geofencing part. That’s so you will be able to know, if the point falls into any of your fences and if so, it should return the ids of the fences the point is in.
For this you first need to retrieve the index from memcache. Then query the index with the bounding box of the point which returns a list of polygons. Since the index only tests against the bounding boxes of the polygons, you need to iterate through the list and test if the point actually lies within the polygon.
The process of testing for a point can easily be adapted to test the fences against polylines and polygons. In the case of polylines you query the index with the polyline’s bounding box and then test if the polyline actually crosses the returned fences.
When testing for a polygon you want to get back all fences that are either completely or partly contained in the polygon. Therefore you test if the returned fences are within the polygon or are not disjoint. For some use cases you only want to return fences that are completely contained within the polygon. In that case you want to delete the not disjoint test in the if clause.
The beauty of this is, since it’s running on App Engine, Google’s platform as a service offering, it scales automatically and you only pay for what you use.
If you want to insure best performance and great scalability you should consider to switch from the free and shared memcache to a dedicated memcache for your application. This guarantees enough capacity for your spatial index and therefore ensures enough space even for a large amount of complex fences.
That’s it – that’s all you need to create a fast and scalable geofencing API. Preview: Processing Big Spatial Data in the Cloud with Dataflow In my next post I will show you how I geofenced more than 340 million NYC Taxi locations in the cloud using Google’s new Big Data tool called Cloud Dataflow.