Geolocation and geospatial search are hot topics and a lot of people start building web or mobile applications that use it. Companies like Qype are building up databases of points of interest (POIs), which include shops, restaurants etc. With the upcoming HTML5 standard additions, building such applications will be even easier. From this article you will learn:
- what are the options that you can use to perform geospatial search
- what is Sphinx and how does it fit into the picture
- how can you feed POIs data into Sphinx index
- how can you perform geospatial search with Sphinx and Ruby
What tools can I use to perform geospatial search?
- PostGIS is probably the most mature implementation. This addition to PostgreSQL database is, however, quite hard to install and configure. You also sometimes don’t want to use SQL database, or don’t need a database at all, when all you need is search index.
- MySQL is limited compared to PostGIS implementation. It’s getting better but not quite there yet in terms of performance and functionality.
- MongoDB can perform geospatial search. MongoDB authors didn’t yet implement spherical surfaces support, so currently, database when performing search treats Earth like it was flat. This obviously makes accuracy of geospatial search better near the equator and worse near poles.
- Local Lucene / Solr is very promissing project. Currently functionality of Local Lucene is being ported directly into Solr, but not much works yet.
- Sphinx search engine, which is lighter, easy to set up and use and quite speedy choice.
Sphinx
Sphinx is full text search engine, and if you already built some Rails app with full text search, you might have used it already. However, Sphinx also supports geospatial search, and we can obviously perform both types of search in the same query.
Installation is pretty simple. All you have to do is to grab binaries from Sphinx downloads site or install it with package manager of your operating system.
I won’t describe how to connect Sphinx to ActiveRecord-enabled Rails application, but will focus on using it with XML datasource and performing search with Riddle client library.
Getting data into Sphinx
Sphinx provides a few ways to get data into it’s search index. You can either point it to your SQL database (MySQL or PostgreSQL), or feed indexer directly with XML. We will use second approach.
Sphinx xmlpipe2 data format is pretty easy to understand:
File: pois.xml
<?xml version="1.0" encoding="utf-8"?>
<sphinx:docset>
<sphinx:schema>
<sphinx:field name="name"/>
<sphinx:attr name="lat" type="float"/>
<sphinx:attr name="lng" type="float"/>
</sphinx:schema>
<sphinx:document id="1">
<name><![CDATA[AmberBit HQ]]></name>
<lat>0.927042715037538</lat>
<lng>0.403538937710426</lng>
</sphinx:document>
<sphinx:document id="2">
<name><![CDATA[Google HQ]]></name>
<lat>0.656188367092825</lat>
<lng>-2.13395902872886</lng>
</sphinx:document>
<sphinx:document id="3">
<name><![CDATA[Hewlett-Packard HQ]]></name>
<lat>0.657782603191474</lat>
<lng>-2.13395902872886</lng>
</sphinx:document>
</sphinx:docset>
As you can see, we are doing two things here: first we define schema for documents and then print out documents in required format, according to the schema. The important thing is that we need to convert latitude and longitude coordinates to radians. To do so, we use simple formula: radians = (degrees * Pi) / 180.
We also need to provide sphinx configuration file:
# sphinx.conf
source dummy
{
type = xmlpipe2
xmlpipe_command = bash -c "cat pois.xml"
}
index pois
{
source = dummy
path = tmp/places
docinfo = extern
mlock = 0
charset_type = utf-8
html_strip = 0
}
indexer
{
mem_limit = 32M
}
searchd
{
listen = 127.0.0.1:5000
log = tmp/searchd.log
query_log = tmp/query.log
read_timeout = 1
client_timeout = 1
max_children = 60
pid_file = tmp/searchd.pid
max_matches = 10000000
seamless_rotate = 1
preopen_indexes = 0
unlink_old = 1
mva_updates_pool = 1M
max_packet_size = 8M
max_filters = 256
max_filter_values = 4096
}
In this file, we specify “dummy” data source, which will read POIs data from our pois.xml file we created in previous step.
Let’s run the indexer:
$ indexer -c sphinx.conf --rotate --all
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file 'sphinx.conf'...
indexing index 'pois'...
collected 3 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 3 docs, 38 bytes
total 0.002 sec, 17025 bytes/sec, 1344.08 docs/sec
total 2 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 7 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
..and start Sphinx search engine:
$ searchd -c sphinx.conf
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file 'sphinx.conf'...
listening on 127.0.0.1:5000
As you can see, we have indexed collection of 3 documents, these are locations of AmberBit Office, Google HQ and HP HQ. We’re in good company ;). Now, let’s try search.
Riddle is quite easy to use library itself, but in a real app you probably want to write a wrapper class to manage conditions and filters and generate final query. We will just use simpliest code to find IT companies near your location. Our script, will ask for user’s location with latitude and longitude and output a list of companies ordered by distance from that given location.
# We require riddle library:
require 'riddle'
require 'riddle/0.9.9'
# Define company names, because Sphinx will return us only IDs.:
companies = %w(AmberBit Google HP)
# Let's connect to Sphinx and use most robust search algorighms:
client = Riddle::Client.new "localhost", 5000
client.match_mode = :extended
client.sort_mode = :extended
# To return records in order of distande, we use this setting:
client.sort_by = "@geodist ASC"
# We need to ask user for his location and convert it from latitude
and longitude coordinates to radians:
puts "What's your latitude: "
lat = gets.to_f * Math::PI / 180.0
puts "What's your longitude: "
lng = gets.to_f * Math::PI / 180.0
# And we perform search and print companies in desired order to
console:
puts "Top IT companies near you, ordered by distance: "
client.set_anchor "lat", lat, "lng", lng
client.query("", "pois")[:matches].each do |record|
puts companies[record[:index].to_i - 1]
end
Where to go from here?
Riddle documentation is great place to look for help. You can find out how to perform complex searches and also how to re-order or filter returned records by their attributes. You can also specify radius option in metres, or retrieve distance from given POI. Also, check out this gist for sources of this example.
Post by Hubert Łępicki
Hubert is partner at AmberBit. Rails, Elixir and functional programming are his areas of expertise.