Geocoding: Concepts, Techniques & Secrets from SGSI

What is Geocoding?

Before geocoding:

Name Address Zip
Bill Smith 123 Orca St 98000

After geocoding:

GrnLake_Geocoded

Geocoding is making data “mappable”.

The Technical Process of Geocoding:

  • Matching records in two databases: your database (without map coordinates but with some street, Zip Code or other geographic reference) and a reference map (*with* map coordinates).
  • Geocoding software (e.g., MapInfo Pro, MapMarker, or Envinsa) links records in the two databases by matching text fields: e.g., zip codes, street names and address numbers.
  • When matched to a reference map, your records are tagged with the map positions, typically lat-lon coordinates.
  • Typically, after geocoding, your data contains its own position information and can be mapped without the reference street map or address dictionary.
  • MapInfo Pro hides map coordinates of a geocoded table: i.e., it does not display the Lat-Lon values in the table by default. But the coordinates are there behind the scenes.

Before geocoding:

Name Address Zip XCoord YCoord
Bill Smith 123 Orca St 98000 Blank Blank

After geocoding:

Name Address Zip XCoord YCoord
Bill Smith 123 Orca St 98000 -122.345678 47.234567

Why geocoding precision matters

 

Precision options

 

You can map data at many levels of spatial precision depending on your data and your available maps or geocoding software:

  • Zip Code (+/- 1 mile);
  • Zip + 4 Code (+/- 1 block);
  • Street Address (approx. position on block);
  • Land Parcel (centered on real estate parcel);
  • Building (in building footprint)
  • Exact (hydrant, electric meter, mail box, building doorway, gate, etc.)

Geocoding precision affects analytic precision

 

Maps showing values by Zip Code paint a very coarse picture compared to a address-level point map.

   Geocoding by zip code           Geocoding by address

Zip_Geo   PointGeo

We may make assumptions about households based on the characteristics of the neighborhoods where they live. If geocoded by street address, we can associate household records with smaller, more homogenous areas such as block groups. This results in more accuracy.

As shown below, the demographics of block group areas can be dramatically different than the demographics of Zip Codes.

 Higher income zip Codes                  Higher income block groups

   Sea_Zip (1)                Sea_BG

Spatial precision: What’s “correct” depends on your application

 

What is the appropriate geocoding precision for projects such as the following?

  • Health: Mapping and analyzing national patterns of chicken pox?
  • Public safety: Planning volcano evacuation routes? Planning electric meter reader routes?
  • Transit: Determining number of residents within 200 meters of a proposed bus route.
  • Telecom: Selecting wireless phone tower sites? Selecting WiFi sites?
  • Door-to-door delivery: Creating efficient route maps for newspaper or UPS deliveries?
  • Retail: Choosing best location for new clinic, bank branch or retail store?

Under-appreciated reason for geocoding precision – Summarizing data by custom regions

 

  • Which district office should handle the new client? Geocode client addresses by street address and see which territory region they fall within.
  • Are capital improvement projects benefiting all legislative districts equally? Find out by geocoding the street addresses of the capital projects then overlaying the legislative districts.
  • Banks and insurance companies often report statistics by census tract. To determine the census tract for a loan or depositor, geocode the street addresses, overlay the tract map, then use Query > SQL Select to summarize by tract.

Geocoding: Part of modern corporate customer information systems

 

More and more Fortune 1000 corporations geocode all their customer addresses as a standard practice. By making map position part of their overall corporate information system design, all departments take advantage of spatial analysis techniques.

Comparing Geocoders

 

Before buying new geocoding software, understand the three technical “legs” of the geocoding “stool”: data, algorithms, and interface.

  • Data: To what street map or other map content does the software match your data to? I.e., how good is the underlying “address dictionary”? Is it updated quarterly as streets are built? Does it place the streets in the right place on the map? Does it have current street names? Does it have alternative street names? Can you edit/add your own data to the prepackaged out-of-the box data? Is data available nationally off-the-shelf? Internationally?
  • Algorithms & features: How effectively does it standardize and match your addresses to its internal address dictionary? Can it optionally fall-back to Zip + 4 or other less precise matches, if exact match is impossible? How fast is it? Is there a scoring system for ranking possible matches? A user-configurable emphasis on exact match of Zip Code, City name or House Number?
  • User interface: How easy is it to use? Must you translate your data to/from a proprietary format or can you geocode addresses “in place” in SQL Server or other DBMS? What is the interface for deciding among “close match” alternatives, when there is no exact match? Can you call geocoding functions programmatically from SQL Server, MapBasic, or even batch files?

Non-technical comparison factors

 

  • Price: Number of Transactions/year? Server or desktop use? # of users who geocode?
  • License rights: Do you need to distribute geocoded data outside your organization?
  • Maintenance required: How much work is it to update the address dictionaries or other data?

MapMarker: A stand-alone geocoder

 

  • MapMarker is a standalone geocoder software program, introduced by MapInfo Corp. in 1996. Version 14.2 was released in February 2009. It runs independent of MapInfo Pro and is optimized to do one thing: geocode quickly.
  • It is a combined software & data product: i.e., it comes with its own street address reference data files (aka “address dictionary”), and regular data updates.
  • It is available in three versions, each with different pricing and different levels of precision: MapMarker Zip + 4, MapMarker, and MapMarker Plus.

Advantages of MapMarker

 

  • Cheaper, except for single-county applications.
  • One-step geocoding, even if addresses are in many different counties across the US.
  • Cleans and freshens addresses, Zip Codes, & Zip + 4 Codes on the fly. MapMarker Plus produces USPS “CASS-certified” mailing lists. [CASS-certified means lower postage rates for bulk mailers.]
  • Includes Zip + 4 reference files and can automatically compute the Zip + 4 code and geocode to the Zip + 4 location as a fallback, if it cannot match to exact street address.
  • Very fast: Can process 1,000,000s of records/hour.
  • Appends census tract, block group, and block codes to the address record as part of its single-pass processing. No separate SQL processing required.
  • Friendly, straightforward user-interface. Easy to use.
  • Is engineered as a true client-server application and can be used as the geocoding engine for a web server, for example.
  • More sophisticated “rule based” matching algorithms are better than those in MapInfo Professional. Offers a better set of “close-match” candidates than MapInfo Pro.
  • Can geocode address lists “in place” when they are stored in a remote server database (Oracle, Informix, etc.)
  • Address and street data is not directly user-editable. However, power users can build & compile their own custom MapMarker-format databases and use them to supplement MapMarker’s standard databases.
  • Can process sets of address files, running in “batch” mode.

MapInfo Pro + MapMarker or Envinsa Server

 

This option became available in 2006, with the release of MapInfo Pro v8.0. It is essentially MapMarker Standalone, but now with the advantage of full integration within the MapInfo Pro menu system.

  • Geocoding is accessible via the MapInfo menu option “Table > Geocode using Server… “

        MI85_Geocoding_400

  • Requires MapMarker installed locally as a Server, or an online Envinsa account for “per Transaction” usage.

MapInfo Pro + Detailed Street map

 

The original way to geocode used MapInfo’s built-in geocoding engine along with a street map. MapInfo Corp pioneered this technique for PCs, releasing MapInfo (for DOS) in 1987. MapInfo for Windows, released in 1992, also offered street level geocoding using displayable and user-editable street maps.

  • Geocoding is accessible via the MapInfo menu option Table > Geocode … Just add street map data.
  • Requires a street map with street names & address ranges: e.g., “StreetPro”, “StreetInfo” or comparable.
  • Uses an “abbreviation file” (e.g., mapinfow.abb) to standardize some address components: e.g., “Street”, “Str”, and “St”.
  • Can use a City or Zip Code boundary map to distinguish addresses that occur in more than one place in a county.

Advantages of MapInfo Pro + Detailed Street map

 

  • Can be cheaper than MapMarker, if all addresses are in a single or few counties, especially if local government street data is available.
  • Street address information in street tables is completely editable. Missing a street? Just draw it and fill in the street name and address fields. You can match against this new street immediately..
  • Can easily use new street map information generated by local governments.
  • Suitable for use in Mexico & other countries for which MapMarker versions may not yet be available.
  • Can be used for reverse-geocoding (EAL version only): ie., look up nearest address based on lat-lon. This requires customization.

References:

 

MapInfo Professional User Guide, pp. 211ff.
MapMarker Plus User Guide.
MapMarker Plus Developer Guide.

SGSI