MapInfo Pro is fast. It uses spatial indexing as well as tabular indexing to speed up tabular (non-map) and map queries. It also has a truly superior ability to align coincident maps on screen on-the-fly, even if the map layers have dramatically different map projections and coordinates.
On the other hand, no one’s ever satisfied that any software is “fast enough”, particularly when using tables of 100,000’s or 1,000,000’s of records. For some users, e.g., wireless telecommunications users, the total number of records is not that large, but coverage region or other spatial objects are extremely complex, having 100,000’s of nodes.
MapInfo Pro’s built-in database engine can manage record sets of many millions of records, exceeding the capacity of Access by far. Its map rendering and map object analysis operations allow us to work with regions composed of 1,000’s of individual polygons, including deeply nested and overlapping polygons. A single region object may be comprised of more than 1,000,000 nodes.
This discussion will outline some ideas for how to improve our performance, especially when using large data sets. It is a discussion based on user experience and has not been reviewed or approved by Precisely Software (maker of MapInfo software).
Faster hardware may be the cheapest solution
Map and non-map queries are both CPU and hard disk intensive. This is your chance to justify a faster PC.
- Faster CPU speed is good.
- Faster CPU and fast hard disk is better. A 10,000 or 15,000 RPM disk will be noticeably faster than a 5400 RPM. A caching controller is best but rare on a desktop system.
- Laptops have notoriously slow disk access times. Ie., the rotation speed may be good, but the data access may still be slow. Use a desktop if possible.
- Is your network or file server slow? Copy the data to your local disk.
Work in MapInfo TAB/DAT format.
It’s convenient to be able to open and map tables opened in native XLS, Text, Shapefile, Access, RDBMS or other formats. But processing is significantly faster in MapInfo’s native DAT format.
Use File > Save As … to convert to TAB/DAT format. Remember to open the newly saved file (as it does not re-open automatically). XLS and plain Text format data files are exceptionally slow.
Many deleted records in your table? Pack the table before executing your queries
Deleted records are still part of your table, although grey-ed out. Your queries will process faster without them. Packing the table removes these records permanently.
Warning: Beware of Packing if you have Workspaces referencing this table *and* the Workspaces include custom text labels (labels you have manually moved, changed fonts, deleted or otherwise adjusted).
Non-map (tabular) Queries
Use Table > Maintenance > Table Structure … to set tabular indexes for columns you’ll query against.
- If your table is in Access or DBF format, it is not enough to create Access or DBF indices. MapInfo won’t use them.
- Indexing helps most when columns hold highly-varied data values: e.g., a unique ID column vs. a “Customer/Non-customer” column.
Note: SGSI has found cases where the effect of indexing columns is minimal.
Optimize your query syntax: put the indexed column reference last in your ‘where’ clause
For example, if column ‘Sales’ in table ‘Customer_Points’ is Indexed:
Syntax option #1: (Slower) Select * from Customer_Points, Cov_Regions where Sales > 10000 and Customer_Points.Obj within Cov_Regions.Obj into qSales10000_Within1
Syntax option #1: (Faster) Select * from Customer_Points, Cov_Regions where Customer_Points.Obj within Cov_Regions.Obj and and Customer_Points.Sales > 10000 into qSales10000_Within2
Map (spatial) Queries
Use ‘within’ or ‘contains’ instead of ‘partly within’, ‘entirely within’, or ‘intersects’
The ‘within’ and ‘contains’ operators are somewhat less precise than ‘intersect’. etc. but much faster. This is because they consider a region ‘within’ if its centroid is within. Do you really need the extra accuracy of ‘partly within’ or ‘entirely within’. On average, the simple ‘within’ syntax produce very very similar results. Run comparative tests with your data to see if the speed benefits are worth the small reduction in accuracy.
If regions have very complex shapes, remove unneeded detail
Telecom coverage maps commonly have hundreds of thousands of nodes and 10,000+ polygons in a SINGLE region. Each extra node that defines the region edge adds to the processing drain. Do you need them all? You can use MapInfo’s Object processing menu to, for example, remove coincident nodes or generalize the region edges.
Warning: ‘Thinning’ can create data problems if you have adjacent regions that are generalized differently. Thinning can also change your query results, because it changes the literal shape of the edge. Hence a point on the very edge may be *outside* a thinned region but *inside* the un-thinned region or vice versa.
Split complex regions, even if you don’t simplify them on the edges
Before running your queries, manually “split” complex region shapes by creating a “cookie cutter” grid or other object (cf the “Grid Maker” MBX in Tools > Tool Manager) then use “Object > Split. You can automate this via a simple MBX that tests the number of nodes in a region, then splits it if and only if the number of nodes exceeds a threshold you set. SGSI customers (only) should contact us for details and sample code.
Splitting does not change the shape of the region overall, so your point-in-polygon operations will return the same results. However split regions process much much faster in practice. This is probably because the point-in-polygon code will operate against much simpler (ie, split out pieces) regions.
Use the ‘region overlay’ feature of Vertical Mapper 3 instead of a MapInfo Pro spatial query.
This feature can produce results up to 10x faster than MapInfo’s built-in vector overlay process. It is particularly valuable when computing ‘area proportion’ values based on the overlap of two complex region map layers, e.g, if computing the population falling within a complex telecom coverage region.
Vertical Mapper currently lists for $1195/user and has many other features including 3D visualization, line-of-sight/intervisibility, DEM and other grid analysis, contouring.
Optimize your query syntax: e.g., alter the order of table references
For example: All these queries (below) create the same results, but query syntax #3 is 50% slower. Your results may be different, so experiment.
- Syntax option #1: (Fast) Select * from Cov_Regions, Customer_Points where Cov_Regions.Obj Contains Customer_Points.Obj into qRegionPoint_Contains
- Syntax option #2: (As fast) Select * from Cov_Regions, Customer_Points where Customer_Points.Obj Within Cov_Regions.Obj into qRegionPoint_Within
- Syntax option #3: (Much slower) Select * from Customer_Points, Cov_Regions where Customer_Points.Obj within Cov_Regions.Obj into qPointRegion_Within
Process records in sub-sets.
For example, let’s say you have 2,000,000+ points that need to be assigned a County name, based on location. The following queries will process at least 2x as fast if you process the points in chunks of 200,000 records at a time.
- Select * from CountyRegions, MillionsOfPoints where CountyRegions.Obj Contains MillionsOfPoints.Obj
Using “Table > Update Column…” when tables have many columns?
Two-table queries often join tables for purposes of updating one table with values from the other, e.g.,
Select * from Table1, Table2 where Table1.ID=Table2.ID into qMyJoinedTables Update qMyJoinedTables Set Table1.Name = Table2.Name
This “Table Update” process can be dramatically slower if Table1 and Table2 have many columns. To improve performance:
- Narrow overly-wide columns
- Permanently remove unneeded columns
Adding records or updating values one record at a time? Drop indexes.
Drop the indexes, add/edit all records, then re-build the indexes. Why? It’s more efficient for MapInfo to index an entire table once than to repeatedly index an large number of individual records.
Adding or editing records? Set FastEdit ON (Advanced users only).
When table records are edited or added, MapInfo builds a separate transaction file holding the edit. This file disappears when you explicitly Save. By turning FastEdit ON, you command MapInfo to skip the transaction file and make all changes directly to the master table.
Use the MapBasic window to issue this command:
Set Table [tablename] FASTEDIT ON
- FASTEDIT setting apply to a single table. Issue separate commands for faster editing with multiple tables.
- Once ON, FASTEDIT remains ON until the table is closed or you explicitly reverse things via the following command: Set Table [tablename] FASTEDIT OFF
- Without a transaction file, you lose the option to Revert your edits. This is a risk.
The speed at which the screen refreshes certainly depends in part on the system hardware. Here, though, we will mention issues that are more user-controllable.
Map layers can cover huge areas without hurting performance.
MapInfo’s spatial indexing is good, so it takes approximately the same time to re-draw a neighborhood street map whether the street map layer is state-wide or only county-wide. MapInfo Pro does not spend any energy drawing street lines or other layer features outside the screen view.
Complex map features can hurt performance.
MapInfo spends energy drawing an entire map region or other feature even if only one small part of it is in view. This is particularly noticeable with MapX and MapXtreme, when drawing highly-complex region shapes with 10’s of thousands of nodes. For best re-draw speed, split these regions into separate records and limit the maximum number of nodes.
Map features with a display style of “None” still take work to draw.
Sometimes you need a map layer to be “Visible” only so that the text labels are drawn. You can make the map lines or regions invisible, but still see the text labels, by setting the layer Display Style to “None”. However, the map lines and regions are still manipulated internally in memory by the code, whether visible or not. For best performance, be sure the zoom layering for the map shapes is narrowed to just the zoom ranges set for the text labels.
Two other options: (Test only on a backup copy)
1. Update the map features, converting lines, regions, etc to points: e.g., in the MapBasic window:
Update [tablename] Set Obj = Create Point (Centroidx(obj), CentroidY(obj))
2. Use the Labeler tool in Tool Manager to create and save a permanent “text-only” layer, dropping the shapes entirely.
Sort address table by zip code in advance
Put records in ascending zip code order. Mapmarker’s address dictionary is apparently in zip code order. Thus, if your input address table is also in zip order, the processing will be much quicker.
Don’t create MapInfo point objects.
Fill lat-lon columns initially. Afterwards, use MapInfo’s Table > Create Point menu option to create the point symbols.
Relaxing ZIP Codes slows performance.
When ZIP Code matching is relaxed, MapMarker searches a wider area for possible matches so must consider more records in the address data dictionary.
Relaxing City Name slows MapMarker only slightly.
With city names relaxed, MapMarker evaluates more records in its data dictionary. It evaluates street addresses with the specified ZIP Code for all cities, not just the the city name in the input address. Most users will relax city name because city names used by the USPS often are different from the city names used ‘on the street’ or the legal incorporated city name.
Relaxing Street Name slows down MapMarker.
If street name is ‘relaxed’, MapMarker considers all street names that sound alike (soundex).
Relaxing House Number doesn’t hurt performance much.
Avoid matching to street Intersections.
If you opt to Match to Street Intersections, MapMarker must compare two street names for every record, not one. Choose this option only if your table contains a large number of street intersections.
If you must geocode to intersections, do this as a second pass. i.e., geocode all possible records without it, then turn on intersection matching specifying ‘only unmatched records’. Note: Remember that Street1 and Street2 must be contained in the same field and be separated by an “&&”. If you use a single ampersand, expect strange results.
Don’t leave MapMarker address files on the MapMarker install CDs.
Don’t ask MapMarker to fill in Output columns and don’t request MapMarker to blank out columns if address is not geocode-able.
Delete MapInfo indexes if geocoding DBF-format tables.
Even if the indexed column is *not* relevant to geocoding, the mere presence of a single indexed column in the DBF table slows geocoding down by a factor of 10 times or more.
Traps and dead ends
There are always those steps we try which we think may help, but which don’t. Here are some things not worth trying.
Don’t combine/merge adjacent regions.
You might think that reducing the number of regions would help, e.g., if your telecom coverage data defines antennae sectors as separate regions, you could use Object > Combine to merge them. Don’t. It is the complexity of each region that makes the most difference. Merging sectors will slow down processing because it results in even more complex regions.
Don’t bother disaggregating complex regions.
A region can be composed of one polygon or thousands of them. The Object > Disaggregate command lets us convert a highly complex region into a set of simpler regions with one polygon each. Don’t bother. It doesn’t speed things up.
This material is based on a presentation by SGSI at the Northwest MapInfo User Group, May 11, 2004.
©Copyright 2004 by SGSI, All rights reserved. Material may not be reproduced in whole or in part without written SGSI permission.