NVO HOME National Virtual Observatory ICON
Open SkyQuery

Hosted By
JHU Home

The XMatch Algorithm

The cross-matching algorithm is the heart of the SkyQuery distributed query engine. It is absolutely critical to the success of SkyQuery as a distributed astronomical query service. We have devoted a great deal of thought to the design of this algorithm, and it is encapsulated in a SQL stored procedure called spGetMatch that gets invoked by the XMatch() function of the SkyNode in the C# implementation. We discuss below the cross-matching strategy that we adopted and the reasons for adopting it.

Likelihood analysis

The elements of our cross-matching strategy are as follows:

  • In order to optimize the network traffic, we first obtain counts of the number of objects on each database that match the user-entered constraints that apply to that database. Also, doing a count brings the objects into the cache. When we access the objects the second time to do the cross-matching (see below), hopefully we will just be hitting the cache, so this will be much faster. Most of the time is spent in moving objects around between the nodes, so we don’t want to do it twice.
We require the user to specify the following in addition to the non-spatial constraints:
  • The spatial extent of the search specified as a circular area in terms of the astronomical coordinates (right ascension = ra, declination = dec) of the center and a search radius r (in minutes of arc). This allows us to limit the spatial extent of the search to a well-defined area. The syntax for this in ADQL/s is
    Region(Circle <ra> <dec> <search-radius>)
  • Which catalogs the user wants to select matches in, and which to select dropouts in (see below).
  • A parameter that encodes the tolerance of the match. We have developed a SQL Server stored procedure called spGetMatch()to perform the cross-match between databases.
    This does a probabilistic join of the data between two databases based on the value of sigma.

The cross-matching algorithm encoded in the stored procedure is a probabilistic calculation that minimizes the chisquare parameter as defined by:
where x,y,z are the Cartesian coordinates corresponding to the ra and dec specified by the user, a is a weighting parameter calculated from the astrometric precision of the survey, and l is the Langrange multiplier in the minimization to ensure that the (x,y,z) is a unit vector. The code for spGetMatch is included in the code listings.

We compute four cumulative quantities at each cross-identification step – these are
<
The best position is given by the direction of The log-likelihood at that point is given by

This is divided by the number of surveys considered up to that point, and compared to the tolerance. If a tuple’s log-likelihood exceeds this threshold, it is killed. This cross-identification process is fully symmetric, the particular order of matching does not matter. The cross-matching is applied to each node recursively by the portal when it runs the query execution plan.

Mandatory Matches and Dropouts

The majority of cross-matching queries would search for objects that match in each one of the selected catalogs. This is the mandatory match mode meaning that objects must meet the matching criterion in every archive that the query is run on. However, users may actually want those objects that exist in one or more archive(s) and not in the other(s). These dropouts are often as important scientifically as the matches, e.g. quasars that appear in an optical sources catalog but not in a radio sources catalog.

Our algorithm is designed to handle both cases. The special syntax that we have introduced into ADQL/s to achieve this is the XMATCH construct, as illustrated in the example below:

SELECT …
FROM SDSS:photoobj p, 2MASS:photoobj t, FIRST:obj r
WHERE XMATCH( p, t, !r ) < 3 AND Region('Circle J2000 0.9 0.8 0.3)
AND (…)
--remaining constraints

which means “find all objects that satisfy the remaining constraints in the archives represented by p and t, but not in the archive represented by r”. Hence we are selecting mandatory matches in the first two archives but dropouts in the third.

 

NSF HOME NASA HOME
Developed with support from the National Science Foundation (under Cooperative Agreement
AST0122449 with the Johns Hopkins University), and NASA AISRP (awards NAG5-17042
and NAG5-12092). The NVO is a member of the International Virtual Observatory Alliance.

This NVO Application is hosted by the JHU Department of Physics & Astronomy.

Member
IVOA HOME
Meet the Developers
MEET THE DEVELOPERS

Last Modified: Tuesday, February 10, 2009 at 6:19:59 PM $Name: $Revision 1.1.1.2