OpenSkyQuery

Likelihood analysis

The elements of our cross-matching strategy are as follows:

In order to optimize the network traffic, we first obtain counts of the number of objects on each database that match the user-entered constraints that apply to that database. Also, doing a count brings the objects into the cache. When we access the objects the second time to do the cross-matching (see below), hopefully we will just be hitting the cache, so this will be much faster. Most of the time is spent in moving objects around between the nodes, so we don’t want to do it twice.

The spatial extent of the search specified as a circular area in terms of the astronomical coordinates (right ascension = ra, declination = dec) of the center and a search radius r (in minutes of arc). This allows us to limit the spatial extent of the search to a well-defined area. The syntax for this in ADQL/s is

Region(Circle <ra> <dec> <search-radius>)

Which catalogs the user wants to select matches in, and which to select dropouts in (see below).

A parameter that encodes the tolerance of the match. We have developed a SQL Server stored procedure called spGetMatch()to perform the cross-match between databases.
This does a probabilistic join of the data between two databases based on the value of sigma.

The cross-matching algorithm encoded in the stored procedure is a probabilistic calculation that minimizes the chisquare parameter as defined by:

where x,y,z are the Cartesian coordinates corresponding to the ra and dec specified by the user, a is a weighting parameter calculated from the astrometric precision of the survey, and l is the Langrange multiplier in the minimization to ensure that the (x,y,z) is a unit vector. The code for spGetMatch is included in the code listings.

We compute four cumulative quantities at each cross-identification step – these are

The best position is given by the direction of

The log-likelihood at that point is given by

This is divided by the number of surveys considered up to that point, and compared to the tolerance. If a tuple’s log-likelihood exceeds this threshold, it is killed. This cross-identification process is fully symmetric, the particular order of matching does not matter. The cross-matching is applied to each node recursively by the portal when it runs the query execution plan.

Mandatory Matches and Dropouts

The majority of cross-matching queries would search for objects that match in each one of the selected catalogs. This is the mandatory match mode meaning that objects must meet the matching criterion in every archive that the query is run on. However, users may actually want those objects that exist in one or more archive(s) and not in the other(s). These dropouts are often as important scientifically as the matches, e.g. quasars that appear in an optical sources catalog but not in a radio sources catalog.

Our algorithm is designed to handle both cases. The special syntax that we have introduced into ADQL/s to achieve this is the XMATCH construct, as illustrated in the example below:

SELECT … FROM SDSS:photoobj p, 2MASS:photoobj t, FIRST:obj r WHERE XMATCH( p, t, !r ) < 3 AND Region('Circle J2000 0.9 0.8 0.3) AND (…) --remaining constraints

which means “find all objects that satisfy the remaining constraints in the archives represented by p and t, but not in the archive represented by r”. Hence we are selecting mandatory matches in the first two archives but dropouts in the third.

Developed with support from the National Science Foundation (under Cooperative Agreement
AST0122449 with the Johns Hopkins University), and NASA AISRP (awards NAG5-17042
and NAG5-12092). The NVO is a member of the International Virtual Observatory Alliance.

This NVO Application is hosted by the JHU Department of Physics & Astronomy.

The XMatch Algorithm

Likelihood analysis

Mandatory Matches and Dropouts