Open SkyQuery Limitations
Open SkyQuery is a facility that allows users to access individual
astronomical catalogs, as well as to compare them finding positional
cross-matches subject to any other conditions or constraints the user
wishes to define based on the data in the catalogs. The catalogs/databases
that are available for use are shown on the Query Screen on the left side,
under the title Nodes. We call them SkyNodes.
Users should be aware of the fact that queries between SkyNodes (including
MyData) are always limited to a maximum of 5000 rows. We apply this
restriction so Web access is possible and big queries don't swamp the
systems.
What does this 5000 rows limit mean?
- Single node queries will be limited to 5000 rows.
- Cross-matches between query sets that contain
about 5000 objects are likely to be incomplete.
Why?
The way Open SkyQuery works is as follows:
- First, SkyNodes are queried for the number of rows that meet the query
region constraints.
- Then, a query plan is created in such a way that the smallest SkyNode
is executed first and it sends the results on to the next SkyNode in size to do
the first cross-match, and so on.
- If a SkyNode has more than 5000 objects in the specified region, the
first cut will be applied here. This is likely to happen when the REGION
constraint covers a big area with a lot of objects. The REGION constraint is
applied first to determine how many objects lie within it, and if the number of objects in the REGION
is higher than 5000, the first 5000 objects only are selected. When the remainder
of the WHERE clause constraints are applied in the second step, the total number of rows
could end up being smaller than the 5000 limit.
- Since the first 5000 objects are selected using the SQL "TOP 5000" construct, there
is no order implied and hence no guarantee that the same 5000 objects will be selected if
the query is repeated. This may lead to unrepeatable results from consecutive calls of
the exactly same query. This is an annoying "feature" that we are addressing and trying
to solve in our ongoing development.
- Additional cuts may happen when a cross-match is performed and the
results are sent to the next node. During the cross-match process, each
object from the prior node is compared to the current catalog looking
for matches. If the prior node provided about 5000 rows and a one-to-one
match is expected, the xmatch table might end up with more than 5000
rows depending on how restrictive is the confidence level in "XMATCH ()
< confidence_level" and the catalog astrometric precision, sigma.
- http://openskyquery.net/Sky/SkySite/help/algo.aspx
So what should you do to avoid this limitation?
The only way to get around the region cut is to use small regions at the moment so that the number of matching
objects will be less than 5000. We are already working on a parallel framework capable of doing
full catalog-to-catalog cross-matches. Thank you for your patience!
|
|