The Ecoengine uses ElasticSearch as search back end which
can be used through the ?q=
query parameter. The search engine
returns results much quicker than filtering would.
Searching will return faceted results and works across all indexed resources
(marked as searchable). However, their are trade-offs.
The search index might not reflect new data immediately since an update needs to be triggered. Because of the mostly static nature of Ecoengine’s resources this should rarely matter. A second trade-off stems from the fact that not all information is contained in the indexes. If additional fields are requested, the search result needs to be transcribed into a quite expansive database request which can be slower than filtering. The Ecoengine handles this automatically and will delay the database request as long as possible.
Whether it make sense to request non-indexed fields within searches depends on the number of returned records: For small numbers of returns using the search engine in combination with database queries and filtering can be beneficial, for bigger returns it might be better to use filtering. This situation stems from the very nature of search engines geared to find a few needles in a haystack. Carefully targeted search requests perform better and reduce the amount of returned data.
The best and fasted approach is to request only fields stored in the search index and use follow-up requests to query details where needed.
Note
Profiling the performance of different request strategies is a good approach for
building applications. It is important to remember that the API uses a caching layer
(memcached) that stores returns using the full URL to retrieve them later. If the same
URL will be requested later the return might come from the cache while the Ecoengine itself
will not be hit. Requests issued repeatedly will consequently perform better but will not
reflect the speed of the original request. To circumvent this behavior, modify requests
by adding a different random number with every profiling request, e.g.
/api/observations/?q=lynx&9383947349
or /api/observations/?genus=lynx&34686384
.
All resources marked as “searchable” in the last chapter can be used with the Search Engine. All fields that appear in the unmodified list views are contained in the search index.
The goal of the development of the Ecoengine is to use the search engine as the main avenue to the data and data aggregation and reduce direct database querying as much as possible.
See also Search Feedback
GET
/api/search/
¶The search endpoint returns a faceted summary of results. The ?q=
query parameter. There are different ways to specify and narrow searches within the string provided
which in some respects double the functionality of filters but might behave somewhat
differently.
Warning
The results of filtering and searching might differ. Compare /api/observations/?genus=lynx and /api/search/?q=genus:lynx
The search parameter ?q=
provides full text search across indexed content
(currently Observations and Media). This makes it easy to discover content across
resources. However, it might be harder to narrow the search precisely if the same word
appears in different contexts.
For example /api/search/?q=california would find all occurrences of records that are in California but also species with “california” as part of their scientific names.
Warning
The search engine will find only information that is also contained in the data. Not all observation records have complete taxonomies. For this reason a search for “aves” does NOT return all birds (We plan the inclusion of taxonomy resolvers in the Ecoengine.)
The search back end facets on the resource itself. For searching a particular resource, e.g. photos: /api/search/?selected_facets=resource:photos&q=
Another way to search is by a particular field. E.g. /api/search/?q=state_province:California or /api/search/?q=state_province_exact:California.
Searches can be – separated by commas – combined (AND), e.g. /api/search/?q=puma,state_province_exact:California.
The fields that are searchable and available to be retrieved from the index without hitting the database are the ones that are by default visible in the list view of the resource in question. Other fields that are contained in the detail view or even further hidden are not represented in the search index.
Time and space are concepts central to the Ecoengine. For this reason, geographical time and space got special attention in designing the search indexes.
(extend)
The goal of a search can be the aggregation (e.g. count) of data with certain attributes,
a functionality that is well supported by faceting. However, more often the result of
a search will be requested as a list of items for download or further processing.
The Ecoengine API makes it easy to retrieve results of searches by just applying
the exact same ?q=
and ?selected_facets
parameters to resources in question. E.g.
/api/observations/?q=puma,state_province_exact:California.
returns all puma observations in California.
There a few things to remember to get best performance from the search engine. In the example, the database will not be hit. In the background the API delays the resolution of the search result against the database until it does not have enough information to return the requested data. As long as the request can be satisfied with data in the search engine (which parallels the list views) it will be able to so without any use of the database.
Performance will be very different for following requests.
/api/observations/?q=puma,state_province_exact:California&extra=sex.
In order to populate the sex
field the API needs to hit the database.
The same is true if filters are added, e.g.
/api/observations/?q=puma,state_province_exact:California&sex=male.