Monday, 11 May 2009

Painful simple error configuring full-text SQL search engine Sphinx

After years of Lucene/Solr use as a search engine, I started experimenting with Sphinx as a potential replacement when used with MySQL.
Once installed and the tutorial done, I started plugging in the same SQL query used during the Solr indexing and of course, the index would not run.
The infamous
sql_query_range: : macro '$start' not found in match fetch query
popped up.
The only change I made in the sphinx.conf was the sql_query parameter. I tested my query in the command line and obviously everything worked.
I googled for a (long) while, trying different syntaxes, putting carriage returns/line feed in UNIX, using table aliases to have a shorter query but nothing would fix it.
Finally, I found the problem. I was trying:
sql_query = \
select b.business_id, b.business_name, b.short_description, b.phone, GROUP_CONCAT(DISTINCT(bc.tncat) SEPARATOR '#') as categories from tbl_business b, tbl_business_category_map bcm, tbl_business_category bc where b.business_id = bcm.business_id and bcm.category_id = bc.category_id and b.business_id BETWEEN $start AND $end group by bcm.business_id;

and the problem was the '#' which was used as a legacy separator in a denormalised database field.
The proper syntax is actually:
sql_query = \
select b.business_id, b.business_name, b.short_description, b.phone, GROUP_CONCAT(DISTINCT(bc.tncat) SEPARATOR '\#') as categories from tbl_business b, tbl_business_category_map bcm, tbl_business_category bc where b.business_id = bcm.business_id and bcm.category_id = bc.category_id and b.business_id BETWEEN $start AND $end group by bcm.business_id;

Sphinx seems to parse its conf as a typical Unix syntax so you must escape the characters that could shadow the conf parsing, which is not really documented (at least I could not find it).

That's it. Next target: indexing our full db and comparing index time to Solr :-)


No comments: