These are the only two tables that are germane. No need to bother you with the others.
mysql> describe skill_usage;
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO | MUL | NULL | |
| job_id | int(11) | NO | MUL | NULL | |
+----------+---------+------+-----+---------+-------+
mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| skill_id | int(11) | NO | PRI | NULL | auto_increment |
| skill_name | char(32) | NO | MUL | NULL | |
+------------+----------+------+-----+---------+----------------+
Basically, users enter a boolean search string, using skill names.
I will convert the skill snames to skill_id, and then want to generate a MySql query to get all matching job_id from table skill_usage, by parsing the user's search string.
Strings can contain skill name, the operators AND and OR, plus brackets for precedence.
Some examples might be
- C
- C or C++
- C++ AND UML
- (C AND kernel) OR (C++ AND UML)
But there is no limit to the complexity of expression allowed - and that's my problem.
I'm no SQL guru, so correct me if I am wrong. I think that I want to start SELECT job_id FROM skill_usage then parse, and build up the rest of the query.
For the first example, just skill name C, I want to add WHERE skillId = X, where X is gotten from the table skill_names.
For a simple OR, like C OR C++, I can use an IN clause - WHERE skillId IN (X, Y) (again, X & Y are lookups of the skill names to get a skill_id).
For a simple AND, like C++ AND UML, I reckon I need an INNER JOIN, something like WHERE skill_id = X INNER JOIN skill_usage ON skill_usage.skill_id = Y (where X is the skill_id for C++ and Y for UML).
I think that is roughly correct, for those simple cases (?).
But, when I get to even slightly more complex cases like (C AND kernel) OR (C++ AND UML), I get confused.
Would a regex or an algorithm be appropriate here?
@AnthonyVallée-Dubois answer to this question looks like I might be able to modify it, but it also seems very complex. I am hoping to make something simpler, but am unsure how to start (the PHP coding is not my problem, just the regex or algorithm).
Update
I am trying to separate the parsing from the queries, and am using this question to sort out the queries.
I am getting answers like
SELECT job_id
FROM skill_usage
WHERE skill_id IN (3, 4)
GROUP BY job_id
HAVING MIN(skill_id) <> MAX(skill_id);
and
select s1.job_id
from skill_usage s1
where s1.skill_id = 3
and s1.job_id in (
select s2.job_id
from skill_usage s2
where s2.skill_id = 4
)
Where the latter looks more extensible.
Whereas my pseudo-code for the PHP to convert a search string to an SQL query is roughly
fail if mis-matched brackets
reduce multiple spaces to single
removes spaces before and after closing/opening bracket "( " & " )"
foreach c in string
if c == (
else
if c === )
else
if AND
else
if OE
else
# it's a skill name