I would expect this to be the most asked question, but I didn’t find any hits - most related to search failing or bad installs. I’m having issues wrapping my head around the analyzed search functionality. I’m basically inches from editing the custom index and disabling full_message analyzing.
My problem is one of oversimplification not allowing me to return accurate hits. Before I get into that, I need to make sure I understand this setup correctly. A message comes in (full message), some form of truncating is performed on it (message). Both of these fields are then tokenized via the analyzer and turned into search terms. This looks like this:
<30>1 DATE TIME XXXX vmdird - - - t@139824883271424: Modify Entry (cn=XXXX@false@urn%3Aacl%3Aglobal%3Apermissions,cn=AclModel,cn=VmwAuthz,cn=services,dc=vsphere,dc=local)(from )(by )(via Rep)(USN XXXX) t@139824883271424: Modify Entry (cn=XXXX@false@urn%3Aacl%3Aglobal%3Apermissions,cn=AclModel,cn=VmwAuthz,cn=services,dc=vsphere,dc=local)(from )(by )(via Rep)(USN XXXX) Field terms: t 139824883271424 modify entry cn XXXX false urn 3aacl 3aglobal 3apermissions cn aclmodel cn vmwauthz cn services dc vsphere dc local from by via rep usn XXXX
Now the search I’m using is only looking at these tokens. And on top of that, the default behavior is an OR. What this means is that none of the following searches work.
message: /\(cn=.*,dc=vsphere,dc=local\)/ message: /.* add|modify|delete entry cn=.*,cn=user|group,dc=vsphere,dc=local/ message: "(cn=*,dc=vsphere,dc=local)"
The only search that will work here is
message: "dc vsphere dc local" message: /add|modify|remove/ AND message: "entry" AND message: "dc vsphere dc local" AND NOT message: "cn licensing" AND NOT message: "cn service" (still finding new bad hits with this approach)
However, if I’m being presented with a dynamic heirarchy of multiple CN statements of various lengths, I cannot easily restrict the results to an accurate subset that I can confidently alert off of.
Examples: (cn=important1,dc=vsphere,dc=local) (cn=notimportant1,cn=configuration,dc=vsphere,dc=local) (cn=alsonotimportant,cn=licensing,dc=vsphere,dc=local) Totally different message I don't care about (cn=notimportant,dc=vsphere,dc=local) (cn=important2,dc=vsphere,dc=local)
So the question here is whether I’m missing something. Should I handle this problem using massive amounts of AND/OR statements? Should I remove full_message from the analyzer? Is this a scenario where I should create an extractor to populate a non-analyzed field that I then scan via my own regexp?
The main problem I’m running into here is that the search examples in the 3.0 doc (which is an incredibly nice set of docs for what it is worth) are extremely simplistic in comparison to real world scenarios. Searching for a simple address, or GET or SSH, are very, very simple. But most of the things you would be dashboarding or alerting on are the complete other end of the spectrum.