Wildcard search question


(Claus Koell) #1

Hi !

We use the newest version of graylog and we see a feature that we have not expected.

As described in the docu only message, full_message and source will be analyzed so we will be able to use the
wildcard feature on those fields.

We have now a field called application where we store string
values.

We are able to find values with wildcard …

Is this a new feature ?

We have not created any Index templates.

Greets
Claus


Wildcard Search Issue
(Jochen) #2

Hi Claus,

wildcard search (using * and ? from the Lucene query language) will also work on non-analyzed fields, but the behavior is different. On analyzed fields, the wildcard search is terms-based whereas non-analyzed fields have to match completely.

See https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_wildcard_and_regexp_queries.html for details.

Could you please point out the exact position where in the documentation this is mentioned?

Cheers,
Jochen


(Claus Koell) #3

Hi Jochen!

Thank you very much for the info !

http://docs.graylog.org/en/latest/pages/queries.html
Also note that message, full_message, and source are the only fields that can be searched via wildcard by default.

Greets
Claus


(Jochen) #4

Thanks, Claus!

I’ve clarified that sentence in the documentation.


(Claus Koell) #5

Thank you very much Jochen !

It’s not in this context but maybe you can explain me some other “feature” :wink:

We have a field called path where we store values like

/app/web/test.do
/TestWeb/something.do

As i understand now the field will not be analyzed and therefore it will not be lowercased …?
So my assumption is that the search schould be case sensitive.
In combination with a wildcard search i get a unexplainable situation

If i search for “path:/app/we*” i get results but if i try “/TestWeb/som*” i get no results.

If i search with the exact value “/TestWeb/something.do” i get a result.

Can you give me some hint why this is so ?

greets
claus


(Claus Koell) #6

Hi again :slight_smile:

I have tried to search with the head plugin and if i execute following query

{"query":{"bool":{"must":[{"wildcard":{"path":"/TestWeb/some*"}},{"match_all":{}}],"must_not":[],"should":[]}},"from":0,"size":250,"sort":[],"aggs":{}}

i get valid results and as expected it is case sensitive -> /testWeb/* returns no results.

Can somebody give me a explanation why graylog has problems to find values with Upper/Lowercase values in combination with Wildcards ?

The other question, is it possible to enable some debug logging to see what graylog is sending to elasticsearch ?

Thanks very much and greets
claus


(Claus Koell) #7

Hi Graylog Team !

Can someone help me to understand why the described search is not working please ??

greets
claus


(Michael Brown) #8

I came here to post this issue. Wildcard searches don’t work at all with uppercase characters. There is something very wrong with the regex/search that is implemented.

Being able to loosely search is rather important in a logging app to be of utility. Or at the very least, a basic wildcard search that isn’t limited to lowercase characters!


(Michael Brown) #9

I’ll post an example on how this doesn’t work - and it’s not a case insensitive issue:

searching data that is only lowercase works with wildcards:
env:myserver*

searching data that contains any uppercase char doesn’t work:
env:Myotherthing*
env:myOther*
env:myother* - also won’t match if value is myotherThing

In many cases you don’t have control over casing, such as when logger’s set the class name or basically any normal property. Have to resort to exact matches which makes diagnostics really difficult.


(Jochen) #10

The “standard” analyzer of Elasticsearch preprocesses the message fields before indexing them. Among other things, it converts the messages to lowercase.

See https://www.elastic.co/guide/en/elasticsearch/reference/5.6/analysis-standard-analyzer.html for details about the “standard” analyzer.

By default, Graylog creates an index mapping which will instruct Elasticsearch to analyze the “message”, “full_message”, and “source” fields. Other fields are not automatically analyzed.

You can change the analyzer which Elasticsearch is applying to a message field by creating your own custom index mapping:
http://docs.graylog.org/en/2.4/pages/configuration/elasticsearch.html#custom-index-mappings