Avoid spliting words/tokens by "-"

We face mutiple issues around special strings containing “-” (dash).

Following log line for example:

2018-03-06 10:26:57,046 INFO [unique_key=journal/application/SOME_Special-47D0DE38-1094-11E8-8122-9140EB0135AA.csv.zip] [application_id=1301dfsdf4678] [ReportDataService]

Will also appear in:
2018-03-06 10:26:26,319 INFO [integration_api] New journal HiredScore_application_recovery_sq-47D0DE38-1094-11E8-8122-9140EB0135AA - Complete

I’d expect in this case the search term:
“47D0DE38-1094-11E8-8122-9140EB0135AA”

To find both cases, But I can’t find an easy search term.

I can do:
47D0DE38 AND 1094 AND 11E8 AND 8122 AND 9140EB0135AA*

But this seems very non intuitive.
Any suggestions?

The dash character is being used as a token separator in the standard tokenizer for analyzed fields.

See https://www.elastic.co/guide/en/elasticsearch/reference/5.6/analysis-standard-tokenizer.html for details.

There are some fields (“message”, “full_message”, and “source”) which are analyzed by default in Graylog.
If you don’t want Elasticsearch to analyze these fields or want to use a different tokenizer or analyzer configuration, you’ll have to create a custom index template:
http://docs.graylog.org/en/2.4/pages/configuration/elasticsearch.html#custom-index-mappings

You can also check out the tokenization of the “message” field of your messages in Graylog by clicking on one of the messages on the search page, then click on the dropdown menu of the “message” field (next to the magnifying glass) and select “Show terms of message”.

with the lastest version, i found graylog will modify the GELF extension field which named alike “_fieldname1.fieldname2” to “fieldname_fieldname2” in the graylog web page and also in elasticsearch. will we get same graylog behaviors to named it directly to “fieldname1_fieldname2” for the GELF extension field?

@cdeng Please don’t hijack unrelated topics.

Thanks.
We use multiple uuids in the system, many of which contains batch and not being able to cleanly search for the unique identified of a workflow is a major issue.

Appreciate you input.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.