Special chars, searching and the documentation

Hi

Sorry if this has been covered elsewhere and for the wall of text below, but I’ve been trying to figure this out over the last few days and trawling various forums along the way and haven’t seen anything specific to my experience.

Simply stated, are there any other special characters aside from what is covered via:
https://docs.graylog.org/en/3.1/pages/queries.html#search-query-language
Advising - & | : \ / + - ! ( ) { } ^ " ~ * ? - need to be slashed out.

The reason I ask is I’m having difficulty matching search strings such as email address (*@domain.tld) for example. I’ve enabled the front wildcard in the config and played with slashing it out as well. But casting that aside, I then just looked for the string @domain.tld, whilst this works, when I reviewed my results I saw that it was matching messages more broadly than anticipated. For example, I was matching email based logs as well as DNS based logs, where the ampersand (@) definitely doesn’t feature within the log line even if the domain.tld component did.

Note the wild card does appear to work as expected, so long as it’s not mixed with the @. For example *domain.tld does appear to match the expected messages. Insert the @ - nothing, even though the scope was exactly the same.

Another search I was attempting was based on Postfix logs, similar to to=*@domain.tld And no matter where I put slashes or don’t, I get unintended or no results. When the initial searches didn’t work, I then assumed that the < > = symbols needed to be slashed out and I went as far as to try the @ as well. Whenever I refer to < or > symbols, the search seems to apply the context in ranges (for lack of any other explaination). Even if I slash or quote them, they seem to do “something”

In my searches I found a post on this forum, mentioning that the period (.) represents a separator between tokens, so in the above case domain.tld is seen as 2 distinct strings - true? If so that’s not really covered in the above article - unless I’ve skipped over that!

Regardless this led me to setting allow_highlighting = true. When I did and played with the same sorts of searches, I found that the @ < > chars never get highlighted, so that leads me to conclude, they are not treated the way I assuming they are via search and they seem to have a special meaning.

It might be asked/suggested, whether or not I’ve enabled pipelines or extractors - the answer is not in any great deal. Rather I had intended to work through the logs and develop them as needed.

Any ideas?

Thanks

did you search in a specific field or did you use the full txt search on messages/full_message/source for this?

It makes the difference.

Hi Jan

I believe the full text search. The default view available via the search tab. I have put feeds into streams and I have limited my searches to streams and sources. But I also experimented with not, so that I could be sure I was getting expected results or at least understanding them (or not). Another reason for doing this was I was in the process of implementing pipelines and extractors , but I realised the messages were not always being
parsed into the fields (a separate issue) - so that’s why I went looking for messages in the way I did, hoping to peel this back bit by bit and understand the logs so that I could adjust the extractors or pipelines.

Thanks

To add; what is the significance with the @?

I just tried searching for the following:
@
“@”
‘@’
@
@

All searched returned no results - even though I know for certain results certainly exist.

you do a full_text search when you do not add a message field. Means if you just type in text in the search bar. This is done on the field message, full_message and source by default.

When you add the field you want to search in upfront - like headers: "*domain.com" you search in the specific field.

This is all based on lucene: http://www.lucenetutorial.com/lucene-query-syntax.html

If the @ is something special I do not know.

Hi Jan

Yes, confirmed - I’m trying full text searches. Basically I was trying to pull out logs I was specifically interested in (based on known strings like email addresses) and using them as a basis for any extractions I was starting to build. But it seems the only way forward, is to set the extractions up first so I can search more specifically via fields.

Thanks for pointing me toward Lucene. I found a bunch of sites and articles all mentioning the same special chars discussed in the Graylog documentation.
But I just now stumbled on this:
https://help.relativity.com/9.6/Content/Relativity/Data_Grid/Data_Grid_search.htm
Which states that + - = && || > < ! ( ) { } ^ " ~ * ? : \ / @ are special chars. You can’t search for the @ symbol at all and the article mentions that special chars may actually be interpreted as white space ( I know this would seem to be an interpretation of another product, but it seems like this is what is occurring in Graylog). Also I’m not sure if Lucene is Lucene, is Lucene - if you get my drift, but if so then at least the article mentions the chars I’ve been having issues with ( > < and @ ) and this kind of explains why I can’t do what I’m trying to do, at least the way I’m trying.

Is anyone able to confirm this behavior/assumption, in comparison to the above mentioned article?

Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.