Some file names in my ftp log parsed by gryalog have annoying german characters like ä, ö, ü. It seems, the grok pattern “UNIXPATH” does not like them. A file name “/home/my/blümchen.txt” is recognized as “/home/my/bl” which cause some other problems. My brilliant idea was, to fix the UNIPATH pattern to work “better”. I was trying to use logstash pattern (/[[[:alnum:]]_%!$@:.,+~-]*)+, which tested in grok tester (https://grokdebug.herokuapp.com) did the job, my “blümchen.txt” was recognized as expected.
But it seems, this patterm won’t work in graylog. Am I right, the [:alnum:] is not recognized by graylog? Is there any other way to “fix” the UNIXPATH pattern? Or is my whole approch to this problem wrong?
Centos7/Graylog3.3.7, UTF8 in locale seems to be set correct
Thank you. I shall remember this. However… no. Changing \w to \p{Alnum} still gave me the same wrong result.
After playing with this pattern, this one (/[\w_%!$@:.,+~-[^ ]])+ worked fine, (/[.[^ ]])+ would do the job as well. But it matches ANYTHING except a space. Even if this one works with sample data I have provided, I feel it’s somehow wrong to make it so “loose”, less restrictive.
Why is there such a difference in \p{Alnum} or [:alnum:] between different systems? Are any mysterious system settings involved here? In the whole parsing/storage process, all such “unusual” characters (french, polish as well) are displayed correct, so I assume the system locale/encoding/other are set correct in my environment.