Split a path into multiple fields

  1. I have a field called fname containing a fully qualified path. Example entry:
    /home/Templates_in_Word/Projects/SmallProjects_English.dotx

I would like to create multiple fields from this field, one for every folder level (the levels are not predefined but I would stop at lvl9) through running a rule/multiple rules in a pipeline, such that I end up with the following fields (assuming the example above):
lvl0: home
lvl1: Templates_in_Word
lvl2:Projects
filename:SmallProjects_English.dotx

In case there are more levels I would like to have fields up to lvl9

  1. Environment:
    Ubuntu 11.0.16
    Graylog 4.3.8

  2. The problem I have (I am a newbie) is related to the dynamic number of folder levels, otherwise I could use the split function as well as having the last entry (file name) in a separate field. Am I overseeing something or is this not possible to achieve?

I searched on the internet but didn’t find an answer, that is why I am posting this here.

Thank you!

There is a similar article here that should get you on your way using regex. I would extract the file name off the end and perhaps have nine following regex commands using the progressively smaller parts of the previous finds. Eventually you will find nothing so the resulting set_field() function won’t actually create a field. There are a bunch of posts outside of Graylog for finding filenames and paths too (like here) that might be helpful. Note in the first post how it references first found with regex with ["0"]

set_field("file_name", to_string(file_name["0"]));

You can test out regex here

Also note that regex in the pipeline needs to double the escapes and time you are escaping something.

Lastly - post up your solution or if its not quite working post up the code and we’ll see what we can help with! :smiley:

Epilogue: Highlight any code/logs using the </> forum tool to make it readable so you aren’t missing things that way instead of the difficult and missing stuff formatting in:

([^/:*?"<>|\r\n])+$

you get something easier to read:

([^\/:*?"<>|\r\n])+$

1 Like

Thank you for your hints. I had thought, that such problems could be solved with different means than having to use regex. I must admit that I find regex to be powerful but not that intuitive and therefore tending to avoid it :pleading_face:

What I tried today is to use the following rule instead:

rule “extract folders from fname”
when
has_field(“fname”)
then
let msg = to_string($message.fname);
let splitfields = split(“/”, msg);
set_field(“fname_1”, splitfields[1]);
set_field(“fname_2”, splitfields[2]);
set_field(“fname_3”, splitfields[3]);
set_field(“fname_4”, splitfields[4]);
set_field(“fname_5”, splitfields[5]);
set_field(“fname_6”, splitfields[6]);
set_field(“fname_7”, splitfields[7]);
set_field(“fname_8”, splitfields[8]);
set_field(“fname_9”, splitfields[9]);
end

This works, but:
a) The file name is not in a separate field, but written in the last index item, as long as there aren’t more than 8 folder hierarchies. This can be fixed by extracting the text after the last “/” first (probably with regex :slight_smile: )
b) If there are less than 8 folders levels, graylog fills in the variables fname_n but fails when the index is bigger than the number of items filling in the variable: gl2_processing_error

e.g.
Error evaluating action for rule <extract folders from fname/634ad1071e8cc07c3bf1d240> (pipeline <Synology /6345ce421e8cc07c3be75cfe>) - In call to function ‘set_field’ at 14:4 an exception was thrown: Index 8 out of bounds for length 8
ip

I am missing the following:

  1. Either having an algorithm which allows me to iterate through the list and stop at the last item which is filled and skipping the others
  2. or analyzing the gl2_processing_error variable by having a rule following the one described above, which checks on the variable gl2_processing_error and filtering out the index number (a bit like an error handler). With that number I could replace the variable and set set_field(fname_filename,…) and remove_field(“gl2_processing_error”). But this doesn’t seem to work.

Any suggestions?

I hear you on regex but the ONLY way to start getting good at it is to play with it… and if you want to get serious about managing message… you should play with regex. :smiley: :smiley: You can’t really get around your issue without it.

Suggestions:

a) use regex () and/or regex_replace() funtions to pull out the actual file name first. You can test with the following regex that gets just the path ^(.*\/)[^\/]+$ suggesting again to go to regex101.com and plug in the regex I just googled for you and a couple of test paths and filenames. It will tell you in the upper right hand corner what it is doing in sequence to find the info… here is an untested example of how I might start working in the pipeline rule

   let the_filename   =   regex("^(.*\\/)[^\\/]+$",to_string($message.file_name_and_path))["0"];
   let the_path       =   regex("^.*\\/([^\\/]+)$",to_string($message.file_name_and_path))["0"];

I can’t find an easy way of breaking out each directory when you have an undetermined number of them.

Hopefully this gets you on your way though!

2 Likes

Thank you, once more. The regex you had provided helped me :+1: :pray:

By the way: The regex between path and filename were mixed up.

Based on your answer I assume that it is neither possible to iterate through a list nor to analyze and overwrite the gl2_processing_error. I believe that specially an iteration functionality would be a valuable feature for many use cases.

And here come the explanation how I was able to make it work:
I extracted the paths with a grok pattern with all optional entries to ensure that i wouldn’t get an gl2_processing_error and extracted the file name through the regex you had kindly provided.

Below is the code for the rule, which might be helpful for somebody at some stage:

rule"extract folders from fname"
when 
    has_field("fname") 
then
    let msg = to_string($message.fname);
    let pattern = "(?:/%{DATA:_1;string}/)?(?:%{DATA:_2;string}/)?(?:%{DATA:_3;string}/)?(?:%{DATA:_4;string}/)?(?:%{DATA:_5;string}/)?(?:%{DATA:_6;string}/)?(?:%{DATA:_7;string}/)?(?:%{DATA:_8;string}/)?(?:%{DATA:_9;string}/)?";
    let folders = grok(pattern, msg);
    set_fields(folders, "fname");
    let fieldname = regex("^.*\\/([^\\/]+)$", msg)["0"];
    set_field("fname_file",fieldname);
end
2 Likes

Mark the solution for future searchers! :smiley: Glad you were able to take my flipped up regex and make something of it!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.