Pipelines with regex and special character


(Rayees Namathponna) #1

Hi All,

I am facing issue with pipeline and regex, in my regex i have special character [, to escape this I am using \\[, you can see the complete regex below, while doing pipeline simulator getting extra \ in results, any idea how to avoid this ?

Pipeline

rule "file categorize"
when
    has_field("message")
then
  set_field("pipeline", "message");
  let message_field = to_string($message.message);
  //set_field("xy", key_value(message_field));
  set_field("Pal_Inputs", regex("inputs=(\\\\[(.*?)\\\\])", to_string(message_field), ["rayees"]));
  //set_fields(fields: key_value(value: "pipeline"));
  set_field("pipeline2", to_string($message.transaction_date));
end

Message

2016-09-29 00:55:24,261 level=INFO tag="run_pal_workflow.py" msg="Run complete for appname=locationJoiner, job_date=20160912, status=Passed starttime=Thu Sep 29 00:10:31 2016, endtime=Thu Sep 29 00:55:19 2016, duration=0:44:47, inputs=[{"path": "/processed/pal/parse//staticDataJoin/latest", "tag": "static", "stats": {"size": "6.59GB"}}, {"path": "/processed/pal/parse/location/date=20160909", "tag": "location", "stats": {"size": "216.78GB"}}], outputs=[{"path": "/processed/pal/parse//test/locationStaticDataJoin", "tag": "locationstaticjoiner.output.path", "stats": {"diffSize": "45.96GB", "newFiles": ["/processed/pal/parse/test/locationStaticDataJoin/date=20160909"], "endSize": "218.80GB", "startSize": "172.84GB"}}]"

Result

{"1":"{\"path\": \"/processed/pal/parse//staticDataJoin/latest\", \"tag\": \"static\", \"stats\": {\"size\": \"6.59GB\"}}, {\"path\": \"/processed/pal/parse/location/date=20160909\", \"tag\": \"location\", \"stats\": {\"size\": \"216.78GB\"}}","rayees":"[{\"path\": \"/processed/pal/parse//staticDataJoin/latest\", \"tag\": \"static\", \"stats\": {\"size\": \"6.59GB\"}}, {\"path\": \"/processed/pal/parse/location/date=20160909\", \"tag\": \"location\", \"stats\": {\"size\": \"216.78GB\"}}]"}

(Philipp Ruland) #2

Hey,

do you mean the extra \ (backslash) in front of the " (quote)?

I guess this is because Graylog pipelines uses “safe” strings that escape quotes with a backslash to avoid parsing errors of the given string.

Afaik there is no way to properly remove them. Maybe a replaceAll() function would be nice for pipelines…

Greetings - Phil


(Rayees Namathponna) #3

Yes, I mean \ (backslash) infront of ", i cant find any function to replace " to " in pipelines,


(Rayees Namathponna) #4

Is it a bug in Graylog ? I was in the process of converting extractors to pipeline, but got blocked now

I tried like above also, still the same issue


(Philipp Ruland) #5

Well, there is no replace() or replaceAll() function in pipeline (yet). Maybe this will get added sooner or later, or you could write a plugin for that.


(Jan Doberstein) #6

You should write a feature request to the pipeline repo that includes what should be possible with the function.

Or you write this functionality yourself


(Jochen) #7

For reference: