Suggestion for some complex message to process


(Rayees Namathponna) #1

Hi All,

Looking for some suggestion or your thoughts for below use case, this is my message

2016-09-29 05:11:44,644 level=INFO tag=“run_pallow.py” msg=“Run complete for appname=ofileAgg, job_date=20160912, status=Passed starttime=Thu Sep 29 03:13:25 2016, endtime=Thu Sep 29 05:11:34 2016, duration=1:58:8, inputs=[{“path”: “/path/oneday/profile/date=20160909”, “tag”: “oneday_profile”, “stats”: {“size”: “1.01GB”}}, {“path”: “/path/oneday/feature”, “tag”: “ay_feature”, “stats”: {“size”: “934.28GB”}}, {“path”: “/path/latest”, “tag”: “static_data_join”, “stats”: {“size”: “6.59GB”}}, {“path”: “//path/agg/profileAgg/date=20160908”, “tag”: “hist_profile”, “stats”: {“size”: “4.20GB”}}], outputs=[{“path”: “/path/oneday/feature”, “tag”: “features.output.folder”, “stats”: {“diffSize”: “0B”, “newFiles”: [], “endSize”: “934.28GB”, “startSize”: “934.28GB”}}, {“path”: “/path/oneday/profile”, “tag”: “subprofile.output.folder”, “stats”: {“diffSize”: “0B”, “newFiles”: [], “endSize”: “25.57GB”, “startSize”: “25.57GB”}}, {“path”: “/path/agg”, “tag”: “subprofileAgg.output.folder”, “stats”: {“diffSize”: “185.00GB”, “newFiles”: [], “endSize”: “1.80TB”, “startSize”: “1.62TB”}}]”

From this message I want to extract create table form like below,

For that I want to get below fields

Set 1

level=INFO
status=Passed
duration=1:58:8

set 2

_starttime=Thu Sep 29 03:13:25 2016 _
endtime=Thu Sep 29 05:11:34 2016

Set 3

tag:oneday_feature (rename this to Input_App_Name_01)
size:934.28GB( rename this to Input_App_Size_01)

tag:static_data_join (rename this to Input_App_Name_02)
size:6.59GB ( rename this to Input_App_Size_02)

Currently I am using copy input key values extractor get for set 1 from above , using regular expression extractor for set 2 and, regular expression for set 3 to get the json array in the above example input and output, after getting input and output applying grok pattern to get tag and size (there will be mutiple tag and size, so name then uniquely)

Here I am concerned about the performance, due to large number of extractor and different type like regex and grok, if any of you have exprreince with these kind of querry, please give your valuable suggestion