Printer tracking pipeline - $message_ field names not working for set or rename

I have been trying to duplicate this print tracking pipeline rule from a challenge a few months ago:
Tracking Print Jobs - Templates and Rules Exchange / Miscellaneous - Graylog Community

While I have my print logs filtering into a separate stream correctly… I can’t seem to get the rename or set the parameters as shown in the example. I have tried multiple options but can’t find any documentation on the $message variable parameters and how it ties into variables from the event log/message. Like I said, I tried to tackle this that is why some of the variables below are different from others and don’t match the original. Hopefully @tmacgbay see’s this and has some advice. After I get this sorted I would love to know how to add the entires I’m guessing into a table and create the cool dashboard he did as well.

I had to change the to string from:
to_string($message.winlog_event_id) == "307"
to:
to_string($message.winlogbeat_event_id) == "307"

The equivalent change doesn’t help with set_field. If tmacgbay or someone else knows, I am also curious why he chose set_field rather than to rename the fields since they are deleted later in the pipeline. There is likely a reason and I’m trying to learn :slight_smile:

rule "Printer_Tracking"
when
    // Function converts generic fields names to useful ones
    // then removes the unhelpful fieldnames because we don't want them
    to_string($message.winlogbeat_event_id) == "307"
then
    // change fields to something that makes sense.
    set_field("print_user",              $message.winlogbeat_winlog_user_data_Param3);
    set_field("printed_from",            $message._winlog_user_data_Param4);
    set_field("printer_name",            $message._winlog_user_data_Param5);
    set_field("printed_from_ip",         $message._winlog_user_data_Param6);
    set_field("page_count",      to_long($message.user_data_Param8));
    remove_field("winlogbeat_winlog_user_data_Param1"); // document number
    remove_field("winlog_user_data_Param2"); // action i.e.  "Print Document"
    remove_field("winlog_user_data_Param3");
    remove_field("winlog_user_data_Param4");
    remove_field("winlog_user_data_Param5");
    remove_field("winlog_user_data_Param6");
    remove_field("winlog_user_data_Param7");  //size in bytes
    remove_field("winlog_user_data_Param8"); 
    remove_field("winlog_process_thread_id"); // who cares about the thread id? Not me. 
    remove_field("winlog_process_pid");       // who cares about the pid?       Also Not me. 
    remove_field("winlog_opcode"); //  
    // Pull out for reporting
    route_to_stream("Printing_reports_stream");      

end
1 Like

1

  1. Yay!!! Someone is using it! :smiley: :smiley:

  2. re: winlogbeat prefixed - On the Input for Beats if you edit it, you will see at the bottom that you can remove the beat type from the fields - yours is likely unchecked.
    image

  3. Look at the raw message coming in for what the actual field names are when you are setting. So your raw field name is winlogbeat_user_data_Param3 then to reference it in the pipeline you would use $message.winlogbeat_user_data_Param3

  4. I started down the path of just creating the fields and a while after it was done I was on a rule kick to reduce the number of messages and fields are processed by Graylog. Rather than messing with current code, I just added the removes. You can rename and that would work fine. Also of note, if you are going to force the type (page_count to a long) that works better in set_field() than renaming.

hopefully that all helps?

@tmacgbay thanks for your quick response!!!

  1. I’m trying to use it :slight_smile: This is a fresh install of the latest stable/“supported” version of graylog and I only have 3 servers feeding to it so far, I want to sort out some bugs before I point everything “to it”.

  2. Ok I’m missing something. I’m sure it is silly so…I appreciate your patience.
    I did not have that check box “checked” I did that.
    So the issue I have is now they aren’t going to the Stream. I copied your original code in hoping that would fix it
    to_string($message.winlog_event_id) == “307”
    That still didn’t fix getting these events into their own “stream” so I copied the entire pipeline block and still now luck. I created my stream with the exact same name you used “Printing_reports_stream” I’m sure that I’m doing something wrong here like the checkbox I left unchecked but I’m not sure what it is now. Thoughts?

  3. A lot of this might clear up if I understood how to view the “raw” data. How do I view the raw incoming data?

  4. If I can get it to work as written to save time I won’t reinvent the wheel and just use your code. But… I need to get it to actually work.

  5. So after all of that is done do I add these “Results” to a table to create the dashboard you did or? That is down the road and I’m not as worried about this right now. I want the data and format coming in to work first. I can worry about this later…

  1. I guess I kind of know how to view the raw log data, i.e. clicking on a message but… When I’ve used those variables it doesn’t seem to change anything RE: even before I checked the “check box” to not prepend winlogbeats to every message. Is there a “more” raw version than that?

Check out the stream rule and make sure it is connected to the input that is receiving the beats logs. To do this you can look at the Beats Input you set up, click on the “Show Messages” button in the resulting search you will see a query up top that will include the Input ID something like: gl2_source_input:5c507d5150f8c604b0e9965c

To make sure a stream sees all things coming into an input you can then set up the stream rule to capture anything coming into the Input ID you just found
image

once you have that, you don’t need to look at the raw data, just the fields created by the beats agent (and/or you)… while you are looking (have that search you created on the beats input) at the stream messages that are…streaming in… haha you can click on one of the messages and it should open up and show you fields and data like below… so in this example if you wanted to reference the value for agent_hostname in the pipeline, it would be $message.agent_hostname

Lets get the data going right, then I can show snippets of the dash. :stuck_out_tongue:

@tmacgbay I’m trying to learn so I may have caused a … self inflicted wound & might be where things are going wonky. As I said I have 3 servers feeding in so far, one each(domain controller, File & Print Server, & Exchange server). I have a seperate Stream for each type of server and a rule set up on the stream ex: " source must match exactly XXXXX-FPS" to move each server type into the correct “stream”. I also have the Stream configured to remove the messages form the “All Messages” stream. My goal is to be able to manage each type of server log data independently. That way if I need to change retention or something for one I don’t have to affect them all. ex. if the mail server is growing too quickly I can set it to keep for less time. I assumed there was no reason to leave the messages in the “All Messages” stream. Do I need to uncheck that check box and leave the messages in the All messages stream? I had the Pipeline pointed to the FPS Stream and I have since added the All Messages stream however they still aren’t flowing into the Printer_reports_stream.
If that is the issue how do I fix it while retaining the ability to manage the types of logs? If that isn’t the issue then… I’m not sure what to do it still isn’t working even with the pipeline pointing to both FPS stream and All Messages. Thoughts?

I have my system set up so that anything being worked on is removed from the all messages stream, that way I know if something shows up there that it didn’t get captured in a stream/pipeline properly. Its a personal choice, the system doesn’t require it. Streams are a representation of data flowing through Graylog that you can apply a pipeline to… once it is finished with a stream the data is sent to the connected index… which is a searchable storage area on Elasticsearch/OpenSearch. Here is a thread where a bunch of us discussed and drew up message path through Graylog. Hopefully that will clarify a bit.

I am confused about the separate streams you describe? You can have one input going to one stream and apply multiple pipelines, stages, rules to it. it wouldn’t be efficient to create a separate stream and/or index for each machine… You can have streams associated with indexes that have different retention time/counts and use the pipeline function route_to_stream() to route messages to a different stream (retention) when needed… I feel like the is something missing to the explanation but I can’t put my finger on it…

@tmacgbay YAY!!! It is working and flowing into the stream now… I had a typo, Printer_reports_stream vs your Printing_reports_stream, fixed it and they are now at least flowing into the stream correctly and everything is working as far as renaming and removing the fields. UGH… It was a pebkac / typo. By fixed I mean I renamed my stream to match what you were using just in case that would cause other problems later with the dashboard or something.

So… Now what RE: dashboard?

A little background, if you care to advise on best practices. We had a graylog server back in the version ~2 days that was over 5 years ago if memory serves. It worked pretty well but had quite a few issues after running after several years so much so we were better off starting over and setting things up “correctly” from scratch with lessons learned. I understand what you mean RE: not a stream / index per server. We have quite a few Domain Controllers(5+) & FPS servers(again 5+) which will eventually flow into the system. We thought it prudent to break down the streams & indexes to system types. Granted, we only have one Exchange server but mail servers are “chatty” so felt we should separate it. We plan to set up separate streams and indexes for Firewalls(which we have 6 currently) & similar breakdowns. Again, we are trying to set it up “correctly” this time with different streams & indexes based on the types of systems we plan to feed into this server. Previously we fed everything into a single stream and index and it was a nightmare in many ways.

For the streams we have rules set up. Currently at the most basic level we have
4 indexes: default, exchange, dc, fps
7 streams but only using 4,
The three default “All” streams(events/messages/system events)
4 Streams we were using before this project: DC, Exchange, FPS AND the new stream: Print reports stream(created for this we wanted to see some real world return from this before setting up and pointing everything to the system) not a separate index for this stream…

The three main Streams we are using each have a source rule set to match and have it set to at least one of the following rules(i.e. so I can base it on source and route each DC/FPS/Exchange into the correct streams). For the “rule” we have the name of the server “source” to direct each to the correct Stream & corresponding Index. We also have the streams set to remove matches from the “All messages” stream for each of these streams. So, eventually we hopefully have nothing showing up in the All Messages stream/default index.

We only have 2 inputs, one syslog & beats(for winlogbeats).
Currently only one pipeline and 1 stage(stage 0) with 1 rule in the stage Print_Tracking

Does that all sound good? Anything we should be doing differently as far as best practices you can tell so far?

That all makes sense to me - one thing to help would be to look a the Graylog Schema so you can be consistent with naming conventions for field names and the like. Also, take into consideration the message you want to drop, particularly chatty ones, that you don’t want. Try to standardize dropping them or shunting them to an index that has a shorter life span. Trimming out parts of Windows messages… some have a paragraph or two of explanation that comes with each message. :slight_smile:

So how would we go about creating/implement a dashboard similar the one you showed in Dashboard contest? Do I need to create a table and put some of the labels and things into a table or can I have a dashboard directly query the stream, or?

Dashboards query your Elasticsearch/Opensearch. Here are some screen shots of widget details I created for the dash.





ALSO: Moved the conversation to Graylog Central where questions should be… :slight_smile:

1 Like

Sweet! I got my own working. THANK YOU for your help.
I may need to start a new topic but…

I’m trying to get the AD Content pack working and I can’t figure out what I’m doing wrong. AND since I removed the winlogbeats from the beats input… I tried modifying the queries but that didn’t work. I have a REALLY strange issue occurring with my searches. Ex. if I search for:
winlog_event_id: == “4733”

I get results but… they are for event ID 4624 and others, it just so happens that there is 4733 in the message field under Logon GUID and under the winlog_event_data_LogonGuid fields. Why is a search of winlog_event_id pulling data from other fields?

Here is the content pack that started all of this and is thoroughly confusing me. I suspect if I can fix the search pulling back odd results I can fix the winlogbeats_ by replacing it with winlog_ but… I could be wrong:
Active Directory Auditing (WinLogBeats) - Graylog 3.0.2+ - the NEW Marketplace / Content Pack - Graylog Community

yes, please start a new thread in Graylog Central.

The Content Pack is pretty old, it will take a bunch of reworking.

If you are searching in the Graylog GUI it should be:

winlog_event_id:4733