Office 365 input keeps stopping

Hi dear members of the Graylog community,

Since the update from Graylog 4.2.10 to 4.3.2, my Office 365 Input keeps “stopping”, at anytime and several times in a day. This is really annoying because I have some important alerts triggered by event on this input.

I have to manually restart the input, because it does not really stop but no messages are coming from it until it is restarted.

Is it a known problem ?

My environment :

  • OS Information: Debian 11.0.15 on Linux 4.19.0-20-amd64

  • Package Version: Graylog 4.3.2+313b6bc

I have 2 inputs currently running : 1 Office 365 and 1 Syslog. I use a pipeline to check if the IP addresses are bad with the Treat Intel Plugin, and another to get geographic coordinates from the GeoIP lookup.

Have anyone already encountered that issue and solved it ? If yes, I would be happy to understand why it happens !

Thanks in advance for your answers,
Best regards,
G. Morin

I recall seeing this post that suggests that you increase your polling interval. I never like increasing polling intervals because I want to know RIGHT NOW… but it is at least something to experiment with. :slight_smile:

Well, I’m kinda like you : don’t like increasing the polling interval…

I deleted the pipeline checking IPs that uses the Threat Intel Plugin, and it seems to work flowlessly again. Maybe my rules are bad ? :face_with_monocle:

You could post the rule for review if you like.

Well, thanks but even after desactivating all the pipeline rules and increasing the polling interval like you said the input keeps stopping. This time, it does not stop after a few minutes but after a few hours.

Is it possible it happens because I upgraded from 4.2.10 to 4.3.2, skipping 4.3, 4.3.1 and not generating a new server.conf file ?

Do you see anything in the Graylog logs when the Input stops working? Is there anything on the Office365 logs side?

No, nothing special on both sides. The input is still running, but nothing seems to be ingested from it. I have to restart it manually to have all the logs to be downloaded.

By chance do you have a firewall or any application perhaps blocking the port used? Just an Idea

Hi @gsmith !

Yes I have firewalls but I made sure to authorize this kind of trafic. What I don’t understand is : why does it “stops” unexpectedly like this ?

If my firewalls’ configuration was bad, I think that I wouldn’t get any logs from this entry, isn’t it ?
Not very sure about what I’m saying but it seems logical for me :slight_smile:

A tough one when you don’t have any clues showing up in the logs. You had asked about the server.conf - I am not aware of any changes that would break an input… No logs on either side say anything of import and it’s just that the Graylog Input stops working? Is it the Input or is it Microsoft stops sending? How can you tell one way or the other?

YES, YES & YES !

→ First of all, no log on the graylog side. When I go to the “System > Inputs” menu, the Office 365 input is not in disabled state, nor in failed state. It seems just as normal as the Syslog input that is just after the problematic one.

When I quiclky desactivate/reactivate the input manually, the logs begin to be downloaded again, like all of it is normal. Between these states, nothing from my mac address to my public ip address changes, that’s why in my oppinion Microsoft is not blocking the log downloading. :slight_smile:

I’m a newbie in the Graylog world, and I am not very familiar with the way that graylog requests Microsoft for the logs. Assuming the number of secrets and tokens generated through the Microsoft web ui, I think it’s a REST API, and I’m not familiar with these tools for the moment. :grimacing:

But what is sure is that in some way the Graylog server manages to download some logs, which excludes - for me - a firewall issue.

The Graylog logs that can be watched with the command:

tail -f /var/log/graylog-server/server.log

Should show the transition of the input from started to stopped and the reverse - can you post that portion and anything else that looks like it is related?

Graylog listens for what is sent to it unless you have a plugin or a script that does otherwise. Do you have extractors running on that input - if so post up detail… there could be a regex or GROK that could hang up the Input

Here is the output :

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.io.EOFException: SSL peer shut down incorrectly
        at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:483) ~[?:?]
        at sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:472) ~[?:?]
        at sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:160) ~[?:?]
        at sun.security.ssl.SSLTransport.decode(SSLTransport.java:111) ~[?:?]
        at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1506) ~[?:?]
        ... 33 more

That’s what I understood reading the documentations, but is it the same with cloud-based apps like Office 365 ? Is there no polling triggered every X seconds to Azure ?

I have this extractor running on the input composed of several grok patterns :

%{DATA:UNWANTED}\"trc":"%{EMAILADDRESS:recipient}"\,"tsd":"%{EMAILADDRESS:sender}"

I also have this unique pipeline that uses the geoip plugin. I reused the one given in this tutorial, here is the code :

rule "GeoIP lookup: source_ip"
when
  has_field("source_ip")
then
    let geo = lookup("geoip", to_string($message.source_ip));
    set_field("source_ip_geo_location", geo["coordinates"]);
    set_field("source_ip_geo_country", geo["country"].iso_code);
    set_field("source_ip_geo_city", geo["city"].names.en);
end

Don’t know if this has anything to do with it but your GROK statement looks as though it is missing some escape characters - any quotes should be escaped \" or if you are in a pipeline they need to be double escaped \\" … the colon as well

Graylog’s list of characters that need to be escaped:

& | : \ / + - ! ( ) { } [ ] ^ " ~ * ?

Also of note it is good form to use the ^ at the start of a GROK/regex search and sometimes even the $ at the end of a search to make sure that GROK/regex isn’t sliding it’s search around trying to fit your command in wherever possible… that could slow things down more than needed, particularly at high volume.

1 Like

I just tried to run an “escaped” version against one log in th indexes to test, but it does not want to run while every characters needed to be escaped are escaped. Could you give me one example for the extractor please ?

I don’t have an example message to build/check from… You can plug it into an online GROK debugger and see what the results are. The linked one has worked well for me…

1 Like

Hi @tmacgbay ,

I rewrote my regex to use escaped characters, exact matches for some terms and ^and $ signs.

^%{DATA:UNWANTED}\"\btrc\b\"\:\"%{EMAILADDRESS:recipient}\"\,\"\btsd\b\"\:\"%{EMAILADDRESS:sender}\"%{DATA:UNWANTED}$

It seems better but the input continues to crash randomly.

I didn’t see this before - I think I was on the train when I was reviewing… :crazy_face: That says to me that Office 365 stopped sending in a way that pissed off the Input you have and it possibly related to ssl. I would start hunting on the Office 365 side…

Well, I have activated support for TLS 1.3 and removed SSLv3 and TLS 1.0 & 1.1, but it’s not better.

I’ve seen nothing on the Office 365 side, the parameters are very limited in the UI. :grimacing:

I don’t know how to proceed, it’s weird because it worked flawlessly back a month ago and I don’t really know why it crashes randomly like that. It’s so frustrating ! :face_with_spiral_eyes:

Hello @gmorin

I went back over this post to try to find a clue what’s going on. This has me puzzled.

Combined the following statements for clarity.

Need to ask a couple question, not sure if this is it but it was not stated above.

1.Are you using the build-in INPUT “Office 365 Log events” or are you used a third party plugin? Just checking.

2.Normally when its some type of service issue the journalctl command should show something.

3.The logs you showed above is not the full message, is it possible you can show the full log file during the crash on this input and startup? This would be better in trouble shooting this issue.
4.What is the Configuration of this input? If your using the default , built-in Office 365 INPUT should look something like this.