Problems with limit of total fields greater than 1000

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

Using nxlog communitity to grab logs from our windows machines. We have a specific input for our windows machines, and there are no extractors setup for those inputs. However we keep getting:

ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]

We have tried changing the templates, to expand the limit to 2000. (Based off of what was shown here: "Limit of total fields [1000] in index [windows_327] has been exceeded") The template does not seem to be taking affect.

We have run the following command:

sudo curl -X PUT -d @‘index_limit_90day-template.json’ ‘http://localhost:9200/_template/90day-template?pretty’ -H ‘Content-Type: application/json’

which fixes it for the day, but as soon as the next index comes in it reverts back to a template without the fix applied.

Description: Ubuntu 20.04.3 LTS
Version:4.2.1+5442e44, codename Noir
JVM:PID 41413, Private Build 1.8.0_292 on Linux 5.4.0-89-generic

I don’t know NXlog but I do know that a majority of the time when people hit the 1000 (2000...?!?!?) field limit it means that they are capturing data for their field names. So for instance capturing the timestamp into a field called timestamp might look this this:

timestamp: 2021-11-30 16:26:08.269 -05:00

But if you did it the other way around where the name of the field IS timestamp:

2021-11-30 16:26:08.269 -05:00 : timestamp

Then every message would create an new field called “2021-11-30 16:26:08.269 -05:00” or its increment … that would contain the data “Timestamp” and every message would have a new and different field with an incremented name.

While the example is not technically possible, its to illustrate the idea that you likely have random names coming in for fields that are causing thee overload.

The command you posted doesn’t really say much other than you applied the contents of index_limit_90day-template.json to Elasticsearch…

Here is a slightly older post that gives some more detail on how to handle it. There are some reasonably significant differences between Elasticsearch 6 and 7… I am guessing the article is written for version 6… you may have version 7… or maybe not… :slight_smile: …so you may need to take that into account.

1 Like

Couple of things I don’t understand. How is the data even being parsed, if there is a separate input for our windows machines, and it has no extractors. Does graylog have some default parsing engine that is built-in, or is that something that is happening because of nxlog?

What I’m seeing from graylog is that it doesn’t appear to be greatly over the 1000 fields, in-fact it’s saying that it’s 1021 fields, and seems to be consistent with that. If that’s the case wouldn’t that mean that this is less an issue of incrementing fields, and maybe there is that many fields that are actually being pulled?

With that said, assuming that I didn’t want to fix the fields, now or maybe later, is there a way to resolve this by increasing that limit, on a template, so that I don’t have to go and manually set it every time?

Elasticsearch is version 7.10.2

Thanks.

With beats by default there are several standard fields that are extracted before the message is sent to Graylog, I would imagine the same with nxlog. You can also modify the agent to add/parse more fields if you want to so that each message is broken out (or data suppressed) before reaching Graylog. Perhaps it is in your nxlog config? We are a small shop with only ~30 Windows servers but we are no where near 1000 fields. But if you are happy with the fields you are receiving for the moment, then onward! :slight_smile:

Without seeing into what you are applying to the template, my first guess is that when you are defining the template, you aren’t wildcarding the tail end of it and instead are explicitly naming the index. (instead of graylog_22, use graylog_* to catch all future index increments/instances) For instance if you were customizing a type:

{
  "template": "graylog_*",
  "mappings": {
    "properties": {
       "favoriteNumber": {
        "type": "long"
      }
    }
  }
}

Hello,

Be carful there may be some instability by increasing the limit number.
What we found out using GELF inputs uncheck will create a lot of fields.

Another approach’s would be:

Be explicit with your mapping
Turn off dynamic mapping by setting dynamic = false (or even dynamic = strict).

Flattened field type
You can find more about this here.

Split your index
The benefit here is that you can then reindex older data into the new index and mapping, optimizing your overall index size.

Increase index.mapping.total_fields.limit

I see you have done this already

Just a suggestion, I think having 1000 different fields in one index is a bit excessive and you should think about splitting up your messages in different index sets, so that each index has less than 1000 different. What we have done in this situation is having a index for Linux, Windows, Firewalls/Routers devices. But if that’s not feasible maybe consider adjusting your nxlog-ce Config file so you only get what you want. Nxlog will grab all the logs ( Debug, INFO,WARN, ERROR).

Example: This will help decrease the amount of fields. You may need to adjust this to your needs.

<Input in>
    Module          im_msvistalog
    <QueryXML>
        <QueryList>
            <Query Id='1'>
                <Select Path='Application'>*</Select>
                <Select Path='Security'>*[System/Level=4]</Select>
                <Select Path='System'>*</Select>
            </Query>
        </QueryList>
    </QueryXML>

OR something like this. If you just want specific EventID’s.

<Input MonitorWindowsSecurityEvents>
    Module    im_msvistalog
    <QueryXML>
        <QueryList>
            <Query Id="0">
                <Select Path="Security">*[System[(Level=1  or Level=2 or Level=3 or Level=4 or Level=0) and (EventID=1102 or EventID=4719 or EventID=4704 or EventID=4717 or EventID=4738 or EventID=4798 or EventID=4705 or EventID=4674 or EventID=4697 or EventID=4648 or EventID=4723 or EventID=4946 or EventID=4950 or EventID=6416 or EventID=6424 or EventID=4732)]]</Select>
            </Query>
         </QueryList>
    </QueryXML>
</Input>

Hope that helps

EDIT: Maybe adjust your index rotation see if that helps.

Hello there

I don’t think you want to increase the total fields to more than 1000, that is a crazy number of fields. This sounds much more like the logs are either arriving in a mangled format, or being fed into an appropriate input type which is parsing them incorrectly.

Can you post the Nxlog config (sanitizied if you have sensitive info in there)?

May I ask what type of input the Nxlog messages are being received to in Graylog?

Graylog inputs contain appropriate parsing rules. Eg. the Syslog input will parse logs from an expected syslog format.

Panic Soft

#NoFreeOnExit TRUE

define ROOT C:\Program Files (x86)\nxlog

define CERTDIR %ROOT%\cert

define CONFDIR %ROOT%\conf

define LOGDIR %ROOT%\data

define LOGFILE %LOGDIR%\nxlog.log

LogFile %LOGFILE%

Moduledir %ROOT%\modules

CacheDir %ROOT%\data

Pidfile %ROOT%\data\nxlog.pid

SpoolDir %ROOT%\data

#Module xm_syslog

Module xm_gelf

Module xm_charconv

AutodetectCharsets iso8859-2, utf-8, utf-16, utf-32

Module xm_exec

Module xm_fileop

Check the size of our log file hourly, rotate if larger than 5MB

Every 1 hour

Exec if (file_exists(’%LOGFILE%’) and \

(file_size(’%LOGFILE%’) >= 5M)) \

file_cycle(’%LOGFILE%’, 8);

Rotate our log file every week on Sunday at midnight

When @weekly

Exec if file_exists(’%LOGFILE%’) file_cycle(’%LOGFILE%’, 8);

Snare compatible example configuration

Collecting event log

Module im_msvistalog

Please use the forum tools like </> when posting code, it makes it MUCH easier to read…
image

Also - as asked, what type of Input are the NXLOG’s being sent to?

I have a file on the server that has the following values:

{

“order” : -1,

“index_patterns” : [

“windowsserverindex_*”

],

“settings” : {

“index” : {

“analysis” : {

“analyzer” : {

“analyzer_keyword” : {

“filter” : “lowercase”,

“tokenizer” : “keyword”

}

}

}

},

"index.mapping.total_fields.limit": 2000

},

When I run:

sudo curl -X PUT -d @‘index_limit_windowserverindex.json’ ‘http://localhost:9200/_template/windowsserverindex-template?pretty’ -H ‘Content-Type: application/json’

followed by:

sudo curl -get ‘http://localhost:9200/_template/windowserverindex-template?pretty

I get:

{

“windowsdesktopindex-template” : {

“order” : -1,

“index_patterns” : [

“windowsdesktopindex_*”

],

“settings” : {

“index” : {

“analysis” : {

“analyzer” : {

“analyzer_keyword” : {

“filter” : “lowercase”,

“tokenizer” : “keyword”

}

}

},

"mapping" : {

"total_fields" : {

"limit" : "2000"

}

}

}

},

Will this be a problem with this format specifically that it thinks the 2000 is a string?

Hi Chase

I don’t follow what these configurations you’ve pasted relate to or what you are trying to do here, can you explain long-hand please?

The configs are for elasticsearch to expand the fields from 1000, to 2000. What I have is a text file that I created on the server, and then run x put command to import that into elasticsearch. Once I’ve imputed it, you can see that it is configured differently. For instance, you can see that the 2000, now appears to be set as a string, noted by the quotes around the 2000. I’m not sure if that is supposed to be the case, or if I’m doing something wrong. Either way, I’ve added that to elastic, but it still hasn’t resolved my issue.

It is very hard to tell because your code is not formatted in a readable manner… :thinking:

If I take some extra time to push your code through a json formatter… I see that I need to remove a comma and add a pair of closed brackets… but after all that, I get:

{
   "order":-1,
   "index_patterns":[
      "windowsserverindex_*"
   ],
   "settings":{
      "index":{
         "analysis":{
            "analyzer":{
               "analyzer_keyword":{
                  "filter":"lowercase",
                  "tokenizer":"keyword"
               }
            }
         }
      },
      "index.mapping.total_fields.limit":2000
   }
}

With proper indentation, …and presentation, (even using the forum tools like </> !!) it looks as though you might be hitting “index” twice. In both "index":{ --and-- index.mapping.total_fields.limit

1 Like

@Chase - Hopefully there was a solution in here for you, please mark it as so for anyone who searches for it in the future! :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.