Deleted fields still appear in search

Hi,
I had an issue with an input that led to the creation of thousands of fields. In the beginning I did the mistake to increase the number of allowed fields, then I realized the mistake I did, since now Graylog has become really slow.
So I did my homework, I’ve removed the culprit messages from the indexes and I’ve reindexed everything and now number of fields is under control yet again. But only from Elasticsearch perspective, since Graylog is still showing me them all.
How is Graylog getting this list of fields? Is it done every now and then? How can I force Graylog to “rescan” Elsaticsearch indexes?

Thanks,
Matteo

I don’t think that the Graylog MongoDB hold on to fields for display, could it be that the extra fields are showing up because they were in a separate stream/index?

Hi,
I’ve checked all the indexes with a for loop:

for i in $(curl -s -X GET “localhost:9200/_cat/indices/?v” | sed ‘s/ / /g’ | cut -d ’ ’ -f 4); do
curl -s -X GET “localhost:9200/$i/_mapping?pretty”;
done

and now they are all clean.

Nevertheless, the search page still shows me all the old fields. It’s really slowing the whole web interface… any idea?

Thanks,
Matteo

Hello,

These fields should drop depending on your log retention.
Have you tried to manually rotate your indices? or recalculate you Indices?
If you are using the default index and have older indices, there coming from there. This is because your probably using a dynamic Index template on your Default Index set.

EDIT:

Hi,
I’m using 4 Index sets (Default + 3) and I have created different streams with different rules to use different index sets for different data. Using the web interface I can see that the “unwanted” fields appear only when selecting one specific stream called “3m” (and when selecting “All Messages”).
This specific stream is sending data to a specific index set made of 4 indices, and each one has been cleaned of the unwanted messages and reindexed (created new temp index, reindexed, deleted old index, created it again, reindexed, deleted temp index).

This is the status now: no fields containing “snmp” in their name in these indexes:

~ for i in $(curl -s -X GET "localhost:9200/_cat/indices/?v" | sed 's/  / /g' | cut -d ' ' -f 4); do 
echo $i; 
curl -s -X GET "localhost:9200/$i/_mapping?pretty" | grep 'snmp' | cut -d '"' -f 2 | wc -l; 
done | \
grep '3m' -A1 

3m_1
10
--
3m_2
15
--
3m_4
0
--
3m_3
10

Nevertheless, the web interface still shows me hundreds of “snmp” fields when selecting the stream that sends the data to the “3m” index set.

So, if the individual indexes that are part of the index set have no mapping containing “snmp” in the name, where is Graylog getting these fields from if “localhost:9200/indexName/_mapping” shows none?
I’m really clueless now.

Thanks,
Matteo

Hi,
I’ve discovered something: the fields are present inside MongoDB table “index_field_types”, with the reference of the index where they have been “seen”. Any idea on how to have this cleaned? I’m thinking at deleting them from here, but I don’t want to screw up everything…

Thanks,
Matteo

There is the following in the Graylog server.conf file, but it suggests that the field maintenance should have happened already… have you modified the timing on this? One would hope it is not just additive.

# Time interval to trigger a full refresh of the index field types for all indexes. This will query ES for all indexes
# and populate any missing field type information to the database.
# Default: 5m
#index_field_type_periodical_full_refresh_interval = 5m

Hi,
mine was set to 1h, and commented. I’ve uncommented it and reloaded the graylog-service, but after more than 1 hour nothing happened yet.

Instead, I’ve identified the number of fields that are clogging my server:

query = host + 'api/views/fields'
response = requests.get(query, auth = HTTPBasicAuth(username, password), verify = False).json()

num = 0
for field in response:
    if field['name'].startswith('snmp'):
        num = num + 1
print(num)

It’s 5716. It looks like the MongoDB is only getting filled and never cleared.

Regards,
Matteo

Hi,
I’ve decided to move on and delete all these unwanted fields from MongoDB and… so far so good. Graylog GUI has become responsive again, let’s hope this will not create an issue in the long run.

Regards,
Matteo

Could you post up your process for future searchers? I haven’t played much in Mongo so it would be interesting to see… :smiley:

@matteo.comisso

Nice, glad you resolved your issue :slight_smile:

Hi,
sure I can, even if I’m not really proud of it, it has been a “quick and dirty” one.

To get all the fields used in each index:

root@graylog01:~# mongo
> use graylog;
> db.getCollection("index_field_types").find({});

This gave me many JSON dictionaries, one per index, like:

{ "_id" : ObjectId(...), "index_set_id" : "...", "index_name" : "index_0", "fields" : []}
{ "_id" : ObjectId(...), "index_set_id" : "...", "index_name" : "index_1", "fields" : []}
{ "_id" : ObjectId(...), "index_set_id" : "...", "index_name" : "index_2", "fields" : []}

I’ve then put this output in a script (transformed in a list of dict) and created the Mongocommands:

import sys
import json
import re

def ObjectId(string):
    return string

data = [
    { "_id" : ObjectId(...), "index_set_id" : "...", "index_name" : "index_0", "fields" : []},
    { "_id" : ObjectId(...), "index_set_id" : "...", "index_name" : "index_1", "fields" : []},
    { "_id" : ObjectId(...), "index_set_id" : "...", "index_name" : "index_2", "fields" : []}
]

num = []
for index in data:
    for field in index['fields']:
        if field['field_name'].startswith('snmp'):
            if not field['field_name'] in num:
                num.append(field['field_name'])
                print("db.index_field_types.update({'index_name' : '" + index['index_name'] + "'}, {$pull : { 'fields' : { 'field_name' : '" + field['field_name'] +  "' }}});")

Back to Mongo, I’ve used these commands to delete the unwanted fields, like:

> db.index_field_types.update({'index_name' : 'index_0'}, {$pull : { 'fields' : { 'field_name' : 'snmp_1_3_6_1_6_3_1_1_4_1_0' }}});

Again, not the most elegant but it worked!

Regards,
Matteo

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.