Hi there, apologies in advance if any of these questions are previously answered or considered common knowledge but big data is fairly new to me. I have a few questions which have been tormenting me for quite a while that I can’t seem to find concrete answers for.
The situation is I have roughly 4TB+ of data at rest in various formats (json, txt, rtf, sql, csv, tsv etc etc) which I use on a daily basis for work in online investigations. The trouble is it can take 15 to 20 minutes to perform one ripgrep search through larger items, this severely hinders productivity; hence the need to index it all for much faster searches and how I stumbled from ELK (what a headache that was) to Graylog.
My questions are (assuming data is indexed via graylog):
- For data at rest are SSD’s still significantly quicker for performing searches?
- Why is it not recommended to use NFS/SMB for ingesting data?
- Is Grok the best way to “parse” databases without prior cleaning.
I already have 32TB of enterprise spinning disks and would rather not shell out another few thousand for SSD’s unless the performance increase is VASTLY greater.
Any help is much appreciated.