How to effectively (a) record huge amount of data in the file?

Hi all. I am an aspiring Java developer and I need the advice of specialists. There is a task: there are processing huge () number of files (searching, indexing etc). In the process you want to record to a file (well, I solve this problem) 10^6 - 20^6 rows (of old files). Please advise how to do it in the fastest time.
I would be very grateful for the help).
June 3rd 19 at 20:56
3 answers
June 3rd 19 at 20:58
make a test run and see what is the place of the system is a bottleneck.
if plugging in media recorder to compress the data.
if plugging in the processing of the results, to put percent more powerful.
if plugging in obtaining heaps of information about files, speed up file system.
etc., etc.
Sorry, to compress the data to speed up - first time I meet with such an idea since the modems at 2400 baud. Actually somewhere is now used? - samantha commented on June 3rd 19 at 21:01
use :) but usually vtupuyu money stuffing - update media or рейд0 put. it's steeper.

strange as it looked, but the most revealing example is the compression of video and audio) rare medium will be able to record raw 1080p60 stream, and that's compressed quite a cheap writing. - Sandra_Kautzer42 commented on June 3rd 19 at 21:04
,
Sorry to compress the data to speed up - first time I meet with such an idea since the modems at 2400 baud. Actually somewhere is now used?

For example, on this page. Content-Encoding: gzip.
If the disk is really the bottleneck (which is pretty damn likely), then the compression of repetitive rows (i.e. several times) will surely speed up the process.

And I strongly suspect that in Java this compression will cost the programmer in one or two lines (creating a compression wrap around the output stream). - chelsea13 commented on June 3rd 19 at 21:07
Yes , very helpful, especially if the information is textual or sufficiently thinned.
I recommend to look in the direction LZ4 - https://github.com/lz4/lz4-java or Snappy - https://github.com/xerial/snappy-java - Kasey.Cruickshank commented on June 3rd 19 at 21:10
June 3rd 19 at 21:00
Buffer. For writing on baltiku is wildly slow.
June 3rd 19 at 21:02
And, you can use distributed processing.... and look to the side such as hadoop or spark, hazelcast, ingine.
Well, as wrote in comments to the answer - to apply a compression lz4 or snappy

Find more questions by tags Java