Good time of day.
There are large amounts of data, ~40-45ГБ day, 25-30K lines per second, the recording is continuous.
The final volume may consist of 50-70ТБ.
Samples mostly have the form timestamp > date_1 && timestamp < date_2 && data == value*
What DB would you recommend? What response can you expect ?
Very nice bonus is if the database can compress data.
Will add the second option, which is a compression method with the so-called free seek through the file.
Put a file hourly (for example) - new hours - new file. Then pack your bags.
For timestamp you can take 2 bytes (as within hours). See if value can be reduced.
Even if you write 16 bytes then a modern HDD (150Mb/s) can save ~9mn records per second (with your 30K right)
Will only make Tulsa which will be according to your requirements to get the data.
Files can be stored on disk in a file DB in GridFS which will Arditi them across the cluster.
Yandex Elliptics (currently only compiles easily under Ubuntu 14.04 and the corresponding generation of Debian) - no database, and distributed storage DHT. But she knows how to scale and replicate and recover. Your business will only connect the new servers to it (or the drives on the server).
Aida.Windler answered on July 9th 19 at 13:23
You hadoop cluster to deploy the required architecture and design based on message delivery database with a mediator, and all your terabytes in the logs gzip keep well or cluster in the hive. In short, this question is not for a toaster. You freelance platform for devops'AMI need to experience You have in the course of these issues.