For a start, check again your algorithm. Most likely, you got a good bunch of takes, if not 100%, then some pieces will be exactly repeated. I do not believe that 100% will be some very unique.
The first thing you do is total pieces of their information stand in a separate field. Such a structure as a tree you know? Here is the General piece will has kept you on top of the tree. Further, each node keeps links to nodes with some other unique data, etc. In the principle of nested levels you can have are endless.
To the question where to store. Something better hard drives for this, not yet. In your case it will be wiser to use a hybrid storage SATA + SSD + RAM. The data to which the treatment is most often lie in Redis (i.e. RAM), just often used an SSD, something rarely needed - on SATA. The algorithm of frequency count himself write, by defining for what is often, not very rare.
Provider can provide this on digitalOcean there are tariffs with hybrid screws SATA + SSD, look at him. We also recommend you to look in the direction of docker, in your case, I think, will need 10+ cars for storage, and this thing will allow you to manage their configuration easier.
About the time to retrieve, search etc. Google on "storage trees", "tree search", etc. get away from complete graphs, try to go even cycles even say more, don'T DO a complete graph or a cycle in the graph on this volume, you shoot yourself in the foot just.