How to store the "connectivity" between users?


There is an algorithm that calculates the index (number), characterizes the relationship of the user A to the user B.
The algorithm is transitive, symmetric, i.e. A->B = B->A.

With the growing number of users will need to store the results and count them (for example, if you change some of the settings user A needs to convert his attitude to everyone else).

The question is how to efficiently store, quickly update and read this data? SQL, NoSQL? What engine/base better? After all, only 1,000 users will have 1 000 000 records.

Thank you!
September 19th 19 at 12:15
5 answers
September 19th 19 at 12:17
  • Transitivity, in my understanding means
    A->B + B->C = A->C
  • Among thousands of users it is unlikely that all connected.

In such assumptions we are talking about connectedness of the graph. Hierarchical data structure(tree, graph) foreign to the relational algebra of SQL. Known solutions Adjacency list and Nested sets. Adjacency list requires WITH RECURSIVE. Nested sets involves a lot of resampling operations(especially INSERT).
You need a data model it would be more convenient to implement in hierarchical DB (e.g. simple hierarchical key value Redis or levelDB will do) or graph DB or certificated DB (for example just in the XML, which in essence is hierarchical)
Thanks, will read! - alycia_Haag commented on September 19th 19 at 12:20
September 19th 19 at 12:19
You just said that this parameter has two dependencies. NoSQL is not a choice.

After all, only 1,000 users will have 1 000 000 records.
First, will not all these users will be connected.
Secondly, it is a small number.
Do not store and calculate as needed - not an option? - alycia_Haag commented on September 19th 19 at 12:22
calculation processes a large number of parameters and are very expensive. Plus requires fast output, sort, filter by various conditions. - tess.Simonis3 commented on September 19th 19 at 12:25
September 19th 19 at 12:21
If the algorithm assigns a metric, and are willing to sacrifice accuracy, it is possible for all users to display in O(log n)-dimensional space. In the worst case the distance deteriorate to O(log n), but you can store only O(n log n) bits of information.
September 19th 19 at 12:23
You just need to keep this value the same for all? If only for those who have it less/more of a threshold, so it's O(N) or O(NlogN) that will fit in regular SQL and not O(N^2), which is really bad.
A threshold is, but data is expected to be very a lot :) Not less than (N^2) / 2 - alycia_Haag commented on September 19th 19 at 12:26
September 19th 19 at 12:25
I think neo4j is exactly what You need

Find more questions by tags Data storageDatabasesAlgorithms