1) each user has a friends ("maximum" option — all friends at all)
2) Each user can enter an unlimited number of groups
The user may generate the content themselves (like a personal "Wall" as Facebook and FB), so the content may be generated within components of the project.
I want to make a consolidated feed of updates like status/thoughts of the user, his friends, internal events and event groups and bring them on a common list with the possibility of paging.
In the backend, the system turns the MySQL database, but for me it is quite obvious the fact that "normal" relational approach to solving the problem in the General case is not applicable because of the imposed restrictions — if friends, for example, 2500+, then one WHERE it grows to monstrous proportions.
Because looking towards noSQL solutions, but initially for myself, just want to understand the algorithm of work of such ribbon, and then choose tools.
Should a denormalizing is to play with the introduction of a separate tape for each user (but in the case of withdrawal from a group or remove person from friends the update of this list can take a very long time)?
Maybe there are some articles about, or recognised variants of implementation?
antonette.Gislas answered on October 8th 19 at 01:27
All Facebook and Vkontakte tape data generates a demon in a compiled language, which is also a lot of information caches in memory. Mastered? I don't know C, you can try with Node or Java to write well, or PHP at least, if we lay the scale, too, will be OK from time to time.
The algorithm works like this: 1) select the id of the friends of the user 2) select the last event each of them 3) take out protected by privacy 4) merge lists, sort by date and give to the browser.
And in MySQL you have the load increases, the number of friends, it quickly put the server. Let's do a simple calculation: the average user is doing 10 events per day, if users at least a few thousand, the event table will quickly jump up in size to millions of records. And all sorts of construction type of JOIN first, loaded the server, and secondly, not chardata.
aniya.Kertzmann65 answered on October 8th 19 at 01:29
There is such a thing as a cache. The cache can be of different types, it is not only the sample stored in a file at hand. A trivial example of a karma on habré. I'm pretty sure that they are stored in the database all the data, who to whom how many times have poked plus or minusik, but we see only 3 (+4 - 1). A kind of cache need data in a separate location. Move communication separately specifically for this tape, duplicate data, don't be afraid, doubles sometimes help performance. Let's make to start the sign activity:
id, user_id, event_desc, event_date
And when each activity enter data here, this impossibly simple sign can one request to get the user activity in a primitive form. Next is develop the idea, how convenient =)
odessa.Yundt52 answered on October 8th 19 at 01:31
Did not understand, to be honest tasks. Like anything complex.
In pseudocode looks something like
from news, users_users as friends
where friends.left_user = :currentUser
and friends.right_user = news.user_id
from news groups
where groups_users.user_id = :currentUser
and groups_users.group_id = news.group_id
Or something I didn't consider in conditions?
Keara_Hessel answered on October 8th 19 at 01:33
Here is another example where RDBMS is not rested and we should use that to something like MongoDB/CouchDB or Neo4j (if the data is strongly connected).
To make first an array of all ObjectiveID with which the user is associated (one request), and then time to take all events where the author of the event is located in this array. Or using DBRef find that the event has the link to the current user. The first option would be faster, but less code
koby_Zboncak82 answered on October 8th 19 at 01:35
In Facebook and Twitter used to de-normalize, and "cleaning" tapes after whoa from the group/friends to have them available. Corny — that received tape, the received. Additionally, the tape on the FB like is time limited, it is reasonable. Twitter also had a problem with a huge band of users who have 20k+ subscriptions, such one time just to get banned. In General, put as the to get, options to get should be a bit. Normalization is not for large amounts. And denormalizing mysql not larger, unless you split the instances (say, a maximum of 100 users with all their data feeds into one database).
murray.Cummera answered on October 8th 19 at 01:37
Uspenie aside from Redis Sorted set. Each set tape a person, as a score — event ID (available time), the radish can hold about 200 000 queries per second, and in the summer of sorts, able to make requests according to the usual limits/offsets to either the queries of the form ID < x
The very same object — a serialized array (or, as you udobnee), and the number object is taken from memcache for example. Actually, as wrote the companion denver, use the denormalize.
The described method operates successfully on enough visited resources, and easily copes with the case where the person with 30,000 followers creates the post.