There is evidence of moments of entry of users within the app during the year. Two columns:

I want to find patterns, recurring around large groups of users.

Assume that there is a window of 1 week any similar images will be plotted (once a day in the beginning of the week, 5 times this weekend) and in the window of the month (closer to the salary of attenuation, a sharp jump after).

Which side to approach the study of the data in the key?

`user_id`

and `timestamp`

.I want to find patterns, recurring around large groups of users.

Assume that there is a window of 1 week any similar images will be plotted (once a day in the beginning of the week, 5 times this weekend) and in the window of the month (closer to the salary of attenuation, a sharp jump after).

Which side to approach the study of the data in the key?

asked June 10th 19 at 14:43

2 answers

answered on June 10th 19 at 14:45

One of the options is a time - series analysis. Python has a good library from Facebook - Prophet. The library works well out of the box, has built-in tools for visualization.

On Habre it is possible to find an article about the Prophet. I was trying to use the library for prediction of fluctuations of the currency pair - example on github-e.

On Habre it is possible to find an article about the Prophet. I was trying to use the library for prediction of fluctuations of the currency pair - example on github-e.

answered on June 10th 19 at 14:47

Overall sounds like a clustering problem. There are different methods you can choose, the one thing as a metric set.

Vskidku, you can try to determine the distance between users Alex and Bob in the following way. For each call Bob looking for the nearest sunset Petit and see how they vary in time. Get a set of the times t_1, t_2, ... t_В. Do the same for Peter. Further consider the mean value as the distance. The difference can be considered on a looped period (the week between Sunday and Monday distance -- 1 one day).

Distance between Bob and Bob will be 0. The triangle inequality like is performed. If Bob and Peter came at the same time, the distance between them is small.

P. S. Distance can be computed in line with the number of visits using linear Marj sorted arrays.

Vskidku, you can try to determine the distance between users Alex and Bob in the following way. For each call Bob looking for the nearest sunset Petit and see how they vary in time. Get a set of the times t_1, t_2, ... t_В. Do the same for Peter. Further consider the mean value as the distance. The difference can be considered on a looped period (the week between Sunday and Monday distance -- 1 one day).

Distance between Bob and Bob will be 0. The triangle inequality like is performed. If Bob and Peter came at the same time, the distance between them is small.

P. S. Distance can be computed in line with the number of visits using linear Marj sorted arrays.

with hundred thousand users every pair to check well this is a game.. Like, once go according, and vzhuh!, surfaced resonance patterns. commented on June 10th 19 at 14:50

Not necessarily a distance between each pair. Can any DBSCAN to take: https://en.wikipedia.org/wiki/DBSCAN

The question is, as the distance to be determined. commented on June 10th 19 at 14:53

The question is, as the distance to be determined. commented on June 10th 19 at 14:53

Find more questions by tags ClusteringMachine learning

But in General, abstracting from the library, a mate. the method here need to be applied?

Choose the window width and correlate all id with all? – long-difficult. - Lorena commented on June 10th 19 at 14:48

What is the purpose of the study is to predict values, search patterns, search anomaly? - reinhold.Rau commented on June 10th 19 at 14:51