There is a large archive of correspondence with clients. The task of automatically segmenting delagi, collect emotional color dialogs (all good, problem, sales, competitors, etc.). That is to get rid of all the dialogues on the presence of some keywords. How to do it correctly, not much to sink at speed?
p.s. or is there a sense to use a tree using python big data?
Or to read, for example, about Elastic Search or Sphinx
Isaiah_Hauck answered on March 12th 20 at 08:09
It is hardly necessary to take the whole dialogue, because of the possibility of occurring keywords during the conversation, not related to the essence of the treatment.
It makes sense to take only the first message from the client, in which he expounded the essence of the treatment. So to reduce the amount of data being processed. If not, then not much this will complicate the processing, if processing of all messages, even if millions of them - It's a single treatment. Then you will process new incoming requests on the fly. So:
1. Select the first message all the dialogues from the database. (using limit and marked after treatment, this dialogue has already been processed).
2. Cycle run through all the findings. Inside the foreach loop with the array keyword and inside the foreach function stripos(). if there is a match with the keyword throws in the array matches the ID of the dialog ID and the key words.
3. After going through all the dialogues. Record in a separate table ID dialogues and keywords in a single query.
Further, all new questions are treated in the same way. And get a list of dialogs as well as received, but using JOINа to a new table to display the appropriate mark