Automated selection of tags to the article how?

Colleagues, prompt on such question:
Have edition, a daily produced 5+ articles at the time the total number of over 1500.

Now the tags for the article are put down by the owner, periodically arrange session analysis and it can be seen that the path is not entirely true, as many tags are missed corny (the desire of some authors to "score" at this point).

Question - are there any systems that would allow to simplify this process, for example, suggest tags for the article content? And all these are major suppliers of media content (RIA etc.)?

PS. Use the Elastic as a search engine for video has understood that he can somehow in this task to help, but knowledge is not really enough (or rather not)

Thank you!
April 4th 20 at 13:09
3 answers
April 4th 20 at 13:11
Seems pioneer was Reuters. The solution is based on using machine learning methods. First on the corresponding set of marked articles to build some classifier. He then used to assign new articles to one or another category or rubric that corresponds exactly to the problem of tagging.
Well, vskidku, just as an example:
Elaslic here is very far - only as a repository of information.
By the way, boasted to Reuters that he is on the implementation of this method saves millions, mostly on the payroll of the Department dispersed almost a hundred employees that they Tekirova news manually.
April 4th 20 at 13:13
Don't know exactly how this is implemented in practice, but I would do the following:
1. Would define a finite set of tags.
2. Would a Glossary of key words for each of these tags are synonyms, words from the subject area, etc.
3. Would analyze each article for key words and when a sufficient number of matches proposed to add a tag to the article.
April 4th 20 at 13:15
Such a system is extremely simple.
1. Splitts all the words in the article and converted to lowercase.
2. Compilation of the index: a list of these words and the percentage of matching groups of words for a specific article.
3. Is marking as long as the percentage of compliance will not be higher than the threshold value.
4. When the next article is checked - is the comparison and automatic tags are placed.

