Finding hidden links in the text
Spammers use different ways of writing references in the text to pass the filter, for example:
v k c o m vk7com vk.c*om
Is there any more or less universal solution to the search for such links in the text?
October 8th 19 at 01:16
The tool for deployment on the node
Somebody can translate the article Tcl/tk what went wrong?
Interception of pressing additional buttons
Beautiful counting disk space?
Vb.NET read data from COM port?
Where to buy online housing for batteries of type CR2032 "tablet"?
Simple check for the existence of the user (VK API)?
Batch image processing on the GPU?
More answers about
"Finding hidden links in the text"
October 8th 19 at 01:18
1) Take the dictionary of the Russian and dictionary English. 2) Remove from the text vocabulary words 3) Glue what is left.
Of course, the dictionary will have to be supplemented with typos, neologisms, necenzurnoy etc.
And Yes, you are wrong to fight spam, fight with regexps is pathetic.
commented on October 8th 19 at 01:21
Thanks for the reply!
And what is the complexity of this approach? And what if the text contains words that we have in the dictionary is just spam?
commented on October 8th 19 at 01:24
The complexity is high, such tests are best done by a separate daemon. to keep a dictionary constantly in memory. Not the past of a word and not necessarily spam, they need to be handled separately: for example, to glue the letters from them and check blacklists.
But in General there are plenty of other ways: binding to the phones (Vkontakte), karma and invites (Habr), limits on the number of messages to send per unit of time limits for sending the same message (the social network). To make 100% filter based on only text messages is still difficult.
commented on October 8th 19 at 01:27
October 8th 19 at 01:20
Database of spammers, moderation, regular season, captcha.
Find more questions by tags