Format (normalization) a postal address to search for duplicates?
In the database there are 100500 records with the email address (Moscow, Mira Ave., 10-204). You want to find duplicates. How can I do that? Is there any ready solutions, or will need to reinvent the wheel?
cheap labor will save you :)
but in General I can normalize address through dadata.ru and then try to ometzit
only houses, buildings and apartments will have to Tinker
coralie.Grady answered on October 3rd 19 at 04:40
Can be done in case of absence of any normalization very strange way — to find a API maps and to compare the coordinates returned by hash for example.
Jude_Orn37 answered on October 3rd 19 at 04:42
Try to use API Yandex.Cards, comparing the results of the geocoder. Documentation.
Bella.Witting answered on October 3rd 19 at 04:44
Well, you will save cheap children labor.
Hire freelancers for a dollar per hour and let sorted.
jodie37 answered on October 3rd 19 at 04:46
I have once upon a time (12-13 years ago) there was a similar task. I was led to normalized view in several passes
1) Moved everything in one register (upper)
2) Replaced all "St.", "Avenue" and "prospect" for one thing. The same is done with flats, houses, buildings and other things.
3) based On all this, already made 3 norms. form.
4) have Dealt with the addresses that could not be normalized
ON the base with 25,000 to 30,000 addresses I spent 2 or 3 days.
It is clear that the decision in the forehead and can be quite effective, but the alternative was manual perezhevyvaya all this information that some did not suit me :)
anya.Medhurst answered on October 3rd 19 at 04:48
System Papyrus (www.petroglif.ru) is able to do. But your problem is — customized: import-parse-exports. For little money you can easily carry out.