Parsing sites according to the criteria, the base site criteria where to find?

The task is to collect a database of sites of the Runet, the CIS according to the following criteria:
1. No SSL on the site.
2. The lack of corporate e-mail (mail with the domain of the site), the contacts on the website are the boxes type XXX@mail.ru xxx@yandex.ru etc.
3. The lack of any new CMS for this site, this criterion is not the main.
April 18th 20 at 12:56
2 answers
April 18th 20 at 12:58
The problem consists of several parts:
1. Collect list of domains and information, and they
2. Analysis pages

Let's dive in:
1. Collect list of domains and information, and they


Well, the database does not exist in nature. It is possible to understand just of information about how DNS works. Of course, you can begin to sort out all the domain names in order, but even having an infinite number of proxy by considering all possible combinations is going to take forever, and this information is constantly changing. You can go the other way and become Google. The cost of either approach, I guess you could imagine - trillions of dollars

2. Analysis pages

There are ready-made tools, some might even have APIs (otherwise painful to write parsers and update them + a lot of proxies and stuff like that). You can write your own tools, but over all these people for months on end and constantly modify. In principle, the idea of a database of domain names, to collect this information are already possible, but processing will take more time for dig/whois is one thing, but parsing or api it is quite another

In General, do not advise
There is a complete free database of domains ru(nearly 5 million). Also has a Database of domains in the zones (for a fee) - Virginia34 commented on April 18th 20 at 13:01
@Virginia34, it is never relevant. Yes, people collect podomnoj information, but each day there are one domains and die. This is happening right now as we speak) - Lavonne.Homenick32 commented on April 18th 20 at 13:04
@Virginia34, moreover, Runet and CIS are abstract concepts. I know a number of zones that are just not in the list, and they have a lot of Russian websites - Lavonne.Homenick32 commented on April 18th 20 at 13:07
@Lavonne.Homenick32, just base EN relevant. I'll get it. - Virginia34 commented on April 18th 20 at 13:10
@Virginia34, area .EN just one of many) and I now go and register a new domain give and how much time it will appear there? Unknown - Lavonne.Homenick32 commented on April 18th 20 at 13:13
@Lavonne.Homenick32, 100% urgency don't need - valentine31 commented on April 18th 20 at 13:16
@Virginia34How to get this database free? - valentine31 commented on April 18th 20 at 13:19
@valentine31, Russian zone,the other zones are not available. - Virginia34 commented on April 18th 20 at 13:22
@Virginia34, more than enough. - valentine31 commented on April 18th 20 at 13:25
April 18th 20 at 13:00
I disagree with the first criterion

No SSL on the site.


Currently, all moving to http(s) layer (SSL) Protocol. With the total certified domain names. And therefore unencrypted sites is not something that would be absent but rather their total number will rapidly umenshatsya.

So the task is more complicated and just search for those dohliki who can't die because of technological backwardness.

These are the thoughts.
well, it's not a mandatory criterion, as well as 3 rd.
The most important thing to me second. and then you can do the variations. - valentine31 commented on April 18th 20 at 13:03
How do you determine the absence of TSMs.

You know what's even science to prove the lack of is the most difficult task. It's like an atheist to prove there is no God. - Rocky.Kassulke commented on April 18th 20 at 13:06
@Rocky.Kassulke, why are all idealized.
The absence of a common known CMS is good enough for me. - valentine31 commented on April 18th 20 at 13:09
And how many of them (approximately) ? - Rocky.Kassulke commented on April 18th 20 at 13:12
@Rocky.Kassulke,
I don't know that much.
But there is a possibility to use services such as:
https://webdatastats.com/ru - valentine31 commented on April 18th 20 at 13:15
How does this answer the question?

Not in the terms of reference to specify references to other resources. And the quantifiers, infinity.

You don't have a definition of done. - Rocky.Kassulke commented on April 18th 20 at 13:18

Find more questions by tags Data Request