Was being parsed catalog wholesaler, get blocked, what to do?

I send request like this:
function request($url){
 $ch = curl_init();
 curl_setopt($ch, CURLOPT_URL, $url);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
 curl_setopt($ch, CURLOPT_REFERER, $url);
 curl_setopt($ch, CURLOPT_POST, 0);
 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
 curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/6.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36");
 $result = curl_exec($ch);
 $info = curl_getinfo($ch);
 if ($info['http_code'] != 200) {
curl_close($ch);
 return false;
}
curl_close($ch);
 return str_get_html($result);
}

Is there a way to bypass the lock?
First thought - over ip, but apparently no - change to new - did not help.
Had another browser - did not help.
Sending additional headers - did not help.
--
Are there any ideas that I can try?)
Thanks in advance for your answer.

P. S. Give the plug the next request, saying that it is not necessary so)

UPD:
print_r(curl_getinfo($ch)):
<code lang="php">
Array
(
 [url] => https://site.ru/
 [content_type] => text/html; charset=utf-8
 [http_code] => 403
 [header_size] => 308
 [request_size] => 240
 [filetime] => -1
 [ssl_verify_result] => 20
 [redirect_count] => 0
 [total_time] => 0.020305
 [namelookup_time] => 4.4 E-5
 [connect_time] => 0.000747
 [pretransfer_time] => 0.007452
 [size_upload] => 0
 [size_download] => 600
 [speed_download] => 30000
 [speed_upload] => 0
 [download_content_length] => -1
 [upload_content_length] => -1
 [starttransfer_time] => 0.020198
 [redirect_time] => 0
 [redirect_url] => 
 [primary_ip] => 36.32.116.75
 [certinfo] => Array
(
)

 [primary_port] => 443
 [local_ip] => 31.170.122.143
 [local_port] => 42420
)
</code>
March 23rd 20 at 18:56
5 answers
March 23rd 20 at 18:58
Solution
1) Change CURLOPT_USERAGENT using N queries
2) Can the website something in the cookie writes, check them too and pass
1. So now I can't and 1 request to make.
2. With the cookies not tried, but now the question arises, will he write something in a cookie, if I have been blocked?! - Maynard_Will commented on March 23rd 20 at 19:01
From another host has sent the request immediately to store cookies and send them queries, do not know for how long.
Tried these cookies with my host - does not work. - Maynard_Will commented on March 23rd 20 at 19:04
March 23rd 20 at 19:00
Solution
Open the website in browser DevTools to take the line header and HTTP request, and repeat them in the curl request. View curl_error($ch);, it's possible there is something sane.

In the end, try fsockopen($ip, 80, $errno, $errstr); followed echo($errno.":".$errstr); for debug and.

And if the site directory detected lack of a hostname in GET-request - will have to do the queries manually, cRUL this "infringement of specifications of the" on the segment can not.
Curl error will not return, as the query runs.
Tried with the same headers that the browser I have also not responsive.
Try on... - Maynard_Will commented on March 23rd 20 at 19:03
print_r(curl_getinfo($ch)):
Array
(
 [url] => <a href="https://site.ru/">https://site.ru/</a>
 [content_type] => text/html; charset=utf-8
 [http_code] => 403
 [header_size] => 308
 [request_size] => 240
 [filetime] => -1
 [ssl_verify_result] => 20
 [redirect_count] => 0
 [total_time] => 0.020305
 [namelookup_time] => 4.4 E-5
 [connect_time] => 0.000747
 [pretransfer_time] => 0.007452
 [size_upload] => 0
 [size_download] => 600
 [speed_download] => 30000
 [speed_upload] => 0
 [download_content_length] => -1
 [upload_content_length] => -1
 [starttransfer_time] => 0.020198
 [redirect_time] => 0
 [redirect_url] => 
 [primary_ip] => 36.32.116.75
 [certinfo] => Array
(
)

 [primary_port] => 443
 [local_ip] => 31.170.122.143
 [local_port] => 42420
)
- Maynard_Will commented on March 23rd 20 at 19:06
Hmm, HTTP code 403, Forbidden.
It's possible the system worked to ban from your IP or subnet.
Then try to download from another IP, I mean through a proxy or through TOR. - Heaven commented on March 23rd 20 at 19:09
@Heaven, I tried to change local_ip to another, did not help. IP's different, nothing to do with the current was not. - Maynard_Will commented on March 23rd 20 at 19:12
@Heaven, plus he alternates 403 error with his stub roughly equal, it is not necessary to parse. - Maynard_Will commented on March 23rd 20 at 19:15
@Heaven, after all, I don't local_ip me... he was one, remained, but the domain was pingomatic on a new ip. - Maynard_Will commented on March 23rd 20 at 19:18
@Heaven, I thank for the help, solved the problem, the blocking was by ip, after all. - Maynard_Will commented on March 23rd 20 at 19:21
I think there is still a lot of locks.
Or use Tor, or a public free proxy.
However, if there hosting with protection from this - it will be hard... - Heaven commented on March 23rd 20 at 19:24
@Heaven, I'm hot multiple ip-schnick bought, I am on them for 10 records per minute to remove, see what happens. - Maynard_Will commented on March 23rd 20 at 19:27
And still enable COOKIES in cURL, just in case.
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__)."/cookies.txt");
- Heaven commented on March 23rd 20 at 19:30
@Heaven, I thank for the help in the situation - included, though there is one cook that counts the hits, but I always throw one value, do not update every time.
Assume that at the nginx level for ip is blocked.
The only question is, at what unit of time they set a limit of visits.
Want to try a new ip with a counter to parse the maximum number of pages to view their limits.
Plus now 3 the ip change between each subsequent request until like Robit) - Maynard_Will commented on March 23rd 20 at 19:33
March 23rd 20 at 19:02
Solution
take 10 pieces at intervals. It happened to me when I was being parsed, there are 100 to give, and also bans. Just broke into chunks, and changed IP.
I'm 10 and took the only mistake while writing the parser, often had to refresh the page.
Generally the plan was to hang on the crowns of 10 pieces per minute. - Maynard_Will commented on March 23rd 20 at 19:05
Actually, I changed the ip from, and he told me the same answer he gave that very strange. I'll try one more ip to buy, the first time I headers are not changed when you change the ip, maybe this is connected... - Maynard_Will commented on March 23rd 20 at 19:08
@Maynard_Will, by the way, WMS is thought what You desite? - burley.Mosciski commented on March 23rd 20 at 19:11
@burley.Mosciski, um, well, they're instead of the page give a message through time - no need to parse.
But I think the option is the place to be.
The it less how to get around it, until he realized. - Maynard_Will commented on March 23rd 20 at 19:14
@Maynard_Will, googling the protection system of parsing, and read what now the principles are being used
for example here https://parsio.ru/docs/tutorials/parse-protection/ - burley.Mosciski commented on March 23rd 20 at 19:17
@burley.Mosciski, thank - count)) - Maynard_Will commented on March 23rd 20 at 19:20
@burley.Mosciski, it seems, after all, they have me ip banned.
the ip for the domain I changed, he began to pingomatic on a new ip, but the ip of the request is not changed - it still remains the highest ip of the server, which fell under the ban. - Maynard_Will commented on March 23rd 20 at 19:23
@burley.Mosciski, thanks for the help, solved the problem, the blocking was by ip, after all. - Maynard_Will commented on March 23rd 20 at 19:26
@Maynard_Will, and how decided? Civilized, like, guys, razbante, I meant no harm, give API or stupid to change IPS?
Even as an option - node.js, puppeteer, or some headless browser. Although they also know how to ban. Aliexpress I could not win )) - shaun_Labadie commented on March 23rd 20 at 19:29
@shaun_Labadie, yeah (guys wrote, said - 150K and will provide unloading to xml, IP to buy was cheaper), I purchased multiple IPS from your ISP, to be exact - 6 PCs.
Well, each request to change the ip on a new, in the beginning, did 6 queries per 5 minutes, now increased the number to 9 requests over 5 minutes while running.
In General, the speed in 11 (1 a request to the page with the item cards, 10 on each card) requests in 5 minutes I would be fully satisfied, but do not want to force sabathia. - Maynard_Will commented on March 23rd 20 at 19:32
@Maynard_Will, and not simply to bring a separate channel with dynamic ip? on it and twist the parser. Actually, I sometimes do not understand these wholesalers. Usually all need sales, but here they are struggling customer to push myself. - shaun_Labadie commented on March 23rd 20 at 19:35
@shaun_Labadie, dynamic, never touched, no experience.
Besides, quite inexpensive yet solves the problem, with the reservation of how things will be.
I counted about a month need to put the entire directory, so even to renew, I think, not required)
Unless, of course, constantly doing this - I agree, will need new capabilities. - Maynard_Will commented on March 23rd 20 at 19:38
March 23rd 20 at 19:04
Solution
A typical approach to such problems is to set http proxy dumper, make a request from browser and from your software, and then compare what is different in queries. Of course it is if you have the browser opens normally.
But in General you can use headless browsers if you don't want do anything steamed, everything will work immediately, but it also has its disadvantages.
March 23rd 20 at 19:06
To write a letter. To introduce myself, to explain that nothing bad in mind do not have to ask permission or to parse as is, or access the API in a civilized manner.
But is that what it was? =)
But seriously, I've never tried to do so. If you have experience - share, please. - sylvan_Park commented on March 23rd 20 at 19:09
@sylvan_Park, depending on the purpose. If competitors want to put the price lower, they, of course, will send.
If this is to compile a directory, which can then potentially lead to buyers, then why not.

Generally my answer is a trick. Personally, I all this riffraff rogue, trying to make parsing and spam openly despise. This is a real parasite, a purely environmental issue. They foul our Internet. because of them, normal people can't enter the site without having to solve the captcha on each page. And I think that such issues have no place in the community. But all the authors of such questions all saying they are doing anything wrong. Well, hence my answer: if you're not doing anything wrong, then write to the owners of the site.
And if you do, then do not answer to you on a silver platter, as is customary in the toaster, and to drive a dirty sponge. - jaunita_Schaden56 commented on March 23rd 20 at 19:12

Find more questions by tags ParsingPHP