Questions tagged [Crawling] (67)

1
answer

In Racak a single connection to send multiple GET requests?

Open a connection like so: HttpURLConnection con = (HttpURLConnection) startUrl.openConnection(); This connection is assigned a sessionID and a token (unique token for each session). With this connection I get the necessary data for constructing the GET request, but if you re-use .openConnection();, then it will be another ...
Mohammad asked April 16th 20 at 11:18
0
answer

Not working telegram parser bot on Heroku?

Filled bot on heroku, I have installed all the required libraries, but the function below does not want to run, although when you start the bot on the PC, everything works fine, tell me what could be the problem? def asos_parser_bot(linksJs, all_urls, headers, valuet, session, soup): goods = [] conuntryList = ['EN', 'GB'...
Adalberto.Effer asked April 9th 20 at 09:49
2
answers

Selenium webDriver: StaleElementReferenceException. How to get rid of?

Get the list of necessary web elements List<WebElement> temp = driver.findElements(By.cssSelector("span.selection-link")); Next, I want to get from the list of values of attributes, while I'm doing this out StaleElementReferenceException. I understand it loses the connection with the web elements and thrown this excep...
Gaylord_Quitzon3 asked April 8th 20 at 02:44
1
answer

How to close the browser parser?

To parse the links menu page I am using framework puppeteer. The result links are displayed in the console but then the console displays the following error message: (node:24126) UnhandledPromiseRejectionWarning: Error: Protocol error (Target.closeTarget): Target closed. at /home/md/.MINT18/code/js/puppeteer_books_1/node_m...
jermain.McGlynn asked April 8th 20 at 02:34
1
answer

What is the optimal way of collecting data from third-party sites?

Help to clarify the picture, the Internet is not quite specific information. The challenge: gather data from certain sites (this double value) and compare them with the reference number, receiving the difference values are changed many times per minute. These figures on the website are on different pages. The speed of data ...
neil.Hettinger1 asked April 7th 20 at 10:57
1
answer

How best to optimize the crawler for the site?

There is a task of search of images on the ibb website.co, which includes a specific phone model in the exif(this information is displayed directly on the page) When you load the image reference is in the form of ibb.co + 7 characters(0-9, a-z, A-Z) He wrote the script <?php $string = "HUAWEI"; $permitted_chars = '012...
Kamren86 asked April 4th 20 at 14:05
1
answer

How to send a request through requests from another ip?

On server 2 external ip addresses. One v4, another is v6. Use mechanicalSoup (requests) on python3.7 for scrapping, by default it sends requests with ipv4. How to configure sending requests with ipv6? UPD. No matter it is ipv6 or ipv4. For example, I am on server 2 external ipv4. How can I choose what ip to send requests?
ethyl.Hagenes asked April 4th 20 at 13:49
1
answer

Crawlera: 1 request is equal to one spasennoy web page or is one tag with the text?

I want to order Crawlera proxy, but when I looked at prices and saw "1 million queries per month" package for$100 Don't understand, this is considered separately from queries to certain tags(such as parsing a div with text-price inside counts as 1 request) or finally(1 page = 1 request)? How exactly does it work?
mireya_Huel asked April 4th 20 at 01:37
0
answer

Scrapy twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion. How to solve?

Today, the parser site started to give this error instead of collecting information:[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://site.com> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection lost.>] Included Psiphon - does not help. Disable\ch...
Eva.Schoen asked April 2nd 20 at 17:14
1
answer

Is it possible to parse data in an automatically created directory?

My job is to put the website so that the card of each item was in its own directory, and each folder was pictures of the product. What the old advise to collect for the solution auto-creating directory parser?
Grant.Veum30 asked April 1st 20 at 16:22