Questions tagged [Scrapy] (84)

1
answer

BS4 does not see new videos, how to fix?

When I need the video channel is presented, bs4 why it does not see. Here is the code def parseYoutubeChanel(channelurl): urlPage = urlopen(channelurl) xmlData = urlPage.read().decode('utf-8') new_video = BeautifulSoup(xmlData, 'lxml') for lnk in new_video.findAll('a', href=True): linkspage = lnk['href'] startLink = l...
Elva_Braun asked April 16th 20 at 10:43
1
answer

How to merge two dictionaries?

Such a task, we need two functions is found by the old merge, preferably in a single list of dictionaries. Tried something to rivet, but do not want to work data = [] class FunpaySpider(scrapy.Spider): dict1 = {} name = 'funpay' start_urls = ['https://funpay.ru'] def parse(self, response: HtmlResponse): # Ask for ga...
Shemar90 asked April 8th 20 at 18:12
0
answer

How to get data from the page using scrapy?

The challenge was to put the card, then go to the card and from there to get neobhodimye data, but so that it was displayed together. Now I get only separately to withdraw and, apparently, in raznobros. Tried to merge in the variable but remove the error Different was having, but no result What you can try to do? And is tha...
Khalid6 asked April 8th 20 at 02:19
1
answer

Crawlera: 1 request is equal to one spasennoy web page or is one tag with the text?

I want to order Crawlera proxy, but when I looked at prices and saw "1 million queries per month" package for$100 Don't understand, this is considered separately from queries to certain tags(such as parsing a div with text-price inside counts as 1 request) or finally(1 page = 1 request)? How exactly does it work?
mireya_Huel asked April 4th 20 at 01:37
1
answer

How to write css selector "paragraphs, going between the two headers"?

I understand that the question sounds strange, but I need it for the parser Scrapy, so to add some classes I own to the members. You want to select all paragraphs, going between two h2 tags. How to do it correctly? If you do something like response.css('h2:contains("Story") + p').get()will only select the first paragraph af...
Willis_Hessel52 asked April 4th 20 at 01:12
0
answer

Scrapy twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion. How to solve?

Today, the parser site started to give this error instead of collecting information:[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://site.com> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection lost.>] Included Psiphon - does not help. Disable\ch...
Eva.Schoen asked April 2nd 20 at 17:14
0
answer

How to parse all events with facebook?

how to put all events for the coming week using facebook Graph API?
Patrick_Shields asked April 2nd 20 at 15:41
1
answer

Where to insert the timer Python to work?

I think I have found the solution to getting the desired html page. Earlier, scarpe received a blank html. Ie now before you start collecting data from the page scrape wait 5 seconds (during this time, the JS code will have time to request the desired html) and the necessary information will gather. Hanita how to insert a t...
Cristobal_Pur asked April 1st 20 at 17:41
0
answer

How to specify the path to get the values of the elements?

Not turns the output of values of elements on the website: https://www.greatcircus.ru/ Let's say you need to put the name of the events. Write:scrapy shell https://www.greatcircus.ru/ response.css('.schedule-main-tickets-show-title::text').extract() Conclusion: [] What is the error? Orient, please.
nico_Dibbe asked April 1st 20 at 17:02
3
answers

Why I was caught, scrapy throws an kryakozyabry and how to get the HTML of the page itself (the question is not simple)?

Make a request to the fucking... , i.e. booking (https://www.booking.com). In response come the strange krakozyabry. This type here[sV7eV>Td{TZ 7UO_/ ϟU9/PDK4kE i6lLuCspPLݢٺ full text (for those who are not afraid to face unimaginable) here https://drive.google.com/open?id=1xeGxThHw919zk3l1... the answers are always d...
destini78 asked March 31st 20 at 20:35