Parsing Yandex.Market, the problem with the change of page?

Hi!
I need a large list of AsRock, the source decided to take Yandex.Market. Faced with a ban + a very strange thing, after 7 pages inclusive issued for 33 cards goods, although in the settings it is still "Show 48". When switching to 7 page and further does not change the Page parameter for the page, thus in a file .csv copies of the same card ( 1 to 6 page all works OK). Tell me what exactly is the problem?

(Use TOR, change ip every 10 seconds, so use time.sleep(10))

Python beginner, if found the reason, please help and explain as much detail as possible, thank you!

import requests
from bs4 import BeautifulSoup
import time
import socks
import socket
import csv

socks.set_default_proxy(socks.SOCKS5, "localhost", 9150)
socket.socket = socks.socksocket

URL = 'https://market.yandex.ru/catalog--materinskie-platy/55323/list?hid=91020&page=1&glfilter=4923257%3A12108404%2C12108414&glfilter=7774847%3A1&glfilter=7893318%3A762104&onstock=0&local-offers-first=0'
HEADERS = {'User-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:75.0) Gecko/20100101 Firefox/75.0','Accept': '*/*'}

HOST = 'https://market.yandex.ru'

FILE = 'AsRock.csv'

def get_html(url, params=None):
 r = requests.get(url, headers=HEADERS, params=params)
 return r

def save_files(items, path):
 with open(path, 'w', newline=") as file:
 writer = csv.writer(file, delimiter=';')
 writer.writerow(['NAME', 'LINK'])
 for item in items:
 writer.writerow([item['title'], item['link']])

def str_subtract(s1, s2):
 for ch in s2:
 if ch in s1:
 s1 = s1.replace(ch, ", 1)
 return s1

def get_content(html):
try:
 plate = []
 soup = BeautifulSoup(html,'html.parser')
 items = soup.find_all('div', class_='n-snippet-card2__part n-snippet-card2__part_type_center')
 for item in items:
plate.append({
 'title': str_subtract(item.find('h3', class_='n-snippet-card2__title').get_text(strip=True),'ASRock Motherboard '),
 'link': HOST + item.find('a', class_='link').get('href')
})
return(plate)
 except AttributeError:
 return False
return(title)
title = ("-")

def parse():
 html = get_html(URL)
 if html.status_code == 200:
 plate = []
 for page1 in range(1, 28):
 print(f Parsing page {page1} of the 27...')
 if (page1 < 7):
 while (len(plate) < (page1)*48):
 html = get_html(URL, params={'page': page1})
plate.extend(get_content(html.text))
print(len(plate))
time.sleep(10)
else:
 while (len(plate) < 288+((page1-6)*33)):
 html = get_html(URL, params={'page': page1})
plate.extend(get_content(html.text))
print(len(plate))
time.sleep(10)
 save_files(plate, FILE)
else:
print('Error')

parse()
April 19th 20 at 12:44
1 answer
April 19th 20 at 12:46
a very strange thing, after 7 pages inclusive issued for 33 cards goods, although in the settings it is still "Show 48"

I dare you to see for yourself and find out the fact that the Market finds the information on 6 full (of 48 products) pages, and 7 page finds only 33 of the goods. Well, there is on the Market more information. Naturally, when you request show 8 the page immediately goes to the last 7, page products.
But it gives out on request 27 pages 7 through 27 on pages 33 of the product - ron_Kilba commented on April 19th 20 at 12:49
By default, the Market is showing 12 products per page, and returns the number of pages based on this. 6 * 48 + 33 = 321. 321 / 12 = 26,75 - Cornell.Gerhold commented on April 19th 20 at 12:52
@Cornell.Gerhold, Sounds doubtful, the program gives the first 6 pages for 48 products in one opening - ron_Kilba commented on April 19th 20 at 12:55
Apparently the request contains an indication to show on 48 products, and the calculation is based on the 12. - Cornell.Gerhold commented on April 19th 20 at 12:58
@Cornell.Gerhold, I did these calculations)) calculations also for 48 - ron_Kilba commented on April 19th 20 at 13:01
321 market finds the product. It 26 * 12 + 9 or 6 * 48 + 33. And then it is confirmed when viewing the results of the query when you specify "show 12" and "show 48" - Cornell.Gerhold commented on April 19th 20 at 13:04
@Cornell.Gerhold, Take the link URL: https://market.yandex.ru/catalog--materinskie-plat...
change the page parameter=23, link is and gives the goods on the desired request.
Maybe I don't understand something... - ron_Kilba commented on April 19th 20 at 13:07
@Cornell.Gerhold, because I need a list not just those on sale, and that there's no filter on it costs and this is reflected in the url - ron_Kilba commented on April 19th 20 at 13:10
Under products there is a button "Show more". Clicking on it, complements derived derived list of items on the next page. Not lazy, got a joint list with 21 pages, and then the button was hidden. Maybe we need to "press" this button in the program. - Cornell.Gerhold commented on April 19th 20 at 13:13
@Cornell.Gerhold, in my opinion, some incoherent text, you say that just 7 pages, then 21, but I the handles can reach up to 27 pages. Don't understand what you're saying just... - ron_Kilba commented on April 19th 20 at 13:16

Find more questions by tags Beautiful SoupParsingYandexPython