Questions tagged [Parsing] (1687)

2
answers

How best to implement the test latest news the parser?

I have a parser that parses 10 resources. He works from a DB. That is, takes the rules of pulling content from the table. After that I may put a link to the news, title, date, content. I have created links to news, titles, dates, content a separate function. And after the obtained references to news, headlines, dates, conte...
Prince.Rodriguez asked March 27th 20 at 13:40
1
answer

How competently to organize work with a large number of intensively updated data?

The problem: there is a dataset of a few hundreds of thousands of rows (tens of megabytes), if you place them in the table MySQL database. These data need to obtain from a third party source a few times per minute, to produce on-the-fly reformatting and some dimension and then save it - cached data must be available at any ...
river_Volkman asked March 27th 20 at 13:39
1
answer

How to remove sparmannia extra data python?

Hello ! My question is how to make the cut spasennykh data? I party date 10 news resources,and that's when parsing one of the resources I have troubles. When I party date of this resource the date when parsing out thus: How best to implement to date was:23 Aug 2019 10:25 Code:# < Collect date pages. def get_item_dateti...
mafalda.Wilkinson asked March 27th 20 at 13:36
2
answers

How to put the contents of the script tag in php?

You need to put the data from the page of the script tag. For example:the <script> window.runParams = { data: {"actionModule": 123} } </script> In simple html dom I this functionality is not noticed. Tell me what you can do it, so as not to reinvent the wheel?
justus.Mayert asked March 27th 20 at 13:30
2
answers

How to convert a date in my desired format?

Hello ! I party date one page.And here is the result of parsing such2019-08-22T00:01:00Z Code:def get_data(html): soup = BeautifulSoup(html, 'lxml') item_datetime=soup.find('meta',{'itemprop':'dateCreated'}) item_datetime=dateparser.parse(item_datetime,date_formats=['%d %B %Y %H']) print(item_datetime) This format for my...
bonita_Little asked March 27th 20 at 13:28
2
answers

How to parse html, which is constantly changing/moreruela (structure, tags, classes, etc.) with each request?

For example:<div class="DFsfE5qr"> <div class="etgF_2">300 UAH</div> <div class="etgFsdf">USB flashlight</div> </div> May be so:<div class="DFghrtqr"> the <div></div> <div class="eerg_2">300 UAH</div> <div class="etergf">USB flashlight</div> &...
leonor76 asked March 27th 20 at 13:13
1
answer

How to put xml?

have such xmlThis XML file does not appear to have any style information associated with it. The document tree is shown below. <timedtext format="3"> the <head> <pen id="1" fc="#E5E5E5"/> <pen id="2" fc="#CCCCCC"/> <ws id="0"/> <ws id="1" mh="2" ju="0" sd="3"/> <wp id="0"/> <wp ...
Edd asked March 27th 20 at 13:12
2
answers

How to solve IndexError: list index out of range python?

I have code that takes the data (the rule of pulling content from a page) to parse from the database. Further, these data are passed to different functions.import requests from bs4 import BeautifulSoup import pymysql def get_html(url): r = requests.get(url) return r.text # < Get links def get_resource_links(resource_p...
Enoch_Keeli asked March 27th 20 at 13:07
1
answer

How to fix the error NameError: name 'html' is not defined?

Hello ! I have code like this:import requests from bs4 import BeautifulSoup import pymysql def get_html(url): r = requests.get(url) return r.text # < Collection links. def get_links(html): soup = BeautifulSoup(html, 'lxml') links=soup.findAll(link_container_array[0],{link_container_array[1]:link_container_array[2]})...
shanny88 asked March 27th 20 at 12:57