Questions tagged [Beautiful Soup] (60)

4
answers

Beautiful Soup, html5lib or lxml?

It is supposed to use for parsing the user content. Accordingly, the main requirement is the correct handling of broken HTML. Speed is not critical. In the documentation of lxml is this: BeautifulSoup Parser html5lib Parser Ie it can parse using these libraries and return a tree of lxml. At the docks html5lib said: Suppo...
weston72 asked October 7th 19 at 11:15
3
answers

How in Python to parse a large XML file (>500MB) soderjaschii error?

Conditions 1) Have a large XML file. Appropriately tergugat it all in memory is not suitable. 2) File strit error: write inside the tags can be unescaped html tags that are not closed.lxml and sax to parse umeyu gradually reading the file but fall down on unclosed tags inside of tagsBeautifulSoup not to fall for unclosed ta...
Willow17 asked October 3rd 19 at 17:09
1
answer

How to parse a specific piece of code?

<style type="text/css"> ..... </style> <p> Time </p> Two <> Three <p></p> to
Erich5 asked September 26th 19 at 15:31
2
answers

How to remove multiple line breaks?

Text example: Lorem Ipsum is text-"fish", often used in print and web design. Lorem Ipsum is standard "fish" for texts in Latin from the beginning of the XVI century. (1) While some unnamed printer created a large collection of sizes and forms of fonts, using Lorem Ipsum for printing samples. Lorem Ipsum has survived not on...
Erich5 asked September 26th 19 at 15:18
3
answers

How to parse text in div'e, ignoring nested tags with BeautifulSoup?

How to put part div'a, like this:<div class="example"> <p>bla-bla-bla</p> the <div>something not important</div> <a>SomeText</strong> <br / > Desired text <span style="color:red">Also need a text</span> Desired text </div> The problem is that the text that yo...
Jamel.Hudson24 asked September 18th 19 at 12:12
1
answer

How to enter data in textarea?

Good afternoon. I want to enter data in the textarea. The problem is that he doesn't have a name attribute. Here is the code:<textarea dir="ltr" tabindex="-1" role="textbox" aria-label="Rich text editor, vB_Editor_QR_editor, press ALT 0 for help." class="cke_source cke_enable_context_menu" style="width: 100%; height: 10...
Nicholas.Schinner asked September 16th 19 at 17:51
1
answer

How to use python to highlight words in a document without breaking the DOM?

I need to highlight the specified words in the html document. I'm trying to do this with BeautifulSoupe and Regexp. But if you just change the words and build a new document, then it can be "broken", ie, for example, appear pieces of javascript code.import urllib2 import re from bs4 import BeautifulSoup html = urllib2.urlo...
sylvia3 asked September 15th 19 at 23:41
0
answer

What is working with the ResultSet?

Trying to create a parser for BeautifulSoup but I faced a problem that after the findall method SOOP returns me ResultSet, which methods work with him, as priobretaet it into a sheet or string
Tia.Parisian asked September 13th 19 at 15:37
1
answer

Encoding error in python3.5 when writing data to csv?

#!/usr/bin/env python # -*- coding: utf-8 -*- import urllib.request from bs4 import BeautifulSoup import csv #url0 = 'http://journals.indexcopernicus.com/masterlist.php?page=%s' %(page) p = [] url1 = '&2&1&1&cntr%5B%5D=UKR&icv_from=0&icv_to=176' url0 = 'http://journals.indexcopernicus.com/masterlis...
Kaela12 asked September 4th 19 at 20:44
1
answer

How to get the content attribute,when the attribute has another attribute?

There is a code that finds the desired part of the html code. online_status = soup.find('div', {'id': 'profile_online_lv'}) If the attribute does not exist in the soup, the online_status is None. If there is, then it displays the part of the code you want to find. Question is to check the contents of this attribute (and an...
jessy_Bartoletti asked September 4th 19 at 20:13