How to find a substring in the string between characters?

Hello!
Please tell me how to find all substrings between the well-known substring (tags in HTML)? You need to get the first and last two entries (including tags).
Here's an example:
<br / >Lorem Ipsum is simply dummy text of the printing and typesetting industry. <br / >Lorem Ipsum has been the industry''s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. br><br>It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.<br> It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker <br>including versions of Lorem Ipsum.<br>Contrary to popular belief, Lorem Ipsum is not simply random text.<br><br> It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source.<br><br> Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. br><br>The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.<br>


If there is two tags
in a row, their account is not worth it. Please tell me how to make a regular expression?
June 10th 19 at 15:26
1 answer
June 10th 19 at 15:28
Solution
use a dom parser and xslt. for parsing html is a much simpler regexps.
Thank you, and do not tell me how to do it properly? Now trying through BS and lxml, but not yet out - arch_Zboncak commented on June 10th 19 at 15:31
, wow, MNU mistake in the answer: not xslt, and xpath.

I do not understand what this line is necessary to select. - kavon.Murphy commented on June 10th 19 at 15:34
You need to list with the text (framed br) between the tags, then the number is already in use.
Since br is not a paired tag, to:
f = open("filename.html","r")
bsobj = BeautifulSoup(f, 'lxml')
bsobj.find("br").text#same as find_all

gives an empty string, or am I doing something wrong. - arch_Zboncak commented on June 10th 19 at 15:37
, https://ideone.com/iVFwj7
if I understand correctly. - kavon.Murphy commented on June 10th 19 at 15:40
Thank you very much! Will check tomorrow, but looks like what I need! - arch_Zboncak commented on June 10th 19 at 15:43
and? - kavon.Murphy commented on June 10th 19 at 15:46
Only got home now, will check. - arch_Zboncak commented on June 10th 19 at 15:49
Thank you, it works. A little strange conclusion, you will have to work later, but this is not a problem, deal with it.
Thank you again! - arch_Zboncak commented on June 10th 19 at 15:52
please) - kavon.Murphy commented on June 10th 19 at 15:55

Find more questions by tags PythonRegular expressions