Is there a way to remove the extra closing tags when parsing?

Do the actual parsing of the same website, watch a lot of closing tags for my layout, too, flies.
Tried so
$content = preg_replace("/<\/?div[^>]*\>/i", "", $content);
does not work... Can anyone come across?
June 8th 19 at 17:01
2 answers
June 8th 19 at 17:03
You need to filter the html markup.
When properly configured htmlpurifier - fit.
June 8th 19 at 17:05
And you can parse using DOMDocument and get the body content without tags
$url = 'http://yandex.ru';
$result = file_get_contents($url);

$dom = new \DOMDocument();
libxml_use_internal_errors(true);
/* By default loadHTML uses iso-8859-1, so explicitly specify the conversion */
$dom->loadHTML(mb_convert_encoding($result, 'HTML-ENTITIES', 'UTF-8'));
libxml_use_internal_errors(false);
$bodyContent = $dom->getElementsByTagName('body')[0]->textContent;


The text will remain the unwanted parts, like scripts and styles, but before you create the DOMDocument of the regular season to remove them from the html.

If you don't need the entire body, it is possible to obtain the content of individual elements
the fact of the matter is that through it do the parsing and take the contents of the required element, and there is extra closed div is - santina61 commented on June 8th 19 at 17:08
+ I need the html content, so textContent will not work - santina61 commented on June 8th 19 at 17:11

Find more questions by tags PHPParsingRegular expressions