How to cut the text between two tags in xml?

Good evening!

Guys help PL. to achieve remove text between tags description:
pastebin.com/bm31r1sh

Tried myself so Makar, but without result:
sed 's#\(<description>\).*\(</description>\)#\1'xxxxx'\2#g' test.xml > test2.xml
sed '/<description>/,/<\/description>/{//!d}' test.xml > test2.xml</description>


The file itself is very large and xml utilities out during processing.

Thanks in advance!
July 8th 19 at 11:51
3 answers
July 8th 19 at 11:53
First, prepare the text to back, then treated. Because of this, sed is invoked two times.
text="\
a
b
c
x1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<description>
 Toilet water spray
 the <br>EASTERN GLASS
 the <br>Man style is focused on success, self-motivated and creative. Masculinity, dignity, confidence – its main features, along with sensuality and romance. He loves comfort, beautiful things, luxurious life, and expresses itself in the classic pok$
 <br / >Keywords: Confident, dynamic, courageous, noble, elegant, high-status successful</description>y1yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
x2xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<description>
 Toilet water spray
 the <br>EASTERN GLASS
 the <br>Man style is focused on success, self-motivated and creative. Masculinity, dignity, confidence – its main features, along with sensuality and romance. He loves comfort, beautiful things, luxurious life, and expresses itself in the classic pok$
 <br / >Keywords: Confident, dynamic, courageous, noble, elegant, high-status successful</description>y2yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
d
e
f
"

echo-n "$text"

echo-n "$text" | sed 's%<description>%&x|||%g; s%</description>%|||x&%g' \
 | sed '/x|||/ { :join N; /|||x/! join b ; s/x|||.*|||x// }'

Conclusion
[guest@localhost ~]$ text="\
> a
> b
> c
> x1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<description>
> Toilet water spray
> <br>ORIENTAL FOUGERES
> <br>Man style is focused on success, self-motivated and creative. Masculinity, dignity, confidence – its main features, along with sensuality and romance. He loves comfort, beautiful things, luxurious life, and expresses itself in the classic pok$
> . <br>Keywords: Confident, dynamic, courageous, noble, elegant, high-status successful</description>y1yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
> x2xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<description>
> Toilet water spray
> <br>ORIENTAL FOUGERES
> <br>Man style is focused on success, self-motivated and creative. Masculinity, dignity, confidence – its main features, along with sensuality and romance. He loves comfort, beautiful things, luxurious life, and expresses itself in the classic pok$
> . <br>Keywords: Confident, dynamic, courageous, noble, elegant, high-status successful</description>y2yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
> d
> e
> f
> "
[guest@localhost ~]$ 
[guest@localhost ~]$ echo-n "$text"
a
b
c
x1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<description>
 Toilet water spray
 the <br>EASTERN GLASS
 the <br>Man style is focused on success, self-motivated and creative. Masculinity, dignity, confidence – its main features, along with sensuality and romance. He loves comfort, beautiful things, luxurious life, and expresses itself in the classic pok$
 <br / >Keywords: Confident, dynamic, courageous, noble, elegant, high-status successful</description>y1yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
x2xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<description>
 Toilet water spray
 the <br>EASTERN GLASS
 the <br>Man style is focused on success, self-motivated and creative. Masculinity, dignity, confidence – its main features, along with sensuality and romance. He loves comfort, beautiful things, luxurious life, and expresses itself in the classic pok$
 <br / >Keywords: Confident, dynamic, courageous, noble, elegant, high-status successful</description>y2yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
d
e
f
[guest@localhost ~]$ 
[guest@localhost ~]$ echo-n "$text" | sed 's%<description>%&x|||%g; s%</description>%|||x&%g' \
> | sed '/x|||/ { :join N; /|||x/! join b ; s/x|||.*|||x// }'
a
b
c
x1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<description></description>y1yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
x2xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<description></description>y2yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
d
e
f
[guest@localhost ~]$

July 8th 19 at 11:55


Thank you a great. If in the console execute Your example, all is fulfilled clearly.
But the original file is deleted too much:
gross-trading.com/feed.xml

Guided by chislennosti Thong
cat feed.xml|grep "|wc-l
:
But the original file is deleted too much:

Looked
[guest@localhost t]$ head -1 feed.xml | wc -c
999692
[guest@localhost t]$

Too long line in a file, sed can't be used to such long lines. The standard allows a string up to 8Kb. What he further admits - non-guaranteed (may be errors due to scrap).

Most likely, it is necessary to overtake in a database of some kind (like sqlite) and do operations on it.
(You can of course shorten the line, and then sed'om process, but it's not worth it here just does not fit.) - Ramon.Stracke commented on July 8th 19 at 11:58
July 8th 19 at 11:57
If I understand correctly the question, then perl can be like that:

use strict;
use warnings;
use utf8;

my $text = <<'EOF';
<description>
 Toilet water spray
 the <br>EASTERN GLASS
 the <br>Man style is focused on success, self-motivated and creative. Masculinity, dignity, confidence – its main features, along with sensuality and romance. It
 loves comfort, beautiful things, luxurious life, and expresses itself in the classic pok$
 <br / >Keywords: Confident, dynamic, courageous, noble, elegant, high-status successful</description>

EOF

$text =~ s{<description>.*?</description>}{<description></description>}gsi;

print $text;


The result:
<description></description>
if you apply on the whole file, it breaks the xml structure, here is how I tried
#!/usr/bin/perl

use strict;
use warnings;
use utf8;

open(INDAT, "alex.xml");
my $text = <indat>;
$text =~ s{<description>.*?</description>}{<description></description>}gsi;

print $text;</indat>
- Ramon.Stracke commented on July 8th 19 at 12:00
: Not doing the right thing.
If you want to read it into a variable, so that's something

$text = join(", ); - Jimmy.Hudson commented on July 8th 19 at 12:03
my $text= join(", <file>);</file> - Carroll_Cass commented on July 8th 19 at 12:06
:

Thanks, tried, it worked, but it removes too much :))

I expect this is due to the fact that in one line few tags and removed and others.
File:
gross-trading.com/file.gz
Maybe You have any perl script or php that can delete the contents of a tag directly in the file without loading into memory?

The only thing I found is library XML::Twig seems to cope with this task...

Thank you - Carroll_Cass commented on July 8th 19 at 12:09
: The file is only 300 megs, this is nonsense. Can be in the operational stuff.
To accelerate you can try the module
use File::Slurp;
my $text = read_file( 'firms.xml', binmode => ':utf8' );


A possible example of text that works incorrectly?
Well, that is a piece of text from a file where a regular expression cuts that you need. - Jimmy.Hudson commented on July 8th 19 at 12:12
I got your big file. And so from the console it launched.
Thought the file line by line.
And then saved the file out.

perl -e 'open(InFile, "file"); while(<infile>){s!<description>.*?</description>!<description></description>!gsi; print;}' > out</infile>


Run time 2 second.

File before processing, 351M, and after 331M. - Carroll_Cass commented on July 8th 19 at 12:15
perl -e 'open(InFile, "file"); while(<infile>){s/\n|\r//g; s!<description>.*?</description>!<description></description>!gsi; print;}' > out</infile>


Here's a deleted line. - Carroll_Cass commented on July 8th 19 at 12:18


Turned out, thank you!

You can write the parser in Perl for file by referring to above (I'll pay)?
You would need to convert to CSV.
My mail: bikalexander@gmail.com - Carroll_Cass commented on July 8th 19 at 12:21
: Well, you can try, write TK, that should work, give me a sample. - Jimmy.Hudson commented on July 8th 19 at 12:24
can I have Your email? - Carroll_Cass commented on July 8th 19 at 12:27
: Write to you with a quote of your message. - Jimmy.Hudson commented on July 8th 19 at 12:30

Find more questions by tags PerlsedbashXML