Professional implemented the algorithm?

Hello!

Handle parsing the text, has set himself the task to extract all links from a string (the link addresses).
Implemented in this way:

std::string str = "<a href="\"url1\"">link1 name</a>
<a href="\"url2\"">link2 name</a>"; std::regex reg("(<a href="\")([\\w\\s]*)(\"">)(.*)(</a>)"); std::smatch res; std::vector<std::string> arr; std::string tmp_str = str; while (std::regex_search(tmp_str, res, reg)) { arr.push_back(res[2]); tmp_str = tmp_str.substr(res.position(2)); }</std::string>

Interested in "professional" approach, so to speak. Of course then all will be wrapped in a class and will do nicely, now interested in is the algorithm, maybe you can do faster/better?
July 2nd 19 at 16:54
5 answers
July 2nd 19 at 16:56
Solution
More universal check-list:
  • Works
  • Not falling, not broken
  • No brakes, safe
  • Easy is finalized


For SIM all. The rest of the investigation.
July 2nd 19 at 16:58
Clearly unprofessional because you have not had real problems, for which you will pay money for. No, I did not confuse the word professional two values, but they are closely related.
But if you have it, but you can't test a solution to match it, and instead ask advice on the Toaster, then it is unprofessional.

On topic:
1) use ready-made library instead of regular expressions, which are inflexible, are "Bicycle" (you can pretty much ignore) and difficult to perceive
2) not likely to use C++, it is not fast

But it is for most tasks, but not for everyone.
July 2nd 19 at 17:00
About use someone else's code and not reinvent the wheel
https://github.com/google/gumbo-parser
just did copy/paste this initial stage is still far from Pro. - Dalton57 commented on July 2nd 19 at 17:03
those of you not using OS API, and write a 0? - marilyne_Roh commented on July 2nd 19 at 17:06
Well, I'm not for commercial purposes but for learning. Well, that is normal? - Chadd_Hane commented on July 2nd 19 at 17:09
: no
of the regular season for khtml is only used in extreme cases - marilyne_Roh commented on July 2nd 19 at 17:12
: Hmm, why? It turned out compactly and easily. But how else? Using string::search? Then there are more of hemorrhoids will be - Chadd_Hane commented on July 2nd 19 at 17:15
It is one thing to develop a new module using the OS API and the other to copy, the level of professionalism will be different - Dalton57 commented on July 2nd 19 at 17:18
: Answer the question) How do you think the algorithm? - Chadd_Hane commented on July 2nd 19 at 17:21
and just the same professionals be inventing velosipedy not buy ready-made solutions - Dalton57 commented on July 2nd 19 at 17:24
:
Because your regular season will not work if the there are other quotes (or do not) if there will be a relative url or incomplete, if there are parameters
And so and so

In order to make the regular season you need to be a guru in them
And to parse the regular season from the guru will have two gurus and a lot of alcohol

In the case of html/xml tree is constructed, and with him already working - marilyne_Roh commented on July 2nd 19 at 17:27
you get your answer and there is debate about the babe - marilyne_Roh commented on July 2nd 19 at 17:30
And, okay, I see what You mean. No, I need to parse certain links that fit the condition), But not everything. - Chadd_Hane commented on July 2nd 19 at 17:33
: Yes you already answered - Dalton57 commented on July 2nd 19 at 17:36
: you asked you said th did not like it ? - Dalton57 commented on July 2nd 19 at 17:39
: if it's just a temporary solution, the class was not needed, everything is beautiful also
The temporary code can not and should not be assessed from the point of view approach
It needs a time to decide raducu and deleted immediately after that - marilyne_Roh commented on July 2nd 19 at 17:42
: there was no answer - marilyne_Roh commented on July 2nd 19 at 17:45
July 2nd 19 at 17:02
No. Test examples:
<a href="#hello">hello</a>
<a href="site.EN?12">hello</a>
<a href="hello-1">hello</a><a href="hello-2">hello</a>
<a href="hello">hello</a><a href="hello-2">hello</a>

Well, the brackets screwed hurl.
PS Another couple of examples:
<a title="hi" href="#hello">hello</a>
<a href="test">test</a>
href="#hello" style=""
href=hello - Dalton57 commented on July 2nd 19 at 17:05
Already wrote that the regular season is tuned for the parse links of a particular type - marilyne_Roh commented on July 2nd 19 at 17:08
July 2nd 19 at 17:04
Program (code) must be correct, understandable and easily modifiable.

Correct there is suffering, because it takes all the links that need to take, and takes what does not that reference. (If there to insert a html comment with a link inside, it will quietly define this link as valid. And this often happens in the actual pages code comment periodically and all of this continues to be transmitted.)

Clarity suffers in a stupid, never speaking the names of variables and confusing the regular expression. If you take the link, then the regular expression only her braces group, and should be, not the entire row. (Due to the fact that this regular expression is complicated, it can easily slip fault and you just don't notice it, as this is a regular expression a chore to read every time and you just don't want to do that.)

Easy variability is not particularly broken, but only because the code is small. If he was more this, too, would give themselves felt.

In General, the addition of everywhere std:: makes the code professional because it's just lamerskie code added everywhere with std:: .

Find more questions by tags C++Regular expressions