1) Use CURL or is there something better?
2) After receiving the page (e.g. via CURL) immediately take the necessary data or to write to the file page, and then parse it. In terms of cost of operating memory the differences are?
3) Use the regular season or, for example PHP Simple HTML DOM Parser? And if the latter then what is it using? Data from each page to parse is not a lot and the speed of execution does not need.. memory Consumption intreset.
1) I'm using Guzzle, it's the same curl, but in a convenient wrapper.
2) I save the links in a file and then pass it on, if pages are few and not heavy - do not retain.
3) of the Regular season, but only in cases where ordinary methods of library data can't reach, for example on sites with a tabular layout without classes and IDs, use the phpquery library, she quickly specified.
Rick.Satterfie answered on March 12th 20 at 08:06
1. curl for this purpose, excellent, I see no obstacles.
2. For every possible. In my case, the page hits to the database, from there is taken by the parser. The memory consumption will not tell, did not measure.
3. Use PHP Simple HTML DOM Parser - friendly interface, works fine. The regular season only in cases where nothing else.
Actually, looking at what was happening - there are better options parsing in Python sites (bs, scrappy). And much more pleasant options of parsing using headless browsers - allows more natural to the data and it is more natural to bypass some levels of protection from parsing.
I just signed up for the service, which brings me the data of any complexity from any sources. Those who often need to parse the data suggest. Although a one time use kind of there too. https://sssoydoff.wixsite.com/scraper