Need help parsing a WordPress site?

Have a website, you need to put a photo and a title of each post, from the first to the last page. What frameworks need? You can only do jsoup? Are there resources where I can find the algorithm through the articles and pages?
March 23rd 20 at 19:44
1 answer
March 23rd 20 at 19:46
Solution
Hello!
1) whether authorization to access the content? Read how to log on to the website using jsoup.
2) No matter which CMS you parsite.. VP or something
3) Jsoup is not able to work with dynamic content (like ajax pagination, uploading scrolls, etc.). Usually, if no dynamic content then that's enough.
4) If you still have dynamic content - the Selenium + browser (FF || Chrome, etc.)
5)
Are there resources where I can find the algorithm through the articles and pages?

Full of resources, enough to look. As a General principle of passing on articles and pages - in fact it's just cycles.
6) you Can put the data and no JAP. For example, using Visual Web Ripper.

A rough plan of parsing.
- determine the content type. (see paragraph 3 and 4)
- determine the authorization (and if necessary authorization, it is possible to realize authorization)
- to determine the entry point. For example, the category page (rubric) VI.
- to define the type of pagination. Usually, in the VP it /page/1,2,3,4 etc. Then it depends on your goal. You can simply increment the page value to max. value (look at what the last page), or for example, can be incremented up to the moment when the website will not be typical entries of the blocks. (it all depends on the layout).
Next cycle do {} while () or while() {} gather information (links) on existing records and added to some List.
- Then again a cycle is run through the list and open the URLs and parasite the content of the page itself. You can also connect Apache POI to after parsing export data to xlsx.
Usually, for convenience, I create an object (title, text, image link, etc.). Then add all the objects in a List. And then exporteres this sheet in xls.
Here, there's a good snippet to export a List to Excel.
https://www.jeejava.com/generic-way-of-reading-exc...

If you need to import information to the site VP, then use the plugin WP ALL IMPORT. You created xlsx files is perfect
Respect for such an open response! Thank you! - emilio_Bro commented on March 23rd 20 at 19:49

Find more questions by tags Java