Getting and parsing HTML, sending JSON-queries, Tor anonymization — help to define the tools


The result is far from it-world circumstances and a natural curiosity have need to address one simple, but specific task. In short, you need to periodically request a page from a remote server, to analyze received HTML (rip a few lying in a certain place links to isolate those links lying in a certain place pieces), and then based on the results of the analysis and some internal logic to send back via a local Tor proxy JSON request forms.

You do not possess the skill of web development in General, but have programming experience and a desire to learn. Therefore I would be grateful already realized web developers and simply knowledgeable teammates for their ideas and thoughts about the tools by which this task could be solved. The decision does not have to be universal and beautiful, and will fit a pair of crutches, with which you can relatively quickly make a working version.

Personally, I came up with two different directions of the solution. The first JS script that I could run inside Firefox. Parsing the HTML, sending JSON-queries, then would a script work with Tor could be implemented by configuring FF. But, as I understand it, clean JS'ω it is impossible to code a remote page.
The second is the writing any script in PHP or Python, would make all the work. The gugleniya showed that fundamentally the problem is solvable. But I can't decide what to use; plus it is not clear how to use Tor in this case.

In General, if someone once did something similar, share experience, and even in the details, I'll try to figure it out myself =).

Thank you!
October 8th 19 at 03:19
1 answer
October 8th 19 at 03:21
In php
$ch = curl_init();
//Get the desired page into the variable $data
curl_setopt($ch, CURLOPT_URL, "");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
<!-- write here the inner logic and json put example into the variable $json -->
$ch = curl_init(); 
//Where to send
curl_setopt($ch, CURLOPT_URL, ""); 
//IP and port of tor proxy
curl_setopt($ch, CURLOPT_PROXY, ""); 
//The login password for the proxy if there is
curl_setopt($curl, CURLOPT_PROXYUSERPWD,' username:pass'); 

Instead of curl_setopt($curl, CURLOPT_PROXYUSERPWD,' username:pass');
curl_setopt($ch, CURLOPT_PROXYUSERPWD,' username:pass'); - Adrien.Tremblay commented on October 8th 19 at 03:24
Thank you very much!

curl is also across the field of view in the search, but didn't think it would be so easy. - elaina commented on October 8th 19 at 03:27

Find more questions by tags Mozilla FirefoxParsingJSONPythonHTMLPHPJavaScriptTor