PHP script with great run time — how to properly write this?

Welcome all habrovky

There is a script parsing an xml file and then write to the database and downloading images to URLs from this file. The script uses SimpleXML. Immediately say that he wrote it not me, I just now bring it up to the States we need writing necessary things.
The problem is that the size of the file, which we parsim very large — 8000-8500 records. Plus, each entry of 3 to 5 pictures, which he shakes. As a result, the script parses somewhere 6,500 records, and then quietly stops. I tried to run this script on the hosting where it is possible to raise the maximum execution time — it helps but not completely — parse somewhere 7500-7800 entries and stops. Maybe there are some limitations that need to raise?

Please suggest me about the methods of writing such scripts processing ballsie amounts of data. Using cron to run it will not succeed, because this is an extension to one of the TSMs.
I would be grateful for any thoughts and ideas
October 3rd 19 at 04:26
10 answers
October 3rd 19 at 04:28
I designed these scripts in a console command in PHP.
And everything worked according to this principle:
1) Run the command, it writes to the database (in a special plate) that it is running
2) On frontend runs ajax which from time to time checks the sign condition of the team by its ID.
3) the Team failed and recorded in the sign your status "error" for example and the error message, ajax request saw it and reported already to a web interface.
4) Team success — similar to item (3)
October 3rd 19 at 04:30
and runs as? In the automatic mode or the person runs?
If the latter — AJAX + use function register_shutdown_function () that would catch the moment of failure, to return back a flag that the import is not completed + the number of the last record. Repeat requests automatically, until the task is completed.
The script launches cheloveek Ajax.
Thanks for the feature — a question of catching manta stop is very important. - Myrna.Hickle40 commented on October 3rd 19 at 04:33
October 3rd 19 at 04:32
I certainly understand that to give such advice when the software has been written is not the best idea, but still advise You to look at the possibility of raspredeleniya process, for example with gearman, it will increase the performance of Your script.
October 3rd 19 at 04:34
if the script is run from browser (Ajax, or just yanking links) do not forget to put
ignore_user_abort(true);
October 3rd 19 at 04:36
The SAX parser will work much more efficiently to memory, and most likely a lot faster.
October 3rd 19 at 04:38
You can write a script so it was being parsed for example 100 records and then retain the current position and restarted, started the parsing with stored positions.

Thus solve the problem with the runtime and all the rest
Possible position to write in the GET and reboot via meta-refresh, I was being parsed large sites - Myrna.Hickle40 commented on October 3rd 19 at 04:41
: Made the parsing parts. In PHP. Each new part is specified via the GET parameters and redirect via headers Header. Everything works fine. But comes to about the 10th of a redirect, the browser stops everything and swears that a lot of redirects - ERR_TOO_MANY_REDIRECTS. How to win? - tamara_Fra commented on October 3rd 19 at 04:44
solved the problem with meta-refresh - Zaria_Quigley commented on October 3rd 19 at 04:47
October 3rd 19 at 04:40
Here's another, though perverse-spike, but in some cases almost indispensable option (requires basic knowledge of PHP and HTML/JavaScript):

1. Written script parcasi XML forms HTML a sign with the url of the pictures.
2. Spell a script that is passed two parameters: the string value to the label of claim 1 and the number (index) of the row this label. The task of the script: PHP-part to fill in the picture and write what you need in the base and in the body onload (javascript) to redirect to the register himself, but with a value (obtained from javascript) and the index of the next row of plates. If the index is equal to the number of rows, to give the alert that the fill has ended.
3. Written a simple html page in two frames: the first frame script of claim 1, the second script of claim 2 with the starting values. This page opens in the browser, then you can go to sleep. The computer preferably keep on a reliable channel in the Internet and UPS to relationship is not interrupted.
Uzhs. Wouldn't it be easier to do in js cycle with ajaxapi? - Myrna.Hickle40 commented on October 3rd 19 at 04:43
I'm not sure. - tamara_Fra commented on October 3rd 19 at 04:46
October 3rd 19 at 04:42
I have php scripts running in the background via cron sometimes for week's work(working with third party API + pumping images + entry in the database).

Cost one
set_time_limit(0);

If does not help look in the web server logs, perhaps you have the DB fall off due to too long a session. The error will be about the same
MySQL server has gone away
; max_execution_time
; Note: This directive is hardcoded to 0 for the CLI SAPI

so set_time_limit(0) to nothing - Myrna.Hickle40 commented on October 3rd 19 at 04:45
October 3rd 19 at 04:44
To download pictures of the same script is entirely optional. I usually just save the url to a file which is then given to wget-y (- i flag). It also happens that external rail (CMS) saves files in its own structure. Therefore, for this case, you can wget-th pictures of the pre-pump in the local filesystem. That would be in the FS address of the picture was the same as the URL you should use -x opinio Here is a typical example of loading:

wget -x-b --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6 rated" --referer="http://examplke.com/" -i img_url_list.txt

Shakes very fast + easy parallelise if you split the file into pieces and run from different servers. I have two servers 200 000 pictures a total volume of ~8GB to boot for about an hour.
October 3rd 19 at 04:46
Used like this:

set_time_limit(0);
ini_set('max_execution_time', 0);

here it is:

set_time_limit(0);

here it is:

set_time_limit(9000);

after some time gives 504 Gateway Time-out

how can it be overcome, what would the script has completed its function beyond time and opened back the page without this error?

Or that he'd broken the script on the desks, for example, database rows are written 15K, and that he would have to approach time out counter is restarted the time-out?

Find more questions by tags PHPProgramming