The Cyrillic characters in the Filename parameter of Content-Disposition header

Good morning, dear Habra-users.

There was a following problem. I have a web application (client: js, server php script) and some file storage. To download a file from the repository, the client part of the web application sends a request to the server, there is performed the business logic (authorization, etc.), if all OK, then given the link to the file. Then the client part redirects to this link. On the file storage server is nginx, which gives files. The names of the files are a set of symbols without semantic content (just a GUID), which is not much liked by the users. They wanted to when downloading a file had the same name as the corresponding entity in the web application. Because to rename the file it would not be desirable for some reason, came up with the following:

1. In the formation of links in the backend of the web application to it clings the GET parameter that contains acceptable to the user the file name.
2. In nginx configuration, when the impact file is supplied with this parameter in the Content-Disposition header.
(add_header Content-Disposition 'attachment;Filename=$args';)

The problems started with the substitution in the Content-Disposition of the Russian text.
First off, when firefox was redirected to a repository makes urlencode links. And nginx Content-Disposition inserts the encoded string. Accordingly, firefox offers to save the file under a coded name.
Second, even if Content-Disposition is non-encoded string, but UTF-8 with Cyrillic, then IE doesn't want to know anything about what is UTF-8. Interpreterpath it as cp1251 and name of the file is obtained with gibberish.

In General, this scheme works properly only in chrome. If I'm not mistaken, with the problem number 1 (urlencode) can be overcome if to recompile nginx from source to include a module ngx_set_misc. Then in the nginx configuration you can use set_unescape_uri to do urldecode for the file name before inserting it in the Content-Disposition header. But this option would be a last resort.
And how to solve the problem with IE — I don't know.

In General, I'm stumped. I would be very grateful for the advice, maybe there is much more simple way to solve my problem, but I in an emphasis do not see.
October 3rd 19 at 02:52
4 answers
October 3rd 19 at 02:54
Solution
You cannot use Content-Disposition to specify the name of the file, as it only supports Ascii. Solution — use the links to download the form:

/download/12345/Report недвижимости.xls

And all you will have to work as it should.
See RFC5987 - Beaulah.Kutch10 commented on October 3rd 19 at 02:57
Above you write, that not everywhere works. Plus, the decision in RFC looks like a terrible crutch, kind of ugly, quoted printable, which is used in the letters. Plus, the link that ends with the file name looks prettier and more readable than some download.php?id=XXX - marcus.Champlin53 commented on October 3rd 19 at 03:00
I hi the link, since You wrote that C-D only supports ASCII, which is false.

The disadvantages of this method are quite clear. By the way, MIME is used another variant of encoding non-ASCII strings in headers (allows for different view options, qp, b64, at least). - Beaulah.Kutch10 commented on October 3rd 19 at 03:03
I looked once again at the RFC, what is this ugly managed to push the standard. Instead of the standard HTTP to specify a single character encoding in the headers, they bolted on the side a separate RFC for encoding "Header Field Parameters", and each field can now be encoded with their encoding, and with a bunch of ways of placing the quotes. This RFC gives the impression that his Microsoft wrote.

> By the way, MIME is used another variant of encoding non-ASCII strings

It's all curves crutches from the era of the ancient 7-bit communication channels that pull 20 years and no one dares to throw away and which are all supposed to support. It's sad. - marcus.Champlin53 commented on October 3rd 19 at 03:06
Quite frustrating that different protocols are used completely different methods of encoding (MIME, HTTP, HTTP headers, LDAP).

MIME by itself is not so bad. Much worse that some comrades (IBM and MS) quietly put on the MIME. That's when parsing messages becomes hell.

Examples:
1. the UN-encoded data in headers
2. duplication of titles (for example, 2 Subject header and within the data in cp1251, second — in utf8, ichskh both desactivada),
3. cutting the address into parts and encoding in the format of =?UTF-8?B?.....?= pieces email with angle brackets, not the name of the sender/recipient
4. mismatch Content-Transfer-Encoding and real encoding method.

From what I remembered immediately. But there are a lot of different "nice" detail. - Beaulah.Kutch10 commented on October 3rd 19 at 03:09
Thank you did (the url file path, slash + friendly name, rewrite remove the last segment).
The option of RFC5987 working, but only starting with IE 9, but need the 8th. - maureen.Greenholt commented on October 3rd 19 at 03:12
October 3rd 19 at 02:56
Transliterate from Cyrillic to Latin alphabet, whitespace characters to underscores.
October 3rd 19 at 02:58
October 3rd 19 at 03:00
In order for IE normally understood file names in Russian, I have to do the following trick on the server side.
var fileName = '.....';
var name = isIE(request.headers['user-agent'])? encodeURIComponent(fileName): fileName;
response.set({
 'Content-Type' : contentType,
 'Content-Disposition' : 'attachment; filename="' + name + '"'
});

Likely that for You there will be a major use encodeURIComponent

Find more questions by tags Internet ExplorerComputer networksNginx* nix-like systems