Effective transfer codes Huffman?

My scientific work is related with data compression and now I'm trying to solve one very specific task.


There is an input alphabet of N m-bit numbers (N>=10^6; m=24). For each of the N symbols are known, the frequency of its occurrence. On the basis of this information for all symbols of the received Huffman codes.


The challenge was to create a program which can uniquely reconstruct themselves as m-bit numbers and their Goffmanesque codes. Of course, the transmission needs to have the minimum possible volume.


Option, which I myself thought requires N*(m+2)-1 bits. If someone will manage to improve my score, please share your solution.
October 8th 19 at 03:15
3 answers
October 8th 19 at 03:17
You save the whole tree as it is, if you estimate the amount of information in it, less than I already have, will not work.
But you can make a feint ears and to save the tree itself and the depth of the elements in it.
This will allow you to save up to N-log2(N) bits.
but it will be necessary to re-build the tree before compression. That it was identical to what will be built when unpacking.
October 8th 19 at 03:19
Something is wrong in the condition. 24-bit integers are only 16*10^6, not typed 2^9.
And question — you need to get it set codes Huffman coding, or enough to get any equivalent length (and the algorithms of packing and unpacking themselves agree to the extracted codes were exactly those used when packing the main message)?
* meant not 2^9 and 10^9 - Brett79 commented on October 8th 19 at 03:22
Of course, I meant 10^6 characters. But the problem is actually more General: to effectively convey the trees of arbitrary or very large m and N. - aliyah_Dickens commented on October 8th 19 at 03:25
10^6 is already good. Then the question remains whether it is sufficient to encode any of the equivalent codes and what is the maximum depth of the tree (it depends on the size of the original statistics). If a single code will suffice, and depth of, say, not more than 31, it is enough to 21 bits per character (more precisely, 2^m+N*5 on the whole primer): mask my presence symbols and the code length of each of them. - Brett79 commented on October 8th 19 at 03:28
You need to convey exactly Hoffmannesque codes. The maximum height of the tree can be equal to N-1. And such a situation will be exactly to meet. - aliyah_Dickens commented on October 8th 19 at 03:31
The depth was equal to N-1, it is necessary that the frequency of one of the characters was no more phi^(-N), i.e. 1 character for 10^200000. I believe that someday we will work with such volumes of information — but when?
And "we Need to convey exactly Hoffmannesque codes" is the answer to the wrong question. The question was: if a given code — {0,10,11}, and the algorithm will output {1,00,01}, it will be unacceptable, or somehow agree with the code generator? - Brett79 commented on October 8th 19 at 03:34
October 8th 19 at 03:21
If you need to pass is arbitrary Huffman code, then to win almost anything. Code is determined by the N-element vector of m-bit integers (with the condition that all elements are distinct) is a letter, sorted by codes, and the arrangement of brackets in an expression with N operands is a binary tree. The number of vectors equal to (2^m)!/(2^m-N)!, the number of the trees C(N,2*N)/(4*N-2). If N is much smaller than 2^m, then the first of these numbers can be evaluated as 2^(m*N), and the latter as 2^(2*N)/(4*N-2)/sqrt(pi*N). The total number of bits in their work — about m*N+2*N.
The closer N to 2^m there is an opportunity to win up to 1.4*N bits (due to the fact that (2^m)!< (2^m)^(2^m).

Find more questions by tags Algorithmsdata recovery