My scientific work is related with data compression and now I'm trying to solve one very specific task.

There is an input alphabet of N m-bit numbers (N>=10^6; m=24). For each of the N symbols are known, the frequency of its occurrence. On the basis of this information for all symbols of the received Huffman codes.

The challenge was to create a program which can uniquely reconstruct themselves as m-bit numbers and their Goffmanesque codes. Of course, the transmission needs to have the minimum possible volume.

Option, which I myself thought requires N*(m+2)-1 bits. If someone will manage to improve my score, please share your solution.

There is an input alphabet of N m-bit numbers (N>=10^6; m=24). For each of the N symbols are known, the frequency of its occurrence. On the basis of this information for all symbols of the received Huffman codes.

The challenge was to create a program which can uniquely reconstruct themselves as m-bit numbers and their Goffmanesque codes. Of course, the transmission needs to have the minimum possible volume.

Option, which I myself thought requires N*(m+2)-1 bits. If someone will manage to improve my score, please share your solution.

asked October 8th 19 at 03:15

3 answers

answered on October 8th 19 at 03:17

You save the whole tree as it is, if you estimate the amount of information in it, less than I already have, will not work.

But you can make a feint ears and to save the tree itself and the depth of the elements in it.

This will allow you to save up to N-log2(N) bits.

but it will be necessary to re-build the tree before compression. That it was identical to what will be built when unpacking.

But you can make a feint ears and to save the tree itself and the depth of the elements in it.

This will allow you to save up to N-log2(N) bits.

but it will be necessary to re-build the tree before compression. That it was identical to what will be built when unpacking.

answered on October 8th 19 at 03:19

Something is wrong in the condition. 24-bit integers are only 16*10^6, not typed 2^9.

And question — you need to get it set codes Huffman coding, or enough to get any equivalent length (and the algorithms of packing and unpacking themselves agree to the extracted codes were exactly those used when packing the main message)?

And question — you need to get it set codes Huffman coding, or enough to get any equivalent length (and the algorithms of packing and unpacking themselves agree to the extracted codes were exactly those used when packing the main message)?

answered on October 8th 19 at 03:21

If you need to pass is arbitrary Huffman code, then to win almost anything. Code is determined by the N-element vector of m-bit integers (with the condition that all elements are distinct) is a letter, sorted by codes, and the arrangement of brackets in an expression with N operands is a binary tree. The number of vectors equal to (2^m)!/(2^m-N)!, the number of the trees C(N,2*N)/(4*N-2). If N is much smaller than 2^m, then the first of these numbers can be evaluated as 2^(m*N), and the latter as 2^(2*N)/(4*N-2)/sqrt(pi*N). The total number of bits in their work — about m*N+2*N.

The closer N to 2^m there is an opportunity to win up to 1.4*N bits (due to the fact that (2^m)!< (2^m)^(2^m).

The closer N to 2^m there is an opportunity to win up to 1.4*N bits (due to the fact that (2^m)!< (2^m)^(2^m).

Find more questions by tags Algorithmsdata recovery

And "we Need to convey exactly Hoffmannesque codes" is the answer to the wrong question. The question was: if a given code — {0,10,11}, and the algorithm will output {1,00,01}, it will be unacceptable, or somehow agree with the code generator? - Brett79 commented on October 8th 19 at 03:34