Hello, I study the neural network, such issues arose, the input parameters should be normalized in the range from 0 to 1 or from -1 to 1, and what range of numbers should be on the scale? Downloaded what the library looked at her weight in any case are in the range from 0 to 1.

Another question about scales, as I understand it the output value of the neuron should be in the range between 0 and 1, but if a lot of neurons, for example 1000 (one neuron receives 1,000 options, for example with the previous layer), then the output value will always go overboard and sinks to more or less sane only when the weights have values like 0.00.... ie starts with the one thousandth, it's me doing something wrong or so right? just how much should go of iterations that would weight 0.1.. went to.. let's say 0.0001.

You can only please in simple language, on your toes. Thank you very much.

Another question about scales, as I understand it the output value of the neuron should be in the range between 0 and 1, but if a lot of neurons, for example 1000 (one neuron receives 1,000 options, for example with the previous layer), then the output value will always go overboard and sinks to more or less sane only when the weights have values like 0.00.... ie starts with the one thousandth, it's me doing something wrong or so right? just how much should go of iterations that would weight 0.1.. went to.. let's say 0.0001.

You can only please in simple language, on your toes. Thank you very much.

asked June 5th 19 at 21:03

3 answers

answered on June 5th 19 at 21:05

answered on June 5th 19 at 21:07

perhaps America is open to you, but no dimensions weights, no range, no normalization of these weights have absolutely no value.

Matters only decisive feature that adjusts ANY of the values of the weights based on error back-propagation to such values at which a decisive function reacts with the fewest mistakes. And what is the values of the weights - absolutely no difference, at least from 0.01 to 0.02 (in increments of 0.0000001), for example, or from -1000000000 to +10000000000, the result will be the same (adjust weight for desired reaction is the decisive feature).

As for normalization - it is generally a meaningless operation such as you share for example a value of "input" from all neurons the number of neurons (and this value is always constant). And the constant has absolutely no effect on the fitting process ratio (a ratio for example will be more or less than this constant), but as I said, we are NOT interested in the absolute value of the coefficient, we care about the ratio of vibrational spectroscopy and decisive functions.

I hope the idea is clear.

write your network, try manually the coefficients to calculate all by yourself get.

Matters only decisive feature that adjusts ANY of the values of the weights based on error back-propagation to such values at which a decisive function reacts with the fewest mistakes. And what is the values of the weights - absolutely no difference, at least from 0.01 to 0.02 (in increments of 0.0000001), for example, or from -1000000000 to +10000000000, the result will be the same (adjust weight for desired reaction is the decisive feature).

As for normalization - it is generally a meaningless operation such as you share for example a value of "input" from all neurons the number of neurons (and this value is always constant). And the constant has absolutely no effect on the fitting process ratio (a ratio for example will be more or less than this constant), but as I said, we are NOT interested in the absolute value of the coefficient, we care about the ratio of vibrational spectroscopy and decisive functions.

I hope the idea is clear.

write your network, try manually the coefficients to calculate all by yourself get.

answered on June 5th 19 at 21:09

The type of input and output values depends on what meaning you put them and the network architecture (in particular, the activation functions at the outputs of the neurons).

During text processing, for example, often the input is the sequence ID schnick words per sentence, respectively, are integers from 0 to <the number of words in the dictionary>.

In image processing frequently using the ReLU activation function, the output of which is a nonnegative number.

Normalization of input is useful when the original range features very different from each other, but the meaning they are about equal, and the feature is a real number (for example, if the input data is the length of the icicles on the roof in millimetres and the temperature outside in degrees; the first feature is of the order of hundred-thousand, the second - tens-units).

The way to initialize the weights in the layers is of great importance for how well it will work on backprop. But this area is already well understood and everywhere the default standard solution like initialize Poroto or orthogonal initialization. So what is there not to worry.

"how many iterations should pass that would weight 0.1.. went down to 0.0001.." can be rephrased as "why backprop is slow and how to speed it up". This is one of the fundamental tasks in DL. Initialization of weights is one way partial solutions. Different activation functions - other. The new architecture layers the third. Modification of the training data is the fourth. And so on.

During text processing, for example, often the input is the sequence ID schnick words per sentence, respectively, are integers from 0 to <the number of words in the dictionary>.

In image processing frequently using the ReLU activation function, the output of which is a nonnegative number.

Normalization of input is useful when the original range features very different from each other, but the meaning they are about equal, and the feature is a real number (for example, if the input data is the length of the icicles on the roof in millimetres and the temperature outside in degrees; the first feature is of the order of hundred-thousand, the second - tens-units).

The way to initialize the weights in the layers is of great importance for how well it will work on backprop. But this area is already well understood and everywhere the default standard solution like initialize Poroto or orthogonal initialization. So what is there not to worry.

"how many iterations should pass that would weight 0.1.. went down to 0.0001.." can be rephrased as "why backprop is slow and how to speed it up". This is one of the fundamental tasks in DL. Initialization of weights is one way partial solutions. Different activation functions - other. The new architecture layers the third. Modification of the training data is the fourth. And so on.

Find more questions by tags Neural networks