How to store/select data from PostgreSQL?

Kind time of day dear!

In continuation of my previous questions
[PostgreSQL] How to put strings in INT and other data types?
How to cast an array of numbers from a VARCHAR field to INTEGER in PostgreSQL?

The problem: there is a set of data (from 1 to 1000+) records, strictly tied to another record (simple connection, one-to-many). In view of the fact that these data are of mixed type (can be strings, with all the consequences) and numbers. The data stored in VARCHAR format, in view of the above described reasons.

At the same time, work with the data we need depending on the situation and as strings and as numbers. That is, if the search condition is set to "search by rows" - we're looking for around the right and by rows and numbers, the Bud is all the rows (technically it is). If the condition is to search for ranges, for example:
... WHERE n >= 10 AND n <= 100;

we have to choose only numbers and respectively compare them.

I see a solution:
Option 1 We keep lowercase the data in the table for the rows of numeric data in table for numbers (and apparently, for the fractional'll have to have a personal table), and depending on the search terms make a selection from two tables. There is a small problem:
a) fragmented Data
b) the System will work so that identifying the format of the input data, will record them in the appropriate table, in this case there is some probability of an erroneous determination, because it is not the fact that "333555" - is the amount of something, not a phone number or something else other than this amount.

Option 2.1 We store all the data in one table, format VARCHAR, and overhead sign do discharge number, for example:
SELECT field1::integer FROM table1 WHERE field1 ~ E'^\\d+$' AND field1::integer > 3;

in this embodiment confuses me reguraly... although She is very small, but still the regular season.

The version 2.2 We add another field-flag (number -> true/false) which will determine what is stored in this string, a number or a string. Accordingly, the search for rows works in a regular mode, and search by numbers works is not in the format of a regular expression, and based on the flags.

In this version I don't like the additional entity and the additional logic, but we already got rid of the regular (though very small).

Please tell me which option is better, how much better and why? What are the objective pros and cons of each approach be? How slow primitive of the regular season will slow down the system and/or how they are better/worse variants with an extra field?

P. S. I understand that you can "take a check", but as having no ideas about how and why the database will behave in different situations and not being able to simulate such situations on different hardware with a different set (volume) of data - I would like to hear the opinion of someone who understands how it works logically.
July 2nd 19 at 13:31
3 answers
July 2nd 19 at 13:33

And maybe you don't know which set of fields you have in reality
don't know what queries you prevail
do not know the set of field types
unclear what is meant by "better", "fast" and so on
The set of fields is a dynamic object attributes, there could potentially be anything from arithmetic mean income and number of paws to the names of the grandparents in the 8th generation. To predict "what will happen tomorrow and in what quantity?" is unfortunately impossible.

The amount of data - about 1.5 million records (already/now), each of which is bound an average of 20 values (i.e. 30m. records of values), of them at the moment: 70% - integers, 20% row, 10% fractional (approximately). - Salvado commented on July 2nd 19 at 13:36
In what area of expertise is running your app?

While the database size is not more memory - number of entries does not matter - marilyne_Roh commented on July 2nd 19 at 13:39
Memory 8Gb, 2xSSD, server FreeBSD + Posrgres. Concerning the volume database can't access the server from where I am now, but as long as it works within normal limits. Requests within 1s. run.

Don't know how to define an area of knowledge and how it affects storage format, but it is the base of technical products, with all possible parameters, from the authors of the drawing to objects where these products can be applied. Is stored of course, only the important parameters (which it is advisable/you want to store).

Regarding the memory capacity and the size of the database. As long as it works and it works for me, but for example, we can assume that the volume DB is 10 times more memory capacity and build on this in the calculations. - Salvado commented on July 2nd 19 at 13:42
0. You should begin from RAM = volume DB

1. While the rate of answers like - use 3NF
No isonow, * PostgreSQL hstore
EAV - field specify the type of the element
+ nice ORM backend - marilyne_Roh commented on July 2nd 19 at 13:45
: I disagree about "no isonow". In PostgreSQL, to implement the EAV is a very convenient type JSONB, since it is well indexed. Once tested the solution:
1. with "classic" EAV related tables
2. JSONB field in the main table.
Option 2 was noticeably faster, if you don't need any tricky validation. - Markus_Langworth commented on July 2nd 19 at 13:48
those select conditions for EAV slower than same select on Janu? - marilyne_Roh commented on July 2nd 19 at 13:51
: Thanks for the links.
Look at in the paragraph where the author compares the speed SELECT and. I was critical of it.
--- Quote: ---
That's 15000x faster then EAV - Markus_Langworth commented on July 2nd 19 at 13:57
: without indexes - marilyne_Roh commented on July 2nd 19 at 14:00
: Look closely at the chart
The last strip of the maroon color with the operator JSONB @> and indexes. You can hardly see it: "it had a huge effect on the timing: only 0.153 ms!"
In the end, check it out on the examples from the article. I checked on their data. And I did that JSONB is significantly faster than the EAV. - Markus_Langworth commented on July 2nd 19 at 14:03
July 2nd 19 at 13:35
It seems to me that all the problems that you store completely different entities in the same column. As long as you don't lay out to different places, you will suffer. And even regular season/flags/attempts to keep the number separately from the lines will not solve the problem 100% - you yourself gave an example, when it is not clear, the amount is in line or phone. You have something wrong in the storage schema.... Subject area know only you, so you decide what and where)
If it's dynamic attributes, then you may want to look in the direction jsonb, filled with gin. But we need more detail to understand what the analyst is on these fields and why it is important number.
July 2nd 19 at 13:37
Join . In your case perhaps the best solution would be JSONB. Of the minuses - denormalize your data. If you need to control referential integrity within the data JSONB, then the task gets complicated. It can be solved, for example, the hanging of triggers with control functions.
If this control can be neglected, I recommend to look at JSONB.

Find more questions by tags PostgreSQL