How best to design a database for the history of changes to text documents?

Good afternoon.

There is a need to design a service, a key feature of which it is planned to implement the version history of the text of the document. The closest analog that possesses this feature is the version history of a document in Google Docs. Accordingly, it is necessary to choose the most correct database to this task.
I proposed to use a combination of two bases: PostgreSQL for storing metadata and data necessary for the functioning of the website and HBase as storage for text documents, including their version.

Whether this approach is correct or are there other generally accepted solution to this issue? If so, what?
March 19th 20 at 08:31
2 answers
March 19th 20 at 08:33
alfresco view
Unfortunately, Alfresco does not fit in its very essence. It is a holistic product for the organization of document circulation in the enterprise. By nature, it's EMC, but in this case, despite all the advantages of Alfresco, it is not needed. Necessary database, means the product is not.

However, the response is good, thanks for the recommendation. - hanna40 commented on March 19th 20 at 08:36
))

you need to look at HOW , sorry that is not said right - Aliyah.Braun commented on March 19th 20 at 08:39
March 19th 20 at 08:35
You can look at the architecture of CQRS + Event Sourcing. Instead of storing the state of the entity, store what had happened to her events. Negativ these events in turn in essence you can always get the state of the entity at any time.

Simplifies working with data entry in store, but complicated to read. So usually even use CQRS with two repositories, one for the record where is stored the sequence of events, the other for reading, which stores the current state of the entity. WriteModel broker sends a message about the event, ReadModel catches this message and updates its state in the database. ReadModel can always be recompiled from scratch on the existing events. You can use the denormalize your data and write to the database to read required data using simple queries.

The database can take any, though usually for ReadModel use standard relational or document-oriented, and for WriteModel, you can get something more specialized, such as event store, simply because all the chips RDBMS for this part of the application are required.

You can see examples of implementing event sourcing https://github.com/prooph/event-sourcing
Event Sourcing is a great approach, have nothing against him. However, as mentioned, complicated to read. The use of two repositories, if you follow the CAP theorem, cheaper or availability, or data consistency, while preserving the ability to scale: we need to come up with some workaround. In addition, the main operation in our reality will be read and, to simplify the operation. Operation same record will not be as frequent.
Nevertheless, thank you for reminding me about this approach is possible if we don't find another option, it will be good, especially given the fact that we need to ensure the possible recovery from the failure without data loss, and it makes it quite convenient to implement it. - hanna40 commented on March 19th 20 at 08:38

Find more questions by tags Database design