Redefine spec properly

This commit is contained in:
2025-09-30 09:22:10 +02:00
parent 045a9d5e84
commit 2a01aa8d96

155
Spec.md
View File

@@ -1,103 +1,87 @@
Event log based store # Event Log Based Store
The data rows of our table are to be recreated from an event log All data rows are reconstructed exclusively from an event log. All interactions with data rows must occur via events in the log. For performance, data rows are cached for quick lookup.
All interactions with the rows is to happen exclusively via events from/in the log
For performance reasons we are to cache these data rows as well for quick lookup
Events in the log are to take form of: Events are defined as:
type Event struct { type Event struct {
// Server generated sequence number of the event - ie when it was applied Seq int `json:"seq"` // Server-generated sequence number (applied order)
Seq int `json:"seq"` Hash string `json:"hash"` // Server-generated hash, guarantees event was processed
// Type of the event - create, update, delete, defined by the client ItemID string `json:"item_id"` // Client-defined item identifier
Type string `json:"type"` EventID string `json:"event_id"` // Server-generated event identifier (uuid-v4)
// Hash of the event - server generated, gurantees the event was processed Collection string `json:"collection"` // Client-defined collection/table name
Hash string `json:"hash"` Data string `json:"data"` // JSON array of RFC6902 patches
// ItemID of the item that is to be manipulated, defined by the client Timestamp time.Time `json:"timestamp"` // Server-generated timestamp (when processed)
ItemID string `json:"item_id"`
// EventID of the event - server generated, gurantees the event was processed
EventID string `json:"event_id"`
// Collection of the item that is to be manipulated, defined by the client
Collection string `json:"collection"`
// Data that is to be used for manipulation; for create events that's the full objects and for update events that's the diff
Data map[string]interface{} `json:"data"`
// Timestamp of the event - server generated, when the event was processed
Timestamp time.Time `json:"timestamp"`
} }
Events are divided into 3 types, create update and delete events
Create events simply create the object as given in Data
Delete events simply mark an object as deleted (not actually delete!) via its ItemID
This simply means we set the DeletedAt field of the object to the current timestamp
Updates to deleted events are processed as usual, we have no concept of a "deleted" item, only an item with a field set to a value
Which is then filtered against when fetching and it so happens to be named "DeletedAt"
Update events are to modify a field of a row and never more than one field
Therefore its data is only the diff in the form of "age = 3"
When creating an event only the Type and ItemID must be provided When creating an event, only Data, Collection, and ItemID are required from the client. Hash, EventID, Seq, and Timestamp are computed server-side.
Data is optional (delete events have no data)
Hash, EventID and Seq are to be computed server side
On the server side with an incoming event: Server-side event processing:
Grab the latest event - Retrieve the latest event for the collection.
Assign the event a sequence number that is incremented from the latest - Assign the next sequence number (incremented from the latest).
Create its EventID (generate a uuid-v4) - Generate a new EventID (uuid-v4).
Assign it a Timestamp - Assign the current timestamp.
Compute the hash from the dump of the current event PLUS the previous event's hash - Compute the event hash as a function of the current event's data and the previous event's hash.
When serializing the event write the serialization function manually to ensure field order - Serialize the event manually (not via json.Marshal or %+v) to ensure field order for hashing.
Do not use json serialize or %+v but manually string together the fields - Apply the patch to the cached data row.
And only then apply the patch
For create events that is insert objects
For delete events that is mark objects as deleted
For update events get the object, apply the diff and sav the object
Events are to be periodically merged on the server Event log compaction:
Maybe set this cutoff to 2 or 3 days - Every 2 days, merge and compact the event log for each collection.
This means resolve all the events, delete the event log and generate an event log only having create events with the data we resolved - All events older than 2 days are resolved, and a new minimal event log is generated that produces the same state.
Hopefully we will never have more than a few hundred of events - Sequence numbers (Seq) are never reset and always increment from the last value.
Do NOT reset the seq number at any point, always increment from last - Before merging or deleting old events, save the original event log as a timestamped backup file.
Maybe instead of deleting the event log save it somewhere just to have a backup Client requirements:
Maybe cram them into a text file and save with timestamp - Must be able to apply patches and fetch objects.
Maybe don't delete/merge the whole event log but only "old" events like >2d - Must store:
While keeping the "new" events (<2d) - last_seq: sequence number of the last processed event
- last_hash: hash of the last processed event
- events: local event log of all processed events
- pending_events: locally generated events not yet sent to the server
- On startup, fetch new events from the server since last_seq and apply them.
- When modifying objects, generate events and append to pending_events.
- Periodically or opportunistically send pending_events to the server.
- Persist the event log (events and pending_events) locally.
- If the server merges the event log, the client detects divergence by comparing last_seq and last_hash.
- If sequence matches but hash differs, the server sends the full event log; the client reconstructs its state from this log.
If the server merges the event log and the client has unsent local events:
- Client fetches the merged events from the server.
- Applies merged events to local state.
- Reapplies unsent local events on top of the updated state.
- Resends these events to the server.
On the client side we have to be able to apply patches and fetch objects If a client sends events after the event log has been merged:
The client is to keep a sequence number and hash of the last event it has processed - The server accepts and applies these events as usual, regardless of the client's log state.
When starting up ask the server for any new events since its last sequence number
Get any new events and apply them to the local state Merging the event log must not alter the resulting data state.
When modifying objects generate events and append them to our local event log
Periodically or when possible try to send those events to the server Required endpoints:
This means we have to keep saved the event log locally
When the event log is merged on the server our local will diverge GET /api/<collection>/sync?last_seq=<last_seq>&last_hash=<last_hash>
We will only know this by comparing the client hash and seq with the server hash and seq - Returns all events after the specified last_seq and last_hash.
For example the client may have seq 127 and hash "abcd123" while the server, after merging, has seq 127 and hash "efgh456" - If the provided seq and hash do not match the server's, returns the entire event log (client is out of sync).
Since on the server the seq 127 will have no previous event (merged - deleted)
While on the client it will have some event PATCH /api/<collection>/events
At that point the server is to send the whole event log again and the client is to reconstruct it again - Accepts a JSON array of RFC6902 patch objects.
Server processing:
- As new events arrive, process the event log and update the cached state for the collection.
- The current state is available for clients that do not wish to process the event log.
- Only new events need to be applied to the current state; no need to reprocess the entire log each time.
- Track the last event processed for each collection (sequence number and hash).
On startup, the server must:
- Automatically create required collections: one for events and one for items (data state).
- Events must be collection-agnostic and support any collection; at least one example collection is created at startup.
- Ensure required columns exist in collections; if missing, reject PATCH requests with an error.
- Each collection maintains its own sequence number, hash, and event log.
IF the server merged the event log and our client has events that have not yet been sent
Then get the new events from server, apply them, and apply our local events on top of those
And try to send them to server again
On the server side if a client sends us events after we merged the event log
We may simply naturally apply them even if the client was not operating on the merged event log
At the end of the day merging the event log should make no changes to the data
--- ---
Actually for pocketbase we might want to generalize this
Maybe create a "Collection" field as well and allow the events to manipulate any table...
That way our Data isn't tied to a table...
## RFC6902 ## RFC6902
Wait actually we can use RFC6902
"JSON patch standard"
It defines a way to apply patches to JSON documents...
Exactly what we need
https://datatracker.ietf.org/doc/html/rfc6902 https://datatracker.ietf.org/doc/html/rfc6902
Some highlights:
Operation objects MUST have exactly one "op" member, whose value Operation objects MUST have exactly one "op" member, whose value
indicates the operation to perform. Its value MUST be one of "add", indicates the operation to perform. Its value MUST be one of "add",
"remove", "replace", "move", "copy", or "test"; other values are "remove", "replace", "move", "copy", or "test"; other values are
@@ -228,8 +212,3 @@ Some highlights:
## test ## test
I think we don't care about this one I think we don't care about this one
For this then use the PATCH http method
And simply submit patches one by one
"Patch" here is synonymous with our "event"
Well nearly, a patch is part of an event