Redefine spec properly

This commit is contained in:
2025-09-30 09:22:10 +02:00
parent 045a9d5e84
commit 2a01aa8d96

155
Spec.md
View File

@@ -1,103 +1,87 @@
Event log based store
# Event Log Based Store
The data rows of our table are to be recreated from an event log
All interactions with the rows is to happen exclusively via events from/in the log
For performance reasons we are to cache these data rows as well for quick lookup
All data rows are reconstructed exclusively from an event log. All interactions with data rows must occur via events in the log. For performance, data rows are cached for quick lookup.
Events in the log are to take form of:
Events are defined as:
type Event struct {
// Server generated sequence number of the event - ie when it was applied
Seq int `json:"seq"`
// Type of the event - create, update, delete, defined by the client
Type string `json:"type"`
// Hash of the event - server generated, gurantees the event was processed
Hash string `json:"hash"`
// ItemID of the item that is to be manipulated, defined by the client
ItemID string `json:"item_id"`
// EventID of the event - server generated, gurantees the event was processed
EventID string `json:"event_id"`
// Collection of the item that is to be manipulated, defined by the client
Collection string `json:"collection"`
// Data that is to be used for manipulation; for create events that's the full objects and for update events that's the diff
Data map[string]interface{} `json:"data"`
// Timestamp of the event - server generated, when the event was processed
Timestamp time.Time `json:"timestamp"`
Seq int `json:"seq"` // Server-generated sequence number (applied order)
Hash string `json:"hash"` // Server-generated hash, guarantees event was processed
ItemID string `json:"item_id"` // Client-defined item identifier
EventID string `json:"event_id"` // Server-generated event identifier (uuid-v4)
Collection string `json:"collection"` // Client-defined collection/table name
Data string `json:"data"` // JSON array of RFC6902 patches
Timestamp time.Time `json:"timestamp"` // Server-generated timestamp (when processed)
}
Events are divided into 3 types, create update and delete events
Create events simply create the object as given in Data
Delete events simply mark an object as deleted (not actually delete!) via its ItemID
This simply means we set the DeletedAt field of the object to the current timestamp
Updates to deleted events are processed as usual, we have no concept of a "deleted" item, only an item with a field set to a value
Which is then filtered against when fetching and it so happens to be named "DeletedAt"
Update events are to modify a field of a row and never more than one field
Therefore its data is only the diff in the form of "age = 3"
When creating an event only the Type and ItemID must be provided
Data is optional (delete events have no data)
Hash, EventID and Seq are to be computed server side
When creating an event, only Data, Collection, and ItemID are required from the client. Hash, EventID, Seq, and Timestamp are computed server-side.
On the server side with an incoming event:
Grab the latest event
Assign the event a sequence number that is incremented from the latest
Create its EventID (generate a uuid-v4)
Assign it a Timestamp
Compute the hash from the dump of the current event PLUS the previous event's hash
When serializing the event write the serialization function manually to ensure field order
Do not use json serialize or %+v but manually string together the fields
And only then apply the patch
For create events that is insert objects
For delete events that is mark objects as deleted
For update events get the object, apply the diff and sav the object
Server-side event processing:
- Retrieve the latest event for the collection.
- Assign the next sequence number (incremented from the latest).
- Generate a new EventID (uuid-v4).
- Assign the current timestamp.
- Compute the event hash as a function of the current event's data and the previous event's hash.
- Serialize the event manually (not via json.Marshal or %+v) to ensure field order for hashing.
- Apply the patch to the cached data row.
Events are to be periodically merged on the server
Maybe set this cutoff to 2 or 3 days
This means resolve all the events, delete the event log and generate an event log only having create events with the data we resolved
Hopefully we will never have more than a few hundred of events
Do NOT reset the seq number at any point, always increment from last
Event log compaction:
- Every 2 days, merge and compact the event log for each collection.
- All events older than 2 days are resolved, and a new minimal event log is generated that produces the same state.
- Sequence numbers (Seq) are never reset and always increment from the last value.
- Before merging or deleting old events, save the original event log as a timestamped backup file.
Maybe instead of deleting the event log save it somewhere just to have a backup
Maybe cram them into a text file and save with timestamp
Maybe don't delete/merge the whole event log but only "old" events like >2d
While keeping the "new" events (<2d)
Client requirements:
- Must be able to apply patches and fetch objects.
- Must store:
- last_seq: sequence number of the last processed event
- last_hash: hash of the last processed event
- events: local event log of all processed events
- pending_events: locally generated events not yet sent to the server
- On startup, fetch new events from the server since last_seq and apply them.
- When modifying objects, generate events and append to pending_events.
- Periodically or opportunistically send pending_events to the server.
- Persist the event log (events and pending_events) locally.
- If the server merges the event log, the client detects divergence by comparing last_seq and last_hash.
- If sequence matches but hash differs, the server sends the full event log; the client reconstructs its state from this log.
If the server merges the event log and the client has unsent local events:
- Client fetches the merged events from the server.
- Applies merged events to local state.
- Reapplies unsent local events on top of the updated state.
- Resends these events to the server.
On the client side we have to be able to apply patches and fetch objects
The client is to keep a sequence number and hash of the last event it has processed
When starting up ask the server for any new events since its last sequence number
Get any new events and apply them to the local state
When modifying objects generate events and append them to our local event log
Periodically or when possible try to send those events to the server
This means we have to keep saved the event log locally
When the event log is merged on the server our local will diverge
We will only know this by comparing the client hash and seq with the server hash and seq
For example the client may have seq 127 and hash "abcd123" while the server, after merging, has seq 127 and hash "efgh456"
Since on the server the seq 127 will have no previous event (merged - deleted)
While on the client it will have some event
At that point the server is to send the whole event log again and the client is to reconstruct it again
If a client sends events after the event log has been merged:
- The server accepts and applies these events as usual, regardless of the client's log state.
Merging the event log must not alter the resulting data state.
Required endpoints:
GET /api/<collection>/sync?last_seq=<last_seq>&last_hash=<last_hash>
- Returns all events after the specified last_seq and last_hash.
- If the provided seq and hash do not match the server's, returns the entire event log (client is out of sync).
PATCH /api/<collection>/events
- Accepts a JSON array of RFC6902 patch objects.
Server processing:
- As new events arrive, process the event log and update the cached state for the collection.
- The current state is available for clients that do not wish to process the event log.
- Only new events need to be applied to the current state; no need to reprocess the entire log each time.
- Track the last event processed for each collection (sequence number and hash).
On startup, the server must:
- Automatically create required collections: one for events and one for items (data state).
- Events must be collection-agnostic and support any collection; at least one example collection is created at startup.
- Ensure required columns exist in collections; if missing, reject PATCH requests with an error.
- Each collection maintains its own sequence number, hash, and event log.
IF the server merged the event log and our client has events that have not yet been sent
Then get the new events from server, apply them, and apply our local events on top of those
And try to send them to server again
On the server side if a client sends us events after we merged the event log
We may simply naturally apply them even if the client was not operating on the merged event log
At the end of the day merging the event log should make no changes to the data
---
Actually for pocketbase we might want to generalize this
Maybe create a "Collection" field as well and allow the events to manipulate any table...
That way our Data isn't tied to a table...
## RFC6902
Wait actually we can use RFC6902
"JSON patch standard"
It defines a way to apply patches to JSON documents...
Exactly what we need
https://datatracker.ietf.org/doc/html/rfc6902
Some highlights:
Operation objects MUST have exactly one "op" member, whose value
indicates the operation to perform. Its value MUST be one of "add",
"remove", "replace", "move", "copy", or "test"; other values are
@@ -228,8 +212,3 @@ Some highlights:
## test
I think we don't care about this one
For this then use the PATCH http method
And simply submit patches one by one
"Patch" here is synonymous with our "event"
Well nearly, a patch is part of an event