Compare commits
1 Commits
ff0d0b3bd3
...
master
Author | SHA1 | Date | |
---|---|---|---|
2a01aa8d96 |
157
Spec.md
157
Spec.md
@@ -1,103 +1,87 @@
|
||||
Event log based store
|
||||
# Event Log Based Store
|
||||
|
||||
The data rows of our table are to be recreated from an event log
|
||||
All interactions with the rows is to happen exclusively via events from/in the log
|
||||
For performance reasons we are to cache these data rows as well for quick lookup
|
||||
All data rows are reconstructed exclusively from an event log. All interactions with data rows must occur via events in the log. For performance, data rows are cached for quick lookup.
|
||||
|
||||
Events in the log are to take form of:
|
||||
Events are defined as:
|
||||
type Event struct {
|
||||
// Server generated sequence number of the event - ie when it was applied
|
||||
Seq int `json:"seq"`
|
||||
// Type of the event - create, update, delete, defined by the client
|
||||
Type string `json:"type"`
|
||||
// Hash of the event - server generated, gurantees the event was processed
|
||||
Hash string `json:"hash"`
|
||||
// ItemID of the item that is to be manipulated, defined by the client
|
||||
ItemID string `json:"item_id"`
|
||||
// EventID of the event - server generated, gurantees the event was processed
|
||||
EventID string `json:"event_id"`
|
||||
// Collection of the item that is to be manipulated, defined by the client
|
||||
Collection string `json:"collection"`
|
||||
// Data that is to be used for manipulation; for create events that's the full objects and for update events that's the diff
|
||||
Data map[string]interface{} `json:"data"`
|
||||
// Timestamp of the event - server generated, when the event was processed
|
||||
Timestamp time.Time `json:"timestamp"`
|
||||
Seq int `json:"seq"` // Server-generated sequence number (applied order)
|
||||
Hash string `json:"hash"` // Server-generated hash, guarantees event was processed
|
||||
ItemID string `json:"item_id"` // Client-defined item identifier
|
||||
EventID string `json:"event_id"` // Server-generated event identifier (uuid-v4)
|
||||
Collection string `json:"collection"` // Client-defined collection/table name
|
||||
Data string `json:"data"` // JSON array of RFC6902 patches
|
||||
Timestamp time.Time `json:"timestamp"` // Server-generated timestamp (when processed)
|
||||
}
|
||||
Events are divided into 3 types, create update and delete events
|
||||
Create events simply create the object as given in Data
|
||||
Delete events simply mark an object as deleted (not actually delete!) via its ItemID
|
||||
This simply means we set the DeletedAt field of the object to the current timestamp
|
||||
Updates to deleted events are processed as usual, we have no concept of a "deleted" item, only an item with a field set to a value
|
||||
Which is then filtered against when fetching and it so happens to be named "DeletedAt"
|
||||
Update events are to modify a field of a row and never more than one field
|
||||
Therefore its data is only the diff in the form of "age = 3"
|
||||
|
||||
When creating an event only the Type and ItemID must be provided
|
||||
Data is optional (delete events have no data)
|
||||
Hash, EventID and Seq are to be computed server side
|
||||
When creating an event, only Data, Collection, and ItemID are required from the client. Hash, EventID, Seq, and Timestamp are computed server-side.
|
||||
|
||||
On the server side with an incoming event:
|
||||
Grab the latest event
|
||||
Assign the event a sequence number that is incremented from the latest
|
||||
Create its EventID (generate a uuid-v4)
|
||||
Assign it a Timestamp
|
||||
Compute the hash from the dump of the current event PLUS the previous event's hash
|
||||
When serializing the event write the serialization function manually to ensure field order
|
||||
Do not use json serialize or %+v but manually string together the fields
|
||||
And only then apply the patch
|
||||
For create events that is insert objects
|
||||
For delete events that is mark objects as deleted
|
||||
For update events get the object, apply the diff and sav the object
|
||||
Server-side event processing:
|
||||
- Retrieve the latest event for the collection.
|
||||
- Assign the next sequence number (incremented from the latest).
|
||||
- Generate a new EventID (uuid-v4).
|
||||
- Assign the current timestamp.
|
||||
- Compute the event hash as a function of the current event's data and the previous event's hash.
|
||||
- Serialize the event manually (not via json.Marshal or %+v) to ensure field order for hashing.
|
||||
- Apply the patch to the cached data row.
|
||||
|
||||
Events are to be periodically merged on the server
|
||||
Maybe set this cutoff to 2 or 3 days
|
||||
This means resolve all the events, delete the event log and generate an event log only having create events with the data we resolved
|
||||
Hopefully we will never have more than a few hundred of events
|
||||
Do NOT reset the seq number at any point, always increment from last
|
||||
Event log compaction:
|
||||
- Every 2 days, merge and compact the event log for each collection.
|
||||
- All events older than 2 days are resolved, and a new minimal event log is generated that produces the same state.
|
||||
- Sequence numbers (Seq) are never reset and always increment from the last value.
|
||||
- Before merging or deleting old events, save the original event log as a timestamped backup file.
|
||||
|
||||
Maybe instead of deleting the event log save it somewhere just to have a backup
|
||||
Maybe cram them into a text file and save with timestamp
|
||||
Maybe don't delete/merge the whole event log but only "old" events like >2d
|
||||
While keeping the "new" events (<2d)
|
||||
Client requirements:
|
||||
- Must be able to apply patches and fetch objects.
|
||||
- Must store:
|
||||
- last_seq: sequence number of the last processed event
|
||||
- last_hash: hash of the last processed event
|
||||
- events: local event log of all processed events
|
||||
- pending_events: locally generated events not yet sent to the server
|
||||
- On startup, fetch new events from the server since last_seq and apply them.
|
||||
- When modifying objects, generate events and append to pending_events.
|
||||
- Periodically or opportunistically send pending_events to the server.
|
||||
- Persist the event log (events and pending_events) locally.
|
||||
- If the server merges the event log, the client detects divergence by comparing last_seq and last_hash.
|
||||
- If sequence matches but hash differs, the server sends the full event log; the client reconstructs its state from this log.
|
||||
|
||||
If the server merges the event log and the client has unsent local events:
|
||||
- Client fetches the merged events from the server.
|
||||
- Applies merged events to local state.
|
||||
- Reapplies unsent local events on top of the updated state.
|
||||
- Resends these events to the server.
|
||||
|
||||
On the client side we have to be able to apply patches and fetch objects
|
||||
The client is to keep a sequence number and hash of the last event it has processed
|
||||
When starting up ask the server for any new events since its last sequence number
|
||||
Get any new events and apply them to the local state
|
||||
When modifying objects generate events and append them to our local event log
|
||||
Periodically or when possible try to send those events to the server
|
||||
This means we have to keep saved the event log locally
|
||||
When the event log is merged on the server our local will diverge
|
||||
We will only know this by comparing the client hash and seq with the server hash and seq
|
||||
For example the client may have seq 127 and hash "abcd123" while the server, after merging, has seq 127 and hash "efgh456"
|
||||
Since on the server the seq 127 will have no previous event (merged - deleted)
|
||||
While on the client it will have some event
|
||||
At that point the server is to send the whole event log again and the client is to reconstruct it again
|
||||
If a client sends events after the event log has been merged:
|
||||
- The server accepts and applies these events as usual, regardless of the client's log state.
|
||||
|
||||
Merging the event log must not alter the resulting data state.
|
||||
|
||||
Required endpoints:
|
||||
|
||||
GET /api/<collection>/sync?last_seq=<last_seq>&last_hash=<last_hash>
|
||||
- Returns all events after the specified last_seq and last_hash.
|
||||
- If the provided seq and hash do not match the server's, returns the entire event log (client is out of sync).
|
||||
|
||||
PATCH /api/<collection>/events
|
||||
- Accepts a JSON array of RFC6902 patch objects.
|
||||
|
||||
Server processing:
|
||||
- As new events arrive, process the event log and update the cached state for the collection.
|
||||
- The current state is available for clients that do not wish to process the event log.
|
||||
- Only new events need to be applied to the current state; no need to reprocess the entire log each time.
|
||||
- Track the last event processed for each collection (sequence number and hash).
|
||||
|
||||
On startup, the server must:
|
||||
- Automatically create required collections: one for events and one for items (data state).
|
||||
- Events must be collection-agnostic and support any collection; at least one example collection is created at startup.
|
||||
- Ensure required columns exist in collections; if missing, reject PATCH requests with an error.
|
||||
- Each collection maintains its own sequence number, hash, and event log.
|
||||
|
||||
IF the server merged the event log and our client has events that have not yet been sent
|
||||
Then get the new events from server, apply them, and apply our local events on top of those
|
||||
And try to send them to server again
|
||||
On the server side if a client sends us events after we merged the event log
|
||||
We may simply naturally apply them even if the client was not operating on the merged event log
|
||||
At the end of the day merging the event log should make no changes to the data
|
||||
|
||||
---
|
||||
|
||||
|
||||
Actually for pocketbase we might want to generalize this
|
||||
Maybe create a "Collection" field as well and allow the events to manipulate any table...
|
||||
That way our Data isn't tied to a table...
|
||||
|
||||
|
||||
## RFC6902
|
||||
Wait actually we can use RFC6902
|
||||
"JSON patch standard"
|
||||
It defines a way to apply patches to JSON documents...
|
||||
Exactly what we need
|
||||
https://datatracker.ietf.org/doc/html/rfc6902
|
||||
|
||||
Some highlights:
|
||||
Operation objects MUST have exactly one "op" member, whose value
|
||||
indicates the operation to perform. Its value MUST be one of "add",
|
||||
"remove", "replace", "move", "copy", or "test"; other values are
|
||||
@@ -227,9 +211,4 @@ Some highlights:
|
||||
|
||||
## test
|
||||
I think we don't care about this one
|
||||
|
||||
|
||||
For this then use the PATCH http method
|
||||
And simply submit patches one by one
|
||||
"Patch" here is synonymous with our "event"
|
||||
Well nearly, a patch is part of an event
|
||||
|
Reference in New Issue
Block a user