Forked from an inaccessible project.
-
Andrea Lanfranchi authored
* More strict path enforcement for Mingw compilers * Update Readme about antiviruses * Update README.md incomplete links * Incomplete link
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
README.md 5.56 KiB
ETL
ETL framework is most commonly used in staged sync.
It implements a pattern where we extract some data from a database, transform it, then put it into temp files and insert back to the database in sorted order.
Inserting entries into our KV storage sorted by keys helps to minimize write amplification, hence it is much faster, even considering additional I/O that is generated by storing files.
It behaves similarly to enterprise Extract, Tranform, Load frameworks, hence the name. We use temporary files because that helps keep RAM usage predictable and allows using ETL on large amounts of data.
Example
func keyTransformExtractFunc(transformKey func([]byte) ([]byte, error)) etl.ExtractFunc {
return func(k, v []byte, next etl.ExtractNextFunc) error {
newK, err := transformKey(k)
if err != nil {
return err
}
return next(k, newK, v)
}
}
err := etl.Transform(
db, // database
dbutils.PlainStateBucket, // "from" bucket
dbutils.CurrentStateBucket, // "to" bucket
datadir, // where to store temp files
keyTransformExtractFunc(transformPlainStateKey), // transformFunc on extraction
etl.IdentityLoadFunc, // transform on load
etl.TransformArgs{ // additional arguments
Quit: quit,
},
)
if err != nil {
return err
}
Data Transformation
The whole flow is shown in the image