good morning!!!!

Skip to content
Snippets Groups Projects
Forked from an inaccessible project.
  • Andrea Lanfranchi's avatar
    Win build patch 2 (#2054) · aa1f64eb
    Andrea Lanfranchi authored
    * More strict path enforcement for Mingw compilers
    
    * Update Readme about antiviruses
    
    * Update README.md incomplete links
    
    * Incomplete link
    aa1f64eb
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
README.md 5.56 KiB

ETL

ETL framework is most commonly used in staged sync.

It implements a pattern where we extract some data from a database, transform it, then put it into temp files and insert back to the database in sorted order.

Inserting entries into our KV storage sorted by keys helps to minimize write amplification, hence it is much faster, even considering additional I/O that is generated by storing files.

It behaves similarly to enterprise Extract, Tranform, Load frameworks, hence the name. We use temporary files because that helps keep RAM usage predictable and allows using ETL on large amounts of data.

Example

func keyTransformExtractFunc(transformKey func([]byte) ([]byte, error)) etl.ExtractFunc {
	return func(k, v []byte, next etl.ExtractNextFunc) error {
		newK, err := transformKey(k)
		if err != nil {
			return err
		}
		return next(k, newK, v)
	}
}

err := etl.Transform(
		db,                                              // database 
		dbutils.PlainStateBucket,                        // "from" bucket
		dbutils.CurrentStateBucket,                      // "to" bucket
		datadir,                                         // where to store temp files
		keyTransformExtractFunc(transformPlainStateKey), // transformFunc on extraction
		etl.IdentityLoadFunc,                            // transform on load
		etl.TransformArgs{                               // additional arguments
			Quit: quit,
		},
	)
	if err != nil {
		return err
	}

Data Transformation

The whole flow is shown in the image