Interface SampleEstimatorFactory
-
public interface SampleEstimatorFactory
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static classSampleEstimatorFactory.EstimationType
-
Field Summary
Fields Modifier and Type Field Description static org.apache.commons.logging.LogLOG
-
Method Summary
Static Methods Modifier and Type Method Description static intdistinctCount(int[] frequencies, int nRows, int sampleSize, SampleEstimatorFactory.EstimationType type)Estimate a distinct number of values based on frequencies.static intdistinctCount(int[] frequencies, int nRows, int sampleSize, SampleEstimatorFactory.EstimationType type, HashMap<Integer,Double> solveCache)Estimate a distinct number of values based on frequencies.
-
-
-
Method Detail
-
distinctCount
static int distinctCount(int[] frequencies, int nRows, int sampleSize, SampleEstimatorFactory.EstimationType type)Estimate a distinct number of values based on frequencies.- Parameters:
frequencies- A list of frequencies of unique values, NOTE all values contained should be larger than zeronRows- The total number of rows to consider, NOTE should always be larger or equal to sum(frequencies)sampleSize- The size of the sample, NOTE this should ideally be scaled to match the sum(frequencies) and should always be lower or equal to nRowstype- The type of estimator to use- Returns:
- A estimated number of unique values
-
distinctCount
static int distinctCount(int[] frequencies, int nRows, int sampleSize, SampleEstimatorFactory.EstimationType type, HashMap<Integer,Double> solveCache)Estimate a distinct number of values based on frequencies.- Parameters:
frequencies- A list of frequencies of unique values, NOTE all values contained should be larger than zeronRows- The total number of rows to consider, NOTE should always be larger or equal to sum(frequencies)sampleSize- The size of the sample, NOTE this should ideally be scaled to match the sum(frequencies) and should always be lower or equal to nRowstype- The type of estimator to usesolveCache- A solve cache to avoid repeated calculations- Returns:
- A estimated number of unique values
-
-