pyspark.sql.functions.theta_sketch_estimate#

pyspark.sql.functions.theta_sketch_estimate(col)[source]#

Returns the estimated number of unique values given the binary representation of a Datasketches ThetaSketch.

New in version 4.1.0.

Parameters
colColumn or column name
Returns
Column

The estimated number of unique values for the ThetaSketch.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([1,2,2,3], "INT")
>>> df.agg(sf.theta_sketch_estimate(sf.theta_sketch_agg("value"))).show()
+--------------------------------------------------+
|theta_sketch_estimate(theta_sketch_agg(value, 12))|
+--------------------------------------------------+
|                                                 3|
+--------------------------------------------------+