Class AdaptiveHistogram

  • All Implemented Interfaces:
    Serializable

    public class AdaptiveHistogram
    extends Object
    implements Serializable
    This class implements a histogram that adapts to an unknown data distribution. It keeps a more or less constant resolution throughout the data range by increasing the resolution where the data is more dense. For example, if the data has such such a distribution that most of the values lie in the 0-5 range and only a few are in the 5-10 range, the histogram would adapt and assign more counting buckets to the 0-5 range and less to the 5-10 range. This implementation provides a method to obtain the accumulative density function for a given data point, and a method to obtain the data point that splits the data set at a given percentile.
    Author:
    Jorge Handl
    See Also:
    Serialized Form
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      protected static interface  AdaptiveHistogram.ValueConversion
      Auxiliary interface for inline functor object.
    • Constructor Summary

      Constructors 
      Constructor Description
      AdaptiveHistogram()
      Class constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addValue​(double value)
      Adds a data point to the histogram.
      long getAccumCount​(double value)
      Returns the cumulative density function for a given data point.
      long getCount​(double value)
      Returns the number of data points stored in the same bucket as a given value.
      protected int getCountPerNodeLimit()
      This method is used by the internal data structure of the histogram to get the limit of data points that should be counted at one bucket.
      double getValueForPercentile​(int percentile)
      Returns the data point that splits the data set at a given percentile.
      void normalize​(double targetMin, double targetMax)
      Normalizes all the values to the desired range.
      void reset()
      Erases all data from the histogram.
      void show()
      Shows the histograms' underlying data structure.
      ArrayList<Cell> toTable()
      Return a table representing the data in this histogram.
    • Constructor Detail

      • AdaptiveHistogram

        public AdaptiveHistogram()
        Class constructor.
    • Method Detail

      • reset

        public void reset()
        Erases all data from the histogram.
      • addValue

        public void addValue​(double value)
        Adds a data point to the histogram.
        Parameters:
        value - the data point to add.
      • getCount

        public long getCount​(double value)
        Returns the number of data points stored in the same bucket as a given value.
        Parameters:
        value - the reference data point.
        Returns:
        the number of data points stored in the same bucket as the reference point.
      • getAccumCount

        public long getAccumCount​(double value)
        Returns the cumulative density function for a given data point.
        Parameters:
        value - the reference data point.
        Returns:
        the cumulative density function for the reference point.
      • getValueForPercentile

        public double getValueForPercentile​(int percentile)
        Returns the data point that splits the data set at a given percentile.
        Parameters:
        percentile - the percentile at which the data set is split.
        Returns:
        the data point that splits the data set at the given percentile.
      • getCountPerNodeLimit

        protected int getCountPerNodeLimit()
        This method is used by the internal data structure of the histogram to get the limit of data points that should be counted at one bucket.
        Returns:
        the limit of data points to store a one bucket.
      • normalize

        public void normalize​(double targetMin,
                              double targetMax)
        Normalizes all the values to the desired range.
        Parameters:
        targetMin - the target new minimum value.
        targetMax - the target new maximum value.
      • show

        public void show()
        Shows the histograms' underlying data structure.
      • toTable

        public ArrayList<Cell> toTable()
        Return a table representing the data in this histogram. Each element is a table cell containing the range limit values and the count for that range.