Google Summer of Code 2020

14 minute read

Published:

Bringing Boost.Real to review-ready state

boost_logo.png

Boost.Real Documentation

Project Details

Introduction

Boost.Real is a C++17 library which aims at developing numerical data-type for real number representation. The library provides the flexibility of performing arbitrary-precision arithmetic. The foundational work was done by Laouen belloli in GSoC’18, after that, several improvements and new additions were made in GSoC’19 by Sagnik Dey and Kimberly Swanson. For GSoC’20, I and Vikram Chundawat were selected to finish the details and add few functionalities and hence make the library review-ready. Since Vikram and mine’s proposal had several overlaps, we distributed the work among ourselves. After the distribution, I got following tasks:

  • Changing internal base of Boost.Real numbers to a more efficient value.
  • Implementation of optimized long division algorithm.
  • Implementation of division operator.
  • Implementation of integral power operation.
  • Implementation of method to evaluate Pi digits.
  • Implementaion of % operator.
  • Implementation of Karatsuba Multiplication algorithm.
  • Implementation of tests and documentation.

Before the start of official coding period, we decided on not changing the base of the internal representation as changing it would make few operations more expensive resulting in poor performance.

Phase I

Milestones:

  • Implementation of long division algorithm.
  • Implementation of division operator.
  • Implementation of integral power operation.
  • Adding tests and documentation.

Long division algorithm

Long division was already implemented in the library for base 10, since it was required for the purpose of base change. I had to implement it for any general base as I was going to need it while implementing % operator.
I used Knuth’s long division algorithm as it is quite efficient and easy to implement as well. The Knuth’s algorithm optimizes the part where we search for quotient digit (can be anything between 0 to base - 1) in the traditional algorithm. In the traditional algorithm, the search is performed linearly from 0 to Base - 1 making the complexity O(base) (a naive optimization would be to use binary search) whereas the Knuth’s algorithm does this in constant time, which provides significant optimization as our internal base increases (our internal base can go upto 1018).
Initial Commit regarding Knuth’s long division: https://github.com/BoostGSoC20/Real/commit/25f1a06a7ea91bed2413a6417f842ed10dee7c3a
Refrence to Knuth’s Algorithm: algorithm D of section 4.3.2 of volume 2 of The Art of Computer Programming by D. E. Knuth.

Division Operator

This was the most important task in bringing the library to review-ready state. I used Newton-Raphson method for the division algorithm, it involves evaluating reciprocal of the divisor and then multiplying it with the dividend to get the result. The method is quite efficient because of the quadratic convergence of the Newton-Raphson algorithm. The initial guess that I used for Newton-Raphson algorithm involved division therefore I implemented an approximate division algorithm to evaluate the initial guess. Later on the initial guess can replaced with something that doesn’t involve division.
Intial commit regarding division operator: https://github.com/BoostGSoC20/Real/commit/599890e5fdf9290bb1197d7b054d0517a18f58a3

Integral Power Operation

Integral Power was implemented as new operation between two reals. The implementation was based on const_precision_iterators which makes it more general as were the other operations already present. Binary exponentiation was used to evaluate the powers, making the complexity O(C * log(exponent)) where C is the time taken to multiply two reals.
Following is overview of how power operation has been implemented (few lines of code has been skipped) :

case OPERATION::INTEGER_POWER: {
	ro.get_rhs_itr().iterate_n_times(ro.get_rhs_itr().maximum_precision());

	if (ro.get_rhs_itr().get_interval().lower_bound != ro.get_rhs_itr().get_interval().upper_bound ||
	(int) ro.get_rhs_itr().get_interval().lower_bound.digits.size() > ro.get_rhs_itr().get_interval().lower_bound.exponent) {
		throw non_integral_exponent_exception();
	}

	if (ro.get_rhs_itr().get_interval().upper_bound.positive == false) {
		throw negative_integers_not_supported();
	}

	if (ro.get_lhs_itr().get_interval().positive()) {
		this->_approximation_interval.upper_bound = 
			tmp.binary_exponentiation(ro.get_lhs_itr().get_interval().upper_bound, exponent);
		this->_approximation_interval.lower_bound =
			tmp.binary_exponentiation(ro.get_lhs_itr().get_interval().lower_bound, exponent);
	} else if (ro.get_lhs_itr().get_interval().negative()) {
		if (exponent_is_even) {
			this->_approximation_interval.upper_bound =
			    tmp.binary_exponentiation(ro.get_lhs_itr().get_interval().lower_bound, exponent);
			this->_approximation_interval.lower_bound =
			    tmp.binary_exponentiation(ro.get_lhs_itr().get_interval().upper_bound, exponent);
		} else {
			this->_approximation_interval.upper_bound =
			    tmp.binary_exponentiation(ro.get_lhs_itr().get_interval().upper_bound, exponent);
			this->_approximation_interval.lower_bound =
			    tmp.binary_exponentiation(ro.get_lhs_itr().get_interval().lower_bound, exponent);
		}
	} else {
		if (exponent_is_even) {
			if (ro.get_lhs_itr().get_interval().upper_bound.abs() > ro.get_lhs_itr().get_interval().lower_bound.abs()) {
				this->_approximation_interval.upper_bound =
					tmp.binary_exponentiation(ro.get_lhs_itr().get_interval().upper_bound, exponent);
				this->_approximation_interval.lower_bound = zero;
			} else {
				this->_approximation_interval.upper_bound =
					tmp.binary_exponentiation(ro.get_lhs_itr().get_interval().lower_bound, exponent);
				this->_approximation_interval.lower_bound = zero;
			}
		} else {
			this->_approximation_interval.upper_bound =
				tmp.binary_exponentiation(ro.get_lhs_itr().get_interval().upper_bound, exponent);
			this->_approximation_interval.lower_bound =
				tmp.binary_exponentiation(ro.get_lhs_itr().get_interval().lower_bound, exponent);
		}
	}

	break;
} 

Intial commit regarding integral power operation: https://github.com/BoostGSoC20/Real/commit/84f36e9e2451f15d1ebf42178944e91eb34b7147

Phase II

Milestones:

  • Implementing method to evaluate digits of Pi.
  • Implementing % operator.
  • Implementing calculation of few digits of real_algorithm number during compile time

Pi

Pi digits were calculated using Chudnovsky algorithm. The implementation is not the most efficient one therefore after few digits the algorithm runs slow. Chudnovsky algorithm is an iterative algorithm which is being used to calculate the nth digit of the real_algorithm number. There are redundant calculations being performed as for calculating (n + 1)th digit, the algorithm recalculates the previously calculated values as this time it needs more precise values than previous. There is an algorithm which calculates nth digit without calculating previous digits, but the algorithm gives output in hexadecimal base, which doesn’t fulfill our requirement.
Following is overview of how Chudnovsky algorithm has been implemented (few lines of code has been skipped) :

template <typename T = int>
T pi_nth_digit(unsigned int n) {
	// Chudnovsky Algorithm
	// pi = C * ( sum_from_k=0_to_k=x (Mk * Lk / Xk) )^(-1) 
	// increasing x you get more precise pi

	static const boost::real::real_explicit<T> real_k("6");
	static const boost::real::real_explicit<T> real_m("1");
	static const boost::real::real_explicit<T> real_l("13591409");
	static const boost::real::real_explicit<T> real_l0("545140134");
	static const boost::real::real_explicit<T> real_x("1");
	static const boost::real::real_explicit<T> real_x0("-262537412640768000");
	static const boost::real::real_explicit<T> real_s("13591409");

	static exact_number<T> L0 = real_l0.get_exact_number();
	static exact_number<T> X0 = real_x0.get_exact_number();

	bool nth_digit_found = false;
	bool first_iteration_over = false;

	exact_number<T> iteration_number = one;
	exact_number<T> prev_pi;
	exact_number<T> pi;
	exact_number<T> error;
	const exact_number<T> max_error(std::vector<T> {1}, -(n + 1), true);

	do {  
	    exact_number<T> temp = K * K * K - _16 * K;
	    temp.divide_vector(iteration_number * iteration_number * iteration_number, n + 1, true);
	    M *= temp;
	    X *= X0;
	    L += L0;
	    K += _12;

	    temp = M * L;
	    temp.divide_vector(X, n + 1, false);
	    S += temp;

	    temp = C;
	    temp.divide_vector(S, n + 1, false);

	    pi = temp;
	    if (!first_iteration_over) {
		prev_pi = pi;
		first_iteration_over = true;
		iteration_number += one;
	    } else {
		error = pi - prev_pi;
		error.positive = true;

		if (error < max_error) {
		    nth_digit_found = true;
		}
		iteration_number += one;
		prev_pi = pi;
	    }

	} while (!nth_digit_found);

	return pi[n];
}

Intial commit regarding Pi: https://github.com/BoostGSoC20/Real/commit/34ac759f128cd2ffb5d684c5ff7a5a0e54025360

% operator

As % operator is only defined for integers, this operator was overloaded by Vikram in his integer_number class. The operator used the Knuth’s long division algorithm to find the remainder.

Calculation of digits of real_algorithm number during compile-time

This task was proposed to provide an optimization for real_algorithm numbers. The task involved calculating the starting few digits of real_algorithm number during compilation, if the function for n’th digit provided by the user is constexpr. It seemed that the task wasn’t doable since it was not possible to detect whether the function provided by the user is constepxr or not. (In normal scenarios it is possible to check whether the function is constexpr or not, but in our case the since the function was passed between many functions before the check was performed it’s constexpr’ness is lost in most of the cases.)

Phase III

Milestone:

  • Implement Karatsuba algorithm.
  • Tests and Documentation.

Karatsuba algorithm

I implemented the Karatsuba algorithm which improved the performance of the multiplication operation for vectors of much bigger sizes. The implementation is bit more optimised than the traditional implementation as we are not storing zeroes (required to make the length of operands equal) in vectors. This small optimisation saves a lot of memory as well as time. Benchmarking of the algorithm was done by multiplying incrementally larger examples using both standard multiplication algorithm and Karatsuba algorithm and comparing the performances. The examples used for the benchmarking purpose were vectors of same length, in general the benchmarking becomes difficult if we consider examples of different sizes which is more practical situation.

Following is overview of how karatsuba algorithm has been implemented (few lines of code has been skipped) :

template <typename T = int>
void karatsuba_multiplication (
    exact_number<T> &other, 
    const T base = (std::numeric_limits<T>::max() / 4) * 2
) {

	// this --- a, other --- b
	const int a_size = this->digits.size();
	const int b_size = other.digits.size();
	const int a_exponent = this->exponent;
	const int b_exponent = other.exponent;
	const bool a_sign = this->positive;
	const bool b_sign = other.positive;

	const int max_length = std::max(a_size, b_size);

	if (max_length <= KARATSUBA_BASE_CASE_THRESHOLD || std::abs(a_size - b_size) > std::min(a_size, b_size)) {
	    this->standard_multiplication(other, base);
	    return;
	}

	// appending zeroes in front to make sizes of a & b equal
	int a_pref_zeroes = 0, b_pref_zeroes = 0;
	if (a_size < max_length) {
	    a_pref_zeroes = max_length - a_size;
	} else if (b_size < max_length) {
	    b_pref_zeroes = max_length - b_size;
	} 

	const int left_half_length = max_length / 2;
	const int right_half_length = max_length - left_half_length;

	/*
		Variable Explanation
	    a is vector representation of "this", b is vector representation of "other"
	    a = exact_al * base^(right_half_length) + exact_ar
	    b = exact_bl * base^(right_half_length) + exact_br
	*/
	
	// This is where we perform optimization (not appending zeroes in vector)
	if (a_pref_zeroes > 0) {
	    if (a_pref_zeroes >= left_half_length) {
		exact_al = exact_number(std::vector<T> (), true);
		exact_ar = exact_number(this->digits, true);
	    } else {
		exact_al = exact_number(std::vector<T> (this->digits.begin(), this->digits.begin() + left_half_length - a_pref_zeroes), true);
		exact_ar = exact_number(std::vector<T> (this->digits.begin() + left_half_length - a_pref_zeroes, this->digits.end()), true);
	    }
	    exact_bl = exact_number(std::vector<T> (other.digits.begin(), other.digits.begin() + left_half_length), true);
	    exact_br = exact_number(std::vector<T> (other.digits.begin() + left_half_length, other.digits.end()), true);
	} else if (b_pref_zeroes > 0) {
	    if (b_pref_zeroes >= left_half_length) {
		exact_bl = exact_number(std::vector<T> (), true);
		exact_br = exact_number(other.digits, true);
	    } else {
		exact_bl = exact_number(std::vector<T> (other.digits.begin(), other.digits.begin() + left_half_length - b_pref_zeroes), true);
		exact_br = exact_number(std::vector<T> (other.digits.begin() + left_half_length - b_pref_zeroes, other.digits.end()), true);
	    }
	    exact_al = exact_number(std::vector<T> (this->digits.begin(), this->digits.begin() + left_half_length), true);
	    exact_ar = exact_number(std::vector<T> (this->digits.begin() + left_half_length, this->digits.end()), true);
	} else {
	    exact_al = exact_number(std::vector<T> (this->digits.begin(), this->digits.begin() + left_half_length), true);
	    exact_ar = exact_number(std::vector<T> (this->digits.begin() + left_half_length, this->digits.end()), true);
	    exact_bl = exact_number(std::vector<T> (other.digits.begin(), other.digits.begin() + left_half_length), true);
	    exact_br = exact_number(std::vector<T> (other.digits.begin() + left_half_length, other.digits.end()), true);
	}

	exact_number<T> sum_al_ar (exact_al);
	exact_number<T> sum_bl_br (exact_bl);

	/*
	    Variable explanation
	    sum_al_ar = al + ar
	    sum_bl_br = bl + br
	*/
	sum_al_ar.add_vector(exact_ar, base - 1);
	sum_bl_br.add_vector(exact_br, base - 1);

	exact_al.karatsuba_multiplication(exact_bl, base);
	sum_al_ar.karatsuba_multiplication(sum_bl_br, base);
	exact_ar.karatsuba_multiplication(exact_br, base);
	/*
	    Variable explanation
	    exact_al = al * bl
	    sum_al_ar = (al + ar) * (bl + br)
	    exact_ar = ar * br
	*/

	sum_al_ar.subtract_vector(exact_al, base - 1);
	sum_al_ar.subtract_vector(exact_ar, base - 1);
	/*
	    sum_al_ar = al * br + ar * bl
	*/

	// multiply exact_al by base^(2 * right_half_length)
	exact_al.exponent += 2 * right_half_length;

	// multiply sum_al_ar by base^(right_half_length)
	sum_al_ar.exponent += right_half_length;

	exact_al.add_vector(sum_al_ar, base - 1);
	exact_al.add_vector(exact_ar, base - 1);

	*this = exact_al;
	this->exponent += -(a_size + b_size) + (a_exponent + b_exponent);
	this->positive = (a_sign == b_sign);
	this->normalize();

}

Initial commit regarding Karatsuba algorithm: https://github.com/BoostGSoC20/Real/commit/83e43dede5e2d101240777fc1eb0dd14a64f3eba

Tests and Documentation

All required tests and documentation were made in parallel with the code implementation. Following is overview of how test for pi has been implemented :

SECTION("TRIGONOMETRIC PROPERTIES") {

	SECTION("sin(pi) == 0") {

		real sin_pi = real::sin(pi);
		boost::real::exact_number zero("0");

		auto sin_pi_iterator = sin_pi.get_real_itr().cbegin();
		++sin_pi_iterator;

		CHECK(sin_pi_iterator.get_interval().lower_bound <= sin_pi_iterator.get_interval().upper_bound);
		CHECK(!sin_pi_iterator.get_interval().lower_bound.positive);
		CHECK(sin_pi_iterator.get_interval().upper_bound.positive);
		++sin_pi_iterator;

		CHECK(sin_pi_iterator.get_interval().lower_bound <= sin_pi_iterator.get_interval().upper_bound);
		CHECK(!sin_pi_iterator.get_interval().lower_bound.positive);
		CHECK(sin_pi_iterator.get_interval().upper_bound.positive);
		++sin_pi_iterator;

		CHECK(sin_pi_iterator.get_interval().lower_bound <= sin_pi_iterator.get_interval().upper_bound);
		CHECK(!sin_pi_iterator.get_interval().lower_bound.positive);
		CHECK(sin_pi_iterator.get_interval().upper_bound.positive);
		++sin_pi_iterator;

		CHECK(sin_pi_iterator.get_interval().lower_bound <= sin_pi_iterator.get_interval().upper_bound);
		CHECK(!sin_pi_iterator.get_interval().lower_bound.positive);
		CHECK(sin_pi_iterator.get_interval().upper_bound.positive);
		++sin_pi_iterator;
	}

}

SECTION("e^pi > pi^e") {

	real one("1");
	real e("2.718281828459045235360287471352662497757");
	real e_raised_to_pi = real::power(e, pi);
	real pi_raised_to_e = real::power(pi, e);

	auto itr1 = e_raised_to_pi.get_real_itr().cbegin();
	auto itr2 = pi_raised_to_e.get_real_itr().cbegin();

	++itr1;
	++itr2;

	CHECK(itr1.get_interval().lower_bound >= itr2.get_interval().upper_bound);

	++itr1;
	++itr2;

	CHECK(itr1.get_interval().lower_bound >= itr2.get_interval().upper_bound);

	++itr1;
	++itr2;

	CHECK(itr1.get_interval().lower_bound >= itr2.get_interval().upper_bound);

}

Commit History

Here is the list of all the commits that were merged in Boost.Real master during or before the start of GSoC 2020 coding period.

Areas for future improvements

There are few things which can be optimized in the library.

  • The implementation of chudnovsky algorithm for nth digit of pi can be more optimal as it performs redundant calculations in evaluating each new digit of pi. If the maximum number of digits of pi user is going to use is known prior, then the redundent calculations could be avoided but then all the calculations would be performed at higher precision (depending on the number of digits of pi user needs) therefore calculation of each digit becomes slower but each new digit is obtained quickly using the previous calculation. Something more smarter or the idea I suggested can be used to optimize calculation of pi digits.
  • Similarly, evaluation of taylor series involves redundant calculation and that is why all the trigonometric functions, logarithm, fractional power, etc are very slow.
  • The Newton-Raphson division algorithm uses an initial guess evaluation of which involves division itself. Due to this circular dependancy I implemented an approximate division method to calculate the initial guess roughly, as the intial guess was crucial in ensuring faster convergence of the newton-raphson algorithm. This initial guess can be replaced with something that doesn’t involve division and hence the use of the approximate division algorithm can be avoided.

Remarks

It was an amazing three months where I got to work with experienced mentors Laouen Belloli and Damian Vicino and also a very talented colleague Vikram Chundawat. It was a great learning experience in terms of writing clean and efficient cpp codes. All those critical code reviews by the mentors improved us a lot. At last, I want to thank Google and Boost c++ for providing me the opportunity to work with the Boost community. :)