Added by: ~James, 2015 I 19

Added by: ~James, 2015 I 19

Hi,

I came across this library while trying to solve a problem. I am trying to sum a very large group of numbers and the processor errors that I am getting are just too big. I am converting double floating point numbers into Ints * 2^(Const Int) so that I can eliminate these errors. I want to use as few high precision numbers as possible to reduce runtime and I really only want to use a high precision Integer as a store and then for one multiplication at the end. It is this final multiplication that is hurting me.

My code briefly looks like

#include <cstdargs>

#include <ttmath.h>

double FunctionSum(Int numArgs, (double) ...) {

__int64 var1, var2 // these are working variables and arrays, they will fit into __int64

double dbl1, dbl2 // unavoidable

ttmath::Int<4> sum1(0), sum2(0);

code ...

return FinalSum (= sum1*dbl1 + sum2*dbl2);

}

The final line is where I am having the problem. If I use __int64 then this line is accepted by the compiler and the algorithm works well except in cases where I get integer overflow. If I use this typedef from the library here I get the following error.

binary '*' : no operator found which takes a right-hand operand of type 'double'

Can anyone help me with a workaround, please? I looked for a manual and I didn't find one, and have a really hard time understanding the header files, so I don't know what else to do.

Declaring everything as high precision variables is also out of the question because 5 minute simulation runtimes turn into hours or days.

Thanks

Added by: tomek, 2015 I 20

> return FinalSum (= sum1*dbl1 + sum2*dbl2);

Even if there was such an operator you would lose the precision, consider this:

"10001 (int) * 0.00001 (double). You end with zero. First change Int<> to Big<> with a correct
size of the mantissa and then make the multiplication, sample:

#include <ttmath/ttmath.h>

#include <stdint.h>

#include <iostream>

double FunctionSum()

{

int64_t var1, var2;

double dbl1, dbl2;

ttmath::Int<4> sum1(0), sum2(0);

sum1 = "100000000000000000000000";

sum2 = "200000000000000000000000";

dbl1 = 0.000456;

dbl2 = 0.000123;

ttmath::Big<1, 4> dsum1 = sum1;

ttmath::Big<1, 4> dsum2 = sum2;

ttmath::Big<1, 4> res = dsum1 * dbl1 + dsum2 * dbl2;

return res.ToDouble();

}

int main()

{

std::cout << FunctionSum() << std::endl;

}

Added by: ~James, 2015 I 21

Thanks Tomek, I will try this :)

J