Some of you might remember an article I published more or less a year ago, called “Mathematical Curiosities”. In that article I showed an indeed curious formula: the 1964 “Formula for Primes” by C. P. Willans that looks like this:
$$ n^{th} prime = 1 + \sum_{i=1}^{2^n} \Bigg \lfloor \left( \frac{n}{\sum_{j=1}^{i} \Big \lfloor \left( cos \pi \frac{(j-1)! + 1}{j} \right)^2 \Big \rfloor } \right) ^{\frac{1}{n}} \Bigg \rfloor $$It’s not really an efficient formula (hint: those sum signs mean loops…), but it works.
In that article I provided 2 implementations: one in Python, and another in Ruby. The Ruby version was slightly faster than the Python one, but the running times were quite long in both cases.
I wanted to have a working C++ program using the Boost.Multiprecision library that would do exactly the same thing as the Ruby one, and if possible, faster.
So what’s a developer to do in 2025? Well, I asked ChatGPT to do the translation, and then, to make the app parallelizable across multiple cores.
TL;DR: I’m astonished. For reference, this was the starting point: the Ruby code.
require 'bigdecimal'
require 'bigdecimal/math'
class BigDecimal
def fact
(2..self).reduce(1,:*)
end
end
PI = BigMath.PI(10)
def prime(n)
sum = BigDecimal(0)
limit = 2 ** n
for i in 1..limit do
inner_sum = BigDecimal(0)
for j in 1..i do
div = (BigDecimal(j - 1).fact + 1) / BigDecimal(j)
inner_term = BigMath.cos(PI * div, 10) ** BigDecimal(2)
inner_sum = inner_sum + inner_term.floor
end
term = (n / inner_sum) ** Rational(1, n)
sum = sum + term.floor
end
sum + BigDecimal(1)
end
for i in 1..10 do
puts(prime(i).to_s('F'))
end
Let’s see what AI can do for us today.
Make it Work
ChatGPT (the free version, as I don’t have a paid account) translated it into working code after only 2 corrections.
First it returned a version that would include <cmath> and would pretend that cos(cpp_dec_float_100) would work. Of course, it doesn’t, and that’s why there’s boost::math::cos() for that.
Then it returned a version that made the same mistake, but this time about the floor() version.
The third time, after politely pointing out the misconceptions above, ChatGPT returned code that not only compiled without warnings, it ran without issues.
This is what the translated C++ code looked like at the end (compare to the Ruby code above). I find it funny that it used a type alias such as using BigDecimal = cpp_dec_float_100 to keep some sense of readability in the final product, reminiscent from the original:
#include <iostream>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <boost/math/constants/constants.hpp>
#include <boost/math/special_functions/factorials.hpp>
using namespace boost::multiprecision;
using namespace boost::math;
using BigDecimal = cpp_dec_float_100;
BigDecimal prime(unsigned int n) {
BigDecimal sum = 0;
BigDecimal PI = constants::pi<BigDecimal>();
BigDecimal limit = pow(BigDecimal(2), n);
for (long i = 1; i <= limit; i++) {
BigDecimal inner_sum = 0;
for (long j = 1; j <= i; j++) {
BigDecimal div = (factorial<BigDecimal>(j - 1) + 1) / j;
BigDecimal inner_term = cos(PI * div);
inner_term = inner_term * inner_term;
inner_sum += floor(inner_term);
}
BigDecimal term = pow(n / inner_sum, BigDecimal(1) / n);
sum += floor(term);
}
return sum + 1;
}
int main() {
for (unsigned int i = 1; i <= 10; ++i) {
std::cout << "Prime(" << i << "): " << prime(i) << std::endl;
}
return 0;
}
Not only that, but I asked ChatGPT for the required CMakeLists.txt file to drive the build process with CMake, and it produced the one below, which worked off-the-box at first try:
cmake_minimum_required(VERSION 3.15)
project(PrimeCalculator)
# Set C++ standard
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
# Set vcpkg toolchain file (ensure to adjust the path to your vcpkg installation)
if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
set(CMAKE_TOOLCHAIN_FILE "${CMAKE_SOURCE_DIR}/vcpkg/scripts/buildsystems/vcpkg.cmake")
endif()
# Find Boost
find_package(Boost REQUIRED COMPONENTS multiprecision math)
if(Boost_FOUND)
message(STATUS "Boost found: ${Boost_INCLUDE_DIRS}")
else()
message(FATAL_ERROR "Boost not found. Please install Boost using vcpkg.")
endif()
# Add the executable
add_executable(PrimeCalculator main.cpp)
# Link Boost libraries
target_include_directories(PrimeCalculator PRIVATE ${Boost_INCLUDE_DIRS})
target_link_libraries(PrimeCalculator PRIVATE Boost::multiprecision Boost::math)
I’m using vcpkg to manage C++ libraries, which means that I needed to do the following first:
$ git clone https://github.com/Microsoft/vcpkg.git ~/.vcpkg
$ cd ~/.vcpkg
$ ./bootstrap-vcpkg.sh
$ ./vcpkg integrate install
$ ./vcpkg install boost-multiprecision
I used the following commands to compile the application with CMake:
$ mkdir build
$ cmake -B build -DCMAKE_BUILD_TYPE=Release -S . "-DCMAKE_TOOLCHAIN_FILE=~/.vcpkg/scripts/buildsystems/vcpkg.cmake"
$ cmake --build build
$ build/PrimeCalculator 10
It just worked. All in all, I’m very impressed.
Make it Faster
This algorithm appears as naturally parallelizable; after all, it’s a loop in a loop with lots of counting at the end. A bit of map() and reduce(), anyone?
I asked ChatGPT to provide a parallelized version of the C++ program, and also, one that would take an integer from the invocation, instead of looping across the first 10 primes.
The result is the following C++ code.
#include <iostream>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <boost/math/constants/constants.hpp>
#include <boost/math/special_functions/factorials.hpp>
#include <cmath>
#include <future>
#include <vector>
#include <numeric>
#include <cstdlib>
using namespace boost::multiprecision;
using namespace boost::math;
using BigDecimal = cpp_dec_float_100;
BigDecimal prime_part(unsigned int n, long start, long end) {
BigDecimal sum = 0;
BigDecimal PI = constants::pi<BigDecimal>();
for (long i = start; i <= end; i++) {
BigDecimal inner_sum = 0;
for (long j = 1; j <= i; j++) {
BigDecimal div = (factorial<BigDecimal>(j - 1) + 1) / j;
BigDecimal inner_term = cos(PI * div);
inner_term = inner_term * inner_term;
inner_sum += floor(inner_term);
}
BigDecimal term = pow(n / inner_sum, BigDecimal(1) / n);
sum += floor(term);
}
return sum;
}
BigDecimal prime(unsigned int n) {
BigDecimal sum = 0;
BigDecimal PI = constants::pi<BigDecimal>();
BigDecimal limit = pow(BigDecimal(2), n);
long limit_as_long = static_cast<long>(round(limit));
unsigned int num_threads = std::thread::hardware_concurrency();
std::vector<std::future<BigDecimal>> futures;
long chunk_size = limit_as_long / num_threads;
for (unsigned int i = 0; i < num_threads; ++i) {
long start = i * chunk_size + 1;
long end = (i == num_threads - 1) ? limit_as_long : (i + 1) * chunk_size;
futures.push_back(std::async(std::launch::async, prime_part, n, start, end));
}
for (auto& future : futures) {
sum += future.get();
}
return sum + 1;
}
int main(int argc, char* argv[]) {
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " <n>\n";
return 1;
}
int n = std::atoi(argv[1]);
if (n <= 0) {
std::cerr << "Please provide a positive integer greater than 0.\n";
return 1;
}
std::cout << "Prime(" << n << "): " << prime(n) << std::endl;
return 0;
}
The formula by C. P. Willans involves two loops (one for each sum) and ChatGPT, in a very simple but effective approach, broke the outer loop into a separate function called prime_part taking boundaries as arguments, calculated according to the number of CPU cores. It then uses std::async and futures to call that new function repeatedly and in parallel. Check all those CPU cores being active at the same time on htop!

At the end of the process, we sum all the individual results, and get the final answer. Which, by the way, was always exact.
This is a nice example of divide and conquer to solve the problem using concurrency.
But is it Faster?
In short, yes; C++ is faster, but caveat programmator.
In terms of single core performance, the single-core C++ version is faster than the Ruby version1 for values of n smaller than 8… but rather surprisingly, it then becomes slower than the Ruby version, and finally it performs faster again when calculating the 11th prime!
But of course, the multicore implementation dramatically slashes down the times required for longer calculations starting from the 4th prime, for example calculating the 11th entry in 4 minutes instead of 37 for the Ruby code! The multicore C++ implementation was also the first one that could calculate the 12th prime in a relatively normal time. The 13th took much longer on the multicore implementation, so I won’t try with the others.
An interesting observation: in my machine, the multicore implementation in C++ is systematically around 6 times faster than the single-core implementation in C++. This is quite evident from the 8th to the 12th prime calculation. Given the time taken for the 13th prime, I estimate the C++ single-core implementation would have taken around 6 or 7 hours to complete.
Check the results in the table below (in hours:minutes:seconds, as reported by the time utility), where the fastest times are highlighted in bold, and in italic, the disparities between the single-core implementations in C++ and Ruby between the 7th and the 11th prime.
| $n$ | Prime | Python | Ruby | C++ single | C++ multi |
|---|---|---|---|---|---|
| 1 | 2 | 0:00.086 | 0:00.014 | 0:00.005 | 0:00.006 |
| 2 | 3 | 0:00.086 | 0:00.116 | 0:00.008 | 0:00.009 |
| 3 | 5 | 0:00.090 | 0:00.118 | 0:00.009 | 0:00.013 |
| 4 | 7 | 0:00.087 | 0:00.140 | 0:00.019 | 0:00.010 |
| 5 | 11 | 0:00.098 | 0:00.196 | 0:00.046 | 0:00.015 |
| 6 | 13 | 0:00.085 | 0:00.429 | 0:00.142 | 0:00.044 |
| 7 | 17 | Error | 0:01.487 | 0:00.787 | 0:00.281 |
| 8 | 19 | Overflow | 0:07.430 | 0:12.274 | 0:02.695 |
| 9 | 23 | n/a | 0:50.484 | 1:16.440 | 0:13.826 |
| 10 | 29 | n/a | 3:56.610 | 6:10.820 | 1:04.560 |
| 11 | 31 | n/a | 44:31.390 | 26:57.300 | 4:06.190 |
| 12 | 37 | n/a | n/a | 1:43:14.870 | 20:21.310 |
| 13 | 41 | n/a | n/a | n/a | 1:19:22.930 |
The Python version returns the wrong result for 7, and throws an overflow exception when called for 8. I should modify it to use some “big decimal” library, like the ones used in Ruby and C++, but haven’t done it yet.
Also, disclaimer: I’ve never used the Boost.Multiprecision library before, so I don’t know how good the code above is, and what kind of optimizations could be applied. This exercise consists in just taking the code as ChatGPT provides it, make sure it runs correctly, and then measuring the time it takes to complete.
Finally, I don’t have an ARM CPU at hand, but I’d love to run these benchmarks in such hardware one day. If you do that, let me know how fast you can calculate those primes! For reference, the source code for the C++, Ruby, and Python versions is available on my GitLab profile.
These times were all calculated on the same machine, my personal laptop, under the same conditions: 10th generation Lenovo ThinkPad X1 Carbon, with a 16-core 12th Gen Intel Core i7-1270P, 32 GB RAM, and running Fedora 41. Programming language versions: Python 3.13.1, Ruby 3.3.0, and C++ compiled with GCC 14.2.1. ↩︎