Mathematics, Number Theory

Fermat’s factoring method

In my last post I discussed the RSA problem. Involved with the RSA problem is the fact that being able to easily factor the product of two large primes would allow one to easily derive the RSA private key from the corresponding public key, rendering RSA useless.

So how would one go about factoring m, the product of two large primes, p and q?

The most naive approach is called trial division. It begins by dividing m by first 2, then 3, etc., until you find a number that has no remainder when m is divided by it. If m is not divided evenly by 2 then you can skip all multiples of 2, so in fact you can skip all composite numbers. This method will eventually find the smaller factor which leads directly to the larger one no later than when you try the square root of n. This is far from the most efficient method.

For certain subsets of integer factoring problems, Fermat’s method can be vastly more efficient than trial division. The closer p is to q, the more efficient Fermat’s method is. Fermat’s method was first described by Peirre de Fermat in the 17th century.

Fermat’s method is based on the fact that all positive odd integers can be represented as the difference of two squares. This is easy to show using simple algebra. If we say that

m = pq

and we make the qualification that m is odd, we can perform some substitutions that will allow us to show that m is the difference of two squares. If we arbitrarily say that p is larger or equal to q then we can define

u = (p + q) / 2

and

v = (pq) / 2

then we can say that

m = u2v2

which can easily be shown to be true as

u2v2 = ((p + q) / 2)2 – ((pq) / 2)2
= ((p2 + 2pq + q2) / 4) – ((p2 – 2pq + q2) / 4) = 4pq / 4 = pq = m

But saying that m is the difference of u2 and v2 is not necessarily the same as saying that m is the difference of two squares, since u and v may not be integers. We did, after all, define them as fractions. However, if u and v are integers then m is the difference of two squares. It is easy to show that u and v are integers.

We started with the qualification that m is odd. If that’s the case then both p and q are odd since if either p or q were even then 2 would be a factor of one of them which would mean that 2 would be a factor of m. Since both p and q are odd then both their sum and difference must be even because and odd number plus an odd number is an even number and an odd number minus an odd number is an even number. Therefore, both their sum and difference are evenly divided by 2 which means that both u and v are integers. m is therefore the difference of two squares.

So what is Fermat’s method? How does it work? It’s pretty simple, in fact. It involves searching through the possible values of u and v. u must be greater than or equal to √m. u is equal to √m when v is 0 and m is a square. It is always true, even when m is prime, that if u is (m + 1) / 2 and v is (m – 1) / 2, which corresponds to the factors of m which are m and 1.

((m + 1) / 2)2 – ((m – 1) / 2)2 = ((m2 + 2m + 1) / 4) – ((m2 – 2m + 1) / 4) = 4m / 4 = m

Fermat’s method begins by first choosing a value, we’ll call u’ with initial value √m. If mu’2 is also a square then u = u’ and v = mu’2. Otherwise, we increment u’ and test it again. Eventually this method will find u in u – √m iterations. Having found u and v, we can easily derive p and q as

p = u + v

and

q = uv

Why is this an effective method? It isn’t always. If p and q are far apart then Fermat’s method may be much less effective than trial division. In the worst case, when m is prime and p = m and q = 1, the number of iterations required is ((m + 1) / 2) – √m. However, when p and q are close together, the number of iterations can be very small. Let’s look at an example. Let’s say that p is 2153, q is 1933, and m is 4161749.

u = (2153 + 1933) / 2 = 2043

ceil(√m) = 2041

In this example, we will find that u is 2043 in only 3 iterations. We’ve just used Fermat’s method to factor a 22-bit semiprime in only 3 steps!

Let’s make sure everything adds up.

v = 20432m = 12100 = 1102
p = 2043 + 110 = 2153
q = 2043 – 110 = 1933

What about when p and q are further apart? Let’s try a different example. This time we’ll say that p is 1373347, q is 3, and m is 4120041. Notice that in this example, m has very similar magnitude to the previous example.

u = (1373347 + 3) / 2 = 686675

ceil(√m) = 2030

Look what a difference there is in this example. This time using Fermat’s method we would have go through 684646 iterations to find u! That’s much worse than trial division, which would require at most 2029 iterations even without skipping composite numbers.

Clearly Fermat’s method is not useful for all factoring problems. Also, clearly it’s rather good for a subset of them. But why?

This graph illustrates the relationship between u and v for the function u = f(v) = √(m – v2). The slope of the curve is 1 at v = 0 and as v increases, the curve is asymptotic to the line u = v.

fermat_u_v

Fermat’s method is effective at factoring numbers with factors that are close together because in that case, v is relatively small. Because the slope of the curve of u and v is close to 1 for small values of v, a small change in u translates to a large change in v. Incrementing u by 1 when u is close to √m means skipping many possible values for v.

So this is great, we can use Fermat’s method to solve real factoring problems. These days the recommendation is to use a public key with 2048 bits, which means we would need to factor a 2048 bit number to obtain the private key from the public key. We’re going to examine a much simpler problem. Let’s see how well Fermat’s method does on the RSA-100 challenge, which is a challenge to factor a 100 decimal digit number – a number with only 330 bits. In this example

p = 40094690950920881030683735292761468389214899724061
q = 37975227936943673922808872755445627854565536638199
m = 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139

p and q are very close together right? This should be no problem for Fermat’s method. Let’s see how many iterations we would have to use.

u = (40094690950920881030683735292761468389214899724061 + 37975227936943673922808872755445627854565536638199) / 2
= 39034959443932277476746304024103548121890218181130

ceil(√m) = 39020571855401265512289573339484371018905006900195

So to factor Fermat’s method requires 14387588531011964456730684619177102985211280935 iterations to find u. That’s much better than trial division. In fact, it’s approximately 2712 times better. But it still takes 1.44e+45 iterations! That’s a lot of iterations. If you could check 1,000,000 values of u per second per CPU core, that would still take 1.44e+39 seconds. That’s 4.56e+35 CPU-years. Remember, this was only a 330-bit number. Clearly Fermat’s method is not very useful to factor 2048-bit RSA keys.

There are actually much better methods of factoring large semiprimes than Fermat’s method. The two currently fastest known methods are called the Quadratic seive and the General number field seive. As it happens, both of these methods are rooted in Fermat’s method. Look for a future post to read a hopefully simple explanation about how they work.

Cryptography, Security

The RSA problem

The RSA algorithm is heavily relied upon to secure communication on the Internet. Primarily this is done in the form of certificates used to secure SSL/TLS connections as is done to secure the HTTPS protocol. But would you believe that the RSA algorithm cannot be shown mathematically to be secure? It’s true! This fact is known as the RSA problem.

To understand why the RSA problem exists, it is necessary to understand how the RSA algorithm works. It turns out there’s not that much to it. RSA is a public/private key encryption algorithm, also known as an asymmetric key encryption algorithm. With a public/private key pair, a value can be encrypted using the public key and can only decrypted using the corresponding private key. I’ll first explain how to generate and use an RSA public/private key pair and then attempt to describe the mathematics behind why it works. Once that’s out of the way, it will be possible to explain the challenge of mathematically proving that the RSA algorithm is secure.

To generate an RSA public/private key pair, you first choose two large prime numbers. Let’s call them p and q. The product of p and q we’ll call n. Using p and q, knowing they are prime, we can easily compute φ(n) = (pq)(q – 1) where φ is Euler’s totient function. Next, we’ll choose an exponent which we’ll call e, which is less than φ(n) and coprime with φ(n). Next we’ll calculate the multiplicative modualar inverse of e (mod φ(n)), which we’ll call d. The combination of n and e is the public key. The combination of n and d is the private key.

To encrypt a value m, which must be less than n, using the public key we calculate

c = me (mod n)

To decrypt c using the private key we can obtain the original value m by calculating

m = cd (mod n)

Why does this work? The answer lies in Euler’s theorem. Euler’s theorem proves that for any coprime integers a and n,

aφ(n) ≡ 1 (mod n)

If you’re not familiar with modular congruence, this states that if you divide aφ(n) by n, the remainder will be 1.

What we need to do is show that (me)dm (mod n). Note that (me)d can also be written as med.

Since we chose d such that ed ≡ 1 (mod φ(n)), we can also say that ed = 1 + hφ(n) where h is some positive integer. Therefore

medm1 + hφ(n)m(mhφ(n)) ≡ m(mφ(n))h (mod n)

Now, using Euler’s theorem we can convert the last term into

m(1)hm (mod n)

which is what we intended to show. As long as m is a positive integer less than n then m is equal to the least positive member of the congruence class of (me)d (mod n), meaning m is the remainder when dividing by n. Note that if m is too small then me may be less than n and it may be possible to derive m from me by taking the eth root of me.

So what’s the problem? The problem is that the security of the RSA algorithm rests on an unproven assumption. The assumption is that it is easier to find two very large prime numbers than it is to factor their product. It turns out it’s very easy to find large primes, or rather to find very large integers that are very likely prime. For example, choosing two random 512-bit numbers that are likely prime can be done in a small fraction of a second with a modern computer. However, using the most efficient currently known integer factorization algorithm, factoring the 1024-bit product of those two primes would take many thousands of CPU-years of computation.

But why does the complexity of factorizing the product of two primes matter with respect to the security of RSA? The reason is that if the two primes chosen, p and q can be obtained by factoring their product, n, which is part of the public key, then it becomes trivial to calculate φ(n), and then it becomes trivial to calculate d. Being able to easily calculate d given n means that the private key can easily be recovered from the public key. Being able to easily recover the private key from the public key would completely invalidate the security of the RSA algorithm.

So, again, what’s the problem? It takes far longer to factor the product of two large prime numbers than it takes to choose them. We’re safe using RSA, right? Well, the problem is that while the currently known algorithms for factoring the product of two large primes take far longer to factor than it takes to choose them, nobody has so far been able to prove that a much faster algorithm doesn’t exist. That doesn’t mean that one does exist, just that we can’t say for sure that one does not.

In fact, at least one such algorithm is known to exist, but it is currently impractical to implement on the scale necessary to defeat RSA. The algorithm is known as Shor’s algorithm and it relies on quantum computers able to process numbers as large as n, which currently do not exist for values of n currently used in RSA public keys. Quantum computers are getting more and more powerful all the time, however, so it’s a good bet that the days of RSA are numbered since eventually quantum computers will be able to factor numbers large enough to make the use of RSA impractical.