===================
== Andrew's Site ==
===================

Cycle Detection using Floyd's Tortoise and Hare

algo

Finding cycles in graphs is an important problem in computer science. I’m ripping straight from wikipedia, but you can use it to check for infinite loops (e.g. function calls accidentally loop, which is a cycle in the call graph) or checking cryptographic hash functions (since they should be close to random functions, we can calculate statistics of random functions, and if the hash function we are testing is too far off, we have problems)1.

One algorithm to find cycles is Floyd’s tortoise and hare algorithm. It is attributed to Floyd, even though this may be folklore. One way to state the problem is “functionally”, which is done in the wikipedia article. Then, you can prove something using iterations of the function f. In my opinion, this proof doesn’t “translate” well to finding cycles in linked lists. I think you can do better with some simple modular arithmetic.

To be precise, for linked lists, the output for the cycle finding algorithm is NULL if there is no cycle, or the address of the first node of the cycle. The idea of the tortoise and hare algorithm is to have a slow pointer (the tortoise) and fast pointer (the hare), and iterate through a linked list. On each iteration, the tortoise advances one node, and the hare advances two nodes. If there is not a cycle, the fast pointer will reach NULL first. If there is a cycle, eventually fast and slow will be at the same node. Then, with a little more thought, we can calculate the first node in the cycle.

Obviously, since the fast pointer moves faster, it will reach NULL first if there is no cycle. So, we only have to consider the case there is a cycle.

Let’s label nodes starting from 1. Let node k be the node where the cycle begins, and assume the cycle length is j. Thus, the full set of nodes is {1,2,…,k,k+1,…,k+j-1}. Each node i’s next pointer points to node i+1, with the exception that node k+j-1’s next pointer points to node k.

Thus, consider a infinite walk starting at node 1. The node visit sequence looks like: (1,2,3,…,k,k+1,k+2,…,k+j-1,k,k+1,k+2,….,k+j-1,k,k+1,k+2,…)

After the first iteration, the slow pointer points to node 1, and the fast points at node 2. After the second iteration, slow points to node 2, and fast node 4 (or the equivalent one in the cycle). Continuing like this, consider the state after k iterations. The slow pointer is at node k. The fast pointer is at the 2k position in the sequence, which is inside the cycle, specifically the k mod j node inside the cycle. In other words, we can relabel nodes k,k+1,….k+j-1 as 0,1,…j. Then the slow pointer is at 0, the fast is at k mod j. Now, if we continue iterating, when will they meet? After i iterations, the slow pointer is at i mod j, and the fast pointer is at k+2i mod j. Solving for i, they will meet after -k mod j iterations.

So we have a proof they meet, but how do we get the first node of the cycle? Well, if we iterate k more iterations after they meet, we will be at node -k + k = 0 mod j, which, before relabeling, is at node k, which is where the cycle begins!

But in a linked list, we don’t actually know what k is (or else why would we even walk the linked list to find the cycle). However, we can actually count k more iterations if we think carefully. Remember, the linked list starts at node 1, and the cycle starts at node k by assumption. So, if we save the head pointer (node 1), and then walk (k-1) iterations, the head pointer will point to node k, which is the start of the cycle. And, from above, if we walk k iterations from the slow node, after the slow node and fast node meet, we will be at the start of the cycle, node k. Thus, we iterate once for the slow node, and then iterate both the slow and head nodes until they are the same, we will meet at the start of the cycle.

So the basic algorithm is clear. Iterate the slow and fast pointers (by 2 and 1, respectively) until they meet. Once they meet, iterate the slow pointer once, and then iterate both the slow and head pointer until they meet, which we proved is the start of the cycle.

Here’s some code.

struct ListNode {
    int val;
    ListNode *next;
    ListNode(int x) : val(x), next(NULL) {}
};
ListNode *detectCycle(ListNode *head) {
    if (head == nullptr) {
        return nullptr;
    }
    ListNode *slow = head;
    ListNode *fast = slow->next;
    while(fast != nullptr) {
        // safe since slow lags behind fast
        slow = slow->next;
        fast = fast->next;
        if(fast) fast = fast->next;
        if (fast == slow) { //there is a cycle
            //slow node needs one more iteration
            slow = slow->next;
            //head and slow will meet at cycle start
            while(head != slow) {
                slow = slow->next;
                head = head->next;
            }
            return slow;
        }
    }
    return fast;
}