Teaching Kids Programming – Find Gene Mutation Groups via Recursive Depth First Search Algorithm


Teaching Kids Programming: Videos on Data Structures and Algorithms

You are given a list of unique strings genes where each element has the same length and contains characters “A”, “C”, “G” and/or “T”.

If strings a and b are the same string except for one character, then a and b are in the same mutation group.
If strings a and b are in a group and b and c are in a group, then a and c are in the same group.
Return the total number of mutation groups.

Constraints
n ≤ 10,000
k ≤ 20 where k is the length of a string in genes
Example 1
Input
genes = [“ACGT”, “ACCT”, “AGGT”, “TTTT”, “TTTG”]
Output
2
Explanation
There are two mutation groups:
[“ACGT”, “ACCT”, “AGGT”]
[“TTTT”, “TTTG”]

Hint:
How can we connect one word with another? Is there a better way to connect than simply comparing each pair? After connecting words with each other, return the groups they form.

Human has 23 chromosomes (one of them controls the gender, where XX means girls and XY means boys). Each chromosome has hundreds of genes.

Gene Mutation Groups via Recursive Depth First Search Algorithm

We can treat each Gene Group as a vertex, and the edges are connected between two groups if they differ only 1 Gene. We can use a hash set to mark the nodes that we have seen. And then we can start Depth first search on each vertex that is not seen/visited.

The counter is incremented when we start a new Depth first search (DFS) on a vertex. The answer will be the number of components for the Graph.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Solution:
    def geneMutationGroup(self, genes):        
        seen = set()
        dna = set(genes)
        def dfs(a):
            seen.add(a)
            for i in range(len(a)):
                for x in ('A', 'G', 'T', 'C'):
                    c = a[:i] + x + a[i + 1:]
                    if c not in seen and c in dna:
                        dfs(c)
 
        ans = 0
        for i in genes:
            if i not in seen:
                ans += 1
                dfs(i)
 
        return ans
class Solution:
    def geneMutationGroup(self, genes):        
        seen = set()
        dna = set(genes)
        def dfs(a):
            seen.add(a)
            for i in range(len(a)):
                for x in ('A', 'G', 'T', 'C'):
                    c = a[:i] + x + a[i + 1:]
                    if c not in seen and c in dna:
                        dfs(c)

        ans = 0
        for i in genes:
            if i not in seen:
                ans += 1
                dfs(i)

        return ans

Another alternative implementation of Depth First Search algorithm would be to have all vertices in a hash set, and then remove it as we follow the DFS process. The time complexity is O(N*K*K) where N is the number of Genes, and K is the longest Gene sequence. The space complexity is O(NK)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Solution:
    def geneMutationGroup(self, genes):        
        left = set(genes)
 
        def dfs(a):
            left.remove(a)
            for i in range(len(a)):
                for x in ('A', 'G', 'T', 'C'):
                    c = a[:i] + x + a[i + 1:]
                    if c in left:
                        dfs(c)
 
        ans = 0
        for i in genes:
            if i in left:
                ans += 1
                dfs(i)
 
        return ans
class Solution:
    def geneMutationGroup(self, genes):        
        left = set(genes)

        def dfs(a):
            left.remove(a)
            for i in range(len(a)):
                for x in ('A', 'G', 'T', 'C'):
                    c = a[:i] + x + a[i + 1:]
                    if c in left:
                        dfs(c)

        ans = 0
        for i in genes:
            if i in left:
                ans += 1
                dfs(i)

        return ans

Note, this problem can also be solved via Union Find + Disjoint Set.

See also – traversing a Graph can also be done in Breadth First Search Algorithm: Teaching Kids Programming – Find Gene Mutation Groups via Breadth First Search Algorithm

Find Gene Mutation Groups

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
780 words
Last Post: Teaching Kids Programming - Algorithm to Truncate Sentence via Split Function
Next Post: Teaching Kids Programming - Find Gene Mutation Groups via Breadth First Search Algorithm

The Permanent URL is: Teaching Kids Programming – Find Gene Mutation Groups via Recursive Depth First Search Algorithm

Leave a Reply