[LC 187] Repeated DNA sequence

Intuitive solution: 
Take each character as the start, and check the substring of length of 10. Store the substring if it is not in a set. Otherwise, if it is in the set, meaning the substring has appeared before, this substring should be included in the result list. Since this substring may appear once more later, we use a set to store the substrings that appear twice or more. Finally, we change the storing data structure from set to list and return.
Solution with less space, and faster speed: 
The substring is a string type variable of length 10, which requires 2byte * 10 = 20 bytes (character requires 2 bytes, and there are 10 characters). We reduce the size of each substring using bit manipulation. Only four characters may appear in the substring, A, T, G, and C. We code A = 0, C = 1, T = 2, G = 3. Therefore each character needs only 2 bits, and the substring requires 20bits, which is 8 times less than before.
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res = new ArrayList<String>();
        char[] map = new char[26];  // this is to encode the four possible characters into numbers
        map[‘A’ – ‘A’] = 0;
        map[‘C’ – ‘A’] = 1;
        map[‘T’ – ‘A’] = 2;
        map[‘G’ – ‘A’] = 3;
        Set<Integer> set = new HashSet<Integer>(); // probably using a map here to store how many times the pattern appear is more intuitive, but is unnecessary, since we only care which pattern appears more than once
        Set<Integer> more = new HashSet<Integer>(); // need the second set, so that the same pattern appearing more than twice can be stored only once
        for (int i = 0; i < s.length() – 9; i++) {
            int v = 0;
            for (int j = i; j < i+10; j++) {
                v = v << 2;
                v = v | map[s.charAt(j) – ‘A’]; // add 2 bits each time for the next character
            if (!set.add(v) && more.add(v)) res.add(s.substring(i, i+10));
        return res;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s