[Scala] Traits, Classes, Objects, and Collections

Trait:
1. Shared interfaces/fields between classes
2. Cannot be instantiated
3. Abstract methods that can be implemented by subtraits and classes
4. Classes and Objects can extend traits
5. One class can extend any number of traits

Class
Object:
Singleton–classes only have one instance

Mutable and immutable collections:
1. All collection classes are found in the package scala.collection or one of its sub-packages mutable, immutable, and generic.
2. Immutable is default, to get mutable, have to write explicitly collection.mutable.XX
3. Clients of an immutable collection have a guarantee that nobody can mutate the collection; clients of a root collection only promise not to change the collection themselves.

Full collection hierarchy:
Root collection: scala.collection

Immutable collection: scala.collection.immutable

Mutable collection: scala.collection.mutable

Ref:
1. Traits: https://docs.scala-lang.org/tour/traits.html
2. Collections: https://docs.scala-lang.org/overviews/collections/overview.html

 

[Scala] Group values based on multiple key columns in RDD

Data and problem description is similar to the following:
https://stackoverflow.com/questions/46530868/what-is-the-fastest-way-to-group-values-based-on-multiple-key-columns-in-rdd-usi?noredirect=1#comment80018811_46530868

参考:
fold in spark: http://blog.madhukaraphatak.com/spark-rdd-fold/

[bst] 501. Find Mode in Binary Search Tree

Given a binary search tree (BST) with duplicates, find all the mode(s) (the most frequently occurred element) in the given BST.

Assume a BST is defined as follows:

The left subtree of a node contains only nodes with keys less than or equal to the node’s key.
The right subtree of a node contains only nodes with keys greater than or equal to the node’s key.
Both the left and right subtrees must also be binary search trees.
For example:
Given BST [1,null,2,2],
1
\
2
/
2
return [2].

Note: If a tree has more than one mode, you can return them in any order.

Follow up: Could you do that without using any extra space? (Assume that the implicit stack space incurred due to recursion does not count).

首先想到最原始的方法,并不需要利用bst的特点,直接遍历,然后存map记录每个node的个数,记住mode(即最大的)count是多少,再从map中找到所有符合mode count的node

    public int[] findMode(TreeNode root) {
        Map<Integer, Integer> map = new HashMap();
        int max = helper(root, map, 0);
        List<Integer> list = new ArrayList();
        for (int val : map.keySet()) {
            if (map.get(val) == max) {
                list.add(val);
            }
        }
        int[] res = new int[list.size()];
        for (int i = 0; i < list.size(); i++) res[i] = list.get(i);
        
        return res;
    }
    
    private int helper(TreeNode root, Map<Integer, Integer> map, int max) {
        if (root == null) return max;
        map.put(root.val, map.getOrDefault(root.val, 0) + 1);
        max = Math.max(max, map.get(root.val));
        if (root.left != null) {
            int lmax = helper(root.left, map, max);
            max = Math.max(max, lmax);
        }
        
        if (root.right != null) {
            int rmax = helper(root.right, map, max);
            max = Math.max(max, rmax);
        }
        
        return max;
    }

比较啰嗦,运行速度慢,还需要extra space

因为follow up中要求不能使用extra space。想到bst是可以按大小顺序遍历tree的,如果需要排序,则需要inorder遍历。这样遍历一遍就可以知道mode的count是多少,有几个。再遍历第二遍,找到符合mode count的node即可。

    int curval = 0;
    int curcount = 0;
    int[] mode;
    int maxcount = 0;
    int modecount = 0;
    public int[] findMode(TreeNode root) {
        first(root);
        mode = new int[modecount];
        curcount = 0;
        modecount = 0;
        second(root);
        return mode;
    }
    
    private void first(TreeNode root) {
        if (root == null) return;
        first(root.left);
        
        // handle value
        int val = root.val;
        if (curval != val) {
            curval = val;
            curcount = 0;
        }
        curcount++;
        if (curcount > maxcount) {
            maxcount = curcount;
            modecount = 1;
        } else if (curcount == maxcount) {
            modecount++;
        }
        first(root.right);
        
    }
    
    private void second(TreeNode root) {
        if (root == null) return;
        second(root.left);
        
        // handle value
        int val = root.val;
        if (curval != val) {
            curval = val;
            curcount = 0;
        }
        curcount++;
        
        if (curcount == maxcount) {
            mode[modecount] = curval;
            modecount++;
        }
        
        second(root.right);
    }

space,time complexity都提高了,code很啰嗦,再improve一下,发现两个pass的handle value部分都差不多,合并成一个

    int curval = 0;
    int curcount = 0;
    int[] mode;
    int maxcount = 0;
    int modecount = 0;
    public int[] findMode(TreeNode root) {
        inorder(root);
        mode = new int[modecount];
        curcount = 0;
        modecount = 0;
        inorder(root);
        return mode;
    }
    
    private void handleval(int val) {
        if (curval != val) {
            curval = val;
            curcount = 0;
        }
        curcount++;
        if (curcount > maxcount) {
            maxcount = curcount;
            modecount = 1;
        } else if (curcount == maxcount) {
            if (mode != null) {
                mode[modecount] = curval;
            }
            modecount++;
        }
    }
    
    private void inorder(TreeNode root) {
        if (root == null) return;
        inorder(root.left);
        handleval(root.val);
        inorder(root.right);
    }

[dfs] how to deal with duplicates? 491. Increasing Subsequences

Given an integer array, your task is to find all the different possible increasing subsequences of the given array, and the length of an increasing subsequence should be at least 2 .

Example:
Input: [4, 6, 7, 7]
Output: [[4, 6], [4, 7], [4, 6, 7], [4, 6, 7, 7], [6, 7], [6, 7, 7], [7,7], [4,7,7]]
Note:
The length of the given array will not exceed 15.
The range of integer in the given array is [-100,100].
The given array may contain duplicates, and two equal integers should also be considered as a special case of increasing sequence.

容易看出这是dfs的问题。
容易写出以下dfs的过程:

    public List<List<Integer>> findSubsequences(int[] nums) {
        List<List<Integer>> res = new ArrayList<List<Integer>>();
        helper(nums, 0, res, new ArrayList<Integer>());
        return res;
    }
    
    private void helper(int[] nums, int index, List<List<Integer>> res, List<Integer> list) {
        if (list.size() >= 2) {
            res.add(new ArrayList<Integer>(list));
        } 
        
        for (int i = index; i < nums.length; i++) {
            if (list.size() == 0 || nums[i] >= list.get(list.size() - 1)) {
                list.add(nums[i]);
                helper(nums, i + 1, res, list);
                list.remove(list.size() - 1); // 有和没有当前数字参与两种选择
            }
        }
    }

一个问题,如何避免duplicate?例如[4,6,7,7],[4,6,7]已经加上了,如何避免再加一次?
可以用以下简单方法:不同层recursion中,保证单次recursion调用每个数字只循环到一次。例如,[4,6,7,7]中,当前这次调用的循环中,已经循环到了4,6,7,再遇到7,就不管了。看以下code:

    public List<List<Integer>> findSubsequences(int[] nums) {
        List<List<Integer>> res = new ArrayList<List<Integer>>();
        helper(nums, 0, res, new ArrayList<Integer>());
        return res;
    }
    
    private void helper(int[] nums, int index, List<List<Integer>> res, List<Integer> list) {
        if (list.size() >= 2) {
            res.add(new ArrayList<Integer>(list));
        } 
        
        Set<Integer> unique = new HashSet<Integer>();
        for (int i = index; i < nums.length; i++) {
            if (list.size() == 0 || nums[i] >= list.get(list.size() - 1)) {
                if (!unique.add(nums[i])) continue;//对当前循环相同数字只能用一次 
                unique.add(nums[i]); //记录当前循环到的数字,用当前list再去递归的不用想
                list.add(nums[i]);
                helper(nums, i + 1, res, list);
                list.remove(list.size() - 1);
            }
        }
    }

[dfs, dp] 494. Target Sum

You are given a list of non-negative integers, a1, a2, …, an, and a target, S. Now you have 2 symbols + and -. For each integer, you should choose one from + and – as its new symbol.

Find out how many ways to assign symbols to make sum of integers equal to target S.

Example 1:
Input: nums is [1, 1, 1, 1, 1], S is 3.
Output: 5
Explanation:

-1+1+1+1+1 = 3
+1-1+1+1+1 = 3
+1+1-1+1+1 = 3
+1+1+1-1+1 = 3
+1+1+1+1-1 = 3

There are 5 ways to assign symbols to make the sum of nums be target 3.
Note:
The length of the given array is positive and will not exceed 20.
The sum of elements in the given array will not exceed 1000.
Your output answer is guaranteed to be fitted in a 32-bit integer.

DFS的思路首先想出来,顺利写出来:

    public int findTargetSumWays(int[] nums, int S) {
        return helper(nums, S, 0, 0, 0);
    }
    
    private int helper (int[] nums, int S, int index, int sum, int count) {
        if (index == nums.length) {
            if (sum == S) return count + 1;
            else return count;
        } 
        int count1 = helper(nums, S, index + 1, sum + nums[index], count);
        int count2 = helper(nums, S, index + 1, sum - nums[index], count);
        return count1 + count2;
    }

这种recusion其实都类似,一般都在helper函数里需要给index,记录当前看到哪个元素;原函数需要什么就return什么(需要count就return count,需要所有的排列组合就return list或者set之类的)。

但是,recusion很慢,写的时候也能感觉到,各种排列组合应该有很多重复的时候。

一般,dfs的问题想improve,可以考虑dp。只是这个问题直接从dfs的code改成dp,无从下手。

要及时抛弃已经有的dfs code,重新想思路。

新思路:题目转化找到一个subset都是positive,剩下的subset都是negative,使得 P – N == target. 已知 P + N == sum, 则 2P = target + sum, p = (target + sum) / 2。题目转化成找到一个subset,使其和等于(target + sum) / 2。这个新问题,用dp做。

    private int helper(int[] nums, int target) {
        int[] dp = new int[target + 1]; //存有几种组合可以达到当前目标(1...target)
        dp[0] = 1;
        for (int n : nums) {
            for (int i = target; i >= n; i--) { //倒着看
                dp[i] += dp[i - n]; //新来的数是n,现在只要可以达到i-n就可以达到i
            }
        }
        return dp[target];
    }

可以考虑dp解决这个新问题也是一个模版,而且是典型的dp模版。

[dfs] 473. Matchsticks to Square

Remember the story of Little Match Girl? By now, you know exactly what matchsticks the little match girl has, please find out a way you can make one square by using up all those matchsticks. You should not break any stick, but you can link them up, and each matchstick must be used exactly one time.

Your input will be several matchsticks the girl has, represented with their stick length. Your output will either be true or false, to represent whether you could make one square using all the matchsticks the little match girl has.

Example 1:
Input: [1,1,2,2,2]
Output: true

Explanation: You can form a square with length 2, one side of the square came two sticks with length 1.
Example 2:
Input: [3,3,3,3,4]
Output: false

Explanation: You cannot find a way to form a square with all the matchsticks.
Note:
The length sum of the given matchsticks is in the range of 0 to 10^9.
The length of the given matchstick array will not exceed 15.

想到的部分:把所有数字分为4组,使每组和为sum / 4。如果sum % 4 != 0 或者一共有小于4个数字就不可能形成正方形。分组的过程用dfs。
没想清楚的部分:dfs怎么实现。

public boolean makesquare(int[] nums) {
    if (nums == null || nums.length < 4) return false;
    int total = 0;
    for (int num : nums) total += num;
    if (total % 4 != 0) return false;
    return dfs(nums, total / 4, new int[4], 0);
}
    
private boolean dfs(int[] nums, int target, int[] sums, int index) {
    if (index == nums.length) {
        if (sums[0] == target && sums[1] == target && sums[2] == target) return true;
        return false;
    }
        
    for (int i = 0; i < 4; i++) { //尝试当前数字要加到哪个组
        if (sums[i] + nums[index] > target) continue;
        sums[i] += nums[index];
        if (dfs(nums, target, sums, index + 1)) return true;
        sums[i] -= nums[index];
    }
        
    return false;
}

本题需要找到4个sum,其实相对找到1个sum,增加的复杂度并不大,以下是另外一种实现,更加直接。
另,本来想到了,但是没有好好加进去的optimization:先sort数组,从最大数字开始尝试,这样如果不行会尽早发现。

    public boolean makesquare(int[] nums) {
        if (nums == null || nums.length < 4) return false;
        int total = 0;
        for (int num : nums) total += num;
        if (total % 4 != 0) return false;
        Arrays.sort(nums);
        long sum1 = 0, sum2 = 0, sum3 = 0, sum4 = 0;
        return dfs(nums, total / 4, sum1, sum2, sum3, sum4, nums.length - 1);
    }
    
    private boolean dfs(int[] nums, int target, long sum1, long sum2, long sum3, long sum4, int i) {
        if (sum1 > target || sum2 > target || sum3 > target || sum4 > target) return false;
        if (i == -1 && sum1 == target && sum2 == target && sum3 == target) return true;
        else if (i == -1) return false;
        
        return dfs(nums, target, sum1+nums[i], sum2, sum3, sum4, i-1) ||
               dfs(nums, target, sum1, sum2+nums[i], sum3, sum4, i-1) ||
               dfs(nums, target, sum1, sum2, sum3+nums[i], sum4, i-1) ||
               dfs(nums, target, sum1, sum2, sum3, sum4+nums[i], i-1);
    }

[bit manipulation] 476. Number Complement

Given a positive integer, output its complement number. The complement strategy is to flip the bits of its binary representation.

Note:
The given integer is guaranteed to fit within the range of a 32-bit signed integer.
You could assume no leading zero bit in the integer’s binary representation.
Example 1:
Input: 5
Output: 2
Explanation: The binary representation of 5 is 101 (no leading zero bits), and its complement is 010. So you need to output 2.
Example 2:
Input: 1
Output: 0
Explanation: The binary representation of 1 is 1 (no leading zero bits), and its complement is 0. So you need to output 0.

~num取反 但是把所有leading 0都跟着取反了。例如5: 101, ~5 不是010, 是1111..010。
所以制作一个mask,111, 使~num & mask即可。
如何制作mask?一个可以到处用的trick:

(Integer.highestOneBit(num) << 1) – 1

例如5,Integer.highestOneBit(num) = 0b100, 右移一位变成 1000,再减1变成111

[bit manipulation] 421. Maximum XOR of Two Numbers in an Array

Given a non-empty array of numbers, a0, a1, a2, … , an-1, where 0 ≤ ai < 231.

Find the maximum result of ai XOR aj, where 0 ≤ i, j < n. Could you do this in O(n) runtime? Example: Input: [3, 10, 5, 25, 2, 8] Output: 28 Explanation: The maximum result is 5 ^ 25 = 28. 分别计算每一位应该是什么

 public int findMaximumXOR(int[] nums) {        int max = 0, mask = 0;        for (int i = 31; i >= 0; i--) { //从最重要的位MSB看起
           mask = mask | 1 << i; // 每次只看一位,用mask控制看哪位
           Set<Integer> set = new HashSet();
           for (int num : nums) {
               set.add(num & mask); //记录所有nums在这一位是什么情况
           }
           int tmp = max | (1 << i); //结果在当前位暂时设为1
           for (int prefix : set) {
               if (set.contains(tmp ^ prefix)) {
                   max = tmp; //如果有在这一位有和max不同的,那么max在这一位就真的可以是1
                   break;
               }
           }
       }
       return max;

    }

[bit manipulation]477. Total Hamming Distance

The Hamming distance between two integers is the number of positions at which the corresponding bits are different.

Now your job is to find the total Hamming distance between all pairs of the given numbers.

Example:
Input: 4, 14, 2

Output: 6

Explanation: In binary representation, the 4 is 0100, 14 is 1110, and 2 is 0010 (just
showing the four bits relevant in this case). So the answer will be:
HammingDistance(4, 14) + HammingDistance(4, 2) + HammingDistance(14, 2) = 2 + 2 + 2 = 6.
Note:
Elements of the given array are in the range of 0 to 10^9
Length of the array will not exceed 10^4.

Brute force: time limit exceed
思路:integer一共32位, int[] count = new int[32]记录每个bit一共在几个数字中on(一共n = nums.length个数字)。对于一个bit,on的次数 k 次,off的次数 n-k 次,则对于total distance的贡献是 k * (n-k),因为每一个(on, off)的组合都会给total distance加1.

    public int totalHammingDistance(int[] nums) {
        int N = nums.length;
        int[] count = new int[32]; // count for when the bit is on
        for (int n : nums) {
            int i = 0;
            while (n > 0) {
                count[i++] += n & 1;
                n >>= 1;
            }
        }
        
        int total = 0;
        for (int i = 0; i < 32; i++) {
            total += count[i] * (N - count[i]);
        }
        return total;
    }

[sliding window] 424. Longest Repeating Character Replacement

Given a string that consists of only uppercase English letters, you can replace any letter in the string with another letter at most k times. Find the length of a longest substring containing all repeating letters you can get after performing the above operations.

Note:
Both the string’s length and k will not exceed 104.

Example 1:

Input:
s = "ABAB", k = 2

Output:
4

Explanation:
Replace the two 'A's with two 'B's or vice versa.

Example 2:

Input:
s = "AABABBA", k = 1

Output:
4

Explanation:
Replace the one 'A' in the middle with 'B' and form "AABBBBA".
The substring "BBBB" has the longest repeating letters, which is 4.

用sliding window,但是几点要注意。
什么时候往右扩大window?只要当前 窗口长度 不超过 最大的重复字母count + k(k可以理解为容错度)
什么时候调整window的起始?当以上条件不符时
maxcount代表什么?不是当前窗口内的 最大重复字母count,而是史上最大的重复字母count,我们只关心史上出现的最大valid串长度

public int characterReplacement(String s, int k) {
    int start = 0, maxlen = 0, maxcount = 0; 
    int[] count = new int[26];
    for (int end = 0; end < s.length(); end++) {
        maxcount = Math.max(maxcount, ++count[s.charAt(end) - 'A']);
        while (end - start + 1 > maxcount + k) {
            count[s.charAt(start++) - 'A']--;
        } 
        maxlen = Math.max(maxlen, end - start + 1);
    }
    return maxlen;
}