Generating the Rationals

Here is the problem for tonight:

A set S contains 0 and 1, and the mean of every finite non-empty subset of S.

Prove that S contains all the rational numbers in the unit interval.

Ok....

Let's clarify what the rational number is. Maybe we'll get some ideas out of it.

A rational number is in the form of a/b where a and b are integer.

I still got nothing.

So, S contains all the rational numbers in [0,1]. It implies that it might also contain non-rational numbers, as well, but we might not care about it.

I still got nothing.

So, S contains 0 and 1, and the mean of a set, whose members are also the member of S.

It surely sounds like the barber paradox.

A mean of { a, b, c, d } is (a + b + c + d) / 4. A mean is guaranteed to be rational if a, b, c, and d are all rational.

Walla! This is good shit.

S contains only rational numbers (Wrong!)

Let's assume that S contains an irrational number a.

a must be the mean of a subset A.

Ok, this is a wrong direction. I cannot prove anything out of it.

Let's try again

How about I claim that we can derive any a/b where a <= b

Base case

0 comes from 0/1. And 1 comes from 1/1.

Complex case

a/b comes from....

Still got nothing.

Let's try again

Ok, we are using a contradiction this time.

Let's assume that a/b is not a member of S.

This means that there is no set { c[1], ..., c[n] } that has its mean equal to a/b.

a/b != (c[1] + c[2] + ... + c[n]) / n

Still got nothing.

Let's try again

Induction!

If we have 0/n, 1/n, ..., n/n, we can calculate 0/(n+1), 1/(n+1), ..., (n+1)/(n+1)

0/(n+1) is 0/n. Ok!

1/(n+1) is ...

Let's rephrase it to:

How do we calculate a/n from b/m where m > n?

If n = 2k (an even number), assuming that a = c + d (c != d), we get that a/n = c/2k + d/2k = (c/k + d/k)/2. This case is ok.

If n = 2k+1 (an odd number), assuming that a = c + d (c != d), we get nothing...

a/(2k+1) = c/(2k+1) + d/(2k+1). It practically goes nowhere.

I gave up

So.. The book starts with S contains all p/(2^n).

Ok, that's easy to see.

Basically, it claims that p/q can by the mean of x/(2^n).

Now this is getting a little weird.

p/q is the sum of p ones and (q-p) zeroes.

We choose a very very large n and replace 1 with 1-1/(2^n), 1+1/(2^n), 1-2/(2^n), and so on.

For example, 1/7 = (1-1/(2^n) + 0+1/(2^n) + 0-2/(2^n) + 0+2/(2^n) + 0-3/(2^n) + 0+3/(2^n) + 0) / 7

And that's it.

This is tooo creative.

Sum and Differences

The problem sounds familiar. I might already solved it. But that doesn't matter. I'll try to solve it again.

Here is the problem statement:

Given 25 different positive numbers, prove that you can choose two of them such that none of the other numbers equals either their sum or their difference.

Ok, let's rephrase the problem:

There are 0 < a[0] < a[1] < ... < a[24]. They are all positives.

We can find a[i] and a[j] where
a[i] > a[j] and,
a[i] + a[j] != a[k] and,
a[i] - a[j] != a[m] 

where (k is not i or j) and (m is not i or j)

Ok, so how do we begin? I somehow would really like to use contradiction.

For every i and j, I will assume a[i] + a[j] = a[k] where a[k] > a[i] > a[j]

I should contradict it... but how? You can see that if I choose i=24 regardless of j, the assumption is false.

Actually, we can choose i=23, and it still works. Because if a[23] + a[j] = a[24], we can easily adjust to a[23] + a[j-1] or a[23] + a[j+1].

Therefore, for some i and j, a[i] + a[j] != a[k].

Let's move on to another one:

For every i and j, I will assume a[i] - a[j] = a[m] where a[i] > a[j],a[m]

We should choose i=24, as it is ok for the first condition.

  1. If a[24]-a[23] is valid, we got it!
  2. If not, try a[24]-a[22] is valid. If ok, we got it.
  3. Repeat until finding the valid one

But how to prove that there will be a valid one?

We can see that a[24] - a[23] = a[c], and a[24] - a[22] = a[d]. We can see that a[d] > a[c]

Let's generalize it.

a[24] - a[x] = a[y]. As x decreases, y increases.

If there is x that a[x] == a[y], then a[24] = 2 * a[x]. If there is a[x] = a[24] / 2, we should choose it and we're done.

If there is no x that a[x] == a[y], then ...

I cannot contradict it because 1, 2, 3, ..., 25 will be just fine here. There is no a[24]/2 and a[24] - a[11] = a[12] and a[24] - a[12] = a[11].

Try it again

For every i and j, I will assume a[i] - a[j] = a[m] where a[i] > a[j],a[m]

How about choosing i=23?

Choosing a[23] is perfect because

  • If there is x that a[x] == a[y] and a[23] + a[x] != a[24], then we got it!
  • If there is x that a[x] == a[y] and a[23] + a[x] = a[24], then we can choose a[24] and a[0] to be an answer instead...
  • If there is no x that a[x] == a[y], then it will terminate because, in a[0] to a[22], there will be one a[j] that doesn't have its pair.
    • If a[23] + a[j] = a[24], then we can choose a[24] and a[0] instead...
    • If a[23] + a[j] != a[24], then we got the answer.

That's it.

Zeroes, Ones, and Twos

I thought I would practice Google CodeJam on 22dn of March. Today is 27th of March already. Gosh, I'm so dead.

Anyway, here is the today's problem:

Problem Statement

Let n be a natural number.

Prove that:

  • n has a nonzero multiple whose representation (base 10) contains only zeroes and ones
  • 2^n has a multiple whose representation contains only ones and twos.

The 1st axiom

First of all, a natural number is 1, 2, 3, and so on.

Let's say x = ny. We say that x is a multiple of n, or n has a multiple x.

We need to prove that x multiplies some number will result in something look like 11011101.

How on earth do we prove that?

Let's try some weird number, for example, 7. 7 multiplies something and results in x.

x is in the form of 10^a + 10^b + 10^c + ... where 0 <= a < b < c.

Wow, this is difficult.

x mod 7 = 0 means [(10^a mod 7) + (10^b mod 7) + (10^c mod 7) + ...] mod 7 is 0.

Let's analyze a single term:

  • 10 mod 7 = 3
  • 100 mod 7 = 6
  • 1000 mod 7 = 6
  • ...

What is the point? Let's think in a different way. Ok, I don't have any other idea... Let's look at the solution.

The solution

It's actually quite genius. This is how the book finds a wanted multiple.

Let's say we have a set of {1, 11, 111, 1111, 11111, 111111, 1111111, 11111111}.

The set contains 8 numbers. x mod 7 can generate only 7 different residues.

Therefore, there must be at least 2 numbers having the same residue.

In this case, they are 1111111 and 1 having the same residue, which is 1. Therefore, 1111111 - 1 = 1111110 is a multiple of 7. (On a side note, 111111 also happens to be a multiple of 7.)

This also shows that we won't go beyond n+1 digits. Genius!

The 2nd axiom

Ok, this time I'll prove it myself.

We probably can use the same technique. It's called the pigeon-hole theory, by the way.

Here is the technique:

  1. We construct a set of (2^n+1) numbers. It guarantees that at least 2 of them have the same residue.
  2. Any pair subtracting each other must result in something like 12121221.

A subtraction of 2222222 and 1010101 will result in a valid number. Oh god, I'm stupid.

ํYou know? There are 2^n number from 0 to 1111... (n digits). What a coincidence!

Plan A

What if I prove that the set {0 ,1 , 10, ..., 1111...(n digits)} can generate every number from 0 to 2^n-1?

(10^a + 10^b + 10^c + ...) mod 2^n = x where (n-1) >= a > b > c > ... > 0.

Why the fuck do I keep coming back to this equation? You can see how desperate I am...

Let's analyze 10^a mod 2^n where a < n.

10^a / 2^n = (5^a 2^a) / 2^n = 5^a / (2^(n-a)). Where is the fucking point?!?!?!

I've got nothing.

Plan B

If {0 ,1 , 10, ..., 1111...(n digits)} can generate every number from 0 to 2^n-1, it must map 1-1.

From i-th to (i+1)-th, we shall add 1 or 8..89.

a[i+1] = a[i] + 1 if (a[i]%2) == 0

a[i+1] = a[i] + 88..89 if (a[i]%2) == 1

Still going nowhere.

Giving up

Let's look at the solution.

So, the book uses induction...

Let's say we have a number x containing only 1s and 2s, and assume that x is divisible by 2^k.

If we put 1 in front, we will have x + (2^k 5^k).

If we put 2 in front, we will have x + 2(2^k 5^k) = x + 2^k 5^k + 2^k 5^k.

The book claims that either of them must be divisible by 2^(k+1).

Since x = y 2^k, then (y 2^k) + (2^k 5^k) = (y + 5^k) 2^k

And x + 2(2^k 5^k) = (y 2^k) + (2^k 5^k) + (2^k 5^k) = (y + 2 5^k) 2^k

if y is divisible by 2, then (y + 2 5^k) 2^k is divisible by 2^(k+1).

if y is NOT divisible by 2, then (y + 5^k) is divisible by 2, which means (y + 5^k) 2^k is divisible by 2^(k+1).

Wow, what the heck that is! This just proves that we can start with k=1 and walk to any k...

So, the 2nd axiom is proved.

Reflection

I have no chance whatsoever to solve this problem.

The solution seems arbitrary to me. The two axioms are so similar, but the solutions are very different.

And the solutions require more than a few mental steps.

For example, when the book says that it is divisible by 2^(k-1), it took me like 15 minutes to figure out why it is the case....

Gosh, I'm stupid.

Min and Max

Min and Max

Given an array, find Min and Max with the least comparison.

Now the classic way to do is to loop. That's 2n comparisons on the worse case. The best case is using n comparisons.

This kind of problem is somewhat classic. The technique that hits my mind is divide-and-conquer.

Let's try divide them into 2 groups.

We find Max and Min on each group; let's say it take k comparisons. Now to find the global Max and Min, it takes 2 comparisons.

The smallest group consists of 2 elements. It takes only 1 comparison to find both Min and Max.

Let's analyze it. First, assume that n = 2^m for the sake of simplicity.

This means that at the 1st level it takes 2 comparisons. The 2nd level it takes 4 comparisons (2 groups). The 3rd level it takes 2 x 2^2 comparisons (4 groups) . The (m-2)-st level it takes 2 x 2^(m-2) comparison (2^(m-2) groups). The (m-1)-st level, it takes 2^(m-1) comparison (2^(m-1) groups).

To sum it up, 2 + 2^2 + 2^3 + ... + 2^(m-1) + 2^(m-1) comparisons That is 2^m - 1 - 1 + 2^(m-1).

That is a little bit less than (n + 1/2n) comparisons, which is the wanted number.

2SUM problem

2SUM problem

 Given an array a[0] ... a[n-1], find 2 elements that sum up to k

Now this is a classic problem. Everybody will certainly say it is O(n^2).

It turns out that there is a O(n log n) solution.

First of all, sort the array. Next, we put pointer p at a[0] and pointer q at a[n-1].

Let's try every possible case:

  • If a[p] + a[q] > k, we should reduce the sum, and the only way to do it is move q backward.
  • If a[p] + a[q] < k, we should increase the sum, and the only way to do it is move p forward.

Now let's see some proof.

If a[p] + a[q] > k, then a[p+i] + a[q] is always greater than k. Therefore, q will never be part of the answer.

If a[p] + a[q] < k, then a[p] + a[q-i] is always less than k. Therefore, p will never be the answer.

And that's it.

Facebook Interview Questions

Facebook Interview Questions

Right now I'm preparing for a phone interview from Facebook.

I have done all questions from XORSwap

They are not that difficult. I code on average 10 mins for each question.

I'll practice more questions from Glassdoor.

Here are the questions:

Coding Problems

“3SUM problem”

“Write a class providing an LRU cache”

“FInd the maximum sum of a sub-sequence from an positive integer array where any two numbers of sub-sequence are not adjacent to each other in the original sequence. E.g 1 2 3 4 5 6 --> 2 4 6”

“Compute the square root of a number down to a certain precision. ie sqrt(num, precision) returns a number that is in-between sqrt(num) - precision and sqrt(num) + precision.”

“Find the min and max in an array. Now do it in less than 2n comparisons. (they were looking for the solution that finds both max and min in about 3/2 n comparisons).”

“Given a tree, print the values contained at each level on the same line. So if you had the tree with root A, and children B and C, you would print: A B C”

“Given an array of integers, now we want to erase all 0's (can be other value), and we want the result array condensed, meaning no empty cell in the array.”

“Find a needle in the haystack: given two c-strings (null terminated), return a pointer to the location of where the needle occurs in the haystack, NULL otherwise.”

“given a list of words with a same size and a big string that contains one of the permutation of all the words combined(say p), find the startindex of the string p in the big string”

“given a set of words, return a set of anagrams . example : abc , cba , def, fed answer : (abc , cba) , (def , fed)”

“given a string, print every pair of consecutive characters example : abcd --> ab, bc, cd”

“Write the actual code to parse a regular expression including "*", which stands for 0 or more characters, "+", which stands for 1 or more characters, and ".", which stands for 1 exact character.”

“print out all prime numbers in a given string. abc2134kd31 -> 2, 13, 3, 3”

“Implement division without using multiplication or division. It should work most efficient and fast.”

“Implement a suggestion function that generates alternative strings for given password strings like "facebook" => "F@ceß00k" and "fæc€Bo0K" or sth.”

“Implement a function string balanceParanthesis(string s); which given a string s consisting of some parenthesis returns a string s1 in which parenthesis are balanced and differences between s and s1 are minimum. Eg - "(ab(xy)u)2)" -> "(ab(xy)u)2" ")))(((" -> ""”

“Implement a function rotateArray(vector arr, int r) which rotates the array by r places. Eg 1 2 3 4 5 on being rotated by 2 gives 4 5 1 2 3.”

“Given a Binary Search Tree, iterate over the elements without using recursion.”

“Given a set of non-overlapping integer ranges (1,3) (5,8), etc., and an input integer, what is the best way to organize the data and allow for quick search based on the input, etc.”

“Given a telephone number, find all the permutations of the letters assuming 1=abc, 2=def, etc.”

“Given a string, remove all the duplicate characters (not necessarily consecutive)”

“Given a set of words, group them into sets of anagrams.”

“Convert a binary search tree to a sorted, circular, doubly-linked list, in place (using the tree nodes as the new list nodes).”

“Print a singly-linked list backwards, in constant space and linear time.”

“Given a list of n objects, write a function that outputs the minimum set of numbers that sum to at least K. FOLLOW UP: can you beat O(n ln n)?”

“Write a function to prettify Json objects”

“Given sorted arrays of length n and 2n with n elements each, merge first array into second array.”

“Explain in detail what a hash table is, how would you implement it, what would be a good hash function, and when is it good to use a hash table.”

“Write a function that finds the minimum and maximum values within an unsorted array using divide-and-conquer.”

“Given a String containing java-script assets, write a parser which will output the String with proper indentation.”

“Print a binary tree in infix order. Recursive and iterative.”

“Fibonacci Numbers - Iteratively and Recursively”

“How would you implement a method to tell whether or not a string matches a regex that consists of lower case letters and *s.”

“You are trying to rob houses on a street. Each house has some +ve amount of cash. Your goal is to rob houses such that you maximize the total robbed amount. The constraint is once you rob a house you cannot rob a house adjascent to that house.”

“Given a matrix print it clockwise from the first element to the very inner element.”

“You are going to take some numbers as an input from a file. You need to write a program to find longest increasing sequence. You should process it as soon as you are taking an input. After finishing the last input immediately you should be able to tell the sequence. Input: 1 5 3 4 6 4 Output: 3 4 6”

“Given a set of integers, print out all its subsets. Write C/C++ code to solve it.”

“First question: for a random-ordered bucket of numbers 1 through 3000 with one number missing, how would you detect which number is missing?”

“Please write a program to merge 2 sorted arrays”

“Given a string, remove all chars from the string that are present in say another string called filter.”

“Given two binary trees, return true if they have same elements (irrespective of tree structure)”

“Implement stack using a queue”

“Print out all combinations of k numbers out of 1...N e.g. when k = 2, n = 4 Print out 12, 13, 14, 23, 24, 34”

“25 racehorses, no stopwatch. 5 tracks. Figure out the top three fastest horses in the fewest number of races.”

“Write a list class where the only data structure available is a stack”

“Given a function for a fair coin, write a function for a biased coin that returns heads 1/n times (n is a param).”

“Given a score S, and individual points p1,p2,...,pn. give all combinations of p that add up to s.”

“How do you find sequences of consecutive integers in a list that add to a particular number.”

“Write a function that takes in an integer and returns the number of ones set in the binary representation.”

“Find the first letter in a string that does not have a pair.”

“Given a binary tree, print out the elements in order. Without recursion.”

Conceptual Questions

“Open-ended systems/design question on storing and searching zillions of status updates”

“What do you see as fb's biggest challenge in the next 5 years?”

“Design a system to detect typos and provide suggestions to users.”

“Design and implement an algorithm that would correct typos: for example, if an extra letter is added, what would you do?”

“Design the Facebook Credit system which is a application where users can buy/trade virtual currency and can use the virtual currency to purchase Facebook services, like paid apps.”

“Given a 1TB file of serialized 4 byte integers, and 2GB of ram, sort the integers into a resulting 1TB file. My interviewer was very collaborative in entertaining various solution ideas until we came up with a combo that would work performantly and reduce the number of passes over the 1TB file and intermediate files.”

“A file contains 10 billions of Strings and need to find duplicate Strings. You have N number of systems available. How will you find duplicates?”

“Suppose you have a matrix of numbers. How can you easily compute the sum of any rectangle (i.e. a range [rowstart, rowend, colstart, colend]) of those numbers? How would you code this?”

“Given the numbers 1 to 1000, what is the minimum numbers guesses needed to find a specific number if you are given the hint "higher" or "lower" for each guess you make.”

“You have two lightbulbs and a 100-storey building. You want to find the floor at which the bulbs will break when dropped. Find the floor using the least number of drops.”

“Explain the difference between a LEFT and RIGHT SQL JOIN”

“How can you possibly do technical management if you're a very strong senior engineer?”

"You have a database with n normalized tables (the schema for which are shown on the board). The CEO complains that a particular daily report generated from it "is slow"; how would you deal with this issue?"

N independent decisions

N independent decisions

About 5 days ago, I have an phone interview with a Twitter engineer. He asked me to:

Given a string, for example, aa-(1,2)-(4,5)-xx, generate
aa-1-4-xx
aa-1-5-xx
aa-2-4-xx
aa-2-5-xx

Now that's the n independent choices. You must either use a depth-first search or a breadth-first search.

But that's kind of boring.

This morning I've just come up with a meta-programming solution. Basically, it generates a source code with n nested loops (for n independent decisions) and run that code.

This requires only memory to hold a source code. It should run faster than a depth-first search and a breadth-first search because it's just a loop.

Now let's see my new solution:

# Given a string s
statics = s.split(/\([^\)]+\)/)
decisions = s.match(/\([^\)]+\)/)[1..-1].map { |unit|
    choices = unit[1..-2].split(",")
    choices
}

code = ""

decisions.each_with_index { |choices,index|
    code += "[#{choices.join(',')}].each { |a#{index}|\n";
}

decisions.each_with_index { |choices,index|
    code += 'print "' + statics[index]} + '#{a' + index +'}"';
}

code += 'print "' + statics.last + '"';

decisions.each_with_index { |choices,index|
    code += "}\n";
}

eval(code)

The code does not seem to use any large memory as breadth-first search does.

But I think nested loops need memory. It's just that we don't manage it...

An even better solution

My friend at Facebook gave me an even better solution, which is similar to BigInt implementation.

# Given a string s
statics = s.split(/\([^\)]+\)/)
decisions = s.match(/\([^\)]+\)/)[1..-1].map { |unit|
    choices = unit[1..-2].split(",")
    choices
}

counter = []
decisions.length.times { counter.push(0) }

max = 1
decisions.each { |choices| max = max * choices.length }

max.times {

    end_index = 0
    while true

         counter[end_index] += 1

         break if counter[end_index] < decisions[end_index].length

         counter[end_index] = 0
         end_index += 1

    end

    s = ""
    counter.each_with_index { |i, index|
        s += "#{statics[index]}#{decisions[index][i]}"
    }

    s += statics.last
    puts s
}

That's a good one. Ok, next time, if there are n independent decisions, just use this technique.

Problem Solving Dichotomy

Problem Solving Dichotomy

I do believe that given a problem,

You either solve it or you don't

If you don't, you need to seek for a solution elsewhere.

It's probably a little exaggerated. But if you see a problem and you cannot solve it within an hour, you should give it up and find a solution elsewhere instead.

A solution emerges from your mind; It's not something you can deduce from.

As Polya used to say,

All mathematics theories are first guessed, then proved

Theories emerge in mathematicians' minds, then they try to prove it. If they succeed, then they are famous. If they don't, then they wait until another hits their minds, and they try again.

So, if you cannot solve any problem within an hour or so, find the solution on the internet or talk to other people about it.

Sliding Window Maximum

Sliding Window Maximum

A long array A[] is given to you. 
There is a sliding window of size w which is moving from the very left of 
the array to the very right. 
You can only see the w numbers in the window. 
Each time the sliding window moves rightwards by one position.

Input: A long array A[], and a window width w
Output: An array B[], B[i] is the maximum value of from A[i] to A[i+w-1]
Requirement: Find a good optimal way to get B[i]

Ok, the most obvious solution is to loop each window, which takes O(nw).

Of course, there is a better solution. Let's observe:

Observation 1

Let's say we have:

5 2 1 3 4 9, and w is 3

The windows are:

[5 2 1] 3 4 9
5 [2 1 3] 4 9
5 2 [1 3 4] 9
5 2 1 [3 4 9]

The first and second windows are highly related. As the window shifts, it removes one element and add another. However, as long as the maximum is not removed, we do not care.

And the solution seems to emerge. We have a pointer on the maximum element in the window. After we shift the window, we do the following:

  • If the maximum is not removed, compare it with the new element and adjust the pointer accordingly. O(1)
  • If the maximum is removed, we have to find the new maximum and update the pointer. O(w) with simple loop, or O(log w) with heap.

And we have to delete element from heap, as the window shifts. So, we need to track indices of elements in a heap. (So complicated)

The best solution we've got so far takes O(n log w). It's not good enough. Let's try to find O(n). The worst case is shown as follows:

6 5 4 3 2 1

We have to update the pointer every time the window shifts.

Basically, we need a data structure that:

  • Get maximum with O(1)
  • Remove element by its index with O(1)
  • Add element with O(1)

... ...

Ok, I give up. Let's look at Leetcode.com.

Leetcode.com's solution

It uses double-ended queue. The queue is kept sorted at all time in a very smart way.

The maximum index is in the front, 
the minimum index is in the back of the queue, and 
the queue stores indices.

When the window shifts, we:

  1. Remove all elements that is less than the new element from the back.
  2. Remove all elements that are out of the window from the front.
  3. Then add the new element to the back of the queue.
  4. Get the maximum of the current window from the front.

This means that some elements, which are not in the windows, can still be in the queue. But they are not in the front or the back. There are 2 observations here.

Observation 1

We do not remove the element immediately when it is out of the window. We only need remove it when it's in the front.

Observation 2

Let's see this example:

[5 1 3] 4

We don't have to keep 1 because 3 is always a better choice. And when we shift the window, we don't have to keep 3 because 4 is always a better choice.

That's why we just remove all lesser elements from the back of the queue and add the new element at the back.

That is a very very smart trick. Each element is added and removed at most once, and there is no loop over the queue. Therefore, the time complexity is O(n).

That is very very smart!

The k-th Smallest Element in 2 Sorted Arrays

The k-th Smallest Element in 2 Sorted Arrays

It's using the same old technique. I'm bored. Let future Tanin solve it.