Several students ask us how to prepare for the Ph.D. interviews. There is a lot of misunderstanding about what we are looking for, and how we ensure that students that make it have not just "seen" the answers to the questions we ask. We are testing for the depth of your grasp on Computer Science concepts. We ask conceptual, creative questions to test whether you are reasoning based on first principles, whether you can think critically, and can solve problems intelligently. Below are some resources that should help you prepare.

Here are some sample questions we have asked students in the past. Many of these questions are quite simple, but students falter still, mainly because much of the education in our society is based on memorization rather than reasoning. We have also tried to provide sketches of expected solutions. Please do not email us for solutions. You can be sure your email will be ignored.

### First Principle

Suppose 8 of us are sitting around a table in random order. What is the probability you will sit next to me?
Each of 15 red balls and 15 green balls is marked with an integer between 1 and 100 inclusive; no integer appears on more than one ball. The value of a pair of balls is the sum of the numbers on the balls. Show there are at least two pairs, consisting of one red and one green ball, with the same value. Show that this is not true if there are 13 balls of each color.
There is a bag containing 5 white and 5 black balls. You repeat the following experiment till you see a white ball : take a ball uniformly at random out of the bag. If it is white, stop. Otherwise, put it back in the bag. What is the expected number of times you will need to draw a ball from the bag ?
What is the value of (14^14) mod 10?
### Algorithms and Data Structures

Derive the running time of the binary search algorithm. If I modify binary search to break the interval size into 1/3, 2/3 rather than 1/2, 1/2, then what is the worst case running time?
You have a mixed pile of N nuts and N bolts and need to quickly find the corresponding pairs of nuts and bolts. Each nut matches exactly one bolt, and each bolt matches exactly one nut. By fitting a nut and bolt together, you can see which is bigger. But it is not possible to directly compare two nuts or two bolts. Given an efficient method for solving the problem.
You are given a heap containing N elements. Write a procedure which takes as input a parameter k, and outputs the k'th smallest number in the heap. The running time of the procedure must depend on k alone.
Given a vertex pair (s,t) in a graph G, how can you find all edges e in G that lie on an some (s,t)-shortest path.
Suppose we use a hash function h to hash n distinct keys into an array T of length m. Assuming simple uniform hashing, what is the expected number of collisions?
### Cryptography

Describe the RSA algorithm. Suppose there is an efficient black box algorithm for factoring, then how would you use this algorithm to break RSA?
Why cannot one use a perfect square as a RSA modulus?
What is one time pad, and why are they useful?
### Database

Describe what happens when a query is submitted to the DBMS. Expected answer: A precise and technical description of parsing of the query, conversion into a query plan (physical and logical query plans), query optimization and execution.
What is transaction management? How can ACID properties be enforced? Expected answer: A precise and technical description of what a transaction is, and various methods (locking, time stamps, etc.) for enforcing it. Please note the words, "technical" and "precise".
Rewriting relational algebra expressions, writing SQL queries and query plans.
Analysis of index structures and algorithms for relational algebra operators.
### Networks

Which layers do the following protocols belong to: CSMA, TCP, IP, HTTP, SMTP, DNS, UDP, TDMA, NRZI?
Why is there is a minimum frame size defined in Ethernet?
Suppose a network path runs over a series of 3 links with available bandwidths 512Kbps, 256Kbps, and 128Kbps. The RTT for this path is 300ms. What would be a congestion window that can work on this network?
Suppose you are given a wired network with lossy links. A packet transmitted on link "i" is dropped on that link with probability "p_i". The network does not try to retransmit lost packets. Given a pair of source and destination nodes for a particular packet, how would you find the optimum path on which to transmit the packet so that the probability of the packet getting lost on that path is the least among all paths between the source and destination?
What is the "Additive Increase Multiplicative Decrease (AIMD)" principle used in TCP Reno? What purpose does it serve?
### Architecture

Consider a uniprocessor computer system A running a 1 GHz CPU. In system B, we replace the CPU with a 2 GHz CPU, all other components being the same. The logic design of the two CPUs is identical, with the difference in speed caused only by faster devices. Do you expect system B to execute programs at twice the speed of A? Why or why not?
A Finite State Machine (FSM) generates outputs 100, 200, and 300 in a cyclical order, one output in every clock cycle: 100, 200, 300, 100, 200, 300, 100, 200,...,etc. How many flip-flops are needed to implement this FSM? Why?
### Operating Systems

Consider the following service provided by a bank to transfer one rupee from account A to account B, implemented by the following software function:
void transfer(account *a, account *b)

{

if (a->money > 0) {

a->money--;

b->money++; }

}

The transaction could be initiated by users through an ATM machine that would invoke this function at a central server. When there was only one ATM machine, things were working fine. However, when multiple ATM machines were installed and users could simultaneously invoke the transfer() function, problems started happening due to software concurrency issues.

a. What is the problem if multiple users at multiple ATM machines could invoke this function simultaneously?

b. How can this problem be solved? Recall that this code is running at the central server. Your solution should involve changing this code so that it becomes correct even in the presence of multiple simultaneous requests.

c. Follow-up questions: is your solution efficient? How would you ensure that the maximum number of users can be served simultaneously? How would you ensure deadlock freedom?

(Possible solution: use locks; use fine-grained locks for efficiency; use global lock ordering to avoid deadlocks)

On modern computers, how much time does it take on average for the following operations (just provide a rough estimate, e.g., 1nanosecond, 10nanoseconds, 100nanoseconds, 1microsecond, 10microseconds, 100microseconds, 1millisecond, 10milliseconds, 100milliseconds, 1second, ....) a. to execute an instruction on the CPU b. to read a disk block c. to change one byte on the disk d. to overwrite 10 disk blocks. e. to handle a page fault in virtual memory to implement demand paging f. to context switch between two processes g. to execute "grep -R hello /usr/bin" (searches for the string "hello" in all files and subdirectories of "/usr/bin" recursively)
A computer system is expected to run long-running computational jobs like multiplication of large matrices, etc. and is also used for interactive services like web browsing, terminal input, etc. What should be the desirable characteristics of the CPU scheduler to ensure that all these activities can occur simultaneously with a good user experience and high overall throughput?
What is a file-system? What happens when you format a disk? Why do some OSes run a long-running filesystem check on startup if the computer was previously shutdown abruptly (due to power failure, for example).
Below are some resources that may be helpful for you to prepare. Again, we make up our own questions so these are just for you to get some practice in solving problems.

This website is useful for practice in algorithms.
This MIT course is very useful for practicing problems
This website has lots of problems for learning how to think from first principles.
Some useful websites for basic problem solving practice may be code chef, leetcode, Hacker Rank, Geeks for Geeks.
A useful book for problems in Data Structures and Algorithms is this one by Narasimha Karumanchi.
MIT's open courseware and Stanford's MOOCs are great online resources to help you practice.
Here are some interesting and fun links that talk about experiences of Ph.D. students from around the world.

The PhD storybook put together by Rijurekha.
The PhD comics!, known and loved by all.