How to find the duplicate in an array using Big O of N?
Question
Write an algorithm that finds the duplicate in an array using Big O of N, not Big O of N^2.
Information
The array is an integer array and has a MAX_VALUE items (lets just use 1000 as an example). All integers are between 0 and MAX_VALUE (pretend the input is limited to values between 0 and MAX_VALUE). There is only one duplicate and you are not sure where it is. What is the most efficient way to find it.
int[] i = new int[1000];
There is one duplicate and it is the worst case duplicate. You can simulate a worst case duplicate as follows:
for (int i = 0; i < 999; i++) { array[i] = i+1; } array[999] = 999;
So the duplicates are the second to last and the last.
Answer
This is one possible answer that assumes the values are between 0 and MAX_VALUE.
/// <summary> /// Efficient because it has a Big O of n. /// </summary> public int FindDuplcate(int[] inArray) { bool[] tmpArray = new bool[1000]; for (int i = 0; i < inArray.Length; i++) { if (tmpArray[inArray[i]] == true) return i; tmpArray[inArray[i]] = true; } return -1; }
Incorrect answers
Here is a little test app with two incorrect answers and two possible correct answer. Feel free to comment with your answers.
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Collections; namespace FindDuplicateInArray { class Program { const int MAX_VALUE = 1000; static void Main(string[] args) { // Worst case int[] array = new int[MAX_VALUE]; for (int i = 0; i < MAX_VALUE - 1; i++) { array[i] = i + 1; } array[MAX_VALUE - 1] = MAX_VALUE - 1; // Incorrect answer: Big O of N^2 //IFindADuplicateInArrays dupFinder1 = new BruteForceDupFinder(); //int dup = dupFinder1.FindDuplcate(array); // Incorrect answer: Big O of N^2 //IFindADuplicateInArrays dupFinder2 = new LessBruteForceDupFinder(); //int dup = dupFinder2.FindDuplcate(array); // Possible correct answer: Big O of N but with a limitation //IFindADuplicateInArrays dupFinder3 = new EfficientDupFinderWithLimitation(); //int dup = dupFinder3.FindDuplcate(array); // Possbile correct answer: Big O of N IFindADuplicateInArrays dupFinder3 = new EfficientDupFinder(); int dup = dupFinder3.FindDuplcate(array); Console.WriteLine(dup); } public interface IFindADuplicateInArrays { int FindDuplcate(int[] inArray); } /// <summary> /// Incorrect Answer: This is simple N*N alogithm. Big 0 = N^2 /// </summary> public class BruteForceDupFinder : IFindADuplicateInArrays { #region IFindADuplicateInArrays Members public int FindDuplcate(int[] inArray) { for (int i = 0; i < inArray.Length; i++) { for (int j = 0; j < inArray.Length; j++) { if (i != j && inArray[i] == inArray[j]) return j; } } return -1; } #endregion } /// <summary> /// Incorrect Answer: This is N(N-1)/2 which is N*N/2-n/2 so remove the constants N^2-N. /// When there is N^2-N you can keep the N that grows fastest: Big O = N2 /// But the original alrogithm is less N^2 than just N^2. /// </summary> public class LessBruteForceDupFinder : IFindADuplicateInArrays { #region IFindADuplicateInArrays Members public int FindDuplcate(int[] inArray) { for (int i = 0; i < inArray.Length; i++) { for (int j = i + 1; j < inArray.Length; j++) { if (inArray[i] == inArray[j]) return j; } } return -1; } #endregion } /// <summary> /// Possible Answer: Efficient because it has a Big O of n. /// Limitation: If an item in the array is greater than the MAX_VALUE, /// an ArrayIndexOutOfBoundsException occurs. /// </summary> public class EfficientDupFinderWithLimitation : IFindADuplicateInArrays { #region IFindADuplicateInArrays Members public int FindDuplcate(int[] inArray) { bool[] tmpArray = new bool[1000]; for (int i = 0; i < inArray.Length; i++) { if (tmpArray[inArray[i]] == true) return i; tmpArray[inArray[i]] = true; } return -1; } #endregion } /// <summary> /// Possible Answer: Efficient because it has a Big O of n. /// </summary> public class EfficientDupFinder : IFindADuplicateInArrays { #region IFindADuplicateInArrays Members public int FindDuplcate(int[] inArray) { Dictionary<int, bool> table = new Dictionary<int, bool>(); for (int i = 0; i < inArray.Length; i++) { if (table.Keys.Contains(inArray[i])) return i; table.Add(inArray[i], true); } return -1; } #endregion } } }
Run this through Visual Studio’s Performance Analysis tool once for each answer (comment and uncomment). To really see the difference, change MAX_VALUE to 1 million.