Last time I have described a theory about finding typos in a set of data in an Excel file without using spell checker e.g finding typos in accounts numbers. Let’s assume that in most cases our data is correct (often true) and we collected number of transactions made by our clients. The data might look like below.
To compute probability we will use two simple Excel functions COUNTIFS and COUNT.
COUNTIFS allows to count number of elements that satisfied some condition. It counts number of surnames, cities and accounts numbers that equal a specified value stored in each cell. On the other hand, COUNTA function counts number of non-empty cells from a specified range. Dividing the results of these two functions gives us a probability. The last thing is to add a filter to our database and sort our data to catch entries with the lowest probability instead of reviewing all data manually.