Correcting data in Excel

Last time I have described a theory about finding typos in a set of data in an Excel file without using spell checker e.g finding typos in accounts numbers. Let’s assume that in most cases our data is correct (often true) and we collected number of transactions made by our clients. The data might look like below.

Typos

To compute probability we will use two simple Excel functions COUNTIFS and COUNT.

= COUNTIFS(B$2:B$21;B2)/COUNTA(B$2:B$21)

COUNTIFS allows to count number of elements that satisfied some condition. It counts number of surnames, cities and accounts numbers that equal a specified value stored in each cell. On the other hand, COUNTA function counts number of non-empty cells from a specified range. Dividing the results of these two functions gives us a probability. The last thing is to add a filter to our database and sort our data to catch entries with the lowest probability instead of reviewing all data manually.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s