Data Preparation in Excel

Back in excel on Easter Sunday working through data prep. This chapter started pretty easy and then continued on and got into some conditional formulas, lookups, and data transformation which all went pretty well.

Started off with loading data from a separate CSV. You can do this under the Data tab and then ‘Get Data’. This allows you to bring in a few different types of data depending on what type of data you’re importing. You can bring in other excel workbooks, Text/CSV, XML, JSON,SharePoint Online list, OData, Blank table, and a Blank query. I have no idea what most of these are; I imagine they are just other forms of data/tables you’re bringing over to excel to clean and prep the data.

The first chapter went through other forms of data cleaning like ‘deduping’ (removing duplicates), flash filling, and series, filling. As a quick overview, these are all ways to quickly manipulate data to make it more seamless or organized.

The second chapter of this course went through data prep functions. Some functions I learned:

  1. LEN( ): counts the charachets in the text string. Useful for ID’ing data anomalies or ensuring data consistencies. Example: checking the length of phone numbers.

  2. CONCAT( ): Joins text from multiple inputs without adding delimiters. Ideal for merging information into a single cell/column. In this course we joined parts of addresses to put them in 1 cell so we could delete un-needed columns.

  3. TEXTJOIN( ): Similar to CONCAT but allows specifying a delimiter and including empty cells in the output. Useful for creating URLs or unique identifiers.

  4. TODAY( ): Displays current date. Useful for date calculation and indicating when a report was last updated.

  5. WEEKDAY( ): Returns a number representing the day of the week for a given date; adjustable to match your week start day. This can help in scheduling or analysis tracking.

  6. WORKDAY( ): Calculates a date based on a number of working days from a start date, excluding specified holidays. This would be helping in project planning and deadline tracking. Other than that, unsure how/why you would use it.

First two chapters I would chalk up as a bit boring but I imagine can be pretty helpful depending on the data you’re working with. The last two chapters went into logical functions, lookups and pivot tables. Found this more interesting and also more difficult.

For logical functions, I learned/used IF, AND, OR, and NOT functions. A few notes I took on these fuctions:

  1. I was instructed to use a NOT function to get a greater than value. With that, I had to use the less than call. Looked like this: =NOT(logic1<=logic2).

  2. For an IF(AND( ) ) statement, you will always call the logic with the outputs you want to return. Ex.: IF(AND(A1=B1, A2=B2), “Return Text True”,“Return Text False”. When typing in the function in excel it shows the needed values as well.

I ran into trouble with lookups during the last chapter of this course. I was trying to cal an HLOOKUP( ) when a VLOOKUP( ) was needed. Also was calling the wrong first cell when trying to add the function. For a lookup function you need at a minimum 3 values: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]).

  1. lookup_value = the value you are looking for

  2. table_array = the ranger where excel should search

  3. col_index_num = the column number (in the table) to return data from

  4. [range_lookup] = optional: FALSE for exact match, TRUE for approximate match. Typically this is FALSE

And last but certainly not least…pivot tables. Pivot table is like an MBA buzzword for excel. Unless you work in excel no one really knows what a pivot table is but it kind of has this artificial aura around it that they are wicked hard to put together. (Psst…the MBA’s LOVE knowing how to do a pivot table when other people don’t). Here’s a fun TJ fact for you: they aren’t that hard. As a matter of fact, they’re extremely easy once your data looks good. Here’s how to do it:

  1. Highlight your data

  2. Create pivot table: Insert tab -> PivotTable

  3. Drag and drop your column headers (i.e. data) into the fields. Fields are Filters, Columns, Rows, Values. You see the column headers above the fields you drag them to. You literally cannot miss that.

  4. BAM, your pivot table is created!

With all of that said, maybe it’ll take a few seconds to work with the values and what goes where in the fields but that’s all drag and drop. And it’s easy to take values away. You also drag and drop the values you want to remove. Easy way to do baseline analysis of your data. These are the first ‘visualizations’ made in excel so far which is cool. It’s only a table that gets output but still a type of visualization nonetheless.

One thing I did in the pivot tables wrong: the course called for using filters. For the life of me, I couldn’t find where to put in filters. So, I inserted a slicer. This broke down sales data from the different days of the week. I still was able to get the same answers and complete the work. But, it was technically ‘wrong’ as I didn’t use filters. Using the slicer was cool though. I realized that after but I got a more interactive filter if you will so I kept it.

And that’s it from the data prep course. Pretty cool stuff. Excited to jump into the next excel course this week which is Data Visualization in Excel.

-TJ

Previous
Previous

Intermediate SQL

Next
Next

Intro to R