Step 1 Download the file for your city (This would be the city you chose a neigh
Step 1 Download the file for your city (This would be the city you chose a neighborhood on earlier in the semester)
you will be working on the sandiego excel sheet that i have uploaded
San DiegoDownload San Diego (you can find it in the files update)
You will only accomplish one task…
Find which host has the MOST Air BnB Properties and graph the 5 highest
In the example we used data I made up for an example – here I will be using real data from the website ‘Inside Air BnB’ which collects data from Air BnB to show how they contribute to issues of affordable housing.
You should follow along as we go, but just in case you get lost or mess up I have completed the example for you. It always helps as you can chart your progress and use the formulas here as a template for YOUR CITY.
It’s up to you how you want to tackle this – I’m the worst so I usually try to skip ahead – lift the heaviest weight – take the greatest challenge – and then fail and try the tutorial.
I am integrating concepts from the game design theory of metroidvania… Look it up if you are interested.
Here is my completed Excel Worksheet for Portland Download Here is my completed Excel Worksheet for Portland. (example chart in files)
Step 1 Poke around…
EACH ROW is a different Air BnB property. There are a lot of properties! probably more than 4,000.
There are 8 data columns – there are only 1 of them you will need to consider for now (3 if you are doing the ‘A’ assignment). I needed to create Latitude and Longitude so I could make the files for next week (for mapping)
HOST ID – Uniquely identifies each host. This is helpful because…
HOST NAME – Only identifies the name of the host (it is possible there are different host_id and host_names because hosts can have the same name)
NEIGHBORHOOD – What neighborhood is the property located in? Sometimes these will be names – other times numbers. If they are numbers they simply represent zipcodes.
ROOM_TYPE – Air BnB has three categories – Entire Home, Private Room, or Shared Apartment are the options.
PRICE – The price in dollars of an Air BnB per night
Reviews – The number of reviews the property had
Step 2 Set up the spreadsheet
Click ‘A’ in the top row to highlight the whole host_id column
Copy it and click on J (the whole column) and paste
The final copy and paste will be a column which you will populate.
In K1 write ‘count’
In K2 enter ‘1’
highlight K2 and click on the green square in the bottom right hand portion of the box
DRAG IT DOWN ALL THE WAY TO THE LAST CELL (sorry this is going to take a while!)
For me it was ‘4603’
Once you have 1’s all the way down to the last row of Air BnB Properties.
Step 3: Create and populate a new sheet.
Name it ‘host id’
Copy and Paste columns J and K (Host ID and the count column you created) into the host id sheet (columns A and B)’
Step 4: SUMIF
In C1 of the host id sheet type ‘SUMIF’
This is a new formula! SUMIF will help us calculate which hosts have the most reviews
Copy this formula into C2 of the host id sheet
SUMIF($A$2:$A$4603, A2, $B$2:$B$4603)
I KNOW THERE IS no =(equals) sign. The reason for this is because you will have to change some values based on the spreadsheet of the city you have.
First change the 4603 (look it appears twice in the SUMIF formula above) and change it to the the number of the last row you have data entry (in my case it’s the 4603th row – there are 4603 properties in Portland – there will a similar number in the thousands for your city)
You can scroll to the bottom of the document to find this number.
After you paste into the cell put an equals sign at the begging
Once you have changed 4603 to the number unique to your spreadsheet…
PUT IN THE EQUAL SIGN AT THE BEGINNING OF THE EQUATION. It won’t work otherwise
Then… drag it down (the method we used for count above) – highlighting C2 and dragging it all the way down to that last row you just input into the formula.
IF YOU ARE GETTING LOST DON’T FORGET TO PRESS UNDO
PLEASE REFER TO MY SHEET IN PORTLAND TO SEE HOW ALL OF THIS WORKS (LINK ABOVE)
It looks something like this…
What does this tell us. It counts the number of times each host id (that unique identifier for each host) appears in a row
From this information we will find out who had the the most properties.
Step 5: Copying for Sorting and Analysis
Copy the whole row A into Column E
Copy the whole row C —- THEN right click on F to highlight the whole column — right click — go to paste special — values (Remember this from last lab?)
Now we can sort to find the five highest Host Id’s
Left click on F and drag over to E so both columns are highlighted
SORT AND FILTER FROM LARGEST TO SMALLEST! (Z through A)
If a screen pops up just press ok…
Step 6: Get Rid of the Duplicates
Great! The problem is we have SO many duplicates – let’s get rid of them
Highlight E AND F and find ‘Data’ in the top ribbon
From here find ‘remove duplicates’ – that little Neapolitan ice cream looking thing with the red x next to text to columns
Again – if this isn’t popping up full screen Excel or look up the tool on Google.
Just press ok when the menu comes up
Great we got rid of all the duplicate values
If those exponents look weird simply expand the E column by hovering over the border and dragging it a little larger
Step 7 Analysis
The next step is to simply look up these numbers in the original spreadsheet
Go back to Sheet 1
Press Command (or Control) F and simply type in 5615582 (this is the host id with the most properties) and press enter (make sure nothing is highlighted in the sheet or it will only search what is highlighted
Press enter several times in the entry in the host_id column doesn’t come up
Look at that! It’s called RoomPicks!
Let’s enter that on the ‘host_id’ spread sheet into G2
Rinse and Repeat and it will look something like this
Step 8 Preparing for a Chart
Finally – copy the first five entries for the names you just recorded (G1-G6) Into I2 – make sure you type ‘host’ into I1
and F1-F6 (the sum of properties for each of the top five hosts) into J2 – put the ‘number of properties into J1
If you’ve made it this far you are in good shape!
Next highlight J2 and scroll over to include J2 – J6 and I2 – I6 DO NOT INCLUDE THE NAMES AT TOP
Once you have highlighted this table sort in reverse order z – a (see the values above)
We do this so our chart will look nice
Step 9: Make a Chart
Simply highlight the entire table with the title like below and then click on ‘insert’ on the top ribbon and find the vertical bar chart
2d column will do
It will look something like this
Just Change the title and you are good to go
Step 10 – YOU DON’T HAVE TO HAND IN ANYTHING YOU JUST DID – after all I gave you answers above, which you should totally use to make the same chart for your city!
AshevilleDownload Asheville
AustinDownload Austin
BostonDownload Boston
ChicagoDownload Chicago
NashvilleDownload Nashville
San DiegoDownload San Diego
Using SUMIF in the method described above:
Find the the total number of properties for each Host ID
The names of the hosts with the 5 most properties
Create a graph that features the names of these top five hosts and how many properties they own (should go from left to right on the graph from lowest to highest – like shown in my graph for Portland)