Public Data Deep Dive: Construction Insights
I had previously worked in the construction industry, and was curious what kind of insights could be drawn from public records.
After some research, I found that permit data for construction jobs in New York City is publicly available, and should serve as a reliable, official data source to indicate the quantity of construction jobs, as well as provide some qualitative data – such as the type of construction work performed, the company performing it, etc.
I also found that aggregate data from tax returns could help us segment on specific areas of the city – and that postal codes were included in both data sets, allowing us to be fairly granular.
This was a fairly in-depth exercise to see what kind of (potentially) commercially valuable conclusions could be gained from that data.
Links below to show SQL queries / data definitions
- Condata - this table is the raw data that we extracted which contains the construction permit data for New York City.
- Taxdata - this table is the raw data that we extracted for income tax returns across the state of New York. We can join on the zip codes in the NYC permit data to get just the entries for the boroughs.
- Wavgsalary (Materialized View) - We use this materialized view to pre-calculate the weighted average salary (based on income tax data) for each zip code. This gives us an approximation of the potential purchasing power of the residents of each zip.
- Wagranks (Materialized View) - We use this materialized view to pre-calculate the (percentile) rankings of each zip code based on weighted average salary. That is, zip codes with a higher weighted average salary are have a higher score, which gives us one data point to evaluate them.
- Countranks (Materialized View) - We use this materialized view to pre-calculate the (percentile) rankings of each zipe code based on count of construction permits pulled. Zip codes with more permits pulled show a potentially greater demand for construction services overall. This is the other data point we can use to evaluate each zipe code, along with the 'Wagranks' view above. The reason to have both, is that some zip codes may have a high weighted average salary, but few permits issued, indicating low demand. Others may have a high number of permits, but lower average salary, potentially indicating lower purchasing power, and thus, may represent less opportunity for total revenue captured.
- Targetzip (Materialized View) - This is the materialized view that is the final output for the data. It outputs the columns that we would potentially want to see in a report. By zip code, we can see the calculated values, such as weighted average salary, number of permits, the percentile ranks of salaries, the percentile ranks of number of permits, and, crucially, a combined score of those percentile ranks. This combined score gives us a more complete picture of how to prioritize and evaluate each zip code, by both their capability and propensity to seek out construction services.
zipcode | count | wags | countscore | wagscore | score |
---|---|---|---|---|---|
10022 | 993 | 122273 | 99.44 | 96.08 | 195.53 |
10024 | 916 | 115053 | 98.32 | 90.50 | 188.82 |
11201 | 1046 | 110029 | 100 | 88.26 | 188.26 |
10011 | 870 | 115127 | 97.20 | 91.06 | 188.26 |
10023 | 784 | 117515 | 93.29 | 93.85 | 187.15 |
10014 | 710 | 119355 | 92.17 | 94.97 | 187.15 |
10019 | 865 | 104173 | 96.64 | 86.03 | 182.68 |
10016 | 806 | 107066 | 95.53 | 87.15 | 182.68 |
10013 | 880 | 99609 | 97.76 | 84.35 | 182.12 |
10017 | 611 | 115763 | 90.50 | 91.62 | 182.12 |