On DataQuick's Changed Methodology
Welcome back to the bubble blogging world!! I am sure you have had lots of positive comments and replies but it is nice to see you back after such a wild set of circumstances. My name is XXX I work for a Bay Area finance and real estate site based out of Foster City. I read your blog and the Sonoma bubble blog quite often to see whats happening in the North Bay. Living in the Bay Area myself (San Jose) I also have a similar personal interest in the housing bubble. I've actually created a section of our site dedicated to Bay Area and some National housing bubble news. I also feel its important to get the word out especially amongst the misinformation thats currently out there and we've had surprisingly good responses from readers and other bloggers. I'd like to help support your blog by dropping you a donation but notice you don't have a donate button or tip jar on your site. Let me know if theres a way to donate (paypal perhaps?)
I also want to encourage you to read some of our posts. We cover a lot of Bay Area as well as National topics that you specifically blog about. Some of the articles that may be of interest to you are:
Marin County Housing Data Conflict
Gloom and Doom in California: Subprime Loan Foreclosure Projections
San Francisco Bay Area Home Sales Fall Again, Prices Still Flat
So let me know how to donate to your blog. I'm certainly glad your back. And I hope you may find some of our information useful. Please feel free to use any of the articles or data/graphs for your blog.
Regarding that first article cited in the email: If you didn't notice, when DataQuick announced their January, 2007 results they slipped in a little footnote indicating that they were changing the way they calculate the median sale price for their publicly consumed reports (the ones with the data tables I like to show on this blog) but not the ones given out to realtors. They are also changing the way they count a transaction as a sale.
Regarding the New Calculation of the Median
Starting with the January, 2007 data DataQuick is calculating a straight-up median based on everything that sells in the region of interest. A straight-up median was what I've always assumed they were doing all along. But apparently, prior to January, 2007 they were calculating a weighted average of medians across property types. Here is how someone at DataQuick described to me what they were doing in the past:
For a weighted average of medians, we multiplied the number of sales by the median price, for each home type, added the results together and divided by the total number of sales.
This is an average of medians. Because averages are more strongly subject to the biasing effects of "outliers" as compared to medians, DataQuick's previous data was potentially noisier than if they had just calculated a plain-ol-vanilla median. In Marin, we have a very positively skewed distribution of sales -- the vast majority of sales bunches up in the low end of the pricing distribution. So our outliers are the über-expensive, multimillion dollar houses.
DataQuick assures us that there will only be about a 1% discrepancy when comparing current calculations of the median to pre-January, 2007 calculations of the median. I believe their claim but only for large counties with large sample sizes of sold properties and which are more normally distributed.
But what about a small county like Marin where a small number of properties sells in any given month and where there is a wide disparity between the "low end" and the "high end"? You should already have a sense of how "noisy" (i.e., variable) our data is. Recall those plots of mine showing DataQuick's year-over-year percent price appreciation (here is one example). Remember how Marin's data is all over the place whereas the data for the entire Bay Area forms a nice, fairly smooth curve? That's what I am talking about when I say "noisy". The noise is due to our small sample size where a small number of hugely expensive houses can throw off the calculations for the whole county.
And because of our positively skewed distribution here in Marin, it can be seen in the above equation that greater weight is being given to the medians derived from the lower priced property categories as compared to the higher priced categories. Perhaps DataQuick did this in an attempt to make the reported county median more representative of what most people were buying. I can understand the rationale for that. But now, with the new straight-up calculation of the median, the "lowly" $500K condo and the $20 million house are both contributing equally to the calculation of the median. So I think we can expect the reported county medians since January, 2007 to be higher than what they would have been under the old average of weighted medians calculation simply because greater weight is no longer being given to the lower end of the pricing distribution.
Furthermore, when calculating year-over-year percent appreciation for Marin, the appreciation rate will be biased larger when comparing calculations based on the new scheme to that of the old scheme. This explains a lot of the recent appreciation activity in Marin lately compared to the larger counties in the Bay Area. It won't be until January, 2008 when the year-over-year comparison is unbiased.Regarding the New Calculation of Sales
This is what DataQuick says about the new method of counting the number of sales:
So it should be clear from the above that the number of sales DataQuick now reports each month will be significantly higher (10% is huge) than it would have been prior to January, 2007. Again, what they are doing now vis-à-vis sales makes a lot of sense and I have to wonder why they weren't doing it before.
To count as an "arm's-length" sale for our sales counts, the logic we've used insisted that there be a seller, a buyer, and that money changed hands. We've now expanded this to include transactions where there was a purchase loan if no price was apparent.
We're also now including multiple sales transactions. If three homes were bought in the same transaction, we now count them as three home sales, not one sale.
These changes increase monthly sales counts by an average of 10 percent. Intra-family transfers are not included, nor are foreclosures until a home is re-sold to a new buyer.
The Bottom Line
Overall it should be obvious that comparing DataQuick's current calculations of the median and number of sales with those of before January, 2007 will be problematic at best. For Marin, reported year-over-year appreciation rates will be biased in the direction of being higher. Similarly, year-over-year comparisons of sales will be too high, not just for Marin but for most counties. The net effect will be temporarily biased data in the direction of suggesting that things are healthier than they really are. Don't get me wrong: I don't think DataQuick is purposely biasing in that direction; it is just a natural consequence of their (needed and sensible) changes in methodology. But nevertheless, their year-over-year statistics will remain inaccurate until January, 2008 rolls around when they will again be comparing apples to apples, and oranges to oranges.
As a result of all this, I don't think I will be relying on their data as much as I was before. I am now back to where I was originally, wondering if I should be accepting donations to obtain for-pay data services.