I just realized it has been 2 months since the last post in this series.  I’m glad you are all patiently waiting.

In Part I, I explain why it is advantageous to buy aged/neglected sites directly from the original owners.

In Part II, I show you how to estimate a site’s traffic to help you get an idea of the value.

In this post, I’ll be showing you how to essentially short-list all the sites on the internet to just the ones that you might want to acquire.

Warning: This is a highly theoretical post.  There are MANY ways to build this list.  I programmed it myself in PHP/MySQL.  There are definitely other ways to do it.  Also, this is not a programming class.  There are a billion resources on the net if you want to learn how to program.  You’ll have to Google it.  Don’t ask programming questions here.  If you’re afraid to learn how to program, you can always find someone on Odesk or Freelancer to do the programming stuff for you.

Find Every Domain that Might be Worth Something

There are over 200 million domain names currently registered.  It wouldn’t make sense to manually estimate traffic and visit each one.  We are going to get a list of domains, and build a robot to create traffic estimates for every single one.

Start with the Alexa Top 1,000,000 domain list: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

Next, grab the Quantcast top 1,000,000 domain list: http://ak.quantcast.com/quantcast-top-million.zip

There will definitely be some overlap between these two lists, so you’ll want to build a script to somehow remove duplicates.

Narrow Down by TLD

At this point, you have a database of over a million unique domain names.  Each domain must have a morsel of value since it made it on either the Alexa or Quantcast top million list.

The first problem is that there are probably a bunch of international domains that you don’t want to waste your time with.  Or maybe there are international domains that you want to focus on.  Build a script to narrow down your list to just your target TLDs.  For example, I just focused on domains ending in: .com, .net, .org, .biz, .info, etc.  I removed any with the ending .cn, .se, etc.

Collect all Traffic Indicators for Every Domain

In terms of programming, this is probably the most difficult.  You are going to want to build some sort of web scraper/API puller/etc to get data on every one of the domains remaining on your list.  Your final goal will be a database full of information about every single domain that is on your list.  There are probably 800k websites still on your list, so doing this manually is an epic waste of time.

The info you should have on each site should be:

Domain Age

The year that the site was registered is sufficient.  My script would look up each site in WayBackMachine and pull the year that the first copy was archived.  I’m 100% positive there are a dozen better ways to do this, however that is for you to figure out.

Alexa Rank

I built my script two years ago, so I believe things have changed.  You can get an API to pull Alexa information, but it looks like it is not completely free anymore.  I’m sure there’s a way pull the alexa rank for each site for free if you keep working at it.

Check out: http://www.alexa.com/siteowners/data

SEMRush Data

You could get away with just grabbing the Organic Search Traffic estimate, but it you might want to save more than just that in your database.  For example, if you also save the SEM data, this will tell you if the site is currently paying for ads on Google Adwords.  If a site is still paying for ads, it is a tell-tale sign that it is NOT neglected.

Check out: http://www.semrush.com/api.html


You’ll also want to gather the Compete Estimated Traffic figures for each and every site on your list.  From what I remember, I was limited to 1,000 API calls with the free version.  I set up a cronjob to make sure I got my 1,000 API calls each day, and I might have only grabbed Compete data for sites that already had decent ages, alexa rank and acceptable SEMRush data.

Check out: https://www.compete.com/developer/


From what I remember, I simply could not figure out how to get this one to work.  I figured I had enough data about each site already, but it definitely would be nice to also get a Quantcast rank for each of the sites in your database.

Again, I built my script about 2 years ago.  Today, you might want to check out: http://www.quantcast.com/learning-center/guides/quantcast-silverlight-api-guide/

Other Info Worth Gathering

If you can already fill your database with the above stats, you’ll have a great resource to really narrow down a list of 20-80k sites that would be worth buying.  However, there is a ton more info you can scrape from each site to help narrow down your list further:

  • Adsense – If a site already has adsense on it, odds are that the owner will know roughly what the income potential is, and therefor not be willing to sell it super cheap.
  • Guestbooks/Webrings/Frontpage – If you come up with a list of archaic web buzzwords, you could automatically check every site to see which sites on your list still have these words present.
  • Whois Contact – It would definitely keep things organized if you can gather the contact info for every single site on your list.

Complete Your Database in a Timely Manner

My database is 2 years old.  It is pretty much useless at this point.  Things change quickly on the internet, so you’ll want to either complete your database quickly, or program your scripts in such a way that everything is always auto-updating.

Using your Database to Identify Potential Acquisitions

Your database will help take you from a million target domains down to a much more manageable number.  This will eliminate the 900k sites that don’t receive enough traffic to justify an acquisition.  Even when have a list of 50k potential targets, this is still WAY too many to try and contact every single owner.  In the next post, I’ll show you my method of streamlining the manual review that should take place before deciding to target a site.


5 Responses to “Finding and Buying Aged and Neglected Websites. Part III – Building your Million-Domain List”

  1. Scott Paterson says:

    Incredible! Well done. The funny thing is back 5 or so years ago I wrote my own search engine in php that curled content off of sites to cache on my server… and I never thought once about throwing in a db and using it for something like this… wow do I feel stupid.. and poor because of it. Awesome stuff.

  2. Jeff says:

    Thanks for continuing this series. Definitely good info.

  3. motley says:

    I agree with Scott, awesome stuff. I’ve been waiting for something different that writing a script. I thought you had some simple manual strategy to find neglected domains. Now I understand why nobody does the same.

  4. Rich says:

    Thanks for this series. I can’t wait for the next installment. About 6 years ago, I bought a handful of Myspace resource sites with a buddy. He paid for them and I optimized them for revenue. When the sites were paid off, we split the subsequent revenue 50/50. They churned out a decent revenue stream for years. I haven’t bought any new sites since then but I think I’ll get into the game again using your system.

    Motley: You don’t need to know how to build the script as long as you can describe what it should do clearly. There are sites out there like vworker.com where you can hire a programmer to create any script imaginable.

  5. Craig says:

    Thank you again, Tom!
    Will you be doing Part IV soon?!
    I’ve found a real gem and need some guidance on how to contact the owner. My emails are going unnoticed. The WHOIS does list the address and phone number. But what’s the best way to go about this? The whole contacting the owner part….

Leave a Reply