Archive

Posts Tagged ‘search-engine’

Help test some next-generation infrastructure

August 11th, 2009 No comments

Official Google Webmaster Central Blog: Help test some next-generation infrastructure

To build a great web search engine, you need to:

1. Crawl a large chunk of the web.
2. Index the resulting pages and compute how reputable those pages are.
3. Rank and return the most relevant pages for users’ queries as quickly as possible.

For the last several months, a large team of Googlers has been working on a secret project: a next-generation architecture for Google’s web search. It’s the first step in a process that will let us push the envelope on size, indexing speed, accuracy, comprehensiveness and other dimensions. The new infrastructure sits “under the hood” of Google’s search engine, which means that most users won’t notice a difference in search results. But web developers and power searchers might notice a few differences, so we’re opening up a web developer preview to collect feedback.

Some parts of this system aren’t completely finished yet, so we’d welcome feedback on any issues you see. We invite you to visit the web developer preview of Google’s new infrastructure at http://www2.sandbox.google.com/ and try searches there.

Right now, we only want feedback on the differences between Google’s current search results and our new system. We’re also interested in higher-level feedback (”These types of sites seem to rank better or worse in the new system”) in addition to “This specific site should or shouldn’t rank for this query.” Engineers will be reading the feedback, but we won’t have the cycles to send replies.

Here’s how to give us feedback: Do a search at http://www2.sandbox.google.com/ and look on the search results page for a link at the bottom of the page that says “Dissatisfied? Help us improve.” Click on that link, type your feedback in the text box and then include the word caffeine somewhere in the text box. Thanks in advance for your feedback!

Posted by Sitaram Iyer, Staff Software Engineer, and Matt Cutts, Principal Engineer

Read the original post:
Help test some next-generation infrastructure

Share/Save/Bookmark

SEO and SEM: Truth in numbers

August 1st, 2009 No comments

There is an ongoing debate in search marketing about whether Search Engine Marketing or Search Engine Optimization is more effective for driving traffic to company sites.

Search Engine Optimization, or SEO, involves getting links to web sites to rank higher in natural, or so-called organic search results on certain keywords. Search Engine Marketing, or SEM, involves buying paid search ads for specific keywords. Both are designed to catch users’ attention when they are searching, and prompt them to click through the ad or the link to a web site.

Through research, I have discovered that while articles that come up in organic listings get more clicks than do the paid listings on Google, individual business web sites receive more clicks from the paid listings than from the organic listings.

Confused? SEO companies like to use the numbers to prove that SEO is the way to go. However, they do this by not giving you all of the variables. If you look at the whole picture, then you can find the truth in the numbers.

Commercial Searches

Search results

Let’s say you are in the market for a BMW car. You go to Google and do a search for “new bmw 525i.” You would get results that look similar to this:

If you examine the paid listings (labeled “Sponsored Links” on the right and at the top), you will see that the paid listings list dealerships, price quotes, etc. The organic listings show concept cars, BMW 5 series information and more. So, in this search, which we call a “commercial search,” users tend to click on the paid listings.

Informational Searches

Informational search results

Now, let’s say you’re interested to learn more about a MySQL programming topic such as “mysql database table structure.” So you search Google for that phrase and get results that look like this:

Informational search results

Here you will not see any paid listings; instead, it’s all organic listings. This is called an “informational search,” which happens more often than “commercial searches.”

The last few times you used a search engine, were you looking to buy something or find information? If you answer that question, then you will understand the first part of my theory that organic listings receive more clicks than paid listings on search engines.

Applied to Business Needs

Let’s find out how individual businesses receive more clicks from the paid listings than from organic listings. The number of clicks that go to paid listings versus organic listings ranges from 30% to 75%. They range based on the search engine used, but let’s stick with Google, where the numbers range from 30% to 35%. Let’s be conservative and use 30%. So, of all clicks that take place on Google, 30% are conducted on paid listings.

Since it is difficult to know the number of clicks performed on Google each month, let’s use one billion clicks per month for this example.

Now, I know what you are saying to yourself: 700 million clicks go to organic listings and 300 million clicks go to paid listings — that’s what the SEO advocates show you. But what they don’t show you is how many sites share those clicks. That’s a huge part of the equation.

To keep it simple, Google indexes billions of sites compared to a few hundred thousand customers, which means there is much more competition for organic searches than for paid searches. This means your business will receive more clicks on paid ads than on organic ones.

If your business sells something, you want to bring in customers who are conducting commercial searches. So when an SEO firm says only 30% of the clicks that happen on Google go to paid customers, they are not looking at the full picture.

SEM Extends Beyond Search Engines

There is another very important variable to look at with regards to SEO and SEM; Google gets most of its traffic from sites other than Google!

Google puts its paid listings on sites such as PC Magazine, Wired, MLB.com, MySpace and YouTube. You can find more at https://adwords.google.com/select/afc/partners.html.

Now, imagine how many clicks are generated from just Google and your organic listings only benefit from that site. Meanwhile, SEM benefits from all of those top-ranked sites plus Google. When you add that into the mix, you’ll understand why SEM customers have access to so many more clicks than SEO could ever access.
Richard Kahn, CEO of eZanga (www.eZanga.com), founded the company in 2003 following a decade of experience in the online advertising industry. During this time, Kahn founded an ISP called First Street Corporation, which he later sold. He then held the COO position at PPC advertising network AdOrigin. eZanga specializes in helping companies drive their bottom line through online marketing, including search marketing and contextual advertising.

source >>

See original here:
SEO and SEM: Truth in numbers

Share/Save/Bookmark

Lifestream Blog Redesign Now Live

February 21st, 2009 No comments

Lifestream Blog Redesign Now Live | Lifestream Blog

So late last year I decided it was time for a site re-design. The blog turns 3 years old in March and I thought it was time so I spent the last month or so working on it.

lifestream_redesign

I started the process looking for a base Wordpress theme I could use. This took a looong time which is kind of strange since all I wanted was a pretty minimal theme. Luckily I finally found what I wanted in the Blue White theme by AskGraphics. And even though it offered a design I really liked, I ended up editing almost all images and template files, as well as the stylesheet. The theme has a very rigid method for forcing several elements so I had to spend way more time in Photoshop than I would have liked to. I also inserted, moved around, and removed tons Wordpress functions from the templates. In the end, I’m pretty happy with the outcome.

Along with the design changes, I also wanted to make some content changes as well. I removed widgets for Google Friend connect and MyBlogLog and opted for the addition of the Twitter Remote widget. This provides a list of the latest Twitter users that visit the site. I wasn’t seeing much value in the addition of Google Connect and really only wanted one recent readers widget and currently feel Twitter users cast a wider net for inclusion on the site.

I also added a great little customizable Twitter search widget to display the latest tweets from anyone using the term “Lifestreaming”. So if you find something on the web and want to share it with people and appear on the sidebar, send a tweet.

I also had previously offered both the internal Wordpress search as well as the Lijit search service. I tweaked some of the settings for Lijit and opted to make it my only search mechanism. Lijit is in my mind the best search service on the web and offers functionality which resonates well with my content and readers.

In addition to the Lifestreaming Tweets widget, I also wanted to try and include any more helpful data I could aggregate on the site that is useful. I use many tools and methods to track Lifestreaming news and wanted to find ways to share that on the site. I’m a big fan of sites that do a good job of providing a comprehensive way to aggregate data. One such site that caught my eye recently is the Lifestreaming page over at Daymix.com

For now I have added an RSS feed to my Delicious Lifestream tag. I use this to track all the latest info so it’s probably the best way to get my real-time feed. I’m currently just using the included RSS widget provided by Wordpress but hope to add Simplepie soon which is a more powerful and flexible RSS parser as I add more feeds.

I also plan to revamp several of the other navigation pages as well. I really want to add many more services to the Define area of the site and make it more comprehensive. Many of the other pages need attention as well. But I grow tired of postponing getting this live, especially as it’s eating into what I should be doing…which is writing more posts. So here it is.

Anyways, I hope while not a story on Lifestraming, this has provided some valuable insight on my re-design and the process. I always find it very useful to hear others provide this type of information. I hope you find this useful in the spirit of sharing.

A few days ago I posted a preview of the current design on FriendFeed. I wanted to thank everyone who provided me with feedback. I really appreciate it. On that same note, please leave a comment or send me an email if there is something else you’d like to have me add to the site.

If you enjoyed this post, make sure you subscribe to my RSS feed!

SOURCE

Powered by ScribeFire.

Read the original post:
Lifestream Blog Redesign Now Live

Share/Save/Bookmark

Twitter Finally Integrates Its Real-Time Search Engine

February 21st, 2009 No comments

Twitter Finally Integrates Its Real-Time Search Engine – ReadWriteWeb

twitter_logo_Jan_09.pngTwitter just announced that it is slowly releasing a new interface to a subset of its users that will put Twitter Search and Trends right on users’ profiles. Until now, Twitter’s real-time search function, which was acquired from Summize last year, lived on a separate subdomain and was not fully integrated into Twitter. Clearly, Twitter has realized that real-time search is one of its core features if it wants to monetize its service successfully.

The only search function that was available from the profile pages until now was Twitter’s people search. For users who already have access to the new interface, a search box will appear at the top of the page, as well as a link to Twitter’s trends page.

According to Biz Stone, the integrated search will become available to all users in the near future, after “a bunch of us have kicked the tires a bit.”

twitter_search_integration.png

Until now, a lot of users were probably not even aware of the fact that Twitter had a search engine, and Biz Stone explicitly mentions that private accounts will not be indexed by Twitter’s search engine. According to Stone, over 90% of all Twitter users make their updates public.

Twitter’s real-time search is probably one of Twitter’s most valuable features, as it allows you to keep track of an event as it unfolds in real-time. Here at RWW, we use it daily – either through Tweetdeck’s or Twhirl’s built-in search, or directly on the web.

Are you on Twitter? Here are the accounts of ReadWriteWeb’s writers if you’d like to follow us.

SOURCE

Original post:
Twitter Finally Integrates Its Real-Time Search Engine

Share/Save/Bookmark

The Several Habits of Wildly Successful Twitter Users

February 19th, 2009 No comments

The Several Habits of Wildly Successful Twitter Users

Twitter is a deceptively simple utility. That said, Twitter isn’t for everyone. In fact it’s probably not even for half of everyone. But for those that have the patience to find their personal sweet spot, Twitter can be quite good indeed. If you’re not familar with Twitter, it’s pretty easy to describe. It’s instant messaging with a group. You post a short message via IM, web or other utility (see below) and other Twitterers who are “following” you will see your message. Some have called Twitter a form of microblogging and I think that’s a helpful way of looking at it. Most people don’t quite “get” Twitter at first. This post aims to give you a leg up on the learning curve. For starters, you’ll need an account and you can do that here (yeah, it’s free). Once you’ve got your account, you might feel lonely. I’m here for you, buddy. Just click “add”.

Just a little more intro before we jump into the Habits. Most Twitter newbies are underwhelmed by what they find there. It seems…useless. And unless you persevere a bit, you’ll probably walk away from your account wondering what the fuss was all about. This post aims to help give you a leg up on the Twitter learning curve. One last note: if you’ve never used instant messaging before, this may change your mind. It may also be the last straw that convinces you to become Amish. You’ve been warned.

Habit One: Make the right friends
This habit is pretty much the engine that drives many of the other habits. It is so easy to “follow” someone in Twitter that many folks go a little bit overboard at first. If you can handle the firehose of twittering, go for it. Otherwise, don’t get wrapped up in the “more is better” trap.

* Really think about what kind of info you want pushed your way. If you’re interested in what all the cool geek kids are doing, then go ahead and follow them–they’re all there.
* You can add friends on the web by going to their profile and clicking the “add” link.
* You can add friends via IM, if you know their username by sending an IM to the twitter@twitter.com contact and typing: follow USERNAME
* You can add your friends cell phone numbers by sending an IM to the twitter@twitter.com contact and typing: add CELLNUMBER (where CELLNUMBER is their number)
* Alternately you may be the only person in your social circle who does IM, in which case you might only want to get particular types of news pushed at you. There are currently several Twitter accounts that will serve up various flavors of news. You can follow the BBC (there are a bunch of flavors, just search for ‘bbc’ in the Twitter search box), the Digg frontpage, CNN breaking news (again, type ‘cnn’ in the search box to get the most updated options). There’s no ESPN, or Fark or Slashdot yet (at least that I can find, but I’m sure they’re coming).
* The search box is your friend, use it. Also, don’t be shy about bouncing around and looking at who’s following who. You’ll discover a lot of interesting people out there. Just remember that you can always turn down or turn off the firehose of information if becomes overwhelming. To turn it off, just send this IM to Twitter: off And I bet you can guess how to turn it back on…

Habit Two: Put it where you want it
In it’s most basic form, Twitter is a webpage with a text box where you can enter a little message. But you don’t need to stop there.

* Once you’ve got your account set up, you can add twitter@twitter.com as a contact in whatever IM software you use (I use Adium on the Mac, and it’s awesome).
* Additionally, you can get your Twitter pushed to your phone, if that’s how you want it. Just check the appropriate bits in your devices settings. You can send SMS to 40404 if you’re in the US. Outside the US use +44 7781 488126
* If you use Gmail and GoogleTalk, you can add twitter@twitter.com as a contact and get your stuff there.
* If you’ve got a Blackberry, you should check out the Google Talk client and use it there. As if you needed another reason to fiddle with that thing in meetings.
* Same deal with Treos, actually. Any of the Treo IM clients will work with Twitter. If you must, just Google “Treo IM” and you’ll get a lot of leads. I’ve tried a few, but it’s really more than I can take. Good luck to you.
* On the Mac you’ve got a couple of standalone options. There’s Twitterific and Twitterpost. Regardless of which one you use, be sure to add the corresponding Twitter user to your friends list in Twitter. Twitterific is at http://twitter.com/twitterific and Twitterpost is at http://twitter.com/twitterpost. You’ll get updates on bugfixes, etc by adding them.
* If you’re rocking the PC you might check out Twitteroo. As with the Mac options, add the Twitteroo user as a friend in Twitter: http://twitter.com/twitteroo. There are no updates as I write this, but I’m sure that’ll change.
* If you want to take all this just a little bit slower, you can just grab the RSS feed for all the folks that you’re following, or for individual users. If you don’t IM, and you just want to follow a few people, you can grab their feeds–just go to their Twitter page and look at the bottom left corner for the RSS link. Subscribing to your own RSS feed (the one that contains all the twitters from the people you’re following) is a good idea. It acts as an archive of the content which can be quite helpful.

Habit Three: Own it
If you’re unafraid of spreading yourself around the web, be sure to claim your Twitter page with Technorati and expand your digital empire. If you don’t have a Technorati account, just sign up (free). Once you’ve got your account go to your blog settings and at the bottom of the page there’s a place to put in your blog url. This is just http://twitter.com/YOUR-USERNAME. Replace YOUR-USERNAME with, you know, your Twitter username. You’ll then be presented with two options for claiming your blog (which is just your Twitter home page). Choose the posting method. Copy the code and send it to Twitter via any of the methods described above. Done.

Habit Four: Address your followers
By default, when you send a note via Twitter it goes to everyone that’s following you. If you just want to send a note directly to someone, you can get them via the direct messages web interface. Additionally, if you’re twittering via IM, you can use the direct command to send private-ish messages to a contact. Just type: D USERNAME your message here. That’ll send a message directly to that person without bothering all your other followers. If you don’t mind hitting everyone else with a private message, you can publically address that person by following the standard “@” convention. That is, just type: @USERNAME: your message here. That way everyone knows who you’re talking to.

Habit Five: Hack it
People are doing interesting and innovative stuff with Twitter. Feel free to leave a comment if you know about cool Twitter hacks that everyone should know about!

* Check out this recent post over at Lifehack.org that covers “five ways to use Twitter for good.” I particularly like the ideas of friendsourcing and quick human answers.
* Also, various mashups of Twitter search results, RSS feeds and jedi tricks with stuff like Yahoo Pipes can result in some very interesting customized info streams. Check out Christopher S. Penn’s post on Twitter Power Tips.
* Use your skills to take your favorite RSS feed (or spliced feed) and HTTP POST (via API) to create a custom Twitter account that anyone can befriend.(hint: this can be a nice traffic builder…but only if you have the skillz)

Habit Six: Play with it

* Use the Firefox Search Plugin to post directly to your Twitter account (so cool)
*
* Use the Twitter confessional to get clean
* Participate in the Twitter fan wiki (tons more tools in there)
* View info about Twitterers by posting: whois USERNAME This will return the standard Twitter bio for that user
* Try Steve’s Twitter search engine
* Search for friends and coworkers. Heck, search for the guy that just sent you his resume.
* Check out the Twitter help page for more tips on commands. Also see this Library Clips post for more of the same.

So there you go. This is by no means a comprehensive list of Twitterliciousness, but I hope it gets you thinking in new directions, and helps give you a leg up on figuring out Twitter. As a final sign off, here’s a brief wishlist that I’d like to see built into Twitter:

* A way to create an account with an RSS feed for auto-posting (rather than having to work with the API)
* A decent integrated search engine
* Some kind of groups feature with privacy options. It’d be nice to have a Twitter group for my office. I know this can be hacked, but we are not a technical people at my office.

SOURCE

Powered by ScribeFire.

The Several Habits of Wildly Successful Twitter Users

Share/Save/Bookmark

On the Edge

February 6th, 2009 No comments

Op-Ed Columnist – On the Edge – NYTimes.com

A not-so-funny thing happened on the way to economic recovery. Over the last two weeks, what should have been a deadly serious debate about how to save an economy in desperate straits turned, instead, into hackneyed political theater, with Republicans spouting all the old clichés about wasteful government spending and the wonders of tax cuts.
Skip to next paragraph
Fred R. Conrad/The New York Times

Paul Krugman
Go to Columnist Page » Blog: The Conscience of a Liberal
Readers’ Comments

It’s as if the dismal economic failure of the last eight years never happened — yet Democrats have, incredibly, been on the defensive. Even if a major stimulus bill does pass the Senate, there’s a real risk that important parts of the original plan, especially aid to state and local governments, will have been emasculated.

Somehow, Washington has lost any sense of what’s at stake — of the reality that we may well be falling into an economic abyss, and that if we do, it will be very hard to get out again.

It’s hard to exaggerate how much economic trouble we’re in. The crisis began with housing, but the implosion of the Bush-era housing bubble has set economic dominoes falling not just in the United States, but around the world.

Consumers, their wealth decimated and their optimism shattered by collapsing home prices and a sliding stock market, have cut back their spending and sharply increased their saving — a good thing in the long run, but a huge blow to the economy right now. Developers of commercial real estate, watching rents fall and financing costs soar, are slashing their investment plans. Businesses are canceling plans to expand capacity, since they aren’t selling enough to use the capacity they have. And exports, which were one of the U.S. economy’s few areas of strength over the past couple of years, are now plunging as the financial crisis hits our trading partners.

Meanwhile, our main line of defense against recessions — the Federal Reserve’s usual ability to support the economy by cutting interest rates — has already been overrun. The Fed has cut the rates it controls basically to zero, yet the economy is still in free fall.

It’s no wonder, then, that most economic forecasts warn that in the absence of government action we’re headed for a deep, prolonged slump. Some private analysts predict double-digit unemployment. The Congressional Budget Office is slightly more sanguine, but its director, nonetheless, recently warned that “absent a change in fiscal policy … the shortfall in the nation’s output relative to potential levels will be the largest — in duration and depth — since the Depression of the 1930s.”

Worst of all is the possibility that the economy will, as it did in the ’30s, end up stuck in a prolonged deflationary trap.

We’re already closer to outright deflation than at any point since the Great Depression. In particular, the private sector is experiencing widespread wage cuts for the first time since the 1930s, and there will be much more of that if the economy continues to weaken.

As the great American economist Irving Fisher pointed out almost 80 years ago, deflation, once started, tends to feed on itself. As dollar incomes fall in the face of a depressed economy, the burden of debt becomes harder to bear, while the expectation of further price declines discourages investment spending. These effects of deflation depress the economy further, which leads to more deflation, and so on.

And deflationary traps can go on for a long time. Japan experienced a “lost decade” of deflation and stagnation in the 1990s — and the only thing that let Japan escape from its trap was a global boom that boosted the nation’s exports. Who will rescue America from a similar trap now that the whole world is slumping at the same time?

Would the Obama economic plan, if enacted, ensure that America won’t have its own lost decade? Not necessarily: a number of economists, myself included, think the plan falls short and should be substantially bigger. But the Obama plan would certainly improve our odds. And that’s why the efforts of Republicans to make the plan smaller and less effective — to turn it into little more than another round of Bush-style tax cuts — are so destructive.

So what should Mr. Obama do? Count me among those who think that the president made a big mistake in his initial approach, that his attempts to transcend partisanship ended up empowering politicians who take their marching orders from Rush Limbaugh. What matters now, however, is what he does next.

It’s time for Mr. Obama to go on the offensive. Above all, he must not shy away from pointing out that those who stand in the way of his plan, in the name of a discredited economic philosophy, are putting the nation’s future at risk. The American economy is on the edge of catastrophe, and much of the Republican Party is trying to push it over that edge.

View original post here:
On the Edge

Share/Save/Bookmark

Search Engine Journal

February 6th, 2009 No comments
February 3rd, 2009 by Ann Smarty | 15 Comments

<!–
submit_url = “http://www.searchenginejournal.com/promote-your-site-on-facebook/8385/”;

// –>

To me Facebook is the place that unites my relatives, friends and colleagues – people that might build a great community around my website. Here are some tools that can help you share you website with your Facebook buddies:

1. Networked Blogs application allows to create a community around your blog.

What you can do:

  • Verify your authorship by asking your friends to confirm it (send maximum 20 confirmation requests in a bulk):
  • Invite friends: you can invite all your friends at once (of course, they will have to sign up for the app to both confirm your authorship and join your blog network).
  • Add your blog details (blog name, URL, tagline and description, feed URL, up to 3 tags).

Networked blogs application

2. RSS connect application offers to add a blog or any RSS feed to your wall or boxes tab or create a fully customizable tab dedicated to your feeds.

You will be able to :

  1. Add a box to your profile wall;
  2. Add a box to your boxes tab;
  3. Create a customizable tab for your feed.

RSS connect application

3. Simplaris Blogcast (there are many similar apps but this one is my favorite) integrates an existing blog into Facebook “to increase your visibility and share your content with all your friends”. Make sure to configure it to make the profile links go to your site directly:

Simplaris Application - settings

4. “Share a link” option can be accessed from your profile page:

  • Specify the link;
  • Hit “preview”;
  • (Important) Choose the correct image;
  • Click “post”:

Share a link - Facebook

Check out this post for more awesome Facebook apps for doing business.

Excerpted from:
Search Engine Journal

Share/Save/Bookmark

Search Engine Journal : how to market your Facebook pages

February 6th, 2009 No comments
February 3rd, 2009 by Ann Smarty | 15 Comments

<!–
submit_url = “http://www.searchenginejournal.com/promote-your-site-on-facebook/8385/”;

// –>

To me Facebook is the place that unites my relatives, friends and colleagues – people that might build a great community around my website. Here are some tools that can help you share you website with your Facebook buddies:

1. Networked Blogs application allows to create a community around your blog.

What you can do:

  • Verify your authorship by asking your friends to confirm it (send maximum 20 confirmation requests in a bulk):
  • Invite friends: you can invite all your friends at once (of course, they will have to sign up for the app to both confirm your authorship and join your blog network).
  • Add your blog details (blog name, URL, tagline and description, feed URL, up to 3 tags).

Networked blogs application

2. RSS connect application offers to add a blog or any RSS feed to your wall or boxes tab or create a fully customizable tab dedicated to your feeds.

You will be able to :

  1. Add a box to your profile wall;
  2. Add a box to your boxes tab;
  3. Create a customizable tab for your feed.

RSS connect application

3. Simplaris Blogcast (there are many similar apps but this one is my favorite) integrates an existing blog into Facebook “to increase your visibility and share your content with all your friends”. Make sure to configure it to make the profile links go to your site directly:

Simplaris Application - settings

4. “Share a link” option can be accessed from your profile page:

  • Specify the link;
  • Hit “preview”;
  • (Important) Choose the correct image;
  • Click “post”:

Share a link - Facebook

Check out this post for more awesome Facebook apps for doing business.

View post:
Search Engine Journal : how to market your Facebook pages

Share/Save/Bookmark

How Google crawls the deep web

February 1st, 2009 No comments

A googol of Googlers published a paper at VLDB 2008, “Google’s Deep-Web Crawl” (PDF), that describes how Google pokes and prods at web forms to see if it can find things to submit in the form that yield interesting data from the underlying database.

An excerpt from the paper:

This paper describes a system for surfacing Deep-Web content; i.e., pre-computing submissions for each HTML form and adding the resulting HTML pages into a search engine index.

Our objective is to select queries for millions of diverse forms such that we are able to achieve good (but perhaps incomplete) coverage through a small number of submissions per site and the surfaced pages are good candidates for selection into a search engine’s index.

We adopt an iterative probing approach to identify the candidate keywords for a [generic] text box. At a high level, we assign an initial seed set of words as values for the text box … [and then] extract additional keywords from the resulting documents … We repeat the process until we are unable to extract further keywords or have reached an alternate stopping condition.

A typed text box will produce reasonable result pages only with type-appropriate values. We use … [sampling of] known values for popular types … e.g. zip codes … state abbreviations … city … date … [and] price.

Table 5 in the paper shows the effectiveness of the technique, that they are able to retrieve a significant fraction of the records in small and normally hidden databases across the Web with only 500 or less submissions to the form. The authors also say that “the impact on our search traffic is a significant validation of the value of Deep-Web content.”

Please see also my April 2008 post, “GoogleBot starts on the deep web“.

SOURCE

View post:
How Google crawls the deep web

Share/Save/Bookmark

the future of search

September 11th, 2008 No comments
9/10/2008 12:15:00 PM

The Internet has had an enormous impact on people’s lives around the world in the 10 years since Google’s founding. It has changed politics, entertainment, culture, business, health care, the environment and just about every other topic you can think of. Which got us to thinking, what’s going to happen in the next 10 years? How will this phenomenal technology evolve, how will we adapt, and (more importantly) how will it adapt to us? We asked 10 of our top experts this very question, and over the next three weeks we will present their responses. As computer scientist Alan Kay has famously observed, the best way to predict the future is to invent it, so we will be doing our best to make good on our experts’ words every day. – Karen Wickre and Alan Eagle, series editors.

I am a search addict. I’m naturally inquisitive – I’ve always liked finding things out. Plus, I’ve worked at Google on search for the past 9 years and 3 months. Of course I search – a lot. Yet I would guess that on any given day, I only do about 20% of the searches that I could. This past Saturday, I kept track of the things that came up in conversation that I wanted to search for right then but couldn’t:

Are “fab,” “goy” and “eely” words? (There was a Scrabble game going on.) What time does J.C. Penney open on Saturday? Which school has a team called the Banana Slugs? What is the team mascot for San Jose State? How much power does that hydroelectric dam generate? What do you call a group of turkeys? What time does Tropic Thunder show? What’s the name of that great Irish flute player, first name James? What’s the name of the largest city in Russia after Moscow and St. Petersburg? Which is older, a redwood or a cypress? What’s the oldest living thing and how old is it? Who sings “Queen of Hearts”? What kind of bird is that flying over there? Is the “LF” in San Francisco on Union Square or Union Street? What are the dance steps to the Charleston? What day of the week was The Lawrence Welk Show on? What are the lyrics to “In the Mood”? How does Coumadin differ from aspirin in its blood thinning effects? What was the story behind the naming of the number “googol”?

And those are just the ones that I remember. Looking at this list, two things are very clear: (1) I could do a lot more searches and (2) search still has a lot of opportunity for innovation, change, and progress. There are lots of ways that search will need to evolve in order to easily meet user needs. Let’s look at some of my unanswered questions from Saturday and consider how search might change over the next 10 years.

Modes
First, why couldn’t I do these searches right then, when I needed to? Because search still isn’t accessible enough or easy enough. Search needs to be more mobile – it should be available and easy to use in cell phones and in cars and on handheld, wearable devices that we don’t even have yet. For example, when the topic of the oldest living thing came up during a boat ride, everyone in the conversation was curious about it, but no one wanted to break out an awkward, slow device to do a search. It would be much nicer if we had a device with great connectivity that could do searches without interruption. One far-fetched idea: how about a wearable device that does searches in the background based on the words it picks up from conversations, and then flashes relevant facts?

This notion brings up yet another way that “modes” of search will change – voice and natural language search. You should be able to talk to a search engine in your voice. You should also be able to ask questions verbally or by typing them in as natural language expressions. You shouldn’t have to break everything down into keywords.

Further, why should a search be words at all? Why can’t I enter my query as a picture of the birds overhead and have the search engine identify what kind of bird it is? Why can’t I capture a snippet of audio and have the search engine identify and analyze it (a song or a stream of conversation) and tell me any relevant information about it? Services that do parts of that are available today, but not in an easy-to-use, integrated way.

In the next 10 years, we will see radical advances in modes of search: mobile devices offering us easier search, Internet capabilities deployed in more devices, and different ways of entering and expressing your queries by voice, natural language, picture, or song, just to name a few. It’s clear that while keyword-based searching is incredibly powerful, it’s also incredibly limiting. These new modes will be one of the most sweeping changes in search.

Media
Then there’s the media aspect. The 10 blue links offered as results for Internet search can be amazing and even life-changing, but when you are trying to remember the steps to the Charleston, a textual web page isn’t going to be nearly as helpful as a video. The media of the results matters.

Universal search, which we released last May, was an important first step that included images, videos, news, books, and maps/local information in our main Google search results. Yet our presentation is still very linear (the results are just a list) and even (no one result is more important or larger than the next). What if the results page began to transform radically to really harness these different types of results into something that felt much more like an answer rather than just 10 independent guesses? What if results pages pulled the best media together and laid it out such that the most useful content was not only first but largest? What if we laid out content in columns to use more of the width available on newer, wider screens?

We’ve barely scratched the surface with universal search, but it’s an important first step to exploring the full range of what we can do with rich media. For the past year, our goal has been to take advantage of these new types of results and evolve the interface design and user experience in response. You’ll see the fruits of this experimentation in the coming months, but even these changes are just the beginning. The face of search will change dramatically over the next 10 years. Maybe it should contain even more videos and images, maybe it should sharply differentiate the relative weight and accuracy of the results more, maybe it should be more interactive in terms of refinements? We’re not sure yet, but we do know that the one thing that the search experience can’t be – especially in the face of the online media explosion we’re currently experiencing – is stagnant.

Personalization
Search engines 10 years from now will be a lot better than the ones we have now. We know this because Google itself gets a little better each day. We’re constantly writing and revising new notions of search relevance, and we release improvements almost daily. Those improvements add up for us and for other search engines, so it follows that search engines 10 years from now will be markedly better. Therefore, the real question is not will search be better, but rather how will it be better?

One answer is clear: search engines of the future will be better in part because they will understand more about you, the individual user. Of course, you will be in control of your personal information, and whatever personal information the search engine uses will be with your permission and will be transparent to you. But even with the most rudimentary user information, search engines can and will provide drastically better search results. Maybe the search engines of the future will know where you are located, maybe they will know what you know already or what you learned earlier today, or maybe they will fully understand your preferences because you have chosen to share that information with us. We aren’t sure which personal signals will be most valuable, but we’re investing in research and experimentation on personalized search now because we think this will be very important later.

Location
Your location is one potentially useful facet of personalized information. Looking at my questions, the answers to a number of them (What time does J.C. Penney open? How much power does that hydroelectric dam generate? What time does Tropic Thunder play?) require the search engine to know that I was in Yankton, South Dakota and Crofton, Nebraska when I asked. Since location is relevant to a lot of searches, incorporating user location and context will be pivotal in increasing the relevance and ease of search in the future.

Social
Another element of personalization is social context. Who am I friends with, and how do I relate to them? How can I harness their knowledge more efficiently? For example, I have a friend who works at a store called LF in Los Angeles (hence, the question about LF in San Francisco). By itself, “LF” is a very ambiguous acronym. According to the first page of search results on Google, it could refer to my friend’s trendy fashion store, but it could also refer to Leapfrog Enterprises, low frequency, Lebhar-Friedman, Li & Fung Investment Group, LF Driscoll Construction Management, large format, or a future concept car design from Lexus. Today, the person typing “LF” has to figure out which is the right result – to “disambiguate” the ambiguous term – but this is something that the search engine needs to get better at. Perhaps we’ll understand the semantics of the question about where LF in San Francisco is, and infer that LF is a store. Or maybe, search could analyze my social graph and realize that one of my friends works at LF, that I saw that friend this weekend, and that in that context “LF” refers to her place of employment. Algorithmic analysis of the user’s social graph to further refine a query or disambiguate it could prove very useful in the future.

In addition, there are searches where actually asking a friend helps. I was having a hard time finding out the answer to the question about aspirin versus Coumadin because I was spelling it ‘cumitin’ and Google wasn’t correcting me. A quick email to a doctor friend, and I was back on the right track – equipped with the right spelling and his explanation of the difference, so I could search and learn even more about how these two drugs are used to thin blood. There’s a lot of expertise, knowledge, and context in users’ social graphs, so putting tools in place to make “friend-augmented” search easy could make search more efficient and more relevant.

Language
The above examples show how modes, media, and various forms of personalization have the potential to vastly improve search – but what about language? We know there are cases where an answer exists on the web, but not in a language you read. This is why Google is investing in machine translation. We want to be able to unlock the power of web search for anyone speaking any language. The basic concept is – if the answer exists online anywhere in any language, we’ll go get it for you, translate it and bring it back in your native tongue. This is an incredibly empowering idea that could really change the way that users experience the web and communicate with each other, particularly in languages where not a lot of native content is available. You can see our early explorations in this space here, by visiting our cross-language information retrieval tool.

Conclusion
We’re all familiar with 80-20 problems, where the last 20% of the solution is 80% of the work. Search is a 90-10 problem. Today, we have a 90% solution: I could answer all of my unanswered Saturday questions, not ideally or easily, but I could get it done with today’s search tool. (If you’re curious, the answers are below.) However, that remaining 10% of the problem really represents 90% (in fact, more than 90%) of the work. Coming up with elegant, fitting and relevant solutions to meet the challenges of mobility, modes, media, personalization, location, socialization, and language will take decades. Search is a science that will develop and advance over hundreds of years. Think of it like biology and physics in the 1500s or 1600s: it’s a new science where we make big and exciting breakthroughs all the time. However, it could be a hundred years or more before we have microscopes and an understanding of the proverbial molecules and atoms of search. Just like biology and physics several hundred years ago, the biggest advances are yet to come. That’s what makes the field of Internet search so exciting.

So what’s our straightforward definition of the ideal search engine? Your best friend with instant access to all the world’s facts and a photographic memory of everything you’ve seen and know. That search engine could tailor answers to you based on your preferences, your existing knowledge and the best available information; it could ask for clarification and present the answers in whatever setting or media worked best. That ideal search engine could have easily and elegantly quenched my withdrawal and fueled my addiction on Saturday. I’m very proud that Google in its first 10 years has changed expectations around information and how quickly and easily it should be able to be retrieved. But I’m even more excited about what Google search can achieve in the future.

And here, in order, are the answers to my Saturday questions.

Are fab, goy, and eely words? Yes, yes, and yes, according to Merriam-Webster:
Search: [fab site:m-w.com ]
Result: http://dev.m-w.com/dictionary/fab
Search: [goy site:m-w.com]
Result:
http://dev.m-w.com/dictionary/goy
Search:[eely site:m-w.com ]
Result:
http://dev.m-w.com/dictionary/eely

What time does J.C. Penney open on Saturday? 10 a.m.
Search: [jc penney yankton ]
Hours on results page:
http://www.google.com/search?q=jcpenney+yankton

Which school has a team called the Banana Slugs? University of California, Santa Cruz
Search: [banana slugs]
Result:
http://en.wikipedia.org/wiki/University_of_California,_Santa_Cruz

What is the team mascot for San Jose State? The San Jose State Spartans
Search: [san jose state mascot]
On results page:
http://www.google.com/search?q=san+jose+state+mascot

How much power does that hydroelectric dam generate? $35M of electricity annually
Search: [hydroelectric dam crofton yankton]
Search: [gavins point dam]
Result:
https://www.nwo.usace.army.mil/html/Lake_Proj/gavinspoint/welcome.html

What do you call a group of turkeys? A rafter of turkeys
Search: [group of turkeys]
On results page:
http://www.google.com/search?q=group+of+turkeys

What time does Tropic Thunder show? 7 p.m.
Search: [movies yankton mall]
Result:
http://www.moviefone.com/theater/carmike-cinemas-yankton-mall-5/9346/showtimes

What’s the name of that great Irish flute player, first name James? James Galway
Search: [irish flute player james]
On results page:
http://www.google.com/search?q=irish+flute+player+james

What’s the name of the largest city in Russia after Moscow and St. Petersburg? Novobirsk
Search: [largest Russian cities]
Result:
http://en.wikipedia.org/wiki/List_of_cities_and_towns_in_Russia_by_population

What’s older, a redwood or a cypress? Cypresses (4500 years old is oldest known) are older than redwoods (2200 years old is oldest known)
Search: [cypress tree age]
Result:
http://www.payvand.com/news/08/apr/1253.html
Search: [redwood tree age]
Result:
http://www.sempervirens.org/sequoiasemp.htm

What’s the oldest living thing and how old is it? The bristlecone pine, living for 5,000-11,000 years
Search: [oldest living thing]
Result:
http://waynesword.palomar.edu/ww0601.htm
http://hubpages.com/hub/Oldest_living_thing

Who sings “Queen of Hearts”? Juice Newton
Search: ["queen of hearts" song]
On results page:
http://www.google.com/search? =%22queen+of+hearts%22+song

What kind of bird is that flying over there? A turkey vulture
Search: [turkey vulture flying] on Google image search
Pictures that match on results page:
http://images.google.com/images?q=turkey%20vulture%20flying

Is the LF in San Francisco on Union Square or Union Street? 1870 Union Street
Search: [lf san francisco]
Address on results page:
http://www.google.com/search?q=lf+san+francisco

What are the dance steps to the Charleston? Show in video below
Search : [Charleston dance demonstration]
Video result:
http://uk.youtube.com/watch?v=zzyg7l6qxNQ

What day of the week was The Lawrence Welk Show on? Saturday
Search: [lawrence welk show]
Result:
http://en.wikipedia.org/wiki/The_Lawrence_Welk_Show

What are the lyrics to “In the Mood”?
“In the mood, that’s what he told me,
In the mood, and when he told me,
In the mood, my heart was skippin’,
It didn’t take me long to say “I’m in the mood now”.”
Search: [“in the mood” lyrics]
Result:
http://www.lyricsdepot.com/glenn-miller/in-the-mood.html

How does Coumadin differ from aspirin in its blood thinning effects? Aspirin is an anti-platelet agent that prevents clotting. Coumadin also prevents clotting but the mechanism is different. Both thin the blood, but Coumadin is stronger and much more effective in certain instances like atrial fibrillation.
Search: [aspirin Coumadin how different]
Result:
http://www.stmaryhealthcare.org/body.cfm?id=250

Go here to read the rest:
the future of search

Share/Save/Bookmark

Former Employees of Google Prepare Rival Search Engine

July 29th, 2008 No comments

SAN FRANCISCO — In her two years at Google, Anna Patterson helped design and build some of the pillars of the company’s search engine, including its large index of Web pages and some of the formulas it uses for ranking search results.

Skip to next paragraph

The makers of the Cuil search engine say it should provide better results and show them in a more attractive manner.

Now, along with her husband, Tom Costello, and a few other Google alumni, she is trying to upstage her former employer.

On Monday, their company, Cuil, is unveiling a search engine that they promise will be more comprehensive than Google’s and that they hope will give its users more relevant results.

“I think it will be better,” Mr. Costello said in an interview. “But there is no question that the public has to decide.”

Cuil, pronounced “cool,” is only the latest in a long string of start-up companies that have been founded and financed with the goal of competing with Google, as well as Yahoo and Microsoft. (In June, Google accounted for 61.5 percent of search queries in the United States, while Yahoo held 20.9 percent and Microsoft had 9.2 percent, according to comScore.) Some of the most prominent include Powerset, which Microsoft recently bought, and Wikia, which was founded by Jimmy Wales, one of the creators of Wikipedia. So far, none have managed to make a dent in the search market.

But some analysts say Cuil has potential, in part because of the pedigree of its founders.

“This is the most promising thing I’ve seen in a while,” said Danny Sullivan, who has followed the online search business for more than a decade and is the editor of Search Engine Land. “Whether they are going to threaten Microsoft, much less Google, that’s another story.”

Mr. Costello, a former researcher at Stanford, said that with 120 billion Web pages, Cuil’s search index is larger than any other. The company uses a form of data mining to group Web pages by content, which makes the search engine more efficient, he said. Instead of showing results as short snippets of text and images with links, it displays longer entries and uses more pictures. It also provides tools to help users further refine their queries.

Google would not comment on Cuil and would not disclose the size of its own index. But in an e-mail statement, Google said that it maintained “the largest collection of documents searchable on the Web” and welcomed competition.

Mr. Sullivan said he was unimpressed by Cuil’s claim that its index includes more Web pages, noting that that could mean users are “overwhelmed by a whole bunch of junk.” But he said that Cuil’s new approach to ranking pages and presenting results could prove to be a hit with some users.

“If it turns out that they have good relevancy, I could see that the word of mouth” would bring Cuil some popularity, he said.

Ms. Patterson left Google in 2006 to found Cuil. The new company has other prominent ex-Google employees, including Russell Power, who worked with Ms. Patterson on the large Google index, and Louis Monier, a former chief technology officer at AltaVista, a pioneering search engine. Cuil, which has about 30 employees and is in Menlo Park, Calif., has raised $33 million from venture investors.

Continued here:
Former Employees of Google Prepare Rival Search Engine

Share/Save/Bookmark