Why is searching on the web so retarded?

If it's not ZDoom, it goes here.

Re: Why is searching on the web so retarded?

Postby gramps » Mon Apr 15, 2019 7:18 pm

Caligari87 wrote:If Google negates your "exact search in quotations" it's because it legitimately couldn't find that result, meaning usually that it hasn't indexed whatever page you think it might be on.


gramps wrote:Here's an interesting experiment: Try searching "bud big" -- you'll get pages and pages of "big bud" with no "bud big" in sight. You'd think the words "bud big" had never appeared on the web in that order.

But try searching "bud big red" - suddenly you'll see "Bud Big Red Box 24," "my best Bud Big Red," and so on.


As you can see here, it has indexed pages with "bud big," but doesn't return them when searching for that term (at least, not before returning pages and pages of "big bud").

It's more than just not having indexed pages. It has decided that "big bud" is a common phrase, and "bud big" is not (which is true). So it decides I must actually have meant to search for "big bud" (which is not necessarily true). This is not how exact search always behaved -- it may have always thrown out symbols when indexing stuff, but this transposition thing is a new problem.

If any amount of advanced searching or search operators will bring back exact search, I'd love to hear about it. Exclusion gets you maybe halfway there but has its own issues.
gramps
 
Joined: 18 Oct 2018

Re: Why is searching on the web so retarded?

Postby Caligari87 » Tue Apr 16, 2019 5:01 am

but "bud big" in quotes does return exact results: https://www.google.com/search?q=%22bud+big%22



So far as I know, symbols and punctuation have always been excluded from exact text searches.

8-)
User avatar
Caligari87
I'm just here for the community
User Accounts Assistant
 
Joined: 26 Feb 2004
Location: Salt Lake City, Utah, USA
Discord: Caligari87#3089

Re: Why is searching on the web so retarded?

Postby gramps » Tue Apr 16, 2019 8:55 am

Maybe that search is misleading, try this one:

"blue green red" monitor

It does an alright job of finding blue, green, red in that order, but there are also results where those words never appear in that order. The sixth one down for me (using private browsing / incognito window) is a link to Reddit where those words never appear in that order. There are more on the next few pages of results. The majority of the results are "blue green red" but there are also results that aren't that, only "red green blue," and these are on top of more "blue green red" results.

It's the same for "bud big," once you get past the first page or so of "big bud big bud" matches, it's just finding "big bud" straight out, where we already know from specially crafted searches there are more "bud big" it hasn't shown. This word transposition is definitely a thing now, and it's annoying.

This page for advertisers gives an idea of what Google's notion of "exact search" is: https://support.google.com/google-ads/a ... 7825?hl=en

Of course we don't know if they use the exact same magic sauce for organic search results, but it sure looks like something close. Advertisers have taken to using that same exclusion trick, going to great lengths to automate the process of generating large lists of exclusions by creating scripts to generate all possible transposition combinations and so on.
gramps
 
Joined: 18 Oct 2018

Re: Why is searching on the web so retarded?

Postby kb1 » Fri Apr 26, 2019 2:45 am

Here's the problems, as I see them:
* There's too much data out there
* Too many people *want* Google to help them. These are the people that put spaces at the beginning of text fields, can't tell the difference between a comma and a semicolon, and click on every "your PC is infected" link they see.
* My feeling is that they are using some incredibly complex methods to speed up search. Searches are massively multi-processed, and they must be lightning fast. Multiple-depth searches are costly.

If you think of how common certain search criteria is:
* Single words are extremely common.
* Groups of certain words are common, and probably searched as if they were one word. My guess is that they rearrange words with an algorithm that matches how pages are indexed.

For example, imagine if they did something as retarded as: rearranged words in alphabetical order.
"Discussion about ZDoom" becomes "about discussion zdoom". Then, they drop useless words, like "about". That leaves "discussion zdoom". If they indexed the page this way, the page could be found by searching for:
* "Discussion about ZDoom"
* "ZDoom Discussion"
* "About ZDoom Discussion"
* "Discussion"

They may also convert all "discussion" words into the root: "discussion, discussing, discusses, discussed" into "discuss". Then, you could search "Discuss ZDoom" and find this page.

A secondary page search may try harder to match your exact phrase, and bump up the ranking. There's hundreds of tricks like this, and Google uses lots of them. Searching for your exact phrase can happen in any stage of a multiple stage process...who knows which stage they'll use for your particular phrase? Obviously not the first stage :)

It is actually pretty amazing just how good it does do, at the speed it does it. But, yes it does have tons of flaws. Finding an exact phrase requires that each word found by Google is indexed, in order, for every page, and stored in a database. This is in addition to all the other types of indexes that need to be stored. Maybe, instead, it stores data using some sort of word tree (all pages about "discuss", cross-referenced with all pages about zdoom, doom, gzdoom, boom, etc.)

Summary
Exact Phrase Search is probably pretty low on their priorities, and it may be technically difficult to accomplish. Does anybody believe this phrase: "About 5,410,000,000 results (0.51 seconds)"? 5.4 billion hits for the phrase 'cat'? That would take 43 gigabytes just to store NUMERIC pointers to that many hits/webpages (assuming 64-bit). I have a hard time believing that it figured that out in 1/2 second. Some fakery is in play here...

Exact Phrase Search sounds like a good product :lol:
kb1
 
Joined: 11 Oct 2012

Re: Why is searching on the web so retarded?

Postby Rachael » Fri Apr 26, 2019 9:54 am

If you know *anything* about DB's and hash tables, it should not be the least bit surprising that the searches come up so quickly. You can easily categorize words such that the first 3 letters of each word each become a distinctive subcategory, resulting in a total of 17,576 distinctive subcategories, even in just plain English. That's a *LOT* easier to search through than the thousands of words available in the English language, including those that are not officially recognized by Oxford or whatever other control freak standards authority that wants us to all use "proper English".

They then track the pages which get the most hits and those are the results that get returned first.
User avatar
Rachael
Webmaster
 
Joined: 13 Jan 2004
Discord: Rachael#3767
Twitch ID: madamerachelle
Github ID: madame-rachelle
Graphics Processor: nVidia with Vulkan support

Re: Why is searching on the web so retarded?

Postby kb1 » Fri Apr 26, 2019 8:24 pm

People that do know something about search might trivialize what's being done by citing the various technologies that might be at work. People that know more about it realize the depth of the problem and the sheer amount of structuring and planning involved in setting up something on the scale required to make it work as well as it does. Of course various techniques are being used to make it happen - it simply would not work otherwise. None of that lessens the scope of what is being done.

And, no, I do not believe that any one of the popular search engines will happily present me with 8 billion results, page by page, as I click Next. To simply store those results for each user, for each search would require a staggering amount of memory/drive space. Yes, it could be done, and it could be done incrementally. But, again, I don't believe the time and effort required to make that work would be a priority.

You state that they track the pages with the most hits and return those first. That suggests a sort being done on a Rank field...on 8 billion records. Do you still claim that it's not "the least bit surprising"? I find it discouraging how things which should be considered bewilderingly amazing are so easily trivialized. So, yes, people that know "anything" find the technology amazing.
kb1
 
Joined: 11 Oct 2012

Re: Why is searching on the web so retarded?

Postby Rachael » Fri Apr 26, 2019 10:10 pm

I don't know why you're so marveled at it. It's a mix of well learned programming techniques over the decades and massive data centers. No single computer can handle as many queries as Google processes, but there are many of data centers around the world with what's reputed to be around at least 2.5 million servers.

Put two and two together - a single server is only serving maybe a couple thousand requests at most per minute thanks to load balancing techniques, and plus some decent database software, and you have a relatively decent search engine that can handle a heavy demand.

Also there's videos of Youtube showing off one of Google's data centers. It's pretty much how you might imagine one - except with better lighting than most companies use.

Really - it's not that amazing what Google is doing. What's more amazing is that they knew in advance the ways they would need to mitigate the demand of millions upon millions of search queries.

Also - if you scroll to the end of the search results list - it's not always the "8 million" they advertise.
User avatar
Rachael
Webmaster
 
Joined: 13 Jan 2004
Discord: Rachael#3767
Twitch ID: madamerachelle
Github ID: madame-rachelle
Graphics Processor: nVidia with Vulkan support

Re: Why is searching on the web so retarded?

Postby kb1 » Sat Apr 27, 2019 2:52 am

I don't understand how someone like you, who obviously does understand some of the technology, does not marvel at what's happening. I can appreciate the scope of what is happening, and I am grateful and amazed at what has been accomplished. The sheer number of components involved is staggering. Just being able to transmit the search query and receive results is utterly amazing, and involves dozens of innovations, invented by hundreds...no, thousands of engineers over many decades. None of it is automatic, or free. Every algorithm, every protocol, every schematic, every plan was built, and revised, studied, profiled, measured, and tweaked, over and over, over many years.

Do you have any type of estimate of how many bits are being read and written to serve a single request? Here's some technologies you didn't mention:
Encryption/decryption, compression/decompression, digital-to-analog, analog-to-digital, error-correction, handshaking, transmission protocols, linked lists, b-trees, caching, billions of transistors on a chip the size of a postage stamp, getting that signal from your bedroom to anywhere on planet Earth, and back. I've just scratched the surface. It's not a mix of well-learned techniques - it's a chorus of intertwined ideas and innovations from a massive number of genius minds, built empirically from dozens of varied disciplines, over many years, all working in unison, in harmony. Every man, woman, and child, working 24/7 could not produce the same results of a single search in 100 years.

But I get it - it's easy to build a hash table, and find a string quickly. And, it's easy to go buy a couple of PCs, slap a DB engine on one, and a browser on the other, write some "glue" code, and get a search working. Maybe that's not so amazing to many people. But this is possible, because all of those "black boxes" are already there, and have been made to work. What's happening inside that makes it all work *is* very fascinating. And, believe me, Google is doing a whole lot more than routing a request to a DB engine sitting in a server farm.

I don't know - I don't think I ever want to get to the stage where I've lost the ability to be fascinated watching a well-oiled engine run fast and lean - that's all I'm saying. Just rolling out an update is probably a massive process. Those guys deserve a lot of credit for what they've accomplished - there's nothing trivial about it. I would love to see a functional schematic on the process as a whole, going from high to low level. I think you might be surprised to see it as well.
kb1
 
Joined: 11 Oct 2012

Previous

Return to Off-Topic

Who is online

Users browsing this forum: jdredalert and 2 guests