programming4us
programming4us
WEBSITE

The Invisible Web (Part 1) - Deep searching & InfoMine

- Free product key for windows 10
- Free Product Key for Microsoft office 365
- Malwarebytes Premium 3.7.1 Serial Keys (LifeTime) 2019

David Hayward goes deep, deep undercover. Into the hidden depths of the invisible web

Description: Description:  The Deep Web (also called Deepnet, the invisible Web, DarkNet, ...

The Deep Web (also called Deepnet, the invisible Web, DarkNet, ...

‘How on earth do we access the invisible areas of the internet?’

Many of us are quite content to go about our business on the web, happily searching away, safe in the knowledge that pretty much everything we’re looking for is right at our finger tips.

However, despite how web savvy you may think you are, you’re only scratching the surface of the World Wide Web. Much like the analogy of the iceberg, the web only shows us its upper third portion, while below, the other two thirds exist in a dark and hidden realm, known collectively as the invisible web.

The term ‘invisible web’ was first coined by Like Bergman the founder of BrightPlanet, a site dedicated to harvesting this hidden invisible web – or deep web as it’s also known.

Basically, the invisible web is classed as the area of the internet that’s not scanned by the web-bots, spiders and indexing tools of the likes of Google, Yahoo! or any of the other popular search engines. These normal indexing agents are capable of grabbing content via the many hyperlinks, keywords and other methods that make up the surface web, or the area of the internet that we normally surf on an average day.

However, when we talk about the invisible web we’re referring to the likes of scientific journals, minutes taken during meetings, vast databases of libraries, public-fronted intranets of companies – the list goes on. In fact, it’s estimated that the size of the surface web is in the region of 300TB, whereas the estimated size of the invisible web is 98PB (that’s petabytes, by the way) and growing at a considerable rate. In 1997 the Library of Congress was estimated to have close to 4PB of ‘invisible’ data within its searchable, queried databases.

How on earth do we access these invisible areas of the internet, where the web crawlers fear to tread? It’s relatively easy, as it happens.

Deep searching

Description: Description: Search the deep web

Search the deep web


What we need to start with is a search engine that can collate and penetrate deep into the invisible web, of which there are many. Some are used for the greater good, providing us with the latest scientific knowledge and theses, while others are used by unscrupulous individuals who inhabit the darkened recesses of the internet and host sites that will not be mentioned in this magazine. Obviously, it’s the former we’ll be looking at (leaving the latter to the business end of a stout stick) here, and to do so we’ll have a look at a variety of different, unique, search engines that allow us to delve into the invisible web.

InfoMine

 

Description: Description: Infomine, a well-made search engine that’s pretty accurate, when trawling the invisible web

Infomine, a well-made search engine that’s pretty accurate, when trawling the invisible web


The first of these are the ‘Scholarly Internet Resource Collections’ known as InfoMine. A virtual library of resources relevant to scientists, students and researchers, InfoMine can trawl the otherwise hidden databases, e-journals, e-books, bulletin boards, mailing lists and online library card catalogues for information that would normally be well and truly obscured within the traditional confines of a surface Google search.

InfoMinehas been built by librarians from the University of California, Wake Forest University, the California State University and the University of Detroit using a range of web crawlers and metadata assignment. The web crawlers used in this project are the NalandaiVia Focused Crawler, the InfoMine Virtual Library Crawler and the InfoMine Automatic Focused Crawler, with each of these crawlers targeting a different arm of the internet’s available resources within the iVia software project (ivia.urc.edu).

The metadata assignment modules classify fields from the incredibly powerful Library of Congress databases, with some clever algorithms involving keyphrase identification and extraction. All in all, the effect is both stunning and vast – see for yourself by pointing your browser to infomine.urc.edu. As an example, enter the following into the search bar: Manhattan Project. The results shown range from a scientific journal of the physics behind the first nuclear weapon to other papers concerning the cost of nuclear weaponry to the US, congressional research reports, technical information from OSTI (Office of Scientific and Technical Information) and the Hiroshima Archives.

It’s fascinating stuff, ideal for those who are trying to drill down to more specific and not often viewed information.
Other  
 
Top 10
Free Mobile And Desktop Apps For Accessing Restricted Websites
MASERATI QUATTROPORTE; DIESEL : Lure of Italian limos
TOYOTA CAMRY 2; 2.5 : Camry now more comely
KIA SORENTO 2.2CRDi : Fuel-sipping slugger
How To Setup, Password Protect & Encrypt Wireless Internet Connection
Emulate And Run iPad Apps On Windows, Mac OS X & Linux With iPadian
Backup & Restore Game Progress From Any Game With SaveGameProgress
Generate A Facebook Timeline Cover Using A Free App
New App for Women ‘Remix’ Offers Fashion Advice & Style Tips
SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
- Messages forwarded by Outlook rule go nowhere
- Create and Deploy Windows 7 Image
- How do I check to see if my exchange 2003 is an open relay? (not using a open relay tester tool online, but on the console)
- Creating and using an unencrypted cookie in ASP.NET
- Directories
- Poor Performance on Sharepoint 2010 Server
- SBS 2008 ~ The e-mail alias already exists...
- Public to Private IP - DNS Changes
- Send Email from Winform application
- How to create a .mdb file from ms sql server database.......
programming4us programming4us
programming4us
 
 
programming4us