background preloader

Deep Web Research 2012

Deep Web Research 2012
Bots, Blogs and News Aggregators ( is a keynote presentation that I have been delivering over the last several years, and much of my information comes from the extensive research that I have completed over the years into the "invisible" or what I like to call the "deep" web. The Deep Web covers somewhere in the vicinity of 1 trillion plus pages of information located through the world wide web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find hundreds of billions of pages at the present time of this writing. In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, ppt, .ps. and others. This Deep Web Research 2012 report and guide is divided into the following sections: Bot Research Related:  Search Skills

Google Chrome Tips The Deep Web and Open Source Intelligence (OSINT): Two Peas in a Pod At BrightPlanet we often throw around the acronym OSINT and talk about open source intelligence but what is it, what/if anything does it have to do with the Deep Web and how is it being used? We answer those questions in this post. What is OSINT? For the purposes of this post, we’ll keep the definition of OSINT at a high level. If you want to dig deeper into OSINT, check out a past post of ours focused on OSINT. The term OSINT was first, and still is, employed by government agencies to refer to any unclassified, publicly-available information. The Deep Web and OSINT If you are looking to get your hands on open source intelligence, look no further than the Deep Web. If you want to go beyond the Surface Web information found by Google search, you need to utilize a Deep Web harvest to get more open source intelligence. We often use the example of grants.gov. Using OSINT Organizations often have a handle on the data inside their organization but what about outside of it? Photo: Paul Joseph

11 Unknown Ways Of Using Google Search - Curious Mob Thinking what more is there to know about Google search? I mean its Google search after all, type whatever you want to search, press enter and everything in the world related to your topic is displayed in front of your eyes. But believe it or not the search engine has plenty of tricks up its sleeve. Here’s an overview of 11 Google Tricks That Will Change the Way You Search: 11. One well-known, simple trick while searching a phrase in quotes is that it will yield only pages with the same words in the same order as what’s in the quotes. Brandieself added: 25 Sneaky Online Tools and Gadgets to Help You Spy on Your Competitors Even before you entered into the world of “business”, you were watching your competition. Whether it was in a classroom or on a sports team, you not only wanted to keep up, you wanted to know where the marker was set so you could go one step further. It was about finding new opportunities and setting new goals based on someone you aspired to beat. At this time, when search is so important and detailed, and the Internet has grown so extensively, you have tons of different factors to consider when spying on your competition. This is where marketing tools come into play. In many cases, tools that help you monitor your own web performance also can help you gather data on your competition. 1. This is a very simple and easy-to-use tool that will send reports right to your inbox. Best Ways to Use This Tool: Get competitors’ backlinksMonitor social (or other website) mentions of your companyMonitor keyword mentions Price: Free 2. 3. This is a tool that’s all about Twitter. 4. 5. 6. 7. 8. 9. 10.

Verification Handbook for Investigative Reporting Craig Silverman is the founder of Emergent, a real-time rumor tracker and debunker. He was a fellow with the Tow Center for Digital Journalism at Columbia University, and is a leading expert on media errors, accuracy and verification. Craig is also the founder and editor of Regret the Error, a blog about media accuracy and the discipline of verification that is now a part of the Poynter Institute. He edited the Verification Handbook, previously served as director of content for Spundge, and helped launch OpenFile, an online local news startup that delivered community-driven reporting in six Canadian cities. Craig is also the former managing editor of PBS MediaShift and has been a columnist for The Globe And Mail, Toronto Star, and Columbia Journalism Review. He tweets at @craigsilverman. Rina Tsubaki leads and manages the "Verification Handbook" and "Emergency Journalism" initiatives at the European Journalism Centre in the Netherlands.

Research Beyond Google: 119 Authoritative, Invisible, and Compre Got a research paper or thesis to write for school or an online class? Want to research using the Internet? Good luck. There’s a lot of junk out there — outdated pages, broken links, and inaccurate information. Using Google or Wikipedia may lead you to some results, but you can’t always be sure of accuracy. And what’s more, you’ll only be searching a fraction of all of the resources available to you. Google, the largest search database on the planet, currently has around 50 billion web pages indexed. Do you think your local or university librarian uses Google? Topics Covered in this Article Deep Web Search Engines | Art | Books Online | Business | Consumer | Economic and Job Data | Finance and Investing | General Research | Government Data | International | Law and Politics | Library of Congress | Medical and Health | STEM | Transportation Deep Web Search Engines To get started, try using a search engine that specializes in scouring the invisible web for results. Art Books Online Business

How to search like a spy: Google's secret hacks revealed The National Security Agency just declassified a hefty 643-page research manual called Untangling the Web: A Guide to Internet Research (PDF) that, at least at first, doesn't appear all that interesting. That is, except for one section on page 73: "Google Hacking." "Say you're a cyberspy for the NSA and you want sensitive inside information on companies in South Africa," explains Kim Zetter at Wired. "What do you do?" Well, you could type the following advanced search into Google — "filetype:xls site:za confidential" — to uncover a trove of seemingly private spreadsheets. These are just two examples of the numerous private files that are inadvertently uploaded to the Internet, and can be accessed if you know the right Google search terms. Here are a few more: Pretty neat, huh? And even if keyboard espionage isn't really your thing, the document contains a number of practical tips anyone can use to become a better Googler: * Repeating a word will help you find more relevant hits.

Deep Web Search Engines Where to start a deep web search is easy. You hit Google.com and when you brick wall it, you go to scholar.google.com which is the academic database of Google. After you brick wall there, your true deep web search begins. To all the 35F and 35G’s out there at Fort Huachuca and elsewhere, you will find some useful links here to hone in on your AO. If you find a bad link, Comment the link below. Last updated July 12, 2016 – updated reverse image lookup. Multi Search engines Deeperweb.com – (broken as of Sept 2016, hopefully not dead) This is my favorite search engine. Surfwax – They have a 2011 interface for rss and a 2009 interface I think is better. www.findsmarter.com – You can filter the search by domain extension, or by topic which is quite neat. Cluster Analysis Engine TouchGraph – A brilliant clustering tool that shows you relationships in your search results using a damn spiffy visualization. Yippy.com – A useful, non-graphical clustering of results. Speciality Deep Web Engines General

Pathways | Finding | Effective searching | Being Digital | Open University Library Services When you select a pathway, you will see a number of activities on a particular theme. Pathways allow you to develop a deeper understanding of a topic. You can work through the activities in your chosen pathway in any order. Activities will open in a new tab or window. The icon next to each activity helps you to identify the format used (e.g. activity, video, audio, or external resource). Viewing all pathways This is a list of all the pathways available. Assess your skills Assess your familiarity and confidence with online tools and environments and find out which activities can help you develop your skills further. Start pathway Avoiding plagiarism Learn to recognise what plagiarism is, the forms it can take and how to avoid it by developing your skills. Start pathway Communicating online How can you ensure your interactions with others online are appropriate and effective? Start pathway Effective searching Start pathway Exploring your information landscape Start pathway Keeping up-to-date Using

99 Resources to Research & Mine the Invisible Web College researchers often need more than Google and Wikipedia to get the job done. To find what you're looking for, it may be necessary to tap into the invisible web, the sites that don't get indexed by broad search engines. The following resources were designed to help you do just that, offering specialized search engines, directories, and more places to find the complex and obscure. Search Engines Whether you're looking for specific science research or business data, these search engines will point you in the right direction. Turbo10: On Turbo10, you'll be able to search more than 800 deep web search engines at a time. Databases Tap into these databases to access government information, business data, demographics, and beyond. GPOAccess: If you're looking for US government information, tap into this tool that searches multiple databases at a time. Catalogs If you're looking for something specific, but just don't know where to find it, these catalogs will offer some assistance. Directories

6 common misconceptions when doing advanced Google Searching As librarians we are often called upon to teach not just library databases but also Google and Google Scholar. Unlike teaching other search tools, teaching Google is often tricky because unlike library databases where we can have insider access through our friendly product support representative as librarians we have no more or no less insight into Google which is legendary for being secretive. Still, given that Google has become synonymous with search we should be decently good at teaching it. I've noticed though, often when people teach Google, particularly advanced searching of Google, they fall prey to 2 main types of errors. The first type of error involved not keeping up to date and given the rapid speed that Google changes, we often end up teaching things that no longer work. The second type of error is perhaps more common to us librarians. Also the typical Google search brings back estimated count of results. e.g. The 6 are 1. About tilde (~) About plus operator (+) 2. 3. 4. 5. 6. 7.

100 Useful Tips and Tools to Research the Deep Web By Alisa Miller Experts say that typical search engines like Yahoo! and Google only pick up about 1% of the information available on the Internet. The rest of that information is considered to be hidden in the deep web, also referred to as the invisible web. So how can you find all the rest of this information? This list offers 100 tips and tools to help you get the most out of your Internet searches. Meta-Search Engines Meta-search engines use the resources of many different search engines to gather the most results possible. SurfWax. Semantic Search Tools and Databases Semantic search tools depend on replicating the way the human brain thinks and categorizes information to ensure more relevant searches. Hakia. General Search Engines and Databases These databases and search engines for databases will provide information from places on the Internet most typical search engines cannot. DeepDyve. Academic Search Engines and Databases Google Scholar. Scientific Search Engines and Databases

Related: