Forum Activity for @soaringeagle

soaringeagle
@soaringeagle
01/27/17 09:46:06AM
3,304 posts

weird sitemap crawl results after urlscanner update


Installation and Configuration

and i apprecuate that your efforts were far and beyond what ning offered and i have high praise for it
it lacked a little of the fine tuning i got used to
otherwise it was a great improvement
i a going to test it again because an on server solution is by far superiour then crawling over the net

what i willdo is activate it on fredomswings then i can email you both sets of sitemaps as a comparison
soaringeagle
@soaringeagle
01/27/17 09:42:36AM
3,304 posts

weird sitemap crawl results after urlscanner update


Installation and Configuration

i can get you those malformed urls but will take time
the easiest way would be to run a crawl without the exclusions and find them that way
i was looking around randomly but got distracted
give me a lil time to find a few
i'll also compare them to whats on youtube itself to verify the urls arent being imported already currupted

i'll try to do as much of the debufgging as i can before sending ya the info
soaringeagle
@soaringeagle
01/27/17 09:27:22AM
3,304 posts

weird sitemap crawl results after urlscanner update


Installation and Configuration

google can follow a urk ofcourse but can fgoogle tell what url it shoud be fillowing/ how often?
example
ning had a 'share this" page for every page of content on the site
without a sitemap these share pages often ranked higher then the pages they were meant to share
google got lost crawling share pages and indexed a million of them
changing the priority of them to 0 and change freq to never drasticly decrased how often they were crawled and whether they were indexed or ranked above the actual pages

and like i said in the case of a calander of events google bot can crawl millions of empty useless pages, looking for events from 1900 to 2525
googlebots doing its job following links but doesnt do it efficiently ..it gets lost in useless pages and doesnt get to the ones that are important

case in point i have dating.dreadlockssite.com the old version had a calander
for weeks and weeks i watched the bot activity.. every last bot was lost in the calander ..constantly, inly once every few days did i see any other page on the site being crawled
consequently it took months before any pages of importance were indexed
soaringeagle
@soaringeagle
01/27/17 09:18:27AM
3,304 posts

weird sitemap crawl results after urlscanner update


Installation and Configuration

its also to guide them through your site assigning importance to some ages recomending others be ignored etc
but yes it does happen
while on ning i wrote a post on sitemap use and why nings sitemaos were useless
i had about 30 people impliment proper sitemaps and in 2 weeks they saw 50% increase in pages indexed and trafic
after a ciuple months most were up by 200%

a proper sitemap including all urls is the second thing seo experts check for
in seo scoring on automated seo checks it ranks up there in importance almost as high as proper and unique titles
soaringeagle
@soaringeagle
01/27/17 09:10:02AM
3,304 posts

weird sitemap crawl results after urlscanner update


Installation and Configuration

Benefits to using a xml sitemap
The first set of benefits revolve around being able to pass extra information to the search engines.

Your sitemap can list all URLs from your site. This could include pages that aren't otherwise discoverable by the search engines.
Giving the search engines priority information. There is an optional tag in the sitemap for the priority of the page. This is an indication of how important a given page is relevant to all the others on your site. This allows the search engines to order the crawling of their website based on priority information.
Passing temporal information. Two other optional tags (lastmod and changefreq) pass more information to the search engines that should help them crawl your site in a more optimal way. "lastmod" tells them when a page last changed, and changefreq indicates how often the page is likely to change.

Being able to pass extra information to the search engines *should* result in them crawling your site in a more optimal way. Google itself points out the information you pass is considered as hints, though it would appear to benefit both webmasters and the search engines if they were to use this data to crawl the pages of your site according to the pages you think have a high priority. There is a further benefit, which is that you get information back.

Google Webmaster Central gives some useful information when you have a sitemap. For example, the following graph shows googlebot activity over the last 90 days. This is actually taken from a friend of ours in our building who offers market research reports.
soaringeagle
@soaringeagle
01/27/17 09:04:12AM
3,304 posts

weird sitemap crawl results after urlscanner update


Installation and Configuration

actually that goes completely against what google says and the purpose of sitemaps
sitemas are meant to list all pages in your site since google may not find links, might get lost in links structures example a site containing a calander yiu add /calander in the sitemap and the bot gets lost crawling 30 years 12 months a year 30-31 days a month all empty pages
sitemaps guide the bots to where pages are
tell the bots these pages are important , these are worthless ignore them these change constantly, this never changes so only visit it once every couple years
the entire reason for sitemaps is to tell the bots THE COMPLETE LINK STRUCTURE after all.. the domain is a starting point that the bots should find all the links in
but that was not the case so they created sitemaps to tell the bots what all the pages are.. from there it might find a few new links that were not included in the sitemap

this was my biggest issue with nings sitemap
it included /profiles /forums /blogs /video /photos
only main features 100-14 urls for a site that at that point contained 4 million urls

ps at that time google indexed about 2.5 million urls
soaringeagle
@soaringeagle
01/27/17 08:54:25AM
3,304 posts

understanding and debugging the core update process


Installation and Configuration

brian:
No - there's no strain. I downloaded that file from my system here and get 30mb/s. I'm pretty certain this is the root of your issue - you've got some downstream throttling going on, or your hosting provider is not giving you a good connection to the wider net - i.e. they are are only using a single backbone provider such as Cogent, which is a "cheap" typically oversold provider that many low cost hosting provides go with.

they have multiple redundant backbone providers and alow you to use any others you choose
they are running a test on it
in a few days i will be getting a free much more powerful server with an awesome connection solikely it will be fixed hen either way

theres no way that its a universal issue as my sitemap crawler alone often hits it at 1.2 m a sec (we have crappy internet here thats about the best we get) so its gotta be in the connection between jr and my server since i can transfer alot faster from my home pc to server.. at the max speed my crappy connecdtion alows

but they are testing it will let ya know later
thanks this has been a big help (i think)
soaringeagle
@soaringeagle
01/27/17 08:43:31AM
3,304 posts

weird sitemap crawl results after urlscanner update


Installation and Configuration

brian:
Just re-reading this thread and I see you say:

Quote:
yes cause jr sitemap creator i don't believe will list all pages correctly

What pages do you feel it is not listing? Thanks!

hard to say but i remember last i used it (its been a long time) it listed a few thousand pages while the inspyder listed over a million
i can activate it run it and see how many urls it picks up since i removed the rewrite condition from htaccess
wheres the jr created sitemaps stored on the server
  72