solved Archive and Purge

TiG
TiG
@tig
8 years ago
184 posts
Given the size of our Comments, Discussion and Blog tables it would be advantageous to eliminate stale data. Considering this, we believe we can purge all records older than 6/1/2015. That would eliminate about 40% of the comments table. The idea of course would be to archive the data, delete and then reorganize the db.

Ideally JR hosting has tested tools that can be safely used to perform this task. This post is to present our intent and get the best advice on how to proceed.


--
TiG

updated by @tig: 02/14/18 10:12:36PM
brian
@brian
8 years ago
10,148 posts
We have tools for deleting items, but not archiving items. For comments you could just export an SQL dump of the item and item_key tables as your archive - I think that would work.

Are you wanting to do it for performance reasons?

Thanks!


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
TiG
TiG
@tig
8 years ago
184 posts
Yes, this is for performance reasons. No value bogging the indexes with stale records that are likely never accessed (and in reality we can live without anyway). A 40% reduction gives us some breathing room at least with respect to db activity.

The archiving certainly can be done in several ways. We would do it only to be prudent since there is no expectation of ever revisiting those records. My key concern is slicing off interrelated records with integrity. This is not something we want to do without high confidence that we will have no side-effects.

Logically, we are looking for the equivalent of deleting all discussions, blogs and associated comments older than 6/1/2015. Further, if we were to remove ALL comments older than 6/1/2015 that would remove comments posted on the 'walls', etc. If we delete comments using the tools, I presume that referential integrity is maintained.


--
TiG

updated by @tig: 11/16/17 10:01:36AM
paul
@paul
8 years ago
4,335 posts
Is it just the Comments DataStore you want to truncate?


--
Paul Asher - JR Developer and System Import Specialist
TiG
TiG
@tig
8 years ago
184 posts
@paul

Comments is the 800lb Gorilla. So if there is a reason to not do articles and discussions, just doing comments (with referential integrity of course) would be 90% of the intent.


--
TiG
TiG
TiG
@tig
8 years ago
184 posts
@paul

Addressing your question from a different perspective, we could also remove older items such as actions. In a sense, the site could realistically purge user content (and shadows of same) older than 6/1/2015.


--
TiG
brian
@brian
8 years ago
10,148 posts
Note that we have sites that have more than twice as many comments as your site that are working well. What you have to do as you get bigger is start to reduce/refine the number of different ways you are presenting the data. A good example is moving to "simplepagebreak" instead of "pagebreak". Once you have thousands of pages it just doesn't make sense to have a jumper - no one is saying "I want to jump directly to page 1,267". The simple pager is much, much faster since it does not have to know exactly how many pages there are - it just adds prev/next page links.

I know on your site you are using the jrDiscussion module, which to be honest is always going to have more performance issues than our Forum module since it is a combining of 2 different datastores. However, the recent comment 2.0.0 update should have sped things up considerably for most "normal" views.

Any tool that we would make that archives comments would work correctly. Just note that I'm not sure why you would want to archive them instead of just deleting them, unless at some point you want to be able to browse through the old comments i the DB.


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
TiG
TiG
@tig
8 years ago
184 posts
@brian

As noted, we are indeed focused on the delete. The 'archive', per se, is simply a measure for prudence.

Every site will have different functionality. NT could easily support many more comments if the comments were not so verbose. Plus, as we have seen, NT's number of comments imposes rather severe limitations on functionality. When a site must refrain from JOIN operations, etc. one naturally looks for a way to reduce the number of records under consideration. So a site with basic comment functionality could handle far more comments than one that gets 'fancy' with its functionality.

In short, if one is JOIN-challenged because of the number of records in its tables would it not be valuable to reduce the number of records by 40%?


--
TiG

updated by @tig: 11/16/17 10:24:33AM
TiG
TiG
@tig
8 years ago
184 posts
@brian

Now, on this in particular:

Quote:
I know on your site you are using the jrDiscussion module, which to be honest is always going to have more performance issues than our Forum module since it is a combining of 2 different datastores. However, the recent comment 2.0.0 update should have sped things up considerably for most "normal" views.

Would it be practical for us to consider migrating from jrDiscussion to jrForum?


--
TiG
brian
@brian
8 years ago
10,148 posts
TiG:
So a site with basic comment functionality could handle far more comments than one that gets 'fancy' with its functionality.

Exactly. So when you start facing performance issues, the first thing to do is figure out how to be "less fancy". For example, the front page of your site right now has 13 different site builder blocks on it - 8 of them are lists. Are they all really needed? Look at Reddit - it's ONE list. If they tried to slice their data 8 different ways on every page they could never scale.

I know this isn't something you can decide on immediately, but it's something you can do over time that can really help your site scale better. Focus on reducing the number of unique listings - this gives you a better chance of having a smaller cache with a higher hit rate.


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
brian
@brian
8 years ago
10,148 posts
TiG:
Would it be practical for us to consider migrating from jrDiscussion to jrForum?

It could, but it's not that clear cut.

To be honest the calls on your site that are slowing things down are not the comments that are being listed in a specific topic - that's no problem - those will remain fast even with millions of comments. It's the lists of the type "here's the latest comments on discussions in our community profile that are related to US politics" - that requires more search conditions, which means it is going to be slower.

So if you were to attempt to do those types of lists using the Forum, you might run into the same issues.


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
TiG
TiG
@tig
8 years ago
184 posts
@brian

Quote:
I know this isn't something you can decide on immediately, but it's something you can do over time that can really help your site scale better. Focus on reducing the number of unique listings - this gives you a better chance of having a smaller cache with a higher hit rate.

We have established an entirely different UX for the next major version of the site. It will be quite different from what exists today (from scratch redesign) and is indeed partitioned for performance (and a modern UX material style). In particular, the clutter and redundancy will be gone and the design is far more role-based usage centric.

But I am faced with a current site that is what it is. We (NT) can certainly discuss reducing the clutter to a degree but the last thing I want to do is try to refactor this website instead of developing its more modern incarnation. So our changes would necessarily be modest. We will, nevertheless, seriously investigate this.

Now, just to be clear, you are telling me that the best way to deal with being JOIN-challenged is avoid doing JOINs. Of course I agree with that. But you also seem to be saying that reducing the number of comment records by 40% is not valuable for those cases where we cannot avoid JOINs (and any other db operation that ultimately involves serial searches). Seems to me that a small comments table would perform well even with inefficient db queries but as the record count grows the inefficiencies will manifest as performance problems. Ergo, the fewer records the better. How is that not the case?


--
TiG
brian
@brian
8 years ago
10,148 posts
TiG:
Seems to me that a small comments table would perform well even with inefficient db queries but as the record count grows the inefficiencies will manifest as performance problems. Ergo, the fewer records the better. How is that not the case?

Sorry if I gave that impression - of course a smaller DS will always perform better on inefficient queries so it makes sense to archive records if your use case requires lists that use a larger number of search conditions.


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
TiG
TiG
@tig
8 years ago
184 posts
@brian

We have discussed trimming operations. Actions is the first on the chopping block. The home page widgets seem to be tuned rather well at this point. Columnists would be the most likely to impact the db. But I would like to look at hard metrics to identify misbehaving widgets. I could insert my own timer code but would rather use existing tools. Do I have access to tools you use (or at least a portion of them) to identify poor performing widgets? Is so, where might I find them?

Thanks, by the way, for your excellent support.


--
TiG
brian
@brian
8 years ago
10,148 posts
For the Timeline, make sure the ACP -> Profiles -> Timeline -> Prune System Entries is enabled and set to something like "90" - this will prune out all the "system added" timeline entries - i.e. "so and so commented on..", etc. Timeline entries that are made by the actual user are never purged. This can usually take care of any Timeline DS issues.

As for performance profiling, you can get as involved as you want - most of them I would not recommend running on the production site.

One that you CAN turn on is set the Developer Tools -> Slow Query Log to like 2 seconds - any jrCore_list or jrCore_db_search_items calls that take longer than 2 seconds to return will be logged to the Activity Log along with information about the URL, the user and the parameters and constructed query. This can be used to spot where the "trouble spots" are and can really help narrow down your work to those areas that need it most.

Let me know if that helps.

P.S. I've sent Perrie a reply about sponsoring an archive/deleting module - Paul and I have been discussing it in chat and I think we could build something really cool.


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
TiG
TiG
@tig
8 years ago
184 posts
@brian - Thanks. This will substantially improve our understanding of our site dynamics.


--
TiG

Tags