[Hummingbird] Clean orphan meta-data

In the DB cleanup tool, it would be nice to have options to clear orphan meta-data (post meta, comments, post comments, taxonomies, etc.) and more generally any unused stuff.

For instance, clearing out old Yoast tables when using SmartCrawl.

Also, it would be great if we could run "Repair" and "Optimize" queries automatically from HB.

Regards,

Greg

  • Adam Czajczyk
    • Support Gorilla

    Hi Greg

    I hope you’re well today and thank you for suggesting new features!

    The “repair” and “optimize” queries are quite complex (not in term of queries themselves but in terms of what they “do to the database”:wink: and they are/should be used in very rare cases. I tend to say that in case of most sites there no need to use them ever. Not all the sites and not all the cases of course but in my opinion that’s something rather “extreme”. But clearing “orphaned data” might indeed be a handy tool!

    However, I’m wondering about the “tech” side of that. In theory, finding orphaned post/terms etc meta data is not that difficult – all you’d need to do would be to read a post/term/etc ID from the meta db row and check if related post/term/etc exists. But that’s still quite a risky operation on the database as there’s always a chance that due to some unexpected and unpredictable reasons something might go wrong. In such case without full db backup it might cause really serious troubles. So maybe it would be better to do it as a “semi automate” tool? I mean: instead of clearing them up right away provide some sort of “grouped” list with an option to decide? With some additional warning and some sort of double-confirmation before deleting them?

    What do you think about that?

    As for “orphaned tables”, that’s a bit different as it’s not that easy to identify those. One “theoretically” reliable way would be to actually monitor “on the fly” what tables are created by some plugins and store a log of those but that’s something that I’ve seen attempted in the past and while it works great in theory, in reality it never works exactly as intended – mostly because first thing you’d need to do after installing clean WordPress would be to install such “monitoring tool” and then you’d have to keep it enabled and running all the time, never switching it on and making sure that it actually is working properly. If it’s working “inside” your WordPress there’s always a chance that e.g. some conflict would break such monitoring tool making it essentially useless. The other option would be to monitor that with a separate standalone script on a server directly on DB level but then it makes it “unaware” of everything that happens “under WP hood” and actually becomes even unrelated to WordPress in general. There are db monitoring tools around on the market already.

    One other way that comes to my mind though would be to, instead, scan db and for each and every table that’s not a WP core table try to scan all the plugins and try to somehow automatically find out which of them are actually calling out these tables. I’m not sure how this should be done “code-wise” and if it’s entirely possible but that could possibly be “some” way. On the other hand, that could probably be quite a “resource hungry” process.

    Do you think there might be some other way? Or maybe you think that one of those that I suggested might in fact be more reliable than I think it is?

    I’d love to hear more feedback on this from you. I’m talking to our developers about this so I’d just want to better describe your point of view to them and also pass as many ideas related to your suggestions to them as I can :slight_smile:

    Best regards,

    Adam

    • Greg
      • Syntax Hero

      Thank you very much for your answer.

      Actually, I’m not an expert on this domain (by far), but I’ve come accross many “dirty” websites and these are steps I often come along.

      For orphan meta-data, we could simply use a compound request, such as : SELECT * FROM post_meta WHERE post_id NOT IN (SELECT * FROM posts)

      For tables, it would be quite a “heavy” process. Maybe you could identify plug-ins that “compete” with yours (for instance Yoast vs SmartCrawl) and add to each of your plug-in a list of tables to “verify” once your plug-in is up and running. For instance, after the SmartCrawl import, having a pop-up that says something like “Yoast data won’t be necessary anymore, would you like to delete related tables in the DB ?”

      I hope this helps.

      Greg

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.