anti-splog export/import, mass add, and check:*

I run several large networks and absolutely LOVE the new regex support for anti-splog.

But adding a new rule still requires several hurdles that could be alleviated. Here's a short list of feature requests specific to this functionality.

1) PLEASE add export and import capabilities so I can easily duplicate my new rules to my other sites.

2) I assume this would be possible (manually) if the ability to export/import were added - but in any case, I'd love the ability to use my own blacklist sources to help tune the regex for import. Effectively creating my own bulk filters.

3) Currently you have to individually select "site domain", "site title", "username" OR "email" in the "check" field. Please either add an option to check "all" of these, or make them checkboxes instead so I can selectively apply my new pattern to more than one value (username + email + title).

4) Cosmetic: please move the "add pattern" box to the top of the page. It'll make things easier once there's more than a dozen or so patterns.

  • Aaron

    1) PLEASE add export and import capabilities so I can easily duplicate my new rules to my other sites.

    You can do this fairly easily right now as the entire list is stored one site option (a row in your sitemeta table). Just use phpmyadmin to copy and paste between installs.

    Good ideas. I thought about doing checkboxes from the start but then realized that the vast majority of rules would need to be different for each type because of the different formats they are.

  • Shawn

    @Aaron - you're absolutely right about "most" needing only one or the other. But there are some terms and patterns that almost always appear in *either* the username or email, and it would be best to be able to make a single rule that applied to both - especially if it's all stored in one field.

    A couple more requests:

    5) Add the ability to test the IP address. Certain patterns (*.*.*.255 and *.*.*.0) are always spam, and it would be best to be able to purge these from go. The native filters don't provide the ability to filter RTL, only LTR or "anywhere" and using that kind of a static string results in many false positives.

    6) Add insertion variables to the patterns so that I can perform complicated testing with data from other variables. For example, I could use "/(?P=email)/" to test "username" to represent the email prefix. If they're a match, then I could act on it. The data I would like to be able to use:
    * ip address
    * email prefix (everything before the @)
    * domain (comlete email domain)
    * domainroot (portion of the email domain before the first dot - 'sample' in 'sample.example.com', 'example' in 'example.com')
    * blogpath (either the subdomain or foldername they've submitted for their blog)
    * username
    * firstname
    * lastname

    I think these would be sufficient details to address pretty much any pattern matching need.

  • Aaron

    The native filters don't provide the ability to filter RTL, only LTR or "anywhere" and using that kind of a static string results in many false positives.

    Not exactly sure what you mean, but regexs can test basically any pattern you could imagine.

    * email prefix (everything before the @)
    * domain (comlete email domain)
    * domainroot (portion of the email domain before the first dot - 'sample' in 'sample.example.com', 'example' in 'example.com')
    * blogpath (either the subdomain or foldername they've submitted for their blog)
    * username

    All these can already be tested. Can't check names as those aren't normally a part of signup.

    Here is an example of a rule we used to stop a bot:
    /[A-Z]{1}[[:alpha:]]*_[A-Z]{1}[[:alpha:]]*[0-9]+@(gmail\.com|yahoo\.com|hotmail\.com)/

  • Shawn

    RTL (right to left) vs LTR (left to right) testing is the issue with IP filtering. Currently, what is accomplished on an MS site in the IP filtering options is LTR - it only filters IPs from the left to right portion. You can prevent some signups if the first part of the IP matches the string - but it isn't capable of matching from the last octet.

    The examples I gave were not *direct* testing examples, but examples of submitted data I would like to be able to include in a test against another field. For example, when "imaspammer1234@gmail.com" creates the blog "imaspammer1234", I want to be able to automatically test the email prefix (imaspammer1234) against the blog path (imaspammer1234) - and if they match, splog 'em.

    The PCRE syntax "(?P=namedgroup)" allows you to insert captures from previous groups in the same pattern, but in this situation it would be used (through a simple text replacement) to effect additional variables for inserting within the RX pattern. For the example in the paragraph above, I would test "site domain" (the blog path) against "/(?P=email)/" and if they match I could act on it. In that example, the string (?P=email) would be replaced with "imaspammer1234" so the actual tested pattern would be "/imaspammer1234/".

    Another example of variable insertion - testing "site domain" for email address domain root:
    "/(?P=domroot)/"
    This would block those spammers creating blogs like "wzsalestoday" from an email address like "randomstuff@wzsalestoday.example.com".

    Another request...or two!

    7) Allow re-ordering of rules. While for the most part these rules would be treated on their own and we assume would be few enough that prioritization wouldn't be important - some rules are far more likely to match than others. If we can easily drag them to the top so they're tested first, it would improve performance.

    8) Grouping. Allow me to group two or more tests together in positive or negative assertions so that they only perform 'action' if both rules match. "imaspammer1234" is a common username pattern for spammers, but "joeblow1982" is sadly common for valid usernames, too, so it would be best to be able to effectively whitelist joeblow1982 based on a different value (negating the numeric match), or adding an additional test to reinforce the spamminess (title matches ".{20,}", for example). Ideally this would be another drag & drop thing, maybe implemented like the menu editor interface with indentations to present rule groups.

    9) Duplicate rule - so I can use it as the basis for another.