[Autoblog] Strange Characters in AutoBlog Feed

I’ve seen strange behaviour when importing posts from one WordPress site I own into another site I also own, both on the same server. All tables in both WordPress installations are set to use UTF8mb4.

A sample RSS post is

<item>
<title>Google to appeal €50 million GDPR fine</title>
<link>https://brownglock.com/library/2019/01/24/google-to-appeal-e50-million-gdpr-fine/</link>
<comments>https://brownglock.com/library/2019/01/24/google-to-appeal-e50-million-gdpr-fine/#respond</comments>
<pubDate>Thu, 24 Jan 2019 08:58:35 +0000</pubDate>
<dc:creator><![CDATA[Peter Glock]]></dc:creator>
<category><![CDATA[Cyberlaw]]></category>
<category><![CDATA[Big Data]]></category>
<category><![CDATA[data flows]]></category>
<category><![CDATA[data protection]]></category>
<category><![CDATA[google]]></category>
<category><![CDATA[PoliticoEU]]></category>
<category><![CDATA[privacy]]></category>
<category><![CDATA[Technology]]></category>

<guid isPermaLink="false">https://brownglock.com/library/?p=287656</guid>
<description><![CDATA[<img width="150" height="150" src="https://brownglock.com/library/wp-content/uploads/sites/5/2016/10/google_1475566878-e1475567003121-150x150.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="float: left; margin-right: 5px;" link_thumbnail="" />Not really surprising, it’s such a large fine that it’s worth paying the lawyers and going to appeal. I don’t see Google winning this one though…: U.S. tech giant Google said Wednesday it would appeal the €50 million fine for privacy violations issued by the French data protection authority earlier this week. “We’ve worked hard...]]></description>
<content:encoded><![CDATA[<img width="150" height="150" src="https://brownglock.com/library/wp-content/uploads/sites/5/2016/10/google_1475566878-e1475567003121-150x150.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="float: left; margin-right: 5px;" link_thumbnail="" /><p>Not really surprising, it’s such a large fine that it’s worth paying the lawyers and going to appeal. I don’t see Google winning this one though…:</p>
<blockquote><p>U.S. tech giant Google said Wednesday it would appeal the €50 million fine for privacy violations issued by the French data protection authority earlier this week.</p>
<p>“We’ve worked hard to create a GDPR consent process for personalized ads that is as transparent and straightforward as possible, based on regulatory guidance and user experience testing. We’re also concerned about the impact of this ruling on publishers, original content creators and tech companies in Europe and beyond. For all these reasons, we’ve now decided to appeal,” a Google spokesperson said in an emailed statement.</p>
<p>The fine, <a href="https://www.politico.eu/pro/google-fine-privacy-enforcement-france-gdpr/">issued by France’s CNIL on Monday</a>, is considered the first major financial penalty on a large technology company since the EU’s General Data Protection Regulation entered into force last May.</p>
<p>The French data protection watchdog said Google had violated EU privacy rules because it did not properly ask its users for consent on how to use their personal data. Google’s challenge before the Council of State — France’s top administrative court — would further define how the tech sector interprets requirements on consent under the GDPR.</p></blockquote>
<p><a href="https://www.politico.eu/article/google-appeals-e50-million-gdpr-fine/?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication" target="_blank">Original article here</a></p>]]></content:encoded>
<wfw:commentRss>https://brownglock.com/library/2019/01/24/google-to-appeal-e50-million-gdpr-fine/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
<post-id xmlns="com-wordpress:feed-additions:1">287656</post-id> </item>

When that get’s imported into site #2 the feed looks like:

<item>
<title>Google to appeal €50 million GDPR fine</title>
<link>https://glock.co.uk/blog/google-to-appeal-a%c2%ac50-million-gdpr-fine/</link>
<comments>https://glock.co.uk/blog/google-to-appeal-a%c2%ac50-million-gdpr-fine/#respond</comments>
<pubDate>Sat, 26 Jan 2019 11:03:43 +0000</pubDate>
<dc:creator><![CDATA[peterglock]]></dc:creator>
<category><![CDATA[Big Data]]></category>
<category><![CDATA[Cyberlaw]]></category>
<category><![CDATA[data flows]]></category>
<category><![CDATA[data protection]]></category>
<category><![CDATA[google]]></category>
<category><![CDATA[PoliticoEU]]></category>
<category><![CDATA[privacy]]></category>
<category><![CDATA[Technology]]></category>
<category><![CDATA[Glock Takes Stock]]></category>

<guid isPermaLink="false">https://glock.co.uk/blog/google-to-appeal-a%c2%ac50-million-gdpr-fine/</guid>
<description><![CDATA[Not really surprising, it’s such a large fine that it’s worth paying the lawyers and going to appeal. I don’t see Google winning this one though…: U.S. tech giant Google […]]]></description>
<content:encoded><![CDATA[<p><img width="150" height="150" src="https://brownglock.com/library/wp-content/uploads/sites/5/2016/10/google_1475566878-e1475567003121-150x150.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="float: left;margin-right: 5px" /></p>
<p>Not really surprising, it’s such a large fine that it’s worth paying the lawyers and going to appeal. I don’t see Google winning this one though…:</p>
<blockquote>
<p>U.S. tech giant Google said Wednesday it would appeal the €50 million fine for privacy violations issued by the French data protection authority earlier this week.</p>
<p>“Weâ€<img src="https://s.w.org/images/core/emoji/11/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />ve worked hard to create a GDPR consent process for personalized ads that is as transparent and straightforward as possible, based on regulatory guidance and user experience testing. Weâ€<img src="https://s.w.org/images/core/emoji/11/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />re also concerned about the impact of this ruling on publishers, original content creators and tech companies in Europe and beyond. For all these reasons, weâ€<img src="https://s.w.org/images/core/emoji/11/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />ve now decided to appeal,â€? a Google spokesperson said in an emailed statement.</p>
<p>The fine, <a href="https://www.politico.eu/pro/google-fine-privacy-enforcement-france-gdpr/">issued by Franceâ€<img src="https://s.w.org/images/core/emoji/11/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />s CNIL on Monday</a>, is considered the first major financial penalty on a large technology company since the EUâ€<img src="https://s.w.org/images/core/emoji/11/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />s General Data Protection Regulation entered into force last May.</p>
<p>The French data protection watchdog said Google had violated EU privacy rules because it did not properly ask its users for consent on how to use their personal data. Googleâ€<img src="https://s.w.org/images/core/emoji/11/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />s challenge before the Council of State — Franceâ€<img src="https://s.w.org/images/core/emoji/11/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />s top administrative court — would further define how the tech sector interprets requirements on consent under the GDPR.</p>
</blockquote>
<p><a href="https://www.politico.eu/article/google-appeals-e50-million-gdpr-fine/?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication" target="_blank">Original article here</a></p>
]]></content:encoded>
<wfw:commentRss>https://glock.co.uk/blog/google-to-appeal-a%c2%ac50-million-gdpr-fine/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>

I’ve checked that both sites have the same encoding (utf8mb4) set in wp-config and that the database tables are indeed utf8mb4. What else can I check?

I have opened the site for support in the WPMU Dev dashboard

  • Adam Czajczyk
    • Support Gorilla

    Hello Peter

    I hope you’re fine today!

    Both pages are properly set to use UTF-8 encoding, the same encoding is used for both feeds (which is good as it’s an XML feed standard) so I admit it’s a bit “weird” issue. The Autoblog plugin won’t change that encoding “on its own” as it’s using SimplePie library for fetching feeds’ contents – and SimplePie is a long-established solution, that’s also a part of WP core.

    But it definitely is an encoding issue so we first need to identify where that problem actually happens. We can assume that it’s fine on a source site as it shows the posts fine and they are fine in the feed. Therefore it must be happening “later” – with the feed content that’s already fetched.

    That said, before we go any further, could you please check one more thing? I can see that what’s displayed with those “weird characters” in the feed, is also using the same characters on the site and in the post editor. So we need to narrow it down to find out whether it happens before or after the post data is actually saved to the database. Could you please look into the database and in the “wp_posts” table look into the “post_content” column to see whether the content of those affected posts there also uses those characters or if it’s properly represented there with all valid characters?

    Let me know please and we’ll then decide what to check/do next.

    Best regards,

    Adam

  • Peter
    • Design Lord, Child of Thor

    I’ve dug into both databases. Let’s call them ‘source’ and ‘destination’

    This is an affected post that’s stored in the ‘destination'” database:

    <img width="150" height="150" src="https://glock.co.uk/wp-content/uploads/2019/02/GettyImages-85595134-800x509-e1517235387640-150x150.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="float: left;margin-right: 5px" /><p>Thinking of sharing some snippet of your life on social media?</p>
    <blockquote><p>[…] According to <a href="https://www.willistowerswatson.com/en/insights/2017/09/Cyber-risk-its-a-people-problem-too">insurance claim data</a> of businesses based in the UK, over 66% of cyber incidents are caused by employee error. Although the data attributes only 3% of these attacks to social engineering, our experience suggests the majority of these attacks would have started this way.</p>
    <p>For example, by employees not following dedicated IT and information security policies, not being informed of how much of their digital footprint has been exposed online, or simply being taken advantage of. Merely posting what you are having for dinner on social media can open you up to attack from a well trained social engineer.</p>
    <p>[…]</p></blockquote>
    <p><a href="https://www.google.com/url?rct=j&sa=t&url=http://theconversation.com/dont-click-that-link-how-criminals-access-your-digital-devices-and-what-happens-when-they-do-109802&ct=ga&cd=CAIyHGE3NGI4YThiYWMzODhlNDA6Y28udWs6ZW46R0I&usg=AFQjCNF1cVgSp4L1qplB6Qh6NAZkhD_GGQ" target="_blank">Original article here</a></p>

    and this is how the same content is stored in the ‘source’:

    Thinking of sharing some snippet of your life on social media?
    <blockquote>[...] According to <a href="https://www.willistowerswatson.com/en/insights/2017/09/Cyber-risk-its-a-people-problem-too">insurance claim data</a> of businesses based in the UK, over 66% of cyber incidents are caused by employee error. Although the data attributes only 3% of these attacks to social engineering, our experience suggests the majority of these attacks would have started this way.

    For example, by employees not following dedicated IT and information security policies, not being informed of how much of their digital footprint has been exposed online, or simply being taken advantage of. Merely posting what you are having for dinner on social media can open you up to attack from a well trained social engineer.

    [...]</blockquote>

    This is the RSS feed of the same article from the ‘source’:

    <item>
    <title>Don’t click that link! How criminals access your digital devices and what happens when they do</title>
    <link>https://brownglock.com/library/2019/02/11/dont-click-that-link-how-criminals-access-your-digital-devices-and-what-happens-when-they-do/</link>
    <comments>https://brownglock.com/library/2019/02/11/dont-click-that-link-how-criminals-access-your-digital-devices-and-what-happens-when-they-do/#respond</comments>
    <pubDate>Mon, 11 Feb 2019 06:43:30 +0000</pubDate>
    <dc:creator><![CDATA[Peter Glock]]></dc:creator>
    <category><![CDATA[Be Aware]]></category>
    <category><![CDATA[Information Security]]></category>

    <guid isPermaLink="false">https://brownglock.com/library/?p=289686</guid>
    <description><![CDATA[<img width="150" height="150" src="https://brownglock.com/library/wp-content/uploads/sites/5/2017/10/GettyImages-85595134-800x509-e1517235387640-150x150.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="float: left; margin-right: 5px;" link_thumbnail="" />Thinking of sharing some snippet of your life on social media? […] According to insurance claim data of businesses based in the UK, over 66% of cyber incidents are caused by employee error. Although the data attributes only 3% of these attacks to social engineering, our experience suggests the majority of these attacks would have started this way....]]></description>
    <content:encoded><![CDATA[<img width="150" height="150" src="https://brownglock.com/library/wp-content/uploads/sites/5/2017/10/GettyImages-85595134-800x509-e1517235387640-150x150.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="float: left; margin-right: 5px;" link_thumbnail="" /><p>Thinking of sharing some snippet of your life on social media?</p>
    <blockquote><p>[…] According to <a href="https://www.willistowerswatson.com/en/insights/2017/09/Cyber-risk-its-a-people-problem-too">insurance claim data</a> of businesses based in the UK, over 66% of cyber incidents are caused by employee error. Although the data attributes only 3% of these attacks to social engineering, our experience suggests the majority of these attacks would have started this way.</p>
    <p>For example, by employees not following dedicated IT and information security policies, not being informed of how much of their digital footprint has been exposed online, or simply being taken advantage of. Merely posting what you are having for dinner on social media can open you up to attack from a well trained social engineer.</p>
    <p>[…]</p></blockquote>
    <p><a href="https://www.google.com/url?rct=j&sa=t&url=http://theconversation.com/dont-click-that-link-how-criminals-access-your-digital-devices-and-what-happens-when-they-do-109802&ct=ga&cd=CAIyHGE3NGI4YThiYWMzODhlNDA6Y28udWs6ZW46R0I&usg=AFQjCNF1cVgSp4L1qplB6Qh6NAZkhD_GGQ" target="_blank">Original article here</a></p>]]></content:encoded>
    <wfw:commentRss>https://brownglock.com/library/2019/02/11/dont-click-that-link-how-criminals-access-your-digital-devices-and-what-happens-when-they-do/feed/</wfw:commentRss>
    <slash:comments>0</slash:comments>
    <post-id xmlns="com-wordpress:feed-additions:1">289686</post-id> </item>

    Any ideas?

  • Adam Czajczyk
    • Support Gorilla

    Hi Peter

    Thanks for checking this and getting back to me.

    This seems to confirm that the encoding breaks either when the feed is being processed on the “destination” site or right upon writing to the database but no later. I’ve tested feed from the site, made sure that it validates properly, tried to import it to my own setup as well with various settings but to no luck – I couldn’t replicate this.

    I also tried to find some possible explanations of the issue but apart from what we already know (that “something” is “breaking” encoding) I didn’t find anything relevant as well. I’ll need some help from our developers then so I’ve already passed the case to them and am awaiting their feedback. They’ll look into it and we’ll update you here as soon as we get to know more.

    Kind regards,

    Adam

  • Konstantinos Xenos
    • Rubber Duck Debugger

    Peter ,

    I checked the “source” site and I see that all those characters you’re finding in the destination site are either html entities ( i.e.   ) or characters like the Euro symbol etc.

    I’ve run some tests on my setup as well trying to import posts that have the same characters but without luck as everything was imported perfectly.

    Unfortunately the ‘source’ site as well doesn’t have any weird characters on it’s latest feed and since those posts are not in the current feed I can’t re-import it on my end to cross check with an “identical” rss that you had been using from the source site.

    In any case since it’s on specific characters I’m pretty sure there’s something going wrong with the encoding during the import on your end or in the WP installation in general.

    Could you please share some access with me so I can take a look as well? Also please tell me if I could make some import tests myself to test this theory, I’ll most likely add an RSS feed from one of my own test sites using various characters to see if hey will be encoded correctly.

    You can send me the information needed privately through our contact form at https://premium.wpmudev.org/contact/#i-have-a-different-question by following this example:

    Subject: "Attn: Konstantinos Xenos"

    - Admin login ( if Multisite please provide Super Admin details ):
    Admin Username:
    Admin Password:
    Login URL:

    - FTP credentials
    Hostname:
    Username:
    Password:
    Port:
    Key-File ( and password ) if needed

    - Server Admin ( CPanel / Plesk )
    Username:
    Password:
    Login URL:

    - Link back to this thread for reference
    - Any other relevant URLs -or- information regarding the issue that was not included in this thread

    Regards,

    Konstantinos

  • Adam Czajczyk
    • Support Gorilla

    Hi Peter

    did not have ‘mbstring’ enabled. I’ve now enabled it.

    Did it help and solved the issue? If not and you still need help with this, could you please provide us with the access credentials as asked by my colleague previously?

    It seems we never received a message from you so I’m not sure whether you didn’t send it or somehow it didn’t go through. Please follow the guide from this post here and we’ll be ready to assist you further:

    https://premium.wpmudev.org/forums/topic/autoblog-strange-characters-in-autoblog-feed#post-1376600

    Best regards,

    Adam

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.