Extracting images from non standard feed

Hi,

I have asked help for a feed numerous times, to no avail. So I decided to modify the plugin so that I can extract the images from a non standard feed. Here is the feed:
http://www.f5haber.com/rss/magazin_haber.xml

It encapsulates the image in a " <img><<![CDATA[ " tag.

What I would like to do is, while processing the feed I want to add the code
//Clean Up CDATA Start
$content = str_replace ('<![CDATA[','',$details);
$content = str_replace (']]>','',$details);
$content = str_replace ('<img>','<media:thumbnail url="',$details);
$content = str_replace ('</img>','/>',$details);
//Clean Up CDATA End

So that those tags are now converted to "<media:thumbnail>" tags and so that I can use their images.

My question is, how does Autoblog pre process the feeds and what file do I need to edit. Thanks.

  • Predrag Dubajic
    • Support

    Hi Alp,

    Hope you're doing well today.

    We do tend to help will all the issues users are having but some modifications that take longer to implement do go beyond the support we offer her.

    I would need to check with our second level support dev guys about your request and see if this is something that could be fairly easy implemented.

    In the meantime you may want to check /wp-content/plugins/autoblog/autoblogincludes/addons/cacheandfeatureimages.php and /wp-content/plugins/autoblog/autoblogincludes/addons/cacheimages.php files which contain code for image import add-ons.

    Best regards,
    Predrag

  • Alp
    • Site Builder, Child of Zeus

    Hi Predrag,

    Thanks for your feedback. I have already done that, does not work. The reason for that is, we need to pre process the feed, before it is sent to the image extraction functions, so that the functions work the way we want.

    What I need is the file that first processes the feed, and gets the content from the feed. I'd like to try adding the code before the images are processed. Thanks for your support.

    Alp

  • Alp
    • Site Builder, Child of Zeus

    Hello again,

    What I'm trying now to do is, in plugins > autoblog > autoblogincludes > classes > autoblog > addon > image.php:

    in protected function _get_remote_images_from_content( $content );

    I'm trying to add a regex condition to the if statement to extract the images on the XML feed http://www.f5haber.com/rss/magazin_haber.xml

    Could you please ask the developer how to do that as it's very hard to do. The images are wrapped around CDATA and line breaks and all sorts of stuff. Thanks a lot.

  • Milan
    • WordPress Wizard

    Hello Alp

    Hope you are well today and sorry for delay here.

    I've asked our SLS team again to update you here. Currently they are dealing with lots of issues and bugs at a time. So they are little bit late in replying. Hopefully this time they will be quick. :slight_smile: We appreciate your patient. :slight_smile:

    Cheers,
    Milan

  • Alp
    • Site Builder, Child of Zeus

    Hi Milan,

    Thanks for letting me know. For the feeds that we are using, there are lots of images embedded in non standart rss tags, such as <image><url> .... </url></image> tags that Autoblog cannot handle.

    Adding a regex get method for extracting image url's in non standart feeds would be a fantastic feature for both me and lots of other users, because I'm sure everyone using this plugin are facing non standart rss feeds at some point.

    I'm looking forward for your feedback. Thanks,

    Alp

  • Adam Czajczyk
    • Support Gorilla

    Hello Alp!

    We're waiting for some suggestions from our Second Line Support team but I think the issue is a bit more complex than just modifying Autoblog code. The plugin does not process feeds entirely "on its own". It's using some built-in WordPress feed handling tools so the feeds go through wp_kses() filter but what's even more important the basic processing is done by Simple Pie scripts.

    Simple Pie is probably the most popular feed processing PHP solution and entire feed handling in WordPress is based on it. My suggestion would be to dig a bit deeper into this: how the Simple Pie works and how it integrates to WordPress. It may turn out to be necessary to affect the way it works as well.

    I realize it's not a "solution" but I hope it will point you in a right direction and help you find a way. When we get any suggestions from our SLS we'll share them with you immediately.

    Best regards,
    Adam

  • Alp
    • Site Builder, Child of Zeus

    Hi Adam,

    Thanks for the response. Yes, I'm aware of that, but what I did was:

    In autoblog > autoblogincludes > classes > addon > image.php

    Line 50, I have modified the code:

    if ( preg_match_all( '|<img.*?src=[\'"](.*?)[\'"].*?>|is', $content, $matches ) ) {
    			foreach ( $matches[1] as $url ) {
    				$url  = str_replace( ' ', '%20', current( explode( '?', $url, 2 ) ) );
    				$purl = autoblog_parse_mb_url( $url );
    				if ( ! isset( $purl['host'] ) || $purl['host'] != $siteurl['host'] && preg_match( '/[^\?]+\.(jpe?g|jpe|gif|png)\b/i', $url ) ) {
    					// we seem to have an external images
    					$images[] = $url;
    				}
    			}
    		}

    with the following:

    if ( preg_match_all( '|<img.*?src=[\'"](.*?)[\'"].*?>|is', $content, $matches ) ) {
    			foreach ( $matches[1] as $url ) {
    				$url  = str_replace( ' ', '%20', current( explode( '?', $url, 2 ) ) );
    				$purl = autoblog_parse_mb_url( $url );
    				if ( ! isset( $purl['host'] ) || $purl['host'] != $siteurl['host'] && preg_match( '/[^\?]+\.(jpe?g|jpe|gif|png)\b/i', $url ) ) {
    					// we seem to have an external images
    					$images[] = $url;
    				}
    			}
    		} elseif ( preg_match_all( '/([a-z\-_0-9\/\:\.]*\.(jpg|jpeg|png|gif))/i', $content, $matches ) ) {
          foreach ( $matches[1] as $url ) {
    				$url  = str_replace( ' ', '%20', current( explode( '?', $url, 2 ) ) );
    				$purl = autoblog_parse_mb_url( $url );
    				if ( ! isset( $purl['host'] ) || $purl['host'] != $siteurl['host'] && preg_match( '/[^\?]+\.(jpe?g|jpe|gif|png)\b/i', $url ) ) {
    					// we seem to have an external images
    					$images[] = $url;
    				}
    			}
    
        }

    So the reason behind this is, if DOM does not work, it falls back to regex, looking for an image with source. What I'm trying to do is, when it does not find anything with a source, it will look for something like http:// .... .jpg

    This totally makes sense with Simplepie. But I could not get it to work. Can you ask your developers how to make this work? It will also be an additional feature to Autoblog itself, because this way it will be able to capture a featured image without "src" link.

  • Adam Czajczyk
    • Support Gorilla

    Hello Alp,

    Thanks for sharing this additional information. I already passed it to our 2nd-line support team in addition to previous questions. They already found some ways to alter SimplePie's operations but it seems there's now way currently to hook this to Autoblog plugin. Your approach is a bit different so I'm eager to find out what they think about it.

    I'll let you know once I get a replay.

    Best regards,
    Adam

  • Panos
    • SLS

    Hello Alp ,

    Sincere apologies on the delay here!

    Have you tried something like the following:
    preg_match_all( '|<image [^>]+>(.*)</image+>|U', $content, $matches ) ) {

    Also I think it's possible to use php DOM:

    $dom = new domDocument;
    @$dom->loadHTML($content);
    $dom->preserveWhiteSpace = false;
    //for img: $dom_images = $dom->getElementsByTagName('img');
    $images = $dom->getElementsByTagName('image');
    foreach ($dom_images as $image) {
      //for img: $images[] =  $image->getAttribute('src');
      $images[] = $image->nodeValue;
    }

    Hope this helps!

    Kind regards,
    Panos

  • Alp
    • Site Builder, Child of Zeus

    Hi Panos,

    I have solved the issue using a different plugin. But nevertheless, I'll try both the regex and php dom codes that you have supplied, just for the sake of learning. Thanks a lot for your support.

    Alp

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.