OSQA can accept image files to display as a part of article. The files are stored in upfiles folder in default.

When the article which contains images is deleted, still the image files in the upfiles folder. Is there any tools/script which can do garbage collection?

Thanks,

Ted

asked 28 May '10, 00:44

Ted%20Wada's gravatar image

Ted Wada
166446
accept rate: 0%

edited 30 May '10, 09:36

rickross's gravatar image

rickross ♦♦
12.5k2914972

Tagged feature-requset.

(28 May '10, 18:50) ripper234 ♦

The issue with trying to do automatic deletion of those files is that the reference in the article may not be the only reference to the image. We don't have any reasonable way to track whether or not it is, so if we delete it other articles could potentially get broken.

This is probably only a rare case, anyway, and maybe we could detect that locally uploaded images are present in the article and ask for user confirmation about whether or not to delete them.

link

answered 30 May '10, 09:35

rickross's gravatar image

rickross ♦♦
12.5k2914972
accept rate: 46%

I have made a simple script that tries to garbage collect images. It works with psycopg2, but can easily be tweeked to work with an xml dump. You need to be in the upfiles folder to run it. It will only print images it finds not referenced in post.

The problem / limitation / error is that it only checks the forum_node table for references to images. If there are other tables that need to be checked, please tell me which ones.

Use this at your own risk.

#!/usr/bin/env python2.7

#from xml.etree.ElementTree import ElementTree
import os
import psycopg2

from HTMLParser import HTMLParser

credentials = { 'db_username': 'your_db_login_name',
                'db_password': 'your_db_login_pass' }
#files you know you want to keep...
ignore = set(['dump.xml', 'favicon.ico', 'logo.png', 'README'])

images = set()
uploads = set()

def collect_uploads(uploads):
    files = os.listdir('.')
    for f in files:
            if f[0] != '.':
                    uploads.add(f)

class MyHTMLParser(HTMLParser):

def __init__(self, images):
        HTMLParser.__init__(self)
    self.images = images

def handle_starttag(self, tag, attrs):
        #print "Encountered the beginning of a %s tag" % tag
    if tag != 'img':
            return
    for a in attrs:
            if a[0] == 'src':
                    f = a[1].split('/')[-1]
                    self.images.add(f)

def handle_endtag(self, tag):
    pass

htmlParser = MyHTMLParser(images)

def collect_images(images, b):
    htmlParser.reset()
    htmlParser.feed(b)

# use this instead of psycopg2 if you have a dump.xml of your databse's table.
#tree = ElementTree()
#tree.parse("dump.xml")
#rows = tree.findall('records/row')
#
#for row in rows:
#   #print row
#   cols = row.findall('column')
#   for c in cols:
#           if c.get('name') == 'body':
#                   b = c.text.encode('utf-8')
#                   collect_images(images, b)
#                   #print c.text
#

conn = psycopg2.connect("dbname='%(db_username)s' user='%(db_username)s' host='localhost' password='%(db_password)s'" % (credentials ))
cur = conn.cursor()
#cur.execute("select * from information_schema.tables where table_name=forum_node")
cur.execute("select body from forum_node")
print cur.rowcount
for row in cur:
    collect_images(images, row[0])

cur.close()
collect_uploads(uploads)

print 'to delete: '
print uploads.difference(images).difference(ignore)

print 'to keep: '
print images.union(ignore)
link

answered 09 Aug '11, 03:07

mgiann's gravatar image

mgiann
71128
accept rate: 33%

deserves to be an answer in its own right, so I've converted it

(09 Aug '11, 06:27) Andrew_S ♦
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×144
×14
×5
×2

Asked: 28 May '10, 00:44

Seen: 623 times

Last updated: 07 Oct '11, 04:52

powered by OSQA