Is there any script that will allow me to bulk upload users and questions&answers data into osqa? If so what are the supported format?

asked 24 Oct '11, 11:17

openbob's gravatar image

openbob
31125
accept rate: 0%


What I have done so far: There are two importer.py scripts:

  1. /apps/osqa/forum_modules/exporter/importer.py <- for osqa XML export (I guess)
  2. /apps/osqa/forum_modules/sximporter/importer.py <- for stack overflow export (I guess)

However, I do not know how to invoke them :-(

As a result, I have followed "http://meta.osqa.net/questions/4080/how-can-i-import-data-from-another-question-and-answer-script" that are 95% complete. I have created 4 simple scripts so that I can pre-seed my osqa instance prior to open for business.

1) import user (- importing a user test_import_user with a e.g. user-id=4)

cat import_user.py

from django.core.management.base import NoArgsCommand
from forum.models import User
from forum.actions import UserJoinsAction
import datetime
class Command(NoArgsCommand):
    def handle_noargs(self, **options):
        #load a user data from somewhere
        username = 'test_import_user'
        email = 'bob@openbob.com'
        #and if you want to fix membership dates
        join_date = datetime.datetime(2011, 10, 25)
        #then create the user
        user = User(username=username, email=email, date_joined=join_date)
        user.email_isvalid = True
        user.set_unusable_password()
        user.save()
        #now make sure the user is correctly created issuing the proper action
        UserJoinsAction(user=user).save()

2) add question (- add a question asked by test_import_user with user-id=4. This creates a question with e.g. question-id=6)

cat add_question.py

from django.core.management.base import NoArgsCommand
from forum.models import *
from forum.actions import *
class Command(NoArgsCommand):
    def handle_noargs(self, **options):
        #retrieve the "asker" from the database
        # "4" for test_import_user
        user = User.objects.get(id=4)
        #prepare question data
        qdata = dict(
           title = "How do I stop x?",
           text = "How do i stop x? please help me",
           tags = "how stop help",
        )
        #save the question, everything will be handled internally,
        #like creating the tags if they don't exist, etc 
        AskAction(user=user).save(data=qdata)

3) add answer (- creating answer with e.g. answer-id=9 to the question with question-id=6 as test_import_user user-id=4)

cat add_answer.py

from django.core.management.base import NoArgsCommand
from forum.models import *
from forum.actions import *

class Command(NoArgsCommand):
    def handle_noargs(self, **options):
        #retrieve the "answerer" from the database
        # "4" for test_import_user
        user = User.objects.get(id=4)
        #prepare answer data
        adata = dict(
            text = "just ask openbob",
            question = Question.objects.get(id=6),
        )
        #save the answer, everything will be handled internally,
        #like creating the tags if they don't exist, etc 
        AnswerAction(user=user).save(data=adata)

4) add accept (- as the asker, accept the answer-id=9 as the asker user-id=4)

cat add_accept.py

from django.core.management.base import NoArgsCommand
from django.shortcuts import get_object_or_404, render_to_response
from forum.models import *
from forum.actions import *
class Command(NoArgsCommand):
    def handle_noargs(self, **options):
        #retrieve the "asker" from the database
        # "4" for test_import_user
        user = User.objects.get(id=4)
        # locating the answer
        answer = get_object_or_404(Answer, id=9)
        # push in the acceptance
        AcceptAnswerAction(node=answer, user=user).save()

Notes:

  • These are template scripts that explain how they work. You will need to write your own custom scripts based on the same principle to perform the ETL and bulk upload.
  • These scripts will need to be put in "apps/osqa/forum/management/commands".

To run them, "python manage.py script_name_without.py>". For example:

python manage.py import_user

Hope these might help.

(ps. In the ideal world, it will be easier to get importer.py to work.)

link

answered 25 Oct '11, 12:31

openbob's gravatar image

openbob
31125
accept rate: 0%

edited 25 Oct '11, 12:40

Nice, this will definitively help me do the same thing. Thank you!

(13 Feb, 09:34) dahlo
1

Hi there. Did you discover a clever way of knowing which ID the question you want to post answers to has?

I am hoping to import questions from a ticketing system (OTRS) and will need to create original question, answers, and comments, all based on the mails in the ticketing system.

I am thinking i'll just get a list of all the questions my 'GenericCustomer' account (who will be posting all the imported questions) has posted and take the maximum question ID which should be the latest question asked (i.e. my recently imported question).

Now that you know my life's story, did you find a better way of knowing which ID imported questions get? :)

(14 Feb, 02:52) dahlo

I will first of all import all the questions as the generic user. Then I will dump out all the questions (and their ids) from the database (see SQL below). Finally, I will add the answers and comments to the questions as I now know their corresponding question ids.

/osqa/postgresql/bin/psql -d bitnami_osqa -U bitnami -q -A -c "SELECT id,author_id,title,added_at FROM forum_node WHERE node_type = 'question' ORDER BY id"

The database password is stored in /osqa/apps/osqa/settings_local.py under DATABASES -> 'PASSWORD'.

(14 Feb, 04:10) openbob

This is really weird.. I have copied your add_question.py from above, and it won't run for me.

It adds a new question, with correct title and tags, but no contents.. The command give a bunch of python messages, http://pastebin.com/KYUwvaFN , and nothing more happens. Ever happened to you?

(14 Feb, 11:53) dahlo

(i also changed user id to one i had..)

(14 Feb, 11:53) dahlo

I did get some warning messages but the script managed to add all TITLE, TEXT and TAGS. For example: /osqa/python/lib/python2.6/site-packages/markdown/init.py:114: MarkdownWarning: Failed loading extension 'auto_linker' from 'markdown.extensions.auto_linker' or 'mdx_auto_linker' warnings.warn(text, MarkdownWarning)

I am using the Bitnami stack. You need to be careful about wrapping special characters such as ", ., - etc.

(15 Feb, 04:38) openbob
showing 5 of 6 show 1 more comments

I never got the internal question adding, via AskAction etc, working so i wrote a couple of python functions that inserts the question straight into the MySQL database. I followed the work karem did, tracking SQL changes in this topic: http://meta.osqa.net/questions/10251/how-can-i-import-300k-questions

I also wrote similar functions for adding comments and answers.

Question: http://pastebin.com/eeq4hPFR

Answer: http://pastebin.com/UgY9Wvd2

Comment: http://pastebin.com/LUV08HXi

I noticed that OSQA doesn't treat \n (newline) as i had expected. This function will replace all n in a string with <p><\p> blocks.

http://pastebin.com/xTfMvTLS

link

answered 17 Feb, 07:39

dahlo's gravatar image

dahlo
313
accept rate: 0%

edited 17 Feb, 11:35

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×47
×43
×21
×19
×2

Asked: 24 Oct '11, 11:17

Seen: 518 times

Last updated: 17 Feb, 11:35

powered by OSQA