|
I checked http://meta.osqa.net/questions/4080/how-can-i-import-data-from-another-question-and-answer-script to no avail. I imported users with the previously mentioned script but questions are taking too slow. AskAction(user=user).save(data=qdata) I can add 1 question in 1.1 second with the script. Things I tried:
Currently my best speed is ~0.25 seconds per question adding. How can I add 300K+ questions, fast, I am thinking directly to the database without python/django/osqa? Current speed of 4 questions per second limit my testing and explorations. I have messed around in the python codes but I am new to python, so I could not figure out which queries to execute. |
|
Well after digging in mysql logs, I have found that using the following steps, you can import your users, tags and questions by generating an appropriate SQL script. UsersSince there are a manageable amount of users for my site, I have used the previously mentioned AskAction(user=user).save(data=qdata) python script. TagsBefore adding questions I chose to add all of the tags belonging to all questions first. Doing tags is the easiest part. After getting your tags into a list or better a set, just use the following python script and do not worry about use counts of the tags yet. for tag in allTags:
fTags.write("INSERT INTO forum\\\\_tag(name, created\\\\_by\\\\_id, used\\\\_count) VALUES ('%s', 1, 1);\\\\n" % tag)
Note: created\\_by\\_id = 1 is for my user. You can get the used ID from auth_user table with the query: Within mysql console:
QuestionsFor the question part I will give you the order of inserts to tables and give my sample SQL script but it may not work for your input.
For each question I wrote the following to an .sql file:
After running the question script, you have to update forum\\_tag.used\\_count it will take a looong time because the way mySQL handles COUNT(*) and the way that query is. UPDATE forum_tag SET used_count = (SELECT COUNT(*) FROM forum_node_tags WHERE tag_id = forum_tag.id); If you notice anything odd please leave a comment. 1
Forgot to update last activity pointers. UPDATE forum_node SET last_activity_by_id = author_id, last_activity_at = added_at;
(12 Sep '11, 16:01)
kerem
Nice work with the SQL tracking :) I wrote python functions for adding questions, answers, and comments based on this in this thread: http://meta.osqa.net/questions/10600/how-do-i-bulk-upload-users-and-questionsanswers-data-into-osqa Question: http://pastebin.com/eeq4hPFR Answer: http://pastebin.com/UgY9Wvd2 Comment: http://pastebin.com/LUV08HXi
(17 Feb, 07:41)
dahlo
|
|
If you are trying to manage 300K+ questions in a serious endeavor, then you almost certainly want to talk with DZone about using our enterprise-scale product, "Qato" instead of our entry-level OSQA product. Send a note to sales@dzone.com describing what you want to accomplish, and one of our team members will get back in touch with you right away. 1
I am just evaluating, I would like to use the open source OSQA database for this evaluation and not qato but thanks for your recommendation.
(04 Sep '11, 08:59)
kerem
@karem, best of luck in your exploration. I'm sure you will understand that, as the creators of both OSQA and Qato, we at DZone have a good sense of what our products relative strengths and weaknesses are. For many reasons, Qato would be the right choice to operate at this scale.
(04 Sep '11, 09:10)
rickross ♦♦
|
@kerem, all meta.osqa.net postings are required to have a clear, specific question in their title field. I have edited this one for you, but please make sure you observe this rule in future postings. Thanks.