#archiveteam 2012-09-03,Mon

↑back Search

Time Nickname Message
14:00 🔗 Schbirid does someone know what format you add dates to IA items via s3? http://archive.org/help/abouts3.txt only has an example with one year, i hate the date and time to minutes. x-archive-meta-date:2009
15:27 🔗 godane i'm doing a panic download of theblaze.com
15:28 🔗 godane i know its not going anyway but i feel the need to mirror it
15:33 🔗 alard Schbirid: I'd use "yyyy-mm-dd HH:MM:SS", as that's what they use for the publicdate/addeddate fields, e.g., http://ia601400.us.archive.org/18/items/archiveteam-mobileme-hero-2021x/archiveteam-mobileme-hero-2021x_meta.xml
15:34 🔗 alard The s3 api probably doesn't even parse the date, just copies it to the meta.xml
15:56 🔗 Schbirid ok, thanks
16:55 🔗 Nemo_bis alard or someone else: can you help me with some grab debugging?
16:55 🔗 Nemo_bis wikitravel gives me an error 403 and I don't understand why
16:56 🔗 Nemo_bis http://p.defau.lt/?FcCw7XBpT8MTBNo4u9CRjQ
16:57 🔗 Nemo_bis the script is supposed to set a UA and I tried to change it but it didn't help https://code.google.com/p/wikiteam/source/browse/trunk/dumpgenerator.py#268
16:57 🔗 Nemo_bis I also tried different IPs
17:18 🔗 SketchCow Pip Pip tally ho
17:18 🔗 SketchCow So, where are all you London Archive Team people
17:18 🔗 balrog_ hi SketchCow
17:20 🔗 SketchCow Someone's grabbing City of Heroes, right?
17:29 🔗 godane i'm grabbing theblaze.com
17:29 🔗 godane for a panic download
17:29 🔗 balrog_ SketchCow: you mean the mmo world? how would be the best way to do that? :/
17:50 🔗 alard Nemo_bis: Sorry, I know nothing about the wikiteam scripts.
17:50 🔗 alard And no, I don't think anyone is grabbing http://boards.cityofheroes.com/
17:51 🔗 alard They didn't mention it here, anyway.
17:51 🔗 alard Should we be saving it?
17:52 🔗 Nemo_bis alard: not even urlib?
17:52 🔗 Nemo_bis can a 403 be about user-agent
17:53 🔗 alard Nemo_bis: What is the URL you get the error on? Should we do this on another channel?
17:53 🔗 alard 403 can be about anything.
17:56 🔗 Nemo_bis alard: it's about wikiteam so seems related, unless you join #wikiteam
17:56 🔗 Nemo_bis alard: AFAICS it just fails on retrieving http://wikitravel.org/wiki/ar/index.php
17:56 🔗 alard Yes, it's obviously an archiveteam topic, but I fear this could be a longer discussion. Don't want to post all the debugging stuff here.
17:58 🔗 alard SketchCow: Is boards.cityofheroes.com a warrior candidate? It looks quite big.
18:13 🔗 Schbirid alard: does not look that big really. <200k posts unless i dont see some boards. should be easy enough for one downloader once you figure out what to reject and how to start
18:32 🔗 alard Schbirid: 4.4 million posts, it says at the bottom.
18:33 🔗 alard That's probably more than Wget can comfortably handle, and it will take a while.
18:34 🔗 Schbirid hm, i checked each forums page and the counts there were not that much
18:41 🔗 alard I could help write a small Lua script that walks the Forum > Thread > Post structure.
18:41 🔗 alard But I can't run it.
19:04 🔗 Swizzle Anyone have the capabilities/scripts to import ~3,000 items into a archive.org collection for me? I've been toiling away at moving my content over, but doing it manually by copying fields is taking forever. 6 months of work I've moved 700 items and I have about 3,000 to go
19:05 🔗 Schbirid Swizzle: http://archive.org/help/abouts3.txt
19:06 🔗 Swizzle Yea - I did see that earlier, but it's well over my head
19:06 🔗 Nemo_bis or https://wiki.archive.org/twiki/bin/view/Main/IAS3BulkUploader
19:06 🔗 Nemo_bis Swizzle: this is easy ^
19:06 🔗 Nemo_bis you only have to fill the fields in a spreadsheet editor and run the command
19:06 🔗 Swizzle the bulk uploader looks better to follow - thanks - I will check it out
19:07 🔗 Nemo_bis it also manages failures and various errors
19:27 🔗 dashcloud another site to add to the archive pile: touchatag.net- site goes down end of the month
20:07 🔗 alard So, any thoughts on http://boards.cityofheroes.com/ ?
20:08 🔗 alard I've now got a Lua script that downloads things, how shall we run it?
20:13 🔗 alard I'd say: on the warrior, assigning blocks of thread IDs.
20:13 🔗 alard Unless there is a volunteer who can do it all, of course. :)
20:15 🔗 Aragan Archive Team Warrior can be used by anyone who isn't savvy on archiving methods, yeah? If you guys incorporate AT Warrior into this I can put a call out to the people on the CoH forums.
20:17 🔗 alard Yes. Although I'd be very careful with calls-to-action, since we don't want so many volunteers that we crash the site. (Known to happen when archive team gets up to speed. We have some very competitive downloading going on.)
20:19 🔗 Coderjoe indeed
20:19 🔗 Coderjoe and forums are fairly lightweight things compared to other kinds of sites
20:19 🔗 Coderjoe such as photos or video
20:20 🔗 alard It's even giving occasional errors right now, without us downloading. (Wget says "no data received".)
20:30 🔗 Aragan Oy @_x;
20:31 🔗 Aragan I wonder if it's related to the "forum logout bug" people have complained about for a while, or what's been going on with the forum accounts.
20:34 🔗 Aragan When City of Heroes went free to play last year, they incorporated "paying customer" and "free customer" distinctions into the forums as well. They reserved some of the original forum features only for paying customers (anyone with a gold name on the forums). So if, for example, you had an avatar beforehand, it'll only show up whenever you were actively subscribing to the game.
20:35 🔗 Aragan When NCSoft pulled the plug and fired Paragon Studios a few days ago, they also terminated all the related billing stuff, and it's caused people's accounts on the forums to fluctuate between "paid" and "unpaid."
20:38 🔗 godane trs-80.org is downloaded
20:38 🔗 alard SketchCow: You here?
20:39 🔗 alard Aragan: Are the City of Heroes actually going down, or is this a pre-emptive copy?
20:40 🔗 Aragan alard: NCSoft has only announced for certain that they are shooting for November 30 as the date when City of Heroes goes dead, but the forums may go down earlier than that.
20:40 🔗 Aragan Paragon Studios' employees have been saying as much, they don't know how long the forums will remain up.
20:40 🔗 Aragan Or former employees I should say.
20:40 🔗 alard Hmm. (Thinking of a good deadline date to show in the warrior.)
20:43 🔗 Aragan Unrelated, but Titan Network's announced that they're working on an archive project in the game itself:
20:43 🔗 Aragan http://boards.cityofheroes.com/showthread.php?t=296586
20:55 🔗 alard The warrior project is almost ready.
20:55 🔗 Aragan Awesome. :D
20:56 🔗 alard Any idea what the highest thread ID is?
20:56 🔗 Aragan Hrm ...
20:56 🔗 * Aragan tries looking
20:57 🔗 alard It's probably in the RSS feed or something, if you can't find it I'll try it next.
20:57 🔗 chronomex you can always register and make a new thread
20:58 🔗 alard Yes. And maybe we don't want to start archiving up to the latest thread: we could do the older threads first.
20:58 🔗 Aragan Doesn't look like there's a thread ID higher than 297000 at the most.
20:59 🔗 Aragan One recently made thread in the City Life forum has an ID of 296610.
21:00 🔗 alard Okay, thanks.
21:00 🔗 alard Do you know if the counting starts at 1?
21:01 🔗 Aragan There's no thread with the ID value of 1 if that's what you mean.
21:03 🔗 alard Well, I meant something like "what's the id of the oldest thread". We're going to download ranges of threads, so if we know for certain that there's nothing with an ID lower than 100000 that would help.
21:03 🔗 alard We probably don't know that, but perhaps they've had a clean up at some point.
21:04 🔗 Aragan Ahh ... yeah, there was a board purge early on. The earliest threads I'm seeing offhand date back to 2004.
21:07 🔗 Aragan http://boards.cityofheroes.com/showthread.php?t=111494
21:07 🔗 Aragan This MIGHT be one of the earliest remaining.
21:07 🔗 Aragan Lowest ID'd thread I've spotted so far.
21:08 🔗 alard Ah, that's great.
21:10 🔗 Aragan Playing "Battleship" with URLs I got this as the lowest value so far:
21:10 🔗 Aragan http://boards.cityofheroes.com/showthread.php?t=111406
21:10 🔗 Aragan Nothing below that.
21:11 🔗 alard So maybe we could start with the range 111400..250000
21:12 🔗 alard Then do 250000..297000 when we're done with the first range.
21:16 🔗 Aragan Yeah ... if there's anything with a lower value ID I'm not coming across it. 11400 seems like a good starting point.
21:16 🔗 Aragan Er, 111400
21:36 🔗 Coderjoe ugh
21:37 🔗 Coderjoe board purge
21:37 🔗 Coderjoe why?
21:37 🔗 Coderjoe let's arbitrarily wipe stuff out
21:38 🔗 chronomex yaaay
21:38 🔗 chronomex delete ALL the things
21:38 🔗 chronomex I'm not sure, it seems to be traditional in webforums
21:38 🔗 chronomex maybe it's part of the php cult
21:38 🔗 chronomex "we get too much traffic for this crappy shared host, let's delete something and see if it goes away"
21:39 🔗 Coderjoe (also not happy about all the games needing central servers and then seeing those servers vanish. even less happy about DLC)
21:40 🔗 Coderjoe perhaps, with the shared host, the db files are too big and they need to be made smaller before the host runs out of space. (especially in the VPS case)
21:41 🔗 chronomex yeah that's plausible
21:41 🔗 chronomex still not something I can approve of
21:42 🔗 godane Coderjoe: DLC only is good when its to fix bugs and there is a 'Game of the Year' Edition of the game 6 months later
21:43 🔗 chronomex yes, that's when DLC is reasonable
21:43 🔗 Coderjoe (moving to -bs)
21:45 🔗 alard On your marks... Go! http://warriorhq.archiveteam.org/
21:46 🔗 alard http://tracker.archiveteam.org/cityofheroes/
21:46 🔗 chronomex supplementary graphs: http://zeppelin.xrtc.net/corp.xrtc.net/shilling.corp.xrtc.net/index.html#archiveteam
21:46 🔗 alard Ah, yes. Though all graphs look very unimpressive right now.
21:47 🔗 chronomex yes, and these ones accumulate very slowly
21:47 🔗 Coderjoe alard: i don't understans the map. i can't seem to do anything other than look at the pretty map
21:47 🔗 alard Coderjoe: No, that's true. You can see where the warriors are, that's all for now.
21:48 🔗 alard What would you like to see?
21:48 🔗 Coderjoe i was expecting being able to click the project icon and go to the tracker page or a wiki page or something
21:48 🔗 alard That's a good idea (and very easy, to). I'll make a note.
21:48 🔗 alard *too
22:07 🔗 Aragan alard: Would it be alright if I told people about the archive efforts over at the CoH forums? That way, if there's anything they think needs archiving beyond the forums themselves, I can pass it on to you guys.
22:07 🔗 Aragan Or if you need more warriors for the job, etc.
22:08 🔗 * Aragan was thinking about linking the tracker and the Archive Team Warrior program link
22:10 🔗 alard Aragan: That's always nice, of course.
22:10 🔗 Aragan Awesome.
22:11 🔗 alard I've limited the number of IDs the tracker gives out to 3 per minute, so I hope we won't cause too many problems.
22:11 🔗 alard Aragan: On the other hand, archiving is sometimes best done quietly.
22:12 🔗 alard Are there board owners reading those posts who might not like to be copied?
22:17 🔗 S[h]O[r]T alard any non warrior instructions? or no point since ids is so small/rate limiting
22:26 🔗 Aragan alard: Well, all I know is what's public, and the only board owners are NCSoft/Paragon Studios themselves.
22:26 🔗 Aragan No real private boards/forums there.
22:34 🔗 alard S[h]O[r]T: They're the same as with Cinch.fm, make sure wget-lua works and run run-pipeline pipeline.py YOURNAME
22:35 🔗 alard But if you fancy a bit of diy, perhaps you could download the non-thread pages.
22:35 🔗 alard The warrior is not getting the forum index pages, just the individual threads.
22:37 🔗 alard Aragan: Okay. Do what you think best. If you think there'll be an uproar it's probably better to wait until we're done. If you think people will like it, tell them now and ask them to join.
22:46 🔗 alard http://boards.cityofheroes.com/showpost.php?p=4369390&postcount=7
22:51 🔗 Aragan Alright.
22:51 🔗 Aragan I'm thiking people will be mostly supportive of this effort, since the City of Heroes forums has a lot of fan works posted here. Things like character backstories, fan fiction, etc.
23:44 🔗 Nintendud oh cool. another thing to watch my warrior do.
23:50 🔗 SketchCow Bing hello
23:51 🔗 Nintendud hello.
23:56 🔗 Aragan Yo, SketchCow.
23:56 🔗 Aragan There's a player vigil on City of Heroes taking place right now; I'm gonna be joining it, though I'm probably going to get BSoD'd or have a system lockup (video card problems since June 5th). I'll be back later when it's over.

irclogger-viewer