Time |
Nickname |
Message |
14:00
🔗
|
Schbirid |
does someone know what format you add dates to IA items via s3? http://archive.org/help/abouts3.txt only has an example with one year, i hate the date and time to minutes. x-archive-meta-date:2009 |
15:27
🔗
|
godane |
i'm doing a panic download of theblaze.com |
15:28
🔗
|
godane |
i know its not going anyway but i feel the need to mirror it |
15:33
🔗
|
alard |
Schbirid: I'd use "yyyy-mm-dd HH:MM:SS", as that's what they use for the publicdate/addeddate fields, e.g., http://ia601400.us.archive.org/18/items/archiveteam-mobileme-hero-2021x/archiveteam-mobileme-hero-2021x_meta.xml |
15:34
🔗
|
alard |
The s3 api probably doesn't even parse the date, just copies it to the meta.xml |
15:56
🔗
|
Schbirid |
ok, thanks |
16:55
🔗
|
Nemo_bis |
alard or someone else: can you help me with some grab debugging? |
16:55
🔗
|
Nemo_bis |
wikitravel gives me an error 403 and I don't understand why |
16:56
🔗
|
Nemo_bis |
http://p.defau.lt/?FcCw7XBpT8MTBNo4u9CRjQ |
16:57
🔗
|
Nemo_bis |
the script is supposed to set a UA and I tried to change it but it didn't help https://code.google.com/p/wikiteam/source/browse/trunk/dumpgenerator.py#268 |
16:57
🔗
|
Nemo_bis |
I also tried different IPs |
17:18
🔗
|
SketchCow |
Pip Pip tally ho |
17:18
🔗
|
SketchCow |
So, where are all you London Archive Team people |
17:18
🔗
|
balrog_ |
hi SketchCow |
17:20
🔗
|
SketchCow |
Someone's grabbing City of Heroes, right? |
17:29
🔗
|
godane |
i'm grabbing theblaze.com |
17:29
🔗
|
godane |
for a panic download |
17:29
🔗
|
balrog_ |
SketchCow: you mean the mmo world? how would be the best way to do that? :/ |
17:50
🔗
|
alard |
Nemo_bis: Sorry, I know nothing about the wikiteam scripts. |
17:50
🔗
|
alard |
And no, I don't think anyone is grabbing http://boards.cityofheroes.com/ |
17:51
🔗
|
alard |
They didn't mention it here, anyway. |
17:51
🔗
|
alard |
Should we be saving it? |
17:52
🔗
|
Nemo_bis |
alard: not even urlib? |
17:52
🔗
|
Nemo_bis |
can a 403 be about user-agent |
17:53
🔗
|
alard |
Nemo_bis: What is the URL you get the error on? Should we do this on another channel? |
17:53
🔗
|
alard |
403 can be about anything. |
17:56
🔗
|
Nemo_bis |
alard: it's about wikiteam so seems related, unless you join #wikiteam |
17:56
🔗
|
Nemo_bis |
alard: AFAICS it just fails on retrieving http://wikitravel.org/wiki/ar/index.php |
17:56
🔗
|
alard |
Yes, it's obviously an archiveteam topic, but I fear this could be a longer discussion. Don't want to post all the debugging stuff here. |
17:58
🔗
|
alard |
SketchCow: Is boards.cityofheroes.com a warrior candidate? It looks quite big. |
18:13
🔗
|
Schbirid |
alard: does not look that big really. <200k posts unless i dont see some boards. should be easy enough for one downloader once you figure out what to reject and how to start |
18:32
🔗
|
alard |
Schbirid: 4.4 million posts, it says at the bottom. |
18:33
🔗
|
alard |
That's probably more than Wget can comfortably handle, and it will take a while. |
18:34
🔗
|
Schbirid |
hm, i checked each forums page and the counts there were not that much |
18:41
🔗
|
alard |
I could help write a small Lua script that walks the Forum > Thread > Post structure. |
18:41
🔗
|
alard |
But I can't run it. |
19:04
🔗
|
Swizzle |
Anyone have the capabilities/scripts to import ~3,000 items into a archive.org collection for me? I've been toiling away at moving my content over, but doing it manually by copying fields is taking forever. 6 months of work I've moved 700 items and I have about 3,000 to go |
19:05
🔗
|
Schbirid |
Swizzle: http://archive.org/help/abouts3.txt |
19:06
🔗
|
Swizzle |
Yea - I did see that earlier, but it's well over my head |
19:06
🔗
|
Nemo_bis |
or https://wiki.archive.org/twiki/bin/view/Main/IAS3BulkUploader |
19:06
🔗
|
Nemo_bis |
Swizzle: this is easy ^ |
19:06
🔗
|
Nemo_bis |
you only have to fill the fields in a spreadsheet editor and run the command |
19:06
🔗
|
Swizzle |
the bulk uploader looks better to follow - thanks - I will check it out |
19:07
🔗
|
Nemo_bis |
it also manages failures and various errors |
19:27
🔗
|
dashcloud |
another site to add to the archive pile: touchatag.net- site goes down end of the month |
20:07
🔗
|
alard |
So, any thoughts on http://boards.cityofheroes.com/ ? |
20:08
🔗
|
alard |
I've now got a Lua script that downloads things, how shall we run it? |
20:13
🔗
|
alard |
I'd say: on the warrior, assigning blocks of thread IDs. |
20:13
🔗
|
alard |
Unless there is a volunteer who can do it all, of course. :) |
20:15
🔗
|
Aragan |
Archive Team Warrior can be used by anyone who isn't savvy on archiving methods, yeah? If you guys incorporate AT Warrior into this I can put a call out to the people on the CoH forums. |
20:17
🔗
|
alard |
Yes. Although I'd be very careful with calls-to-action, since we don't want so many volunteers that we crash the site. (Known to happen when archive team gets up to speed. We have some very competitive downloading going on.) |
20:19
🔗
|
Coderjoe |
indeed |
20:19
🔗
|
Coderjoe |
and forums are fairly lightweight things compared to other kinds of sites |
20:19
🔗
|
Coderjoe |
such as photos or video |
20:20
🔗
|
alard |
It's even giving occasional errors right now, without us downloading. (Wget says "no data received".) |
20:30
🔗
|
Aragan |
Oy @_x; |
20:31
🔗
|
Aragan |
I wonder if it's related to the "forum logout bug" people have complained about for a while, or what's been going on with the forum accounts. |
20:34
🔗
|
Aragan |
When City of Heroes went free to play last year, they incorporated "paying customer" and "free customer" distinctions into the forums as well. They reserved some of the original forum features only for paying customers (anyone with a gold name on the forums). So if, for example, you had an avatar beforehand, it'll only show up whenever you were actively subscribing to the game. |
20:35
🔗
|
Aragan |
When NCSoft pulled the plug and fired Paragon Studios a few days ago, they also terminated all the related billing stuff, and it's caused people's accounts on the forums to fluctuate between "paid" and "unpaid." |
20:38
🔗
|
godane |
trs-80.org is downloaded |
20:38
🔗
|
alard |
SketchCow: You here? |
20:39
🔗
|
alard |
Aragan: Are the City of Heroes actually going down, or is this a pre-emptive copy? |
20:40
🔗
|
Aragan |
alard: NCSoft has only announced for certain that they are shooting for November 30 as the date when City of Heroes goes dead, but the forums may go down earlier than that. |
20:40
🔗
|
Aragan |
Paragon Studios' employees have been saying as much, they don't know how long the forums will remain up. |
20:40
🔗
|
Aragan |
Or former employees I should say. |
20:40
🔗
|
alard |
Hmm. (Thinking of a good deadline date to show in the warrior.) |
20:43
🔗
|
Aragan |
Unrelated, but Titan Network's announced that they're working on an archive project in the game itself: |
20:43
🔗
|
Aragan |
http://boards.cityofheroes.com/showthread.php?t=296586 |
20:55
🔗
|
alard |
The warrior project is almost ready. |
20:55
🔗
|
Aragan |
Awesome. :D |
20:56
🔗
|
alard |
Any idea what the highest thread ID is? |
20:56
🔗
|
Aragan |
Hrm ... |
20:56
🔗
|
* |
Aragan tries looking |
20:57
🔗
|
alard |
It's probably in the RSS feed or something, if you can't find it I'll try it next. |
20:57
🔗
|
chronomex |
you can always register and make a new thread |
20:58
🔗
|
alard |
Yes. And maybe we don't want to start archiving up to the latest thread: we could do the older threads first. |
20:58
🔗
|
Aragan |
Doesn't look like there's a thread ID higher than 297000 at the most. |
20:59
🔗
|
Aragan |
One recently made thread in the City Life forum has an ID of 296610. |
21:00
🔗
|
alard |
Okay, thanks. |
21:00
🔗
|
alard |
Do you know if the counting starts at 1? |
21:01
🔗
|
Aragan |
There's no thread with the ID value of 1 if that's what you mean. |
21:03
🔗
|
alard |
Well, I meant something like "what's the id of the oldest thread". We're going to download ranges of threads, so if we know for certain that there's nothing with an ID lower than 100000 that would help. |
21:03
🔗
|
alard |
We probably don't know that, but perhaps they've had a clean up at some point. |
21:04
🔗
|
Aragan |
Ahh ... yeah, there was a board purge early on. The earliest threads I'm seeing offhand date back to 2004. |
21:07
🔗
|
Aragan |
http://boards.cityofheroes.com/showthread.php?t=111494 |
21:07
🔗
|
Aragan |
This MIGHT be one of the earliest remaining. |
21:07
🔗
|
Aragan |
Lowest ID'd thread I've spotted so far. |
21:08
🔗
|
alard |
Ah, that's great. |
21:10
🔗
|
Aragan |
Playing "Battleship" with URLs I got this as the lowest value so far: |
21:10
🔗
|
Aragan |
http://boards.cityofheroes.com/showthread.php?t=111406 |
21:10
🔗
|
Aragan |
Nothing below that. |
21:11
🔗
|
alard |
So maybe we could start with the range 111400..250000 |
21:12
🔗
|
alard |
Then do 250000..297000 when we're done with the first range. |
21:16
🔗
|
Aragan |
Yeah ... if there's anything with a lower value ID I'm not coming across it. 11400 seems like a good starting point. |
21:16
🔗
|
Aragan |
Er, 111400 |
21:36
🔗
|
Coderjoe |
ugh |
21:37
🔗
|
Coderjoe |
board purge |
21:37
🔗
|
Coderjoe |
why? |
21:37
🔗
|
Coderjoe |
let's arbitrarily wipe stuff out |
21:38
🔗
|
chronomex |
yaaay |
21:38
🔗
|
chronomex |
delete ALL the things |
21:38
🔗
|
chronomex |
I'm not sure, it seems to be traditional in webforums |
21:38
🔗
|
chronomex |
maybe it's part of the php cult |
21:38
🔗
|
chronomex |
"we get too much traffic for this crappy shared host, let's delete something and see if it goes away" |
21:39
🔗
|
Coderjoe |
(also not happy about all the games needing central servers and then seeing those servers vanish. even less happy about DLC) |
21:40
🔗
|
Coderjoe |
perhaps, with the shared host, the db files are too big and they need to be made smaller before the host runs out of space. (especially in the VPS case) |
21:41
🔗
|
chronomex |
yeah that's plausible |
21:41
🔗
|
chronomex |
still not something I can approve of |
21:42
🔗
|
godane |
Coderjoe: DLC only is good when its to fix bugs and there is a 'Game of the Year' Edition of the game 6 months later |
21:43
🔗
|
chronomex |
yes, that's when DLC is reasonable |
21:43
🔗
|
Coderjoe |
(moving to -bs) |
21:45
🔗
|
alard |
On your marks... Go! http://warriorhq.archiveteam.org/ |
21:46
🔗
|
alard |
http://tracker.archiveteam.org/cityofheroes/ |
21:46
🔗
|
chronomex |
supplementary graphs: http://zeppelin.xrtc.net/corp.xrtc.net/shilling.corp.xrtc.net/index.html#archiveteam |
21:46
🔗
|
alard |
Ah, yes. Though all graphs look very unimpressive right now. |
21:47
🔗
|
chronomex |
yes, and these ones accumulate very slowly |
21:47
🔗
|
Coderjoe |
alard: i don't understans the map. i can't seem to do anything other than look at the pretty map |
21:47
🔗
|
alard |
Coderjoe: No, that's true. You can see where the warriors are, that's all for now. |
21:48
🔗
|
alard |
What would you like to see? |
21:48
🔗
|
Coderjoe |
i was expecting being able to click the project icon and go to the tracker page or a wiki page or something |
21:48
🔗
|
alard |
That's a good idea (and very easy, to). I'll make a note. |
21:48
🔗
|
alard |
*too |
22:07
🔗
|
Aragan |
alard: Would it be alright if I told people about the archive efforts over at the CoH forums? That way, if there's anything they think needs archiving beyond the forums themselves, I can pass it on to you guys. |
22:07
🔗
|
Aragan |
Or if you need more warriors for the job, etc. |
22:08
🔗
|
* |
Aragan was thinking about linking the tracker and the Archive Team Warrior program link |
22:10
🔗
|
alard |
Aragan: That's always nice, of course. |
22:10
🔗
|
Aragan |
Awesome. |
22:11
🔗
|
alard |
I've limited the number of IDs the tracker gives out to 3 per minute, so I hope we won't cause too many problems. |
22:11
🔗
|
alard |
Aragan: On the other hand, archiving is sometimes best done quietly. |
22:12
🔗
|
alard |
Are there board owners reading those posts who might not like to be copied? |
22:17
🔗
|
S[h]O[r]T |
alard any non warrior instructions? or no point since ids is so small/rate limiting |
22:26
🔗
|
Aragan |
alard: Well, all I know is what's public, and the only board owners are NCSoft/Paragon Studios themselves. |
22:26
🔗
|
Aragan |
No real private boards/forums there. |
22:34
🔗
|
alard |
S[h]O[r]T: They're the same as with Cinch.fm, make sure wget-lua works and run run-pipeline pipeline.py YOURNAME |
22:35
🔗
|
alard |
But if you fancy a bit of diy, perhaps you could download the non-thread pages. |
22:35
🔗
|
alard |
The warrior is not getting the forum index pages, just the individual threads. |
22:37
🔗
|
alard |
Aragan: Okay. Do what you think best. If you think there'll be an uproar it's probably better to wait until we're done. If you think people will like it, tell them now and ask them to join. |
22:46
🔗
|
alard |
http://boards.cityofheroes.com/showpost.php?p=4369390&postcount=7 |
22:51
🔗
|
Aragan |
Alright. |
22:51
🔗
|
Aragan |
I'm thiking people will be mostly supportive of this effort, since the City of Heroes forums has a lot of fan works posted here. Things like character backstories, fan fiction, etc. |
23:44
🔗
|
Nintendud |
oh cool. another thing to watch my warrior do. |
23:50
🔗
|
SketchCow |
Bing hello |
23:51
🔗
|
Nintendud |
hello. |
23:56
🔗
|
Aragan |
Yo, SketchCow. |
23:56
🔗
|
Aragan |
There's a player vigil on City of Heroes taking place right now; I'm gonna be joining it, though I'm probably going to get BSoD'd or have a system lockup (video card problems since June 5th). I'll be back later when it's over. |