#archiveteam 2011-11-28,Mon

โ†‘back Search

Time Nickname Message
00:36 ๐Ÿ”— dnova can I begin rsyncing splinder while still downloading?
00:37 ๐Ÿ”— chronomex yes
00:37 ๐Ÿ”— dnova ok I need to get on that asap. gotta catch sketchcow for a slot?
00:37 ๐Ÿ”— chronomex indeed
00:37 ๐Ÿ”— dnova thanks
01:38 ๐Ÿ”— SketchCow BACK
01:38 ๐Ÿ”— Coderjoe dnova: use the upload script
01:39 ๐Ÿ”— dnova # (ask SketchCow for a module name)
01:39 ๐Ÿ”— dnova lol
01:40 ๐Ÿ”— Coderjoe i know. I meant when you do get a module name, use the upload script
01:40 ๐Ÿ”— dnova SketchCow: I want to start uploading my splinders
01:40 ๐Ÿ”— dnova Coderjoe: mos def
02:16 ๐Ÿ”— underscor http://i.imgur.com/1fcec.png
02:20 ๐Ÿ”— chronomex hah
02:26 ๐Ÿ”— BlueMax *facepalm*
02:28 ๐Ÿ”— Coderjoe never put dicks in your ears
02:58 ๐Ÿ”— RedType Coderjoe: you think cleaning out earwax is hard?
03:08 ๐Ÿ”— SketchCow Hello, everyone.
03:08 ๐Ÿ”— SketchCow There are two reporters, Eva Talmadge and Matt/Matthias Schwartz, trying to do a story on Archive Team.
03:08 ๐Ÿ”— SketchCow Please do not talk to them.
03:08 ๐Ÿ”— SketchCow Let's put that in the lines.
03:08 ๐Ÿ”— SketchCow --------------------------------------
03:08 ๐Ÿ”— SketchCow Hello, everyone.
03:08 ๐Ÿ”— SketchCow There are two reporters, Eva Talmadge and Matt/Matthias Schwartz, trying to do a story on Archive Team.
03:08 ๐Ÿ”— SketchCow Let's put that in the lines.
03:08 ๐Ÿ”— SketchCow Please do not talk to them.
03:08 ๐Ÿ”— SketchCow --------------------------------------
03:25 ๐Ÿ”— kennethre SketchCow: Channel Topic, perhaps?
03:28 ๐Ÿ”— SketchCow I expect some people will ignore.
03:28 ๐Ÿ”— SketchCow But I did want to say it.
03:31 ๐Ÿ”— db48x out of curiosity, what's your reasoning there?
03:51 ๐Ÿ”— SketchCow http://www.mattathiasschwartz.com/
03:51 ๐Ÿ”— SketchCow Go read the other articles
03:51 ๐Ÿ”— SketchCow tell me how we'll fare.
03:58 ๐Ÿ”— godane it looks like there is no way to simple turn wikipedia dump into a wikipedia website
03:59 ๐Ÿ”— godane is there any tools you guys use to read wiki dumps like a full index website?
04:12 ๐Ÿ”— chronomex godane: what's the goal?
04:34 ๐Ÿ”— dashcloud someone at some point in this channel asked for a copy of Coming Soon (online magazine) (www.csoon.com)- I tried use wget-warc to make a copy of it
04:38 ๐Ÿ”— Paradoks dashcloud: Any idea what it'd require to verify that you did things correctly? I'd love to help, but know very little about wget-warc.
04:41 ๐Ÿ”— dashcloud here's the command I used to grab it: http://pastebin.com/Yzzw28ep, and the site's still up, minus 10-20 pages
04:43 ๐Ÿ”— dashcloud I'm short on time right now, but I'm happy to send over my copy tomorrow
04:44 ๐Ÿ”— Paradoks Cool. I'll take a stab at it if no one else more qualified steps forward.
04:44 ๐Ÿ”— dashcloud the only other thing I think you need is to make sure all the directories mentioned in the command exist
04:45 ๐Ÿ”— dashcloud (i.e don't rely on wget to create them)
04:45 ๐Ÿ”— dashcloud good night folks!
05:19 ๐Ÿ”— godane chronomex: was trying to host a local lan version of wikipedia
05:20 ๐Ÿ”— chronomex ah, hm.
05:20 ๐Ÿ”— chronomex wow this matt guy is really artsy-fartsy with his writing
05:55 ๐Ÿ”— NotGLaDOS Can't we just throw them off instead?
05:59 ๐Ÿ”— SketchCow Today has been catch-up day.
05:59 ๐Ÿ”— NotGLaDOS Also, have you got that rsync slot set up for me?
06:01 ๐Ÿ”— SketchCow I can do that.
06:06 ๐Ÿ”— Zebranky While you're around, I'd like to throw out an "archive.org was the only source for an extremely helpful page" testimonial, as if you needed more
06:06 ๐Ÿ”— Zebranky So much good for the Internet.
06:07 ๐Ÿ”— NotGLaDOS note: this is archiveteam.org. We only have access to archive.org, we don't run it.
06:08 ๐Ÿ”— chronomex and by access we mean we have no more access than anyone else with an account
06:08 ๐Ÿ”— chronomex for the most part
06:08 ๐Ÿ”— NotGLaDOS What he said.
06:14 ๐Ÿ”— Zebranky I know. That was directed at SketchCow.
06:15 ๐Ÿ”— Zebranky Since this is a convenient way to throw quick thoughts at him
06:23 ๐Ÿ”— NotGLaDOS He's the same as what we said: only has member access.
06:39 ๐Ÿ”— Zebranky Fair enough. My understanding was that he worked a bit closer with them.
08:47 ๐Ÿ”— kin37ik so, google buzz is going down soonish i hear
08:50 ๐Ÿ”— yipdw that's the buzz
10:08 ๐Ÿ”— BlueMax lol
10:37 ๐Ÿ”— emijrp sharing info about the damaged libraries by hurricane irene throguht facebook pages (see last 3 paragraphs), looks like a long term solution hell yeah http://www.librarian.net/stax/3652/helping-libraries-damaged-by-hurricane-irene/
10:38 ๐Ÿ”— emijrp webcite allows uploading link batches http://www.webcitation.org/comb
10:38 ๐Ÿ”— emijrp it is very useful to archive tons
10:39 ๐Ÿ”— emijrp Archive-It tool from Internet Archive is not free, so, in this case, IA sucks
10:40 ๐Ÿ”— BlueMax :/
10:43 ๐Ÿ”— emijrp metadata for 15000+ knols complete
10:44 ๐Ÿ”— emijrp the channel is #klol
11:15 ๐Ÿ”— emijrp trying to archive all AT wiki using the webcite comb www.webcitation.org/comb
11:17 ๐Ÿ”— emijrp clicked the submit button, but the process is slow,... waitnig
11:19 ๐Ÿ”— db48x you put 15000 urls into www.webcitation.org/comb?
11:19 ๐Ÿ”— emijrp no
11:19 ๐Ÿ”— emijrp 15000 metadata from knols downloaded, really 20000 now
11:20 ๐Ÿ”— emijrp the webcite submit is this http://www.archiveteam.org/index.php?title=Template:Navigation_box
11:20 ๐Ÿ”— emijrp less than 100 links
11:20 ๐Ÿ”— emijrp it scrapes all the links, you checkbox the desired links and archive
11:20 ๐Ÿ”— db48x ooh
11:21 ๐Ÿ”— db48x that makes more sense
11:22 ๐Ÿ”— emijrp by the way, uploading knol links batches to webcite is a choice
11:23 ๐Ÿ”— emijrp just downloading all knols, tar gzip and upload to IA is a shit
11:23 ๐Ÿ”— emijrp most of AT projects are not viewable
11:23 ๐Ÿ”— emijrp just huge packs
11:31 ๐Ÿ”— db48x yea, given their size that's been the easiest way to go
11:43 ๐Ÿ”— underscor <NotGLaDOS> He's the same as what we said: only has member access.
11:43 ๐Ÿ”— underscor Actually, he's a full admin, iirc
12:23 ๐Ÿ”— emijrp ~25k metadata chunk for your tests http://www.sendspace.com/file/o8fthv
12:23 ๐Ÿ”— emijrp tab delimited
13:54 ๐Ÿ”— NotGLaDOS underscor: interesting.
15:06 ๐Ÿ”— SketchCow Brp
17:47 ๐Ÿ”— emijrp how many sites have closed this year?
17:48 ๐Ÿ”— Schbirid milliona
17:48 ๐Ÿ”— Schbirid s
17:51 ๐Ÿ”— tef emijrp: check out the wiki for the deathwatch pages
17:51 ๐Ÿ”— emijrp i mean, i feel that this year has been very bad
17:52 ๐Ÿ”— tef it can only get worse
18:18 ๐Ÿ”— emijrp SketchCow: why IA doesnt setup anything like this to allow people transcript books? https://es.wikisource.org/w/index.php?title=P%C3%A1gina:Plat%C3%B3n_-_La_Rep%C3%BAblica_%281805%29,_Tomo_1.djvu/322&action=edit&redlink=1 IA OCR is worst ever
18:18 ๐Ÿ”— Schbirid that would be really cool
18:20 ๐Ÿ”— ersi emijrp: that text made little sense
18:21 ๐Ÿ”— emijrp text on the left is OCR autofill, later a person rewrite needed phrases
18:23 ๐Ÿ”— emijrp a corrected page is this https://es.wikisource.org/wiki/P%C3%A1gina:Plat%C3%B3n_-_La_Rep%C3%BAblica_%281805%29,_Tomo_1.djvu/89
18:40 ๐Ÿ”— emijrp https://twitter.com/#!/brewster_kahle
18:42 ๐Ÿ”— ersi emijrp: point being? A specific tweet? click the "X hours ago" to direct link
18:42 ๐Ÿ”— emijrp no, just recommending that twitter account
18:42 ๐Ÿ”— ersi Okay
18:43 ๐Ÿ”— * Schbirid sendsd ersi into the fresh air
18:43 ๐Ÿ”— ersi Weeeee!
19:11 ๐Ÿ”— emijrp The digital materials, we can make copies of. And weรขย€ย™veรขย€ย”we have two copies within the United States, and we have a partial copy in Alexandria, Egypt, which is, I guess, fitting, as we have a large-scale swap agreement with them to archive their materials, and they archive ours. And also in Amsterdam, we have a partial copy. If there are five or six copies of these materials worldwide, I think Iรขย€ย™d feel safe.
19:11 ๐Ÿ”— emijrp http://www.democracynow.org/2011/8/24/pioneering_internet_archivists_brewster_kahle_and
19:12 ๐Ÿ”— emijrp So, ...
22:42 ๐Ÿ”— Wyatt So at what point should I just give up and kill a wget?
23:30 ๐Ÿ”— DoubleJ Wyatt: Check the files directory. If the most recent directory was recently modified it's still going.
23:31 ๐Ÿ”— DoubleJ And if there are a crapload of domains it's trying to download all of Splinder/MobileMe/whatever.

irclogger-viewer