Time |
Nickname |
Message |
00:36
๐
|
dnova |
can I begin rsyncing splinder while still downloading? |
00:37
๐
|
chronomex |
yes |
00:37
๐
|
dnova |
ok I need to get on that asap. gotta catch sketchcow for a slot? |
00:37
๐
|
chronomex |
indeed |
00:37
๐
|
dnova |
thanks |
01:38
๐
|
SketchCow |
BACK |
01:38
๐
|
Coderjoe |
dnova: use the upload script |
01:39
๐
|
dnova |
# (ask SketchCow for a module name) |
01:39
๐
|
dnova |
lol |
01:40
๐
|
Coderjoe |
i know. I meant when you do get a module name, use the upload script |
01:40
๐
|
dnova |
SketchCow: I want to start uploading my splinders |
01:40
๐
|
dnova |
Coderjoe: mos def |
02:16
๐
|
underscor |
http://i.imgur.com/1fcec.png |
02:20
๐
|
chronomex |
hah |
02:26
๐
|
BlueMax |
*facepalm* |
02:28
๐
|
Coderjoe |
never put dicks in your ears |
02:58
๐
|
RedType |
Coderjoe: you think cleaning out earwax is hard? |
03:08
๐
|
SketchCow |
Hello, everyone. |
03:08
๐
|
SketchCow |
There are two reporters, Eva Talmadge and Matt/Matthias Schwartz, trying to do a story on Archive Team. |
03:08
๐
|
SketchCow |
Please do not talk to them. |
03:08
๐
|
SketchCow |
Let's put that in the lines. |
03:08
๐
|
SketchCow |
-------------------------------------- |
03:08
๐
|
SketchCow |
Hello, everyone. |
03:08
๐
|
SketchCow |
There are two reporters, Eva Talmadge and Matt/Matthias Schwartz, trying to do a story on Archive Team. |
03:08
๐
|
SketchCow |
Let's put that in the lines. |
03:08
๐
|
SketchCow |
Please do not talk to them. |
03:08
๐
|
SketchCow |
-------------------------------------- |
03:25
๐
|
kennethre |
SketchCow: Channel Topic, perhaps? |
03:28
๐
|
SketchCow |
I expect some people will ignore. |
03:28
๐
|
SketchCow |
But I did want to say it. |
03:31
๐
|
db48x |
out of curiosity, what's your reasoning there? |
03:51
๐
|
SketchCow |
http://www.mattathiasschwartz.com/ |
03:51
๐
|
SketchCow |
Go read the other articles |
03:51
๐
|
SketchCow |
tell me how we'll fare. |
03:58
๐
|
godane |
it looks like there is no way to simple turn wikipedia dump into a wikipedia website |
03:59
๐
|
godane |
is there any tools you guys use to read wiki dumps like a full index website? |
04:12
๐
|
chronomex |
godane: what's the goal? |
04:34
๐
|
dashcloud |
someone at some point in this channel asked for a copy of Coming Soon (online magazine) (www.csoon.com)- I tried use wget-warc to make a copy of it |
04:38
๐
|
Paradoks |
dashcloud: Any idea what it'd require to verify that you did things correctly? I'd love to help, but know very little about wget-warc. |
04:41
๐
|
dashcloud |
here's the command I used to grab it: http://pastebin.com/Yzzw28ep, and the site's still up, minus 10-20 pages |
04:43
๐
|
dashcloud |
I'm short on time right now, but I'm happy to send over my copy tomorrow |
04:44
๐
|
Paradoks |
Cool. I'll take a stab at it if no one else more qualified steps forward. |
04:44
๐
|
dashcloud |
the only other thing I think you need is to make sure all the directories mentioned in the command exist |
04:45
๐
|
dashcloud |
(i.e don't rely on wget to create them) |
04:45
๐
|
dashcloud |
good night folks! |
05:19
๐
|
godane |
chronomex: was trying to host a local lan version of wikipedia |
05:20
๐
|
chronomex |
ah, hm. |
05:20
๐
|
chronomex |
wow this matt guy is really artsy-fartsy with his writing |
05:55
๐
|
NotGLaDOS |
Can't we just throw them off instead? |
05:59
๐
|
SketchCow |
Today has been catch-up day. |
05:59
๐
|
NotGLaDOS |
Also, have you got that rsync slot set up for me? |
06:01
๐
|
SketchCow |
I can do that. |
06:06
๐
|
Zebranky |
While you're around, I'd like to throw out an "archive.org was the only source for an extremely helpful page" testimonial, as if you needed more |
06:06
๐
|
Zebranky |
So much good for the Internet. |
06:07
๐
|
NotGLaDOS |
note: this is archiveteam.org. We only have access to archive.org, we don't run it. |
06:08
๐
|
chronomex |
and by access we mean we have no more access than anyone else with an account |
06:08
๐
|
chronomex |
for the most part |
06:08
๐
|
NotGLaDOS |
What he said. |
06:14
๐
|
Zebranky |
I know. That was directed at SketchCow. |
06:15
๐
|
Zebranky |
Since this is a convenient way to throw quick thoughts at him |
06:23
๐
|
NotGLaDOS |
He's the same as what we said: only has member access. |
06:39
๐
|
Zebranky |
Fair enough. My understanding was that he worked a bit closer with them. |
08:47
๐
|
kin37ik |
so, google buzz is going down soonish i hear |
08:50
๐
|
yipdw |
that's the buzz |
10:08
๐
|
BlueMax |
lol |
10:37
๐
|
emijrp |
sharing info about the damaged libraries by hurricane irene throguht facebook pages (see last 3 paragraphs), looks like a long term solution hell yeah http://www.librarian.net/stax/3652/helping-libraries-damaged-by-hurricane-irene/ |
10:38
๐
|
emijrp |
webcite allows uploading link batches http://www.webcitation.org/comb |
10:38
๐
|
emijrp |
it is very useful to archive tons |
10:39
๐
|
emijrp |
Archive-It tool from Internet Archive is not free, so, in this case, IA sucks |
10:40
๐
|
BlueMax |
:/ |
10:43
๐
|
emijrp |
metadata for 15000+ knols complete |
10:44
๐
|
emijrp |
the channel is #klol |
11:15
๐
|
emijrp |
trying to archive all AT wiki using the webcite comb www.webcitation.org/comb |
11:17
๐
|
emijrp |
clicked the submit button, but the process is slow,... waitnig |
11:19
๐
|
db48x |
you put 15000 urls into www.webcitation.org/comb? |
11:19
๐
|
emijrp |
no |
11:19
๐
|
emijrp |
15000 metadata from knols downloaded, really 20000 now |
11:20
๐
|
emijrp |
the webcite submit is this http://www.archiveteam.org/index.php?title=Template:Navigation_box |
11:20
๐
|
emijrp |
less than 100 links |
11:20
๐
|
emijrp |
it scrapes all the links, you checkbox the desired links and archive |
11:20
๐
|
db48x |
ooh |
11:21
๐
|
db48x |
that makes more sense |
11:22
๐
|
emijrp |
by the way, uploading knol links batches to webcite is a choice |
11:23
๐
|
emijrp |
just downloading all knols, tar gzip and upload to IA is a shit |
11:23
๐
|
emijrp |
most of AT projects are not viewable |
11:23
๐
|
emijrp |
just huge packs |
11:31
๐
|
db48x |
yea, given their size that's been the easiest way to go |
11:43
๐
|
underscor |
<NotGLaDOS> He's the same as what we said: only has member access. |
11:43
๐
|
underscor |
Actually, he's a full admin, iirc |
12:23
๐
|
emijrp |
~25k metadata chunk for your tests http://www.sendspace.com/file/o8fthv |
12:23
๐
|
emijrp |
tab delimited |
13:54
๐
|
NotGLaDOS |
underscor: interesting. |
15:06
๐
|
SketchCow |
Brp |
17:47
๐
|
emijrp |
how many sites have closed this year? |
17:48
๐
|
Schbirid |
milliona |
17:48
๐
|
Schbirid |
s |
17:51
๐
|
tef |
emijrp: check out the wiki for the deathwatch pages |
17:51
๐
|
emijrp |
i mean, i feel that this year has been very bad |
17:52
๐
|
tef |
it can only get worse |
18:18
๐
|
emijrp |
SketchCow: why IA doesnt setup anything like this to allow people transcript books? https://es.wikisource.org/w/index.php?title=P%C3%A1gina:Plat%C3%B3n_-_La_Rep%C3%BAblica_%281805%29,_Tomo_1.djvu/322&action=edit&redlink=1 IA OCR is worst ever |
18:18
๐
|
Schbirid |
that would be really cool |
18:20
๐
|
ersi |
emijrp: that text made little sense |
18:21
๐
|
emijrp |
text on the left is OCR autofill, later a person rewrite needed phrases |
18:23
๐
|
emijrp |
a corrected page is this https://es.wikisource.org/wiki/P%C3%A1gina:Plat%C3%B3n_-_La_Rep%C3%BAblica_%281805%29,_Tomo_1.djvu/89 |
18:40
๐
|
emijrp |
https://twitter.com/#!/brewster_kahle |
18:42
๐
|
ersi |
emijrp: point being? A specific tweet? click the "X hours ago" to direct link |
18:42
๐
|
emijrp |
no, just recommending that twitter account |
18:42
๐
|
ersi |
Okay |
18:43
๐
|
* |
Schbirid sendsd ersi into the fresh air |
18:43
๐
|
ersi |
Weeeee! |
19:11
๐
|
emijrp |
The digital materials, we can make copies of. And weรขยยveรขยยwe have two copies within the United States, and we have a partial copy in Alexandria, Egypt, which is, I guess, fitting, as we have a large-scale swap agreement with them to archive their materials, and they archive ours. And also in Amsterdam, we have a partial copy. If there are five or six copies of these materials worldwide, I think Iรขยยd feel safe. |
19:11
๐
|
emijrp |
http://www.democracynow.org/2011/8/24/pioneering_internet_archivists_brewster_kahle_and |
19:12
๐
|
emijrp |
So, ... |
22:42
๐
|
Wyatt |
So at what point should I just give up and kill a wget? |
23:30
๐
|
DoubleJ |
Wyatt: Check the files directory. If the most recent directory was recently modified it's still going. |
23:31
๐
|
DoubleJ |
And if there are a crapload of domains it's trying to download all of Splinder/MobileMe/whatever. |