Time |
Nickname |
Message |
00:10
🔗
|
balrog |
xmc: :( |
00:10
🔗
|
balrog |
where was this? |
00:10
🔗
|
balrog |
ahh ... http://en.wikipedia.org/wiki/University_of_Washington_firebombing_incident |
00:11
🔗
|
xmc |
bingo |
00:23
🔗
|
balrog |
yeah what.cd has been under ntp and traditional DDoS since early january. |
00:24
🔗
|
BiggieJon |
ouch |
00:28
🔗
|
dashcloud |
here's an article from Brian Krebs on the NTP attacks: http://krebsonsecurity.com/2014/02/the-new-normal-200-400-gbps-ddos-attacks/ (he got hit with 200 Gbps or more) |
00:31
🔗
|
balrog |
oh yeah |
00:31
🔗
|
balrog |
if you're on iOS, update now |
00:52
🔗
|
DFJustin |
ouch, was gonna ask what could impact ia with a 20gbit pipe |
01:03
🔗
|
ivan` |
if you like f.lux on your iOS don't update ;) |
01:03
🔗
|
ivan` |
or not. huh. http://appadvice.com/appnn/2014/02/apples-ios-7-0-6-breaks-evasi0n7-but-it-can-easily-be-fixed |
01:04
🔗
|
balrog |
ivan`: evasi0n has been updated. |
01:05
🔗
|
ivan` |
yeah, incredible that Apple has let the jb ecosystem be happy for over 2 months already |
01:38
🔗
|
namespace |
So on the subject of cool obscurish websites, has anybody ever grabbed this? |
01:38
🔗
|
namespace |
http://www.wideweb.com/phonetrips/ |
01:38
🔗
|
dashcloud |
grabbing it now! |
01:39
🔗
|
namespace |
dashcloud: Make sure to get the groupbell ftp server, that's where the really cool stuff is. |
01:39
🔗
|
namespace |
Those realaudio files don't work anymore. |
01:41
🔗
|
namespace |
ftp://ftp.wideweb.com/GroupBell/ |
01:42
🔗
|
namespace |
(In case you're wondering what they are, old phreaker recordings. Pretty cool to listen to. Also includes my favorite hacker story where Evan Doorbell narrates how he became a phone phreak.) |
01:43
🔗
|
namespace |
dashcloud: So what tools do you use for this? Is there a wget script you have lying around? |
01:44
🔗
|
dashcloud |
there is a wget command, but for smaller things, there's ArchiveBot |
01:44
🔗
|
dashcloud |
which grabs sites in the same way every time, and packages them up for upload to IA, and then put in a collection |
01:44
🔗
|
dashcloud |
eventually the pages make their way into the Wayback Machine |
01:44
🔗
|
namespace |
Ah. |
01:45
🔗
|
namespace |
Does it do ftp? |
01:45
🔗
|
dashcloud |
no- but there's a standard command set for FTP |
01:45
🔗
|
dashcloud |
http://archiveteam.org/index.php?title=FTP |
01:45
🔗
|
namespace |
Yeah, then this probably hasn't been grabbed yet, I searched evan doorbell on archive.org and got zip. Any other ways to figure out if somebodies gotten something yet? |
01:46
🔗
|
dashcloud |
not really- if you know the site address, you can obviously check wayback, but other than that, no |
01:47
🔗
|
DFJustin |
pretty much all our grabs our going into wayback with a small delay so that's gonna be your best bet |
01:47
🔗
|
DFJustin |
*are going |
01:47
🔗
|
DFJustin |
you can also search for the domain in the archive.org collections |
01:48
🔗
|
dashcloud |
I'm not entirely sure what command archivebot is using, but here's ones we've used in the past to grab sites for inclusion into the wayback machine: http://www.archiveteam.org/index.php?title=Wget |
01:48
🔗
|
BiggieJon |
.ram file ? vlc plays them fine |
01:48
🔗
|
namespace |
BiggieJon: Yeah but they're streams. |
01:48
🔗
|
namespace |
Not the actual audio. |
01:48
🔗
|
namespace |
And the streaming server seems to have died last time I checked. |
01:52
🔗
|
namespace |
Interesting, I seem to have found another source for these tapes. |
01:52
🔗
|
namespace |
http://www.evan-doorbell.com/ |
01:55
🔗
|
namespace |
Let's say I wanted to grab the stuff on that site, where would I start? |
01:56
🔗
|
* |
namespace is reading the linked page on wget now |
02:01
🔗
|
namespace |
So the wget man page says that -c isn't required to have the current instance of wget try to redownload a file if it stops downloading. |
02:01
🔗
|
namespace |
But the wiki claims otherwise. Should an edit be made? |
02:05
🔗
|
DFJustin |
you can just have archivebot grab them |
02:05
🔗
|
DFJustin |
come to #archivebot |
02:06
🔗
|
godane |
i just did that |
02:06
🔗
|
dashcloud |
not sure, but do feel free to edit the wiki |
02:07
🔗
|
namespace |
I thought you said archivebot doesn't do ftp though. |
02:07
🔗
|
dashcloud |
archivebot doesn't- there's separate wget commands for that |
02:08
🔗
|
godane |
the files look to be on the website |
02:08
🔗
|
godane |
not a ftp |
02:08
🔗
|
namespace |
godane: So archivebot will grab files even if they're like 60MB each off a web server? |
02:08
🔗
|
godane |
yes |
02:09
🔗
|
namespace |
godane: And you already started the process for the website I linked? |
02:09
🔗
|
namespace |
(So I shouldn't go start it again...) |
02:09
🔗
|
DFJustin |
it's in the queue |
02:09
🔗
|
namespace |
Okay. |
02:10
🔗
|
namespace |
How do I add a website to the queue? |
02:10
🔗
|
DFJustin |
read the topic |
02:10
🔗
|
dashcloud |
join #archivebot first |
02:10
🔗
|
namespace |
I did. |
02:10
🔗
|
namespace |
" This dashboard is a Javascript monstrosity, yo." Nice. |
02:15
🔗
|
namespace |
So I'm looking at an example of a similar website on archive.org (Ravearchive.com) and it doesn't seem to include the tapes, just the skeleton of the website. Of course ravearchive value is in the tapes. |
02:15
🔗
|
dashcloud |
namespace: I've checked the FTP you mentioned, and it's still up- maybe 4-5 GB space needed to download it all. Here's the instructions on downloading an FTP site: http://archiveteam.org/index.php?title=FTP |
02:19
🔗
|
namespace |
dashcloud: Which one? Wideweb or evan-doorbell? |
02:19
🔗
|
dashcloud |
wideweb |
02:21
🔗
|
namespace |
I'm not sure I understand the two lines about tar'ing the files after wgetting them. |
02:24
🔗
|
namespace |
Okay okay, that third line is making a list of files included. |
02:25
🔗
|
namespace |
And the second line is making a tar file, but I'm not sure I understand what of. It uses the ftp link of the website, but that should already be handled by wget, unless of course the link is really a directory named after the website. |
02:27
🔗
|
DFJustin |
yeah wget creates a directory with the domain name |
02:27
🔗
|
namespace |
DFJustin: Got it. I'd start grabbing but I think dashcloud already said he was. |
02:28
🔗
|
dashcloud |
if you do that, make sure to check the file listing in case you need to change/remove the user/group part of the file |
02:28
🔗
|
dashcloud |
I haven't started the FTP site |
02:28
🔗
|
namespace |
dashcloud: Cool, I got this then. |
03:12
🔗
|
namespace |
https://news.ycombinator.com/item?id=7284541 |
03:12
🔗
|
namespace |
Interesting. |
03:12
🔗
|
namespace |
HN is hard to grab because of the Arc servers in-built limitations on page loading. Do we already have a copy anyway? |
03:16
🔗
|
namespace |
Also I'm finished with the wideweb grab, what does Jason Scott mean by adding metadata? |
04:16
🔗
|
namespace |
Like when I go to upload to archive.org (which I'm basically ready to do once I've got this sorted out) how should the files be arranged? One zip file containing the file list and the tar, or should those be uploaded as two seperate files? Is there anything I should add to the tar to describe the files? Or do I add that info to the archive.org website after it's uploaded? |
04:21
🔗
|
chfoo |
namespace: i think the file list file and tar file should be separate files but uploaded as under one item. info about the item should be entered as the item description. |
04:22
🔗
|
namespace |
chfoo: Thanks. |
04:22
🔗
|
namespace |
Doing it now. |
04:26
🔗
|
SketchCow |
s3 seems back. |
04:28
🔗
|
godane |
i hope so |
04:28
🔗
|
godane |
cause i'm uploading items for cnnmoney archive |
04:29
🔗
|
namespace |
SketchCow: Hey, how do I get involved in the bitsavers/etc metadata project? |
04:29
🔗
|
namespace |
I have experience in this area, sort of. |
04:29
🔗
|
godane |
but looks like it since my new items are appearing there |
04:35
🔗
|
SketchCow |
Well, bitsavers does itself except the descriptions. |
04:36
🔗
|
namespace |
SketchCow: Yes, I meant the entire metadata thing you were talking about being a bottleneck back in 2011, and more recently. |
04:36
🔗
|
namespace |
You said you had all this stuff with no metadata, and needed people to help tag it. |
04:36
🔗
|
SketchCow |
Do you mean you specifically want to do metadata for bitsavers or general metadata in a variety of collections I have? |
04:37
🔗
|
SketchCow |
Just understanding your interests. |
04:37
🔗
|
namespace |
SketchCow: General Metadata. |
04:37
🔗
|
SketchCow |
It's lonely work. |
04:37
🔗
|
namespace |
SketchCow: I know, I've sat hunched over a table for hours at a time taking photos of books. |
04:38
🔗
|
SketchCow |
I'm thinking of something you can take on. |
04:38
🔗
|
namespace |
Okay. |
04:38
🔗
|
SketchCow |
Do you have an internet archive.org account? |
04:38
🔗
|
namespace |
Yes. |
04:38
🔗
|
namespace |
Just signed up. |
04:39
🔗
|
namespace |
(It's 'bandpass') |
04:40
🔗
|
dashcloud |
SketchCow: is the Apple II Asimov stuff mirrored on IA somewhere such that I can point to individual files, rather than having to link to the FTP? |
04:40
🔗
|
SketchCow |
Well, it's in a big collection that can be linked to within |
04:43
🔗
|
dashcloud |
okay |
04:43
🔗
|
SketchCow |
namespace: Do you want to describe items you can read, or research metadata from the world? |
04:43
🔗
|
namespace |
SketchCow: What would the latter look like? |
04:43
🔗
|
SketchCow |
http://ia601704.us.archive.org/zipview.php?zip=/13/items/asimov.apple.archive.2013.03/asimov.apple.archive.2013.03.zip |
04:44
🔗
|
SketchCow |
dashcloud: Example: |
04:45
🔗
|
SketchCow |
http://archive.org/download/asimov.apple.archive.2013.03/asimov.apple.archive.2013.03.zip/apple_II%2Fimages%2Fgames%2Ffile_based%2Fdiamondmine_dungbeetles_ladytut_minotaur.dsk |
04:45
🔗
|
dashcloud |
that's pretty awesome |
04:46
🔗
|
DFJustin |
yeah just add a / on the end of any zip file on an archive.org item to get a browse view |
04:46
🔗
|
DFJustin |
they really need to add a link on the page to that |
04:46
🔗
|
SketchCow |
That's why I make these sites zipped and sitting around. |
04:46
🔗
|
namespace |
How about tar files? |
04:46
🔗
|
DFJustin |
works on zip, tar, and iso/cdr |
04:46
🔗
|
SketchCow |
They're huge ice blocks I can use |
04:47
🔗
|
DFJustin |
does not work on tar.*, rar, bin |
04:47
🔗
|
SketchCow |
namespace: Example: https://archive.org/details/computermagazines <--- description of each magazine |
04:48
🔗
|
SketchCow |
Not each ISSUE, just each title, so it has some context |
04:51
🔗
|
namespace |
Hmm. So you mean I'd have to go research what these magazines are? |
04:52
🔗
|
namespace |
I don't see how that's different from just reading them and then describing the contents. |
04:52
🔗
|
namespace |
Or rather, why that approach wouldn't work here. |
04:52
🔗
|
dashcloud |
check out this collection of magazines: http://archive.org/search.php?query=collection%3Abig-k-magazine&sort=-publicdate |
04:53
🔗
|
namespace |
dashcloud: What about them? |
04:53
🔗
|
dashcloud |
they have metadata entered for them- the table of contents page(s) fully transcribed |
04:53
🔗
|
namespace |
And you want me to do that to a different collection of magazines? |
04:54
🔗
|
dashcloud |
if you're interested in that sort of thing |
04:54
🔗
|
namespace |
dashcloud: Okay, I think I can do that. Not right now though. |
04:54
🔗
|
dashcloud |
I think the other thing SketchCow was talking about is metadata work on projects like Console Living room or Business Case |
04:55
🔗
|
* |
namespace idly wonders if we could kickstart a fund to pay people on mechanical turk or whatever to do this |
04:57
🔗
|
dashcloud |
if you're more interested in digging up information on items, and researching a thing, rather than describing a thing, you might be interested in working on something like the Business Case: https://archive.org/details/businesscase |
06:22
🔗
|
namespace |
https://archive.org/details/2014.02.ftp.wideweb.com |
06:22
🔗
|
namespace |
Any suggestions? |
06:22
🔗
|
* |
namespace has to go do stuff, will be back in a bit |
06:53
🔗
|
xmc |
looks good to me, namespace |
06:53
🔗
|
xmc |
thanks for grabbing that |
07:02
🔗
|
namespace |
xmc: I think I'm gonna add a short list of some of the notable items featured within. (3.8 gigs is a lot of material after all) |
07:02
🔗
|
xmc |
ure |
07:02
🔗
|
xmc |
sure |
07:08
🔗
|
namespace |
Also, I messed up on what category to put it in (should go into "computers and technology") |
07:08
🔗
|
namespace |
(Put it into community audio by accident.) |
07:08
🔗
|
namespace |
But the editor won't let me change it. |
08:11
🔗
|
DFJustin |
as a new user you can only put stuff in the community areas |
08:25
🔗
|
namespace |
DFJustin: How long does that take to wear off? Can I get an upgrade? |
08:27
🔗
|
DFJustin |
oh I phrased that poorly, it's as a non-admin user |
08:28
🔗
|
DFJustin |
an admin can move items on your behalf or specifically grant you access to collections on a case by case basis |
08:28
🔗
|
namespace |
DFJustin: Got it. |
08:29
🔗
|
DFJustin |
but it's moreso geared towards small specific collections of stuff you uploaded, computers and tech is a big general area that you wouldn't get access to |
08:30
🔗
|
DFJustin |
but if you upload 100 episodes of a computer podcast or something then you can get a collection made for that under computers and tech |
08:30
🔗
|
namespace |
Okay. |
15:03
🔗
|
unbeholde |
guys I have quite a problem, after taking the advice of you guys I got the UT3 mods. The full list of files I drew up and several have come up corrupted (after fully downloading) I tryed to download them twice (once with Chrome and once with Download Accelerator Plus): List of archive site links that show up as corrupted: I've placed all the bad links here in a text file. http://depositfiles.com/files/31bq5xkpn |
15:04
🔗
|
Schbirid |
please post text on sites like https://pastee.org/ or pastebin.com or else |
15:06
🔗
|
unbeholde |
...ok.. https://pastee.org/n5ckj |
15:07
🔗
|
Schbirid |
hm, getting an 502 error on that |
15:09
🔗
|
unbeholde |
ugh fine I'll try the other site: http://pastebin.com/kGiwAPqH |
15:09
🔗
|
Schbirid |
ah, you came to the right guy :) |
15:13
🔗
|
Schbirid |
i highly suggest NOT using tools like Download Accelerator Plus, they stress servers and can suck |
15:13
🔗
|
Schbirid |
race.zip is fine for me |
15:14
🔗
|
* |
Schbirid downloads some more |
15:18
🔗
|
Schbirid |
prometheus_v3.zip also fine |
15:20
🔗
|
Schbirid |
talus.zip too |
15:20
🔗
|
Schbirid |
i will host them elsewhere for you if you promise not to use DAP anymore ;) |
15:21
🔗
|
Schbirid |
try http://www.freedownloadmanager.org/ but without that stupid multiple connection stuff |
15:26
🔗
|
unbeholde |
I see. Thank you kind sir. |
15:27
🔗
|
Schbirid |
https://www.quaddicted.com/files/temp/fp/ they are done when there is a file called "done" |
15:27
🔗
|
Schbirid |
downloading now |
15:42
🔗
|
Schbirid |
are those all huge files? |
15:45
🔗
|
Schbirid |
btw, did you use https://www.quaddicted.com/stuff/temp/fileplanet-postgres.php?filename=test to find them? i havent advertised that as it is work in progress but it is handy |
15:46
🔗
|
Asparagir |
Need ops in ArchiveBot plz. |
15:46
🔗
|
Asparagir |
More Ukrainian sites to add... |
15:48
🔗
|
unbeholde |
yeah that was the recommended way to get em. I shall visit the temp fp once you have made some more progress. |
15:49
🔗
|
Schbirid |
nice |
15:49
🔗
|
Schbirid |
if you stay in this channel, i will poke you |
16:23
🔗
|
xmc |
Schbirid: what does download accelerator plus do? |
16:23
🔗
|
xmc |
does it open a bunch of connections in parallel? |
16:24
🔗
|
xmc |
there's a debian package for that, 'axel' |
16:24
🔗
|
Schbirid |
probably |
16:24
🔗
|
Schbirid |
yeah, don't use those unless you must |
16:24
🔗
|
Schbirid |
aria2c is also nice for it |
16:45
🔗
|
joepie91 |
oh man |
16:45
🔗
|
joepie91 |
download accelerator plus |
16:45
🔗
|
joepie91 |
that's still around? |
16:45
🔗
|
* |
joepie91 usually recommends Orbit for windows |
16:52
🔗
|
Nemo_bis |
right, I used Orbit too |
16:52
🔗
|
Nemo_bis |
though that was ages ago |
17:08
🔗
|
joepie91 |
I remember NetAnt |
17:14
🔗
|
SadDM |
Anybody know if there's a way to change an item's media type on IA? |
17:15
🔗
|
Nemo_bis |
metadata api, mostly |
17:15
🔗
|
dashcloud |
if you're an admin of a collection, yes, otherwise you're pretty much stuck with whatever choices the interface gives you |
17:17
🔗
|
SadDM |
daaang... not the end of the world I suppose. Thanks. |
17:21
🔗
|
Schbirid |
* unbeholde has quit (Quit: Page closed) |
17:21
🔗
|
Schbirid |
oh great |
17:21
🔗
|
dashcloud |
I guess I should clarify- if IA has marked your book as an image, that you can change; if IA has put your item in a collection and you want to change that collection, you're stuck |
17:35
🔗
|
SadDM |
dashcloud: that's exactly my case... a book marked as an image. Can I change it through the web somewhere, or do I need to poke around at a lower level (and if so, where should I start)? |
17:35
🔗
|
dashcloud |
go to your item, and choose edit item |
17:36
🔗
|
SadDM |
yup... go on |
17:36
🔗
|
dashcloud |
you want to edit stuff about your item (should be the left option) |
17:37
🔗
|
SadDM |
uh huh, and the "mediatype" is displayed (as "image") but not editable. |
17:38
🔗
|
dashcloud |
what's the link to the item on IA? |
17:39
🔗
|
SadDM |
https://archive.org/details/romeos_quest |
17:42
🔗
|
dashcloud |
you could try changing the mediatype for the PDF to texts |
17:44
🔗
|
SadDM |
oh... like on a per-file basis. I'll give that a try in a bit. Either way it doesn't *really* matter, but thanks for the ideas. |
18:25
🔗
|
joepie91 |
dashcloud: IA thinks my software is all books :( |
19:46
🔗
|
DFJustin |
you can't change it using the web interface but you should be able to using other methods like https://pypi.python.org/pypi/internetarchive |
20:06
🔗
|
arkiver |
is it possible to set a number of retries for https://pypi.python.org/pypi/internetarchive? |
20:12
🔗
|
arkiver |
the number of retries for an upload |
20:22
🔗
|
joepie91 |
:P |
20:22
🔗
|
joepie91 |
while true |
20:22
🔗
|
ersi |
Well, I think he means on the command line. The answer is most likely "No" |
20:23
🔗
|
ersi |
there's no option for retries if I do just `ia upload` |
20:24
🔗
|
arkiver |
hmm |
20:24
🔗
|
arkiver |
but for the other s3 uploade there is |
20:25
🔗
|
arkiver |
but I found out the internetarchive uploader is far more faster then the ias3upload uploader |
20:25
🔗
|
arkiver |
or is there a way to increase the upload spead of te ias3upload? |
20:26
🔗
|
ersi |
My brain hurts while reading what you wrote |
20:27
🔗
|
arkiver |
haha |
20:27
🔗
|
ersi |
Well, you could, if inclined, take a look at how they differ and see what makes one of 'em go faster than the other. I dunno. |
22:28
🔗
|
yipdw |
arkiver: one of these days you're going to have to learn that there is much more to the Internet than what software you personally are using |
22:28
🔗
|
yipdw |
in this particular case, IA's S3 endpoint and/or something upstream is getting really fucked right now |
22:40
🔗
|
namespace |
:P |
22:40
🔗
|
namespace |
yipdw: My first recommendation for faster speeds would have been to find an IA server close to you and specify that IP. |
22:40
🔗
|
namespace |
(If the software will let you do that.) |
22:41
🔗
|
namespace |
(And assuming IA has regional servers.) |
22:41
🔗
|
yipdw |
because I have never seen one |
22:41
🔗
|
yipdw |
namespace: if you know of any other IA S3 endpoint that is not s3-lb0.us.archive.org, please tell me what it is |
22:41
🔗
|
namespace |
yipdw: Ah, suspected as much. |
22:41
🔗
|
yipdw |
suggesting geographical optimization is a fine idea, but is only useful if such a thing even exists |
22:41
🔗
|
yipdw |
and it's not the path length, anyway |
22:42
🔗
|
yipdw |
I mean |
22:42
🔗
|
DFJustin |
hmm I always just use s3.us.archive.org |
22:42
🔗
|
yipdw |
https://gist.github.com/yipdw/70b1464deb1a9fd5a093 |
22:42
🔗
|
DFJustin |
which may be a load balancer to more than one endpoint |
22:42
🔗
|
yipdw |
you really can't get much shorter than that |
22:42
🔗
|
DFJustin |
but they're all gonna be in the bay area |
22:42
🔗
|
yipdw |
at least not transatlantic |
22:43
🔗
|
yipdw |
but it's much slower than what I usually see and I have no idea what it is |
22:44
🔗
|
yipdw |
it's possible that other paths into IA are fine |
22:44
🔗
|
yipdw |
actually, they all look okay: https://monitor.archive.org/weathermap/weathermap.png |
22:44
🔗
|
yipdw |
must be on my end |
22:44
🔗
|
yipdw |
interesting |
22:48
🔗
|
namespace |
I'm lucky in that I live right next to Cali. |
22:52
🔗
|
yipdw |
actually, I wonder if the recent outbreak of NTP DDoS attacks are related |
22:52
🔗
|
yipdw |
IA got hit, but they're not the only victim |
23:01
🔗
|
namespace |
yipdw: I think freenode is dealing with that right now. |
23:22
🔗
|
namespace |
I have my entire high school careers worth of schoolwork sitting in this massive pile, but I'd be stepping on like 1000 peoples copyrights if I scanned it and put it up, what should I do with it? |
23:22
🔗
|
namespace |
(Obviously this wouldn't go on archive, this would be hosted somewhere else.) |
23:22
🔗
|
namespace |
(As a snapshot of schoolwork in 2010-1014.) |
23:25
🔗
|
dashcloud |
if you've got any good code bits, put them up |
23:25
🔗
|
namespace |
dashcloud: Good code bits? No. |
23:26
🔗
|
namespace |
I have code bits, but I didn't understand how data structures work. (I still don't.) |
23:26
🔗
|
namespace |
So nothing substantial. |
23:26
🔗
|
namespace |
I don't think I ever got something to work. |
23:26
🔗
|
namespace |
That was more trivial than like a function. |
23:26
🔗
|
namespace |
*less trivial |
23:28
🔗
|
namespace |
Mainly the value is in having something to look at in terms of what grade school education looks like at the time I went to school at my particular school/district. |
23:29
🔗
|
namespace |
It probably falls under the category of "weird data" that won't be routinely saved until we have like 256TB hard drives. |
23:31
🔗
|
namespace |
(I collect tons of "weird data" like this, I recently threw out a little bowl of fortunes from fortune cookies.) |
23:31
🔗
|
namespace |
(If I'd gone through and counted those out I'd have a lower bound on how many times I've had teriyaki in my lifetime.) |