Time |
Nickname |
Message |
00:00
🔗
|
dashcloud |
there's another good wget command here: http://www.archiveteam.org/index.php?title=User:Djsmiley2k |
00:02
🔗
|
arrith1 |
dashcloud: yeah, need to make sure people are downloading different things. might require the Warrior |
00:05
🔗
|
dashcloud |
balrog: I saw your request earlier about an autoloader for dumping CDs- Google shopping turns up this: http://www.bizchair.com/rx100pc-rex.html $495 |
00:07
🔗
|
dashcloud |
DoubleJ: my download just finished- much smaller site than I thought it would be |
00:08
🔗
|
DoubleJ |
dashcloud: Mine's done, too. 18 MB or so? |
00:10
🔗
|
SketchCow |
Is swizzle here? No, guess not. |
00:11
🔗
|
winr4r |
hey jason |
00:14
🔗
|
dashcloud |
yeap |
00:15
🔗
|
SketchCow |
Hey there. |
00:15
🔗
|
SketchCow |
(You're not swizzle) |
00:16
🔗
|
dashcloud |
so, just upload gont.com.ar.warc.gz to the community texts collection, and you'll take it from there? |
00:17
🔗
|
DoubleJ |
dashcloud: I'm checking the subpages to make sure we're not missing anything. He has a blog site that I"m downloading now. |
00:19
🔗
|
SketchCow |
dashcloud: Yes |
00:22
🔗
|
dashcloud |
here it is: https://archive.org/details/Gont.com.ar.warc |
00:29
🔗
|
godane |
uploaded: https://archive.org/details/ftpsites_arcade.demon.co.uk_2013.06.17 |
00:32
🔗
|
godane |
uploaded: https://archive.org/details/arcade.demon.co.uk-20130617 |
00:35
🔗
|
dashcloud |
ivan`: did you start a download of http://misc.yero.org/modulez/ ? |
00:36
🔗
|
ivan` |
dashcloud: I WARCed the site; Smiley went to download some of the music linked within |
00:36
🔗
|
arrith1 |
jfranusic asked earlier, and i'm wondering myself, to upload to archive.org do you just go to http://archive.org/upload/ ? is there a specialized way for AT people? |
00:36
🔗
|
ivan` |
dashcloud: we ran into a wget bug that causes segfault |
00:37
🔗
|
dashcloud |
I didn't know, so I started a download of the site |
00:38
🔗
|
dashcloud |
you can use the upload page, or there's a bulk upload script floating around somewhere for larger collections |
00:39
🔗
|
arrith1 |
dashcloud: thanks |
00:39
🔗
|
arrith1 |
jfranusic: i'll hunt around for that script for you |
00:40
🔗
|
dashcloud |
ivan`: I'm downloading the ftp://ftp.scene.org/pub/mirrors/scenesp.org/ bits- did you already get all of these? |
00:41
🔗
|
jfranusic |
arrith1: cool, thanks |
00:43
🔗
|
arrith1 |
jfranusic: found these: https://github.com/kngenie/ias3upload and http://askubuntu.com/questions/32763/script-to-upload-to-internet-archive-archive-org |
00:43
🔗
|
arrith1 |
jfranusic: kind of lower level: http://archive.org/help/abouts3.txt |
00:44
🔗
|
arrith1 |
i would think there would be a python script |
00:44
🔗
|
jfranusic |
"The intended users of this script are Internet Archive users interested in uploading batches of content alongside per-item metadata in an automated fashion." |
00:45
🔗
|
jfranusic |
the actually uploading part isn't what I'm worried about |
00:45
🔗
|
arrith1 |
jfranusic: this might be overkill for your purposes, but maybe a starting point: http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem#Archive_Team_megawarc_factory |
00:46
🔗
|
jfranusic |
I'm just wondering if I can just start uploading files willy nilly |
00:46
🔗
|
jfranusic |
or if I need to follow some sort of procedure |
00:47
🔗
|
arrith1 |
jfranusic: hm good question. ping dashcloud ? |
00:49
🔗
|
ivan` |
dashcloud: no |
00:49
🔗
|
dashcloud |
I've never used it, but the ias3upload script defines the metadata you need to provide |
00:50
🔗
|
DFJustin |
jfranusic: we're pretty low on formality |
00:50
🔗
|
DFJustin |
regular users can only upload to a couple specific areas on the site so someone higher up has to sort it later anyway |
00:50
🔗
|
jfranusic |
ah! okay |
00:50
🔗
|
jfranusic |
and, what are those areas |
00:51
🔗
|
DFJustin |
they're listed in the drop-down on the upload form, the main ones are Community Texts, Community Videos, and Community Audio |
00:52
🔗
|
DFJustin |
eventually most of it goes into the Archive Team collection https://archive.org/details/archiveteam |
00:52
🔗
|
DFJustin |
warcs usually get dumped in community texts to start because there's no ready-made community web collection |
00:54
🔗
|
DFJustin |
we have lots of people uploading random stuff beyond just website grabs though, e.g. historically interesting videos which will just stay in community videos |
01:56
🔗
|
jfranusic |
ah, cool, thanks DFJustin |
02:08
🔗
|
SketchCow |
I'm cleaning the software collection |
02:11
🔗
|
omf_ |
arrith, I recommend the fork https://github.com/kimmel/ias3upload it has more documentation and it has a few bug fixes |
02:12
🔗
|
omf_ |
Also the readme has a nice section on the metadata fields |
02:14
🔗
|
BlueMax |
SketchCow, what do you mean by clean |
02:14
🔗
|
SketchCow |
The front page's a mess. |
02:14
🔗
|
SketchCow |
Not informative, a lot of mess. |
02:14
🔗
|
SketchCow |
Dead projects, poor ideas |
03:01
🔗
|
godane |
so i found out one thing about the glennbeck highlights |
03:01
🔗
|
godane |
and maybe the mlb highlights too |
03:01
🔗
|
godane |
the last number is always a odd number |
03:02
🔗
|
arrith1 |
omf_: nice, thanks |
03:02
🔗
|
godane |
0, 2, 4, 6 and 8 are never used |
03:25
🔗
|
dashcloud |
hi, my grab of misc.yero.org/modulez has finished without issues: I used wget -e robots=off -r -l 0 -m -p --wait 1 --warc-header "operator: Archive Team" --warc-cdx --warc-file misc-yero-org http://misc.yero.org/modulez/ |
03:25
🔗
|
dashcloud |
did I miss something or do something different than other people? |
03:29
🔗
|
dashcloud |
I'm expecting my download of ftp://ftp.scene.org/pub/mirrors/scenesp.org/ to finish up over night |
03:31
🔗
|
SketchCow |
Hi, I just spent a few hours completely redoing the Software collection of archive.org. |
03:31
🔗
|
SketchCow |
http://archive.org/details/software |
03:32
🔗
|
SketchCow |
It will probably take overnight for the scripts to completely redirect everything, but now we have things described, and I'm going to begin the process of putting vintage software in a proper place (vintagesoftware) instead of scattered to the four winds, etc. |
03:32
🔗
|
godane |
computerbooks has bad url |
03:33
🔗
|
SketchCow |
fixed! |
03:34
🔗
|
godane |
also someone should mirror this site: http://www.dream17.info/ |
03:34
🔗
|
godane |
since it has a lot of amgia pd ware |
03:36
🔗
|
godane |
also you guys should know the 37 maximum pc cds are still up: http://www.ebay.com/itm/Lot-of-37-Maximum-PC-CDs-First-Issue-Old-Shareware-Software-1998-2003/231001552845 |
03:38
🔗
|
godane |
i'm getting this one: http://www.ebay.com/itm/18-Maximum-PC-and-Linux-Magazine-Demo-CD-Discs-/231002003264?pt=US_Wholesale_Software&hash=item35c8cad740 |
03:38
🔗
|
godane |
comes with 18 demo discs and is only $10 |
03:39
🔗
|
SketchCow |
I can't quite afford buying stuff right now. |
03:40
🔗
|
godane |
i just hope this one goes like my cnn bid went |
03:40
🔗
|
godane |
*cnn cd bid |
04:07
🔗
|
winr4r |
mistym: oh hi |
04:08
🔗
|
mistym |
winr4r: Hi! |
04:23
🔗
|
DFJustin |
SketchCow: the link to open source software is broken, should be open_source_software |
04:23
🔗
|
DFJustin |
good job cleaning all that up :D |
04:24
🔗
|
SketchCow |
I'm thinking of adding a "datasets" collection |
04:24
🔗
|
SketchCow |
For things like the Internet Census and Twitter downloads and all that crap. |
04:25
🔗
|
DFJustin |
ideally that collection should be titled something like "Community Software" though because most of it is not actually OSS |
04:26
🔗
|
SketchCow |
These changes will happen in waves. |
04:26
🔗
|
DFJustin |
much like "opensource" has been titled "Community Texts" |
04:26
🔗
|
SketchCow |
I agree, but that one I have to get clearance for |
04:27
🔗
|
DFJustin |
natch |
04:28
🔗
|
DFJustin |
yeah would be nice to get stuff like 301works off the "this just in" list |
04:28
🔗
|
SketchCow |
Well, that won't happen. |
04:29
🔗
|
SketchCow |
This Just In is a mess, the way it's done. |
04:29
🔗
|
SketchCow |
It basically adds anything with a software mediatype |
04:29
🔗
|
DFJustin |
there is https://archive.org/details/data but it's rather neglected |
04:29
🔗
|
SketchCow |
Wow, that's a mess |
04:29
🔗
|
SketchCow |
:) |
04:30
🔗
|
DFJustin |
that's another mediatype wildcard thing, it's all s3 uploads with no mediatype set and stuff |
04:31
🔗
|
DFJustin |
with the occasional proper thing like https://archive.org/details/BrownCorpus |
04:32
🔗
|
SketchCow |
Made datasets |
04:35
🔗
|
SketchCow |
Put some things into it |
04:49
🔗
|
underscor |
SketchCow: Damn, /software looks good! |
05:43
🔗
|
underscor |
Any archiveteamers that live in the bay area who'd be interested in touring IA and/or hanging out sometime? |
05:43
🔗
|
underscor |
It's kinda boring here with nobody to do things with after hours x3 |
05:44
🔗
|
SketchCow |
Ha |
05:44
🔗
|
SketchCow |
We'll work together to find you things |
05:45
🔗
|
BlueMax |
I'm sure underscor could find SOMETHING to do |
05:45
🔗
|
underscor |
I mean, there's the internet to chat with people |
05:45
🔗
|
underscor |
and always more work things |
05:45
🔗
|
underscor |
BlueMax: :P |
05:46
🔗
|
underscor |
I mean, I can wander to things, but again, kinda sucks alone |
05:46
🔗
|
BlueMax |
why don't you get people on Skype or something? |
05:46
🔗
|
BlueMax |
I'd be up for tha |
05:46
🔗
|
BlueMax |
that |
05:46
🔗
|
underscor |
:o cool |
05:46
🔗
|
underscor |
add me! |
05:46
🔗
|
underscor |
alex.buie.kwd |
05:46
🔗
|
SketchCow |
You should clean your space |
05:46
🔗
|
underscor |
I'm on the phone with my mom right now though |
05:46
🔗
|
underscor |
SketchCow: I did. It's much better. |
05:47
🔗
|
underscor |
That was embarassing. |
05:47
🔗
|
underscor |
(but entirely my fault) |
05:47
🔗
|
BlueMax |
oh underscor you're so messy |
05:49
🔗
|
* |
underscor giggles innocently |
05:49
🔗
|
winr4r |
morning folks |
05:51
🔗
|
winr4r |
best hour and a half of sleep evar |
05:52
🔗
|
BlueMax |
morning lame windows program :P |
05:56
🔗
|
underscor |
how does it feel for nobody to pay for you |
05:59
🔗
|
winr4r |
hah |
06:03
🔗
|
winr4r |
by the way, if anyone like backing up FTP sites and has a very fast connection, ftp.hp.com (godane) |
06:04
🔗
|
winr4r |
i was looking for some tru64 patch or other and it literally took me 12+ hours to generate a list of the files |
06:07
🔗
|
godane |
i think that one will be for some else |
06:07
🔗
|
godane |
only cause i think its too big for me |
06:24
🔗
|
godane |
i forgot that i have braingames collection: http://archive.org/search.php?query=braingames%20AND%20collection%3Aopensource_movies |
06:25
🔗
|
arrith1 |
underscor: heck yeah, east bay here |
06:25
🔗
|
godane |
and nerds 2.0.1: http://archive.org/search.php?query=subject%3A%22Nerds+2.0.1+-+A+Brief+History+of+the+Internet%22 |
06:25
🔗
|
arrith1 |
arrith: didn't know i could wander in, i'm so down for that |
06:26
🔗
|
godane |
and triumph of the nerds: http://archive.org/search.php?query=subject%3A%22Triumph+of+the+Nerds%22 |
06:33
🔗
|
underscor |
arrith1: yeah, totally |
06:33
🔗
|
underscor |
we have open catered lunch and tours on fridays at noon |
06:33
🔗
|
underscor |
you should come by on a friday you're free |
06:34
🔗
|
arrith1 |
underscor: then like a big box store i'll hide in a corner until they turn off the lights :P |
06:34
🔗
|
underscor |
I'll show you around and stuff, and we could hang after or something |
06:34
🔗
|
underscor |
arrith1: I'm here 24/7! Nice try! |
06:34
🔗
|
arrith1 |
underscor: more time to hang out i mean haha :) |
06:34
🔗
|
underscor |
Lights are all off right now, though, actually. Kinda weird to be at my desk in the dark |
06:34
🔗
|
underscor |
(I could turn them on, I just like it dim) |
06:35
🔗
|
arrith1 |
underscor: and yeah that sounds awesome. would be super fun to hang |
06:35
🔗
|
arrith1 |
ever since i got a dimmer for my office lights it's basically dim always |
06:36
🔗
|
underscor |
the archive work area is just a big open room with a bunch of table clusters |
06:36
🔗
|
underscor |
so it's either "dark" or "bright as fuck with 300 watt overheads" |
06:36
🔗
|
arrith1 |
hm nice. sounds like a hackerspace almost |
06:36
🔗
|
arrith1 |
haha |
06:36
🔗
|
underscor |
It's *very* much like a hackerspace |
06:36
🔗
|
arrith1 |
iirc hacker dojo is like that with the lighting |
06:36
🔗
|
underscor |
Lots of tables, ethernet drops, couches for people to just chill at, talk, hack on stuff |
06:37
🔗
|
arrith1 |
their presentation room is super dark, or can barely see the projected image |
06:37
🔗
|
underscor |
haha |
06:37
🔗
|
arrith1 |
wow nice. just needs some soylent feeding tubes and it'd be all you need |
06:40
🔗
|
underscor |
hahaha |
06:40
🔗
|
underscor |
we have a coffee robot |
06:40
🔗
|
underscor |
if that counts |
06:41
🔗
|
arrith1 |
delivers to your table? |
06:41
🔗
|
arrith1 |
one nice thing about sf is good food really isn't ever too far away |
06:43
🔗
|
underscor |
arrith1: unfortunately not, although that would probably be embraced by everyone here |
06:43
🔗
|
underscor |
It's just one of those really fancy coffee/espresso/cappuchino/everything machines |
06:43
🔗
|
underscor |
you put a cup on one arm, and science happens |
06:43
🔗
|
Aranje |
coffee... robot? |
06:43
🔗
|
underscor |
and you end up with a cup of caffinated sludge |
06:43
🔗
|
Aranje |
oh! |
06:44
🔗
|
arrith1 |
underscor: ahh, one of those super basic "robots" |
06:44
🔗
|
arrith1 |
underscor: personal delivery of pizza and/or beer is one of the primary goals of the hercules robot project at hacker dojo, so depending on how successful they are, that work could be re-purposed |
06:45
🔗
|
arrith1 |
though with hot coffee it better have some safety features.. |
06:45
🔗
|
underscor |
hahahaha |
06:45
🔗
|
underscor |
that would be awesome |
06:45
🔗
|
underscor |
well, a pizza to the face would probably hurt too |
06:45
🔗
|
underscor |
also, should move to -bs |
06:45
🔗
|
arrith1 |
oops |
06:46
🔗
|
underscor |
no big, I started it XD |
06:59
🔗
|
ivan` |
common crawl says they have 5 billion URLs, but downloading everything with common_crawl_index gets me 2,412,755,840 URLs |
06:59
🔗
|
ivan` |
the last two are |
06:59
🔗
|
ivan` |
zw.org.zwrcn.www/women-voice-blog/view-topiclist/forum-1-women-discussions.html:http |
06:59
🔗
|
ivan` |
zz_seay662TT/indexb.php:0223744@gothicundine.com:http |
07:00
🔗
|
ivan` |
I have a 22GB bz2 in case anyone wants it |
07:05
🔗
|
ivan` |
http://204.12.192.194:32047/common_crawl_index_urls.bz2 wrong mimetype, don't open it in your browser, sha1sum c296782cf01fa4f4e111f58a3b02200d3a475d24 |
07:11
🔗
|
ivan` |
heh https://github.com/trivio/common_crawl_index/issues/12 |
08:44
🔗
|
ivan` |
could a few people run https://github.com/ArchiveTeam/greader-directory-grab please? |
08:44
🔗
|
ivan` |
getting 14 item/min, need about 30 |
08:44
🔗
|
ivan` |
--concurrent 2-4 should be fine |
08:45
🔗
|
ivan` |
ping GLaDOS underscor |
10:12
🔗
|
Baljem |
winr4r: did you get anybody biting on ftp.hp.com? that could be useful - especially with VMS getting EoLed end of 2016... |
10:12
🔗
|
Baljem |
... I've found useful VMS patches on the FTP site before that otherwise would have required paying $$$ for |
10:13
🔗
|
Baljem |
... and I have bandwidth on various machines if we could divvy up the work somehow. |
10:14
🔗
|
js_ |
i'd be happy to help out somehow, don't think i can do the entire site but i can definitely do a portion :) |
10:19
🔗
|
Baljem |
hopefully winr4r still has the file list that took 12+ hours to generate, so we can get some idea of the scale and how best to carve it up. |
10:20
🔗
|
Baljem |
I'd take a look myself but I have customers screaming at me (thankfully not about VMS today) so I need to go and do some work :-/ |
10:31
🔗
|
Nemo_bis |
Baljem: how big is it? |
10:32
🔗
|
Nemo_bis |
I thought I/we had already downloaded it |
10:32
🔗
|
Baljem |
absolutely no idea, I'm afraid, but winr4r made it sound pretty big. which makes sense, given the size of HP |
10:32
🔗
|
Baljem |
oh, in that case... job done ;) |
10:33
🔗
|
Nemo_bis |
but I can't find it |
10:33
🔗
|
Nemo_bis |
maybe https://archive.org/details/ftp-ftp.rta.nato.int is the only one I did |
10:34
🔗
|
Nemo_bis |
I remember playing with KIOslaves and html files in HP FTP though |
10:35
🔗
|
Baljem |
just playing with a tool for doing a recursive ls, let's see what happens |
10:35
🔗
|
winr4r |
Baljem: i don't even have the list anymore, i didn't think it'd be useful and it just occurred to me that it might not be archived, so, lesson learned |
10:37
🔗
|
winr4r |
i'd save it myself, but 1) 8mbit connection 2) 20gb montlhy bandwidth cap |
10:38
🔗
|
Nemo_bis |
aww |
10:40
🔗
|
winr4r |
(and $3 for each gigabyte over that, so you know, not sure i like it enough to spend like $1000 on it) |
10:49
🔗
|
Nemo_bis |
ftp1/ is 241,8 GiB |
10:50
🔗
|
winr4r |
Nemo_bis: did you seriously find that out in like 20 minutes? |
10:50
🔗
|
Nemo_bis |
sure |
10:50
🔗
|
winr4r |
because i wasn't kidding about it taking me 12+ hours |
10:51
🔗
|
Nemo_bis |
I told you, KIOslaves |
10:51
🔗
|
winr4r |
oh! |
10:51
🔗
|
winr4r |
dude, ftp.hp.com is going to be walking funny for *weeks* after that |
10:52
🔗
|
Nemo_bis |
nah, I doubt they care |
10:54
🔗
|
winr4r |
:) |
10:54
🔗
|
Nemo_bis |
ftp2/ 6000 subdirs and counting |
10:55
🔗
|
ivan` |
Smiley: https://ludios.org/tmp/0001-fix-segfault-in-ftp.c-ftp_loop_internal.patch from the bug-wget list |
10:55
🔗
|
Baljem |
I find 8857453594 files |
10:56
🔗
|
Nemo_bis |
... |
10:56
🔗
|
Baljem |
that seems... hmm. let me double-check my perl! |
10:56
🔗
|
Baljem |
yeah, something's fucked there, there are only 1.1 million lines in the listing I generated |
10:56
🔗
|
Nemo_bis |
^^ |
10:57
🔗
|
winr4r |
haha, 8 billion |
10:58
🔗
|
Baljem |
oh, duh, that's the size line I was adding up, not the file count |
10:58
🔗
|
Baljem |
so what is that smaller that 214GB? obviously this recursive FTP tool is not great. |
10:58
🔗
|
Baljem |
why* |
11:00
🔗
|
Baljem |
good grief, ftp1/pub/all_in_one - they kept all the ancient DEC stuff. nice. |
11:07
🔗
|
Nemo_bis |
it was onlt the ftp1/ dir |
11:07
🔗
|
Nemo_bis |
ftp2/pub/ at 10k subdirs and counting |
11:13
🔗
|
Baljem |
doing the same recursive listing a second time seems to be taking a lot longer. weird. will see what this comes out with in the end |
11:16
🔗
|
Nemo_bis |
I sometimes get no permission to access a directory that later works, I don't know why :) |
11:17
🔗
|
Nemo_bis |
Baljem: how much disk space do you have? |
11:19
🔗
|
Baljem |
I'm thinking nowhere near enough, unfortunately |
11:19
🔗
|
Baljem |
I need to shift some VMs around on the work cluster, but even then I think I can only free up a hundred gig or so |
11:20
🔗
|
Baljem |
and I've just realised that 8.8-billion number I quoted was in traditional 512-byte blocks. so that's 4TB... gonna need a bigger boat |
11:22
🔗
|
Nemo_bis |
you sure? |
11:22
🔗
|
Baljem |
not in the slightest. I'm presuming this tool is doing the right thing, which may not be wise seeing as the second listing is taking so much longer |
11:23
🔗
|
Nemo_bis |
ftp2/pub is only 10 809 subdirs, 29 421 files and 388,7 GiB here |
11:24
🔗
|
Baljem |
ah, it followed symlinks, blast |
11:27
🔗
|
Nemo_bis |
how does one tell wget not to? |
11:36
🔗
|
Baljem |
no idea, I'm afraid, using an FTP client called lftp instead |
11:36
🔗
|
Baljem |
but the results it's giving me are wildly different from yours so I suspect I'm doing something wrong |
11:40
🔗
|
Deewiant |
Baljem: Using lftp and 'rels -lR bin' I'm getting slightly different time stamps every time, and once the file sizes were wrong too |
11:40
🔗
|
Deewiant |
Seems like the server's lying :-P |
11:46
🔗
|
winr4r |
you guys are awesome |
11:57
🔗
|
dashcloud |
it's entirely possible HP's actually adding new stuff on an on-going basis |
11:59
🔗
|
Deewiant |
The timestamp of the root directory ('..' in bin/) had varying values all in 2011-2012 and in random order |
11:59
🔗
|
Deewiant |
If that's due to them adding stuff then that's pretty weird |
12:00
🔗
|
dashcloud |
SketchCow: https://archive.org/details/MiscYeroOrg.warc is up- it's just a wget-warc of the website. The music is hosted on the artist's site |
12:41
🔗
|
Smiley |
i'm tryign to grab all non-ftp music files |
13:54
🔗
|
ivan` |
https://twitter.com/KimDotcom/status/347342896174866433 |
13:54
🔗
|
ivan` |
"#Leaseweb has wiped ALL #Megaupload servers." |
14:28
🔗
|
ersi |
ReLeaseWeb rather |
15:14
🔗
|
underscor |
ivan`: Running --concurrent 4 |
15:16
🔗
|
ivan` |
thanks |
15:17
🔗
|
ivan` |
going to add some non-english words there soon |
16:28
🔗
|
Smiley |
ivan`: you any good with bash |
16:28
🔗
|
Smiley |
? |
16:30
🔗
|
ivan` |
probably, what do you need? |
17:00
🔗
|
Smiley |
https://github.com/djsmiley2k/smileys-random-tools/blob/master/get_xanga_users fix this? XD |
17:09
🔗
|
* |
ivan` looks |
17:35
🔗
|
SketchCow |
dashcloud: Thanks. |
17:44
🔗
|
ivan` |
Smiley: did you see the wget patch |
17:44
🔗
|
ivan` |
Smiley: it doesn't crash on ftp with it |
17:45
🔗
|
Smiley |
sweet |
17:45
🔗
|
Smiley |
i fixed my bash issue |
17:45
🔗
|
Smiley |
SketchCow: I've got script now running which iscollecting as many usernames as possible |
17:45
🔗
|
Smiley |
Already done a few hundred to test. works nicely |
17:45
🔗
|
Smiley |
and i've basically learnt awk... to the point of usablity doing it. |
17:53
🔗
|
* |
winr4r salutes Smiley |
17:53
🔗
|
winr4r |
takin' one for the team |
17:55
🔗
|
Smiley |
:D |
17:55
🔗
|
Smiley |
i feel EPIC |
17:58
🔗
|
winr4r |
pretty sure you can actually shoot lightning from your fingertips now man |
18:49
🔗
|
underscor |
^ |
18:51
🔗
|
godane |
found something: http://www.youtube.com/user/EverySteveJobsVideo |
18:56
🔗
|
xmc |
hahaha |
19:00
🔗
|
SketchCow |
Thanks, Smiley |
21:10
🔗
|
Smiley |
no worries SketchCow |
21:27
🔗
|
namespace |
Cool, a new burning building. |
21:35
🔗
|
Smiley |
XANGA projtect guys - DO IT :) |
21:35
🔗
|
namespace |
Already on it. |
21:35
🔗
|
namespace |
"Xanga is getting old. Archive Team investigates." That's a pretty strange description of the project. |
21:35
🔗
|
namespace |
I think it's well beyond investigation now that you're grabbing stuff. |
21:38
🔗
|
godane |
i found out that a friend of glenn beck dead: http://www.glennbeck.com/2013/06/19/glenn-remembers-his-good-friend-author-vince-flynn/ |
21:38
🔗
|
godane |
i'm mirroring vinceflynn.com |
21:39
🔗
|
Smiley |
I can't change the discription of the project namespace |
21:39
🔗
|
namespace |
Smiley: *shrug* Later then. |
21:40
🔗
|
namespace |
godane: Yeah, I actually wanted to talk about small sites. |
21:40
🔗
|
namespace |
I'm really impressed with the work that's been done saving these enormous spires of burning documents. |
21:41
🔗
|
namespace |
But if I wanted to go archive some smaller sites on my own, where would I start? |
21:41
🔗
|
namespace |
(Tools, etc.) |
21:41
🔗
|
godane |
i use wget |
21:41
🔗
|
godane |
my code: wget $website --mirror --warc-file=$website-$(date +%Y%m%d) --warc-cdx -o wget.log |
21:41
🔗
|
Smiley |
namespace: you go to my wiki user page |
21:41
🔗
|
Smiley |
it has a default "wtf save this site!"" code |
21:42
🔗
|
Smiley |
Yes, I'm attempting to be as awesome as Jason. |
21:42
🔗
|
Smiley |
http://archiveteam.org/index.php?title=User:Djsmiley2k |
21:42
🔗
|
Smiley |
also thanks to godane as being the source of that original command :) |
21:44
🔗
|
namespace |
Smiley: I'd prefer a nice document to let me know how to use wget. |
21:45
🔗
|
namespace |
I mean, there are man pages, but that's like saying that my desktop came with a manual. |
21:45
🔗
|
namespace |
(Actually, to be fair man pages are probably more useful than the 'manual' that comes with most computers.) |
21:46
🔗
|
Smiley |
namespace: I'll think about it, but tbh I know nothing about wget |
21:46
🔗
|
namespace |
Smiley: Be smart, somebody else probably already wrote one. |
21:46
🔗
|
namespace |
Instead of writing a mediocre one, try finding somebody elses great one. |
21:47
🔗
|
namespace |
(I'm working on it right now BTW.) |
21:48
🔗
|
namespace |
Oh, it's GNU, nevermind then the documentation is probably excellent. |
21:48
🔗
|
namespace |
https://www.gnu.org/software/wget/manual/wget.html |
22:02
🔗
|
arrith1 |
the wget manpage is pretty good. though there might be some undocumented features, for example erobots for handling robots.txt isn't in the manpage, i'm not sure of others |
22:03
🔗
|
arrith1 |
would be nice to have a submission process to the ArchiveTeam Warrior for smaller sites. could run a basic wget warc command on the clients. maybe have manual review before the grab happens, but the submission process could be automated |
22:03
🔗
|
arrith1 |
and i really should learn awk |
22:04
🔗
|
godane |
uploaded: https://archive.org/details/www.vinceflynn.com-20130619 |
22:21
🔗
|
arrith1 |
nice |
22:25
🔗
|
godane |
i'm doing another mirror of torrentfreak.com |
22:36
🔗
|
dashcloud |
balrog: still looking for a CD autoloader? |
22:36
🔗
|
balrog |
dashcloud: yeah |
22:36
🔗
|
dashcloud |
did you see this one I posted yesterday? http://www.bizchair.com/rx100pc-rex.html |
22:39
🔗
|
balrog |
yes |
22:39
🔗
|
balrog |
they don't make it anymore |
22:54
🔗
|
dashcloud |
so, they don't let you check out with it, but you can put it in your cart? |
23:45
🔗
|
arrith1 |
to test going up to checkout in a site, but not placing the order, fakenamegenerator.com is good. it provides fake credit card numbers to enter to get past the payment step |