#archiveteam 2013-06-19,Wed

↑back Search

Time Nickname Message
00:00 🔗 dashcloud there's another good wget command here: http://www.archiveteam.org/index.php?title=User:Djsmiley2k
00:02 🔗 arrith1 dashcloud: yeah, need to make sure people are downloading different things. might require the Warrior
00:05 🔗 dashcloud balrog: I saw your request earlier about an autoloader for dumping CDs- Google shopping turns up this: http://www.bizchair.com/rx100pc-rex.html $495
00:07 🔗 dashcloud DoubleJ: my download just finished- much smaller site than I thought it would be
00:08 🔗 DoubleJ dashcloud: Mine's done, too. 18 MB or so?
00:10 🔗 SketchCow Is swizzle here? No, guess not.
00:11 🔗 winr4r hey jason
00:14 🔗 dashcloud yeap
00:15 🔗 SketchCow Hey there.
00:15 🔗 SketchCow (You're not swizzle)
00:16 🔗 dashcloud so, just upload gont.com.ar.warc.gz to the community texts collection, and you'll take it from there?
00:17 🔗 DoubleJ dashcloud: I'm checking the subpages to make sure we're not missing anything. He has a blog site that I"m downloading now.
00:19 🔗 SketchCow dashcloud: Yes
00:22 🔗 dashcloud here it is: https://archive.org/details/Gont.com.ar.warc
00:29 🔗 godane uploaded: https://archive.org/details/ftpsites_arcade.demon.co.uk_2013.06.17
00:32 🔗 godane uploaded: https://archive.org/details/arcade.demon.co.uk-20130617
00:35 🔗 dashcloud ivan`: did you start a download of http://misc.yero.org/modulez/ ?
00:36 🔗 ivan` dashcloud: I WARCed the site; Smiley went to download some of the music linked within
00:36 🔗 arrith1 jfranusic asked earlier, and i'm wondering myself, to upload to archive.org do you just go to http://archive.org/upload/ ? is there a specialized way for AT people?
00:36 🔗 ivan` dashcloud: we ran into a wget bug that causes segfault
00:37 🔗 dashcloud I didn't know, so I started a download of the site
00:38 🔗 dashcloud you can use the upload page, or there's a bulk upload script floating around somewhere for larger collections
00:39 🔗 arrith1 dashcloud: thanks
00:39 🔗 arrith1 jfranusic: i'll hunt around for that script for you
00:40 🔗 dashcloud ivan`: I'm downloading the ftp://ftp.scene.org/pub/mirrors/scenesp.org/ bits- did you already get all of these?
00:41 🔗 jfranusic arrith1: cool, thanks
00:43 🔗 arrith1 jfranusic: found these: https://github.com/kngenie/ias3upload and http://askubuntu.com/questions/32763/script-to-upload-to-internet-archive-archive-org
00:43 🔗 arrith1 jfranusic: kind of lower level: http://archive.org/help/abouts3.txt
00:44 🔗 arrith1 i would think there would be a python script
00:44 🔗 jfranusic "The intended users of this script are Internet Archive users interested in uploading batches of content alongside per-item metadata in an automated fashion."
00:45 🔗 jfranusic the actually uploading part isn't what I'm worried about
00:45 🔗 arrith1 jfranusic: this might be overkill for your purposes, but maybe a starting point: http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem#Archive_Team_megawarc_factory
00:46 🔗 jfranusic I'm just wondering if I can just start uploading files willy nilly
00:46 🔗 jfranusic or if I need to follow some sort of procedure
00:47 🔗 arrith1 jfranusic: hm good question. ping dashcloud ?
00:49 🔗 ivan` dashcloud: no
00:49 🔗 dashcloud I've never used it, but the ias3upload script defines the metadata you need to provide
00:50 🔗 DFJustin jfranusic: we're pretty low on formality
00:50 🔗 DFJustin regular users can only upload to a couple specific areas on the site so someone higher up has to sort it later anyway
00:50 🔗 jfranusic ah! okay
00:50 🔗 jfranusic and, what are those areas
00:51 🔗 DFJustin they're listed in the drop-down on the upload form, the main ones are Community Texts, Community Videos, and Community Audio
00:52 🔗 DFJustin eventually most of it goes into the Archive Team collection https://archive.org/details/archiveteam
00:52 🔗 DFJustin warcs usually get dumped in community texts to start because there's no ready-made community web collection
00:54 🔗 DFJustin we have lots of people uploading random stuff beyond just website grabs though, e.g. historically interesting videos which will just stay in community videos
01:56 🔗 jfranusic ah, cool, thanks DFJustin
02:08 🔗 SketchCow I'm cleaning the software collection
02:11 🔗 omf_ arrith, I recommend the fork https://github.com/kimmel/ias3upload it has more documentation and it has a few bug fixes
02:12 🔗 omf_ Also the readme has a nice section on the metadata fields
02:14 🔗 BlueMax SketchCow, what do you mean by clean
02:14 🔗 SketchCow The front page's a mess.
02:14 🔗 SketchCow Not informative, a lot of mess.
02:14 🔗 SketchCow Dead projects, poor ideas
03:01 🔗 godane so i found out one thing about the glennbeck highlights
03:01 🔗 godane and maybe the mlb highlights too
03:01 🔗 godane the last number is always a odd number
03:02 🔗 arrith1 omf_: nice, thanks
03:02 🔗 godane 0, 2, 4, 6 and 8 are never used
03:25 🔗 dashcloud hi, my grab of misc.yero.org/modulez has finished without issues: I used wget -e robots=off -r -l 0 -m -p --wait 1 --warc-header "operator: Archive Team" --warc-cdx --warc-file misc-yero-org http://misc.yero.org/modulez/
03:25 🔗 dashcloud did I miss something or do something different than other people?
03:29 🔗 dashcloud I'm expecting my download of ftp://ftp.scene.org/pub/mirrors/scenesp.org/ to finish up over night
03:31 🔗 SketchCow Hi, I just spent a few hours completely redoing the Software collection of archive.org.
03:31 🔗 SketchCow http://archive.org/details/software
03:32 🔗 SketchCow It will probably take overnight for the scripts to completely redirect everything, but now we have things described, and I'm going to begin the process of putting vintage software in a proper place (vintagesoftware) instead of scattered to the four winds, etc.
03:32 🔗 godane computerbooks has bad url
03:33 🔗 SketchCow fixed!
03:34 🔗 godane also someone should mirror this site: http://www.dream17.info/
03:34 🔗 godane since it has a lot of amgia pd ware
03:36 🔗 godane also you guys should know the 37 maximum pc cds are still up: http://www.ebay.com/itm/Lot-of-37-Maximum-PC-CDs-First-Issue-Old-Shareware-Software-1998-2003/231001552845
03:38 🔗 godane i'm getting this one: http://www.ebay.com/itm/18-Maximum-PC-and-Linux-Magazine-Demo-CD-Discs-/231002003264?pt=US_Wholesale_Software&hash=item35c8cad740
03:38 🔗 godane comes with 18 demo discs and is only $10
03:39 🔗 SketchCow I can't quite afford buying stuff right now.
03:40 🔗 godane i just hope this one goes like my cnn bid went
03:40 🔗 godane *cnn cd bid
04:07 🔗 winr4r mistym: oh hi
04:08 🔗 mistym winr4r: Hi!
04:23 🔗 DFJustin SketchCow: the link to open source software is broken, should be open_source_software
04:23 🔗 DFJustin good job cleaning all that up :D
04:24 🔗 SketchCow I'm thinking of adding a "datasets" collection
04:24 🔗 SketchCow For things like the Internet Census and Twitter downloads and all that crap.
04:25 🔗 DFJustin ideally that collection should be titled something like "Community Software" though because most of it is not actually OSS
04:26 🔗 SketchCow These changes will happen in waves.
04:26 🔗 DFJustin much like "opensource" has been titled "Community Texts"
04:26 🔗 SketchCow I agree, but that one I have to get clearance for
04:27 🔗 DFJustin natch
04:28 🔗 DFJustin yeah would be nice to get stuff like 301works off the "this just in" list
04:28 🔗 SketchCow Well, that won't happen.
04:29 🔗 SketchCow This Just In is a mess, the way it's done.
04:29 🔗 SketchCow It basically adds anything with a software mediatype
04:29 🔗 DFJustin there is https://archive.org/details/data but it's rather neglected
04:29 🔗 SketchCow Wow, that's a mess
04:29 🔗 SketchCow :)
04:30 🔗 DFJustin that's another mediatype wildcard thing, it's all s3 uploads with no mediatype set and stuff
04:31 🔗 DFJustin with the occasional proper thing like https://archive.org/details/BrownCorpus
04:32 🔗 SketchCow Made datasets
04:35 🔗 SketchCow Put some things into it
04:49 🔗 underscor SketchCow: Damn, /software looks good!
05:43 🔗 underscor Any archiveteamers that live in the bay area who'd be interested in touring IA and/or hanging out sometime?
05:43 🔗 underscor It's kinda boring here with nobody to do things with after hours x3
05:44 🔗 SketchCow Ha
05:44 🔗 SketchCow We'll work together to find you things
05:45 🔗 BlueMax I'm sure underscor could find SOMETHING to do
05:45 🔗 underscor I mean, there's the internet to chat with people
05:45 🔗 underscor and always more work things
05:45 🔗 underscor BlueMax: :P
05:46 🔗 underscor I mean, I can wander to things, but again, kinda sucks alone
05:46 🔗 BlueMax why don't you get people on Skype or something?
05:46 🔗 BlueMax I'd be up for tha
05:46 🔗 BlueMax that
05:46 🔗 underscor :o cool
05:46 🔗 underscor add me!
05:46 🔗 underscor alex.buie.kwd
05:46 🔗 SketchCow You should clean your space
05:46 🔗 underscor I'm on the phone with my mom right now though
05:46 🔗 underscor SketchCow: I did. It's much better.
05:47 🔗 underscor That was embarassing.
05:47 🔗 underscor (but entirely my fault)
05:47 🔗 BlueMax oh underscor you're so messy
05:49 🔗 * underscor giggles innocently
05:49 🔗 winr4r morning folks
05:51 🔗 winr4r best hour and a half of sleep evar
05:52 🔗 BlueMax morning lame windows program :P
05:56 🔗 underscor how does it feel for nobody to pay for you
05:59 🔗 winr4r hah
06:03 🔗 winr4r by the way, if anyone like backing up FTP sites and has a very fast connection, ftp.hp.com (godane)
06:04 🔗 winr4r i was looking for some tru64 patch or other and it literally took me 12+ hours to generate a list of the files
06:07 🔗 godane i think that one will be for some else
06:07 🔗 godane only cause i think its too big for me
06:24 🔗 godane i forgot that i have braingames collection: http://archive.org/search.php?query=braingames%20AND%20collection%3Aopensource_movies
06:25 🔗 arrith1 underscor: heck yeah, east bay here
06:25 🔗 godane and nerds 2.0.1: http://archive.org/search.php?query=subject%3A%22Nerds+2.0.1+-+A+Brief+History+of+the+Internet%22
06:25 🔗 arrith1 arrith: didn't know i could wander in, i'm so down for that
06:26 🔗 godane and triumph of the nerds: http://archive.org/search.php?query=subject%3A%22Triumph+of+the+Nerds%22
06:33 🔗 underscor arrith1: yeah, totally
06:33 🔗 underscor we have open catered lunch and tours on fridays at noon
06:33 🔗 underscor you should come by on a friday you're free
06:34 🔗 arrith1 underscor: then like a big box store i'll hide in a corner until they turn off the lights :P
06:34 🔗 underscor I'll show you around and stuff, and we could hang after or something
06:34 🔗 underscor arrith1: I'm here 24/7! Nice try!
06:34 🔗 arrith1 underscor: more time to hang out i mean haha :)
06:34 🔗 underscor Lights are all off right now, though, actually. Kinda weird to be at my desk in the dark
06:34 🔗 underscor (I could turn them on, I just like it dim)
06:35 🔗 arrith1 underscor: and yeah that sounds awesome. would be super fun to hang
06:35 🔗 arrith1 ever since i got a dimmer for my office lights it's basically dim always
06:36 🔗 underscor the archive work area is just a big open room with a bunch of table clusters
06:36 🔗 underscor so it's either "dark" or "bright as fuck with 300 watt overheads"
06:36 🔗 arrith1 hm nice. sounds like a hackerspace almost
06:36 🔗 arrith1 haha
06:36 🔗 underscor It's *very* much like a hackerspace
06:36 🔗 arrith1 iirc hacker dojo is like that with the lighting
06:36 🔗 underscor Lots of tables, ethernet drops, couches for people to just chill at, talk, hack on stuff
06:37 🔗 arrith1 their presentation room is super dark, or can barely see the projected image
06:37 🔗 underscor haha
06:37 🔗 arrith1 wow nice. just needs some soylent feeding tubes and it'd be all you need
06:40 🔗 underscor hahaha
06:40 🔗 underscor we have a coffee robot
06:40 🔗 underscor if that counts
06:41 🔗 arrith1 delivers to your table?
06:41 🔗 arrith1 one nice thing about sf is good food really isn't ever too far away
06:43 🔗 underscor arrith1: unfortunately not, although that would probably be embraced by everyone here
06:43 🔗 underscor It's just one of those really fancy coffee/espresso/cappuchino/everything machines
06:43 🔗 underscor you put a cup on one arm, and science happens
06:43 🔗 Aranje coffee... robot?
06:43 🔗 underscor and you end up with a cup of caffinated sludge
06:43 🔗 Aranje oh!
06:44 🔗 arrith1 underscor: ahh, one of those super basic "robots"
06:44 🔗 arrith1 underscor: personal delivery of pizza and/or beer is one of the primary goals of the hercules robot project at hacker dojo, so depending on how successful they are, that work could be re-purposed
06:45 🔗 arrith1 though with hot coffee it better have some safety features..
06:45 🔗 underscor hahahaha
06:45 🔗 underscor that would be awesome
06:45 🔗 underscor well, a pizza to the face would probably hurt too
06:45 🔗 underscor also, should move to -bs
06:45 🔗 arrith1 oops
06:46 🔗 underscor no big, I started it XD
06:59 🔗 ivan` common crawl says they have 5 billion URLs, but downloading everything with common_crawl_index gets me 2,412,755,840 URLs
06:59 🔗 ivan` the last two are
06:59 🔗 ivan` zw.org.zwrcn.www/women-voice-blog/view-topiclist/forum-1-women-discussions.html:http
06:59 🔗 ivan` zz_seay662TT/indexb.php:0223744@gothicundine.com:http
07:00 🔗 ivan` I have a 22GB bz2 in case anyone wants it
07:05 🔗 ivan` http://204.12.192.194:32047/common_crawl_index_urls.bz2 wrong mimetype, don't open it in your browser, sha1sum c296782cf01fa4f4e111f58a3b02200d3a475d24
07:11 🔗 ivan` heh https://github.com/trivio/common_crawl_index/issues/12
08:44 🔗 ivan` could a few people run https://github.com/ArchiveTeam/greader-directory-grab please?
08:44 🔗 ivan` getting 14 item/min, need about 30
08:44 🔗 ivan` --concurrent 2-4 should be fine
08:45 🔗 ivan` ping GLaDOS underscor
10:12 🔗 Baljem winr4r: did you get anybody biting on ftp.hp.com? that could be useful - especially with VMS getting EoLed end of 2016...
10:12 🔗 Baljem ... I've found useful VMS patches on the FTP site before that otherwise would have required paying $$$ for
10:13 🔗 Baljem ... and I have bandwidth on various machines if we could divvy up the work somehow.
10:14 🔗 js_ i'd be happy to help out somehow, don't think i can do the entire site but i can definitely do a portion :)
10:19 🔗 Baljem hopefully winr4r still has the file list that took 12+ hours to generate, so we can get some idea of the scale and how best to carve it up.
10:20 🔗 Baljem I'd take a look myself but I have customers screaming at me (thankfully not about VMS today) so I need to go and do some work :-/
10:31 🔗 Nemo_bis Baljem: how big is it?
10:32 🔗 Nemo_bis I thought I/we had already downloaded it
10:32 🔗 Baljem absolutely no idea, I'm afraid, but winr4r made it sound pretty big. which makes sense, given the size of HP
10:32 🔗 Baljem oh, in that case... job done ;)
10:33 🔗 Nemo_bis but I can't find it
10:33 🔗 Nemo_bis maybe https://archive.org/details/ftp-ftp.rta.nato.int is the only one I did
10:34 🔗 Nemo_bis I remember playing with KIOslaves and html files in HP FTP though
10:35 🔗 Baljem just playing with a tool for doing a recursive ls, let's see what happens
10:35 🔗 winr4r Baljem: i don't even have the list anymore, i didn't think it'd be useful and it just occurred to me that it might not be archived, so, lesson learned
10:37 🔗 winr4r i'd save it myself, but 1) 8mbit connection 2) 20gb montlhy bandwidth cap
10:38 🔗 Nemo_bis aww
10:40 🔗 winr4r (and $3 for each gigabyte over that, so you know, not sure i like it enough to spend like $1000 on it)
10:49 🔗 Nemo_bis ftp1/ is 241,8 GiB
10:50 🔗 winr4r Nemo_bis: did you seriously find that out in like 20 minutes?
10:50 🔗 Nemo_bis sure
10:50 🔗 winr4r because i wasn't kidding about it taking me 12+ hours
10:51 🔗 Nemo_bis I told you, KIOslaves
10:51 🔗 winr4r oh!
10:51 🔗 winr4r dude, ftp.hp.com is going to be walking funny for *weeks* after that
10:52 🔗 Nemo_bis nah, I doubt they care
10:54 🔗 winr4r :)
10:54 🔗 Nemo_bis ftp2/ 6000 subdirs and counting
10:55 🔗 ivan` Smiley: https://ludios.org/tmp/0001-fix-segfault-in-ftp.c-ftp_loop_internal.patch from the bug-wget list
10:55 🔗 Baljem I find 8857453594 files
10:56 🔗 Nemo_bis ...
10:56 🔗 Baljem that seems... hmm. let me double-check my perl!
10:56 🔗 Baljem yeah, something's fucked there, there are only 1.1 million lines in the listing I generated
10:56 🔗 Nemo_bis ^^
10:57 🔗 winr4r haha, 8 billion
10:58 🔗 Baljem oh, duh, that's the size line I was adding up, not the file count
10:58 🔗 Baljem so what is that smaller that 214GB? obviously this recursive FTP tool is not great.
10:58 🔗 Baljem why*
11:00 🔗 Baljem good grief, ftp1/pub/all_in_one - they kept all the ancient DEC stuff. nice.
11:07 🔗 Nemo_bis it was onlt the ftp1/ dir
11:07 🔗 Nemo_bis ftp2/pub/ at 10k subdirs and counting
11:13 🔗 Baljem doing the same recursive listing a second time seems to be taking a lot longer. weird. will see what this comes out with in the end
11:16 🔗 Nemo_bis I sometimes get no permission to access a directory that later works, I don't know why :)
11:17 🔗 Nemo_bis Baljem: how much disk space do you have?
11:19 🔗 Baljem I'm thinking nowhere near enough, unfortunately
11:19 🔗 Baljem I need to shift some VMs around on the work cluster, but even then I think I can only free up a hundred gig or so
11:20 🔗 Baljem and I've just realised that 8.8-billion number I quoted was in traditional 512-byte blocks. so that's 4TB... gonna need a bigger boat
11:22 🔗 Nemo_bis you sure?
11:22 🔗 Baljem not in the slightest. I'm presuming this tool is doing the right thing, which may not be wise seeing as the second listing is taking so much longer
11:23 🔗 Nemo_bis ftp2/pub is only 10 809 subdirs, 29 421 files and 388,7 GiB here
11:24 🔗 Baljem ah, it followed symlinks, blast
11:27 🔗 Nemo_bis how does one tell wget not to?
11:36 🔗 Baljem no idea, I'm afraid, using an FTP client called lftp instead
11:36 🔗 Baljem but the results it's giving me are wildly different from yours so I suspect I'm doing something wrong
11:40 🔗 Deewiant Baljem: Using lftp and 'rels -lR bin' I'm getting slightly different time stamps every time, and once the file sizes were wrong too
11:40 🔗 Deewiant Seems like the server's lying :-P
11:46 🔗 winr4r you guys are awesome
11:57 🔗 dashcloud it's entirely possible HP's actually adding new stuff on an on-going basis
11:59 🔗 Deewiant The timestamp of the root directory ('..' in bin/) had varying values all in 2011-2012 and in random order
11:59 🔗 Deewiant If that's due to them adding stuff then that's pretty weird
12:00 🔗 dashcloud SketchCow: https://archive.org/details/MiscYeroOrg.warc is up- it's just a wget-warc of the website. The music is hosted on the artist's site
12:41 🔗 Smiley i'm tryign to grab all non-ftp music files
13:54 🔗 ivan` https://twitter.com/KimDotcom/status/347342896174866433
13:54 🔗 ivan` "#Leaseweb has wiped ALL #Megaupload servers."
14:28 🔗 ersi ReLeaseWeb rather
15:14 🔗 underscor ivan`: Running --concurrent 4
15:16 🔗 ivan` thanks
15:17 🔗 ivan` going to add some non-english words there soon
16:28 🔗 Smiley ivan`: you any good with bash
16:28 🔗 Smiley ?
16:30 🔗 ivan` probably, what do you need?
17:00 🔗 Smiley https://github.com/djsmiley2k/smileys-random-tools/blob/master/get_xanga_users fix this? XD
17:09 🔗 * ivan` looks
17:35 🔗 SketchCow dashcloud: Thanks.
17:44 🔗 ivan` Smiley: did you see the wget patch
17:44 🔗 ivan` Smiley: it doesn't crash on ftp with it
17:45 🔗 Smiley sweet
17:45 🔗 Smiley i fixed my bash issue
17:45 🔗 Smiley SketchCow: I've got script now running which iscollecting as many usernames as possible
17:45 🔗 Smiley Already done a few hundred to test. works nicely
17:45 🔗 Smiley and i've basically learnt awk... to the point of usablity doing it.
17:53 🔗 * winr4r salutes Smiley
17:53 🔗 winr4r takin' one for the team
17:55 🔗 Smiley :D
17:55 🔗 Smiley i feel EPIC
17:58 🔗 winr4r pretty sure you can actually shoot lightning from your fingertips now man
18:49 🔗 underscor ^
18:51 🔗 godane found something: http://www.youtube.com/user/EverySteveJobsVideo
18:56 🔗 xmc hahaha
19:00 🔗 SketchCow Thanks, Smiley
21:10 🔗 Smiley no worries SketchCow
21:27 🔗 namespace Cool, a new burning building.
21:35 🔗 Smiley XANGA projtect guys - DO IT :)
21:35 🔗 namespace Already on it.
21:35 🔗 namespace "Xanga is getting old. Archive Team investigates." That's a pretty strange description of the project.
21:35 🔗 namespace I think it's well beyond investigation now that you're grabbing stuff.
21:38 🔗 godane i found out that a friend of glenn beck dead: http://www.glennbeck.com/2013/06/19/glenn-remembers-his-good-friend-author-vince-flynn/
21:38 🔗 godane i'm mirroring vinceflynn.com
21:39 🔗 Smiley I can't change the discription of the project namespace
21:39 🔗 namespace Smiley: *shrug* Later then.
21:40 🔗 namespace godane: Yeah, I actually wanted to talk about small sites.
21:40 🔗 namespace I'm really impressed with the work that's been done saving these enormous spires of burning documents.
21:41 🔗 namespace But if I wanted to go archive some smaller sites on my own, where would I start?
21:41 🔗 namespace (Tools, etc.)
21:41 🔗 godane i use wget
21:41 🔗 godane my code: wget $website --mirror --warc-file=$website-$(date +%Y%m%d) --warc-cdx -o wget.log
21:41 🔗 Smiley namespace: you go to my wiki user page
21:41 🔗 Smiley it has a default "wtf save this site!"" code
21:42 🔗 Smiley Yes, I'm attempting to be as awesome as Jason.
21:42 🔗 Smiley http://archiveteam.org/index.php?title=User:Djsmiley2k
21:42 🔗 Smiley also thanks to godane as being the source of that original command :)
21:44 🔗 namespace Smiley: I'd prefer a nice document to let me know how to use wget.
21:45 🔗 namespace I mean, there are man pages, but that's like saying that my desktop came with a manual.
21:45 🔗 namespace (Actually, to be fair man pages are probably more useful than the 'manual' that comes with most computers.)
21:46 🔗 Smiley namespace: I'll think about it, but tbh I know nothing about wget
21:46 🔗 namespace Smiley: Be smart, somebody else probably already wrote one.
21:46 🔗 namespace Instead of writing a mediocre one, try finding somebody elses great one.
21:47 🔗 namespace (I'm working on it right now BTW.)
21:48 🔗 namespace Oh, it's GNU, nevermind then the documentation is probably excellent.
21:48 🔗 namespace https://www.gnu.org/software/wget/manual/wget.html
22:02 🔗 arrith1 the wget manpage is pretty good. though there might be some undocumented features, for example erobots for handling robots.txt isn't in the manpage, i'm not sure of others
22:03 🔗 arrith1 would be nice to have a submission process to the ArchiveTeam Warrior for smaller sites. could run a basic wget warc command on the clients. maybe have manual review before the grab happens, but the submission process could be automated
22:03 🔗 arrith1 and i really should learn awk
22:04 🔗 godane uploaded: https://archive.org/details/www.vinceflynn.com-20130619
22:21 🔗 arrith1 nice
22:25 🔗 godane i'm doing another mirror of torrentfreak.com
22:36 🔗 dashcloud balrog: still looking for a CD autoloader?
22:36 🔗 balrog dashcloud: yeah
22:36 🔗 dashcloud did you see this one I posted yesterday? http://www.bizchair.com/rx100pc-rex.html
22:39 🔗 balrog yes
22:39 🔗 balrog they don't make it anymore
22:54 🔗 dashcloud so, they don't let you check out with it, but you can put it in your cart?
23:45 🔗 arrith1 to test going up to checkout in a site, but not placing the order, fakenamegenerator.com is good. it provides fake credit card numbers to enter to get past the payment step

irclogger-viewer