[00:04] *** godane has quit IRC (Read error: Operation timed out) [00:17] *** bithippo has joined #archiveteam [00:17] Can someone toss http://www.clearwater.edu/ into ArchiveBot? They're closing their doors June 30th (http://www.clearwater.edu/news/campusnews.asp?ObjectID=2131) [00:17] Thanks! [00:20] *** godane has joined #archiveteam [00:20] Thanks, it's in! [00:20] Thank you! [00:24] *** bithippo has quit IRC (Quit: Page closed) [00:37] *** JesseW has joined #archiveteam [00:41] *** dinomite_ has joined #archiveteam [00:43] *** joepie91_ has joined #archiveteam [00:44] *** edsu_ has joined #archiveteam [00:44] *** dinomite has quit IRC (Read error: Operation timed out) [00:44] *** joepie91 has quit IRC (Read error: Operation timed out) [00:44] *** SketchCow has quit IRC (Read error: Connection reset by peer) [00:44] *** torvik has quit IRC (Ping timeout: 255 seconds) [00:44] *** Stiletto has quit IRC (Ping timeout: 255 seconds) [00:45] *** edsu has quit IRC (Ping timeout: 255 seconds) [00:45] *** torvik_ has joined #archiveteam [00:45] *** swebb has quit IRC (Ping timeout: 255 seconds) [00:45] *** swebb_ has joined #archiveteam [00:45] *** torvik_ is now known as torvik [00:45] *** Stiletto has joined #archiveteam [00:45] *** swebb_ is now known as swebb [00:48] *** SketchCow has joined #archiveteam [00:50] *** Stiletto has quit IRC (Ping timeout: 240 seconds) [00:54] *** godane has quit IRC (Quit: Leaving.) [00:54] *** godane has joined #archiveteam [01:04] *** JesseW has quit IRC (Quit: Leaving.) [01:10] *** Stiletto has joined #archiveteam [01:12] *** JesseW has joined #archiveteam [01:20] *** Start has joined #archiveteam [01:27] *** n00b674 has joined #archiveteam [01:30] *** cva_ has joined #archiveteam [01:31] *** godane has quit IRC (Quit: Leaving.) [01:31] *** n00b674 has quit IRC (Client Quit) [01:31] *** godane has joined #archiveteam [01:38] *** primus104 has quit IRC (Leaving.) [01:45] *** username1 has joined #archiveteam [01:47] *** schbirid2 has quit IRC (Read error: Operation timed out) [01:56] *** bzc6p_ has joined #archiveteam [01:58] *** Start_ has joined #archiveteam [02:03] *** bzc6p has quit IRC (Ping timeout: 600 seconds) [02:04] *** mistym has joined #archiveteam [02:08] *** Start has quit IRC (Ping timeout: 740 seconds) [02:15] *** Start_ has quit IRC (Ping timeout: 740 seconds) [02:24] *** cva_ is now known as cva [03:42] *** Ymgve__ has quit IRC () [04:28] *** aaaaaaaaa has quit IRC (Leaving) [04:29] *** bithippo has joined #archiveteam [04:41] *** VADemon has quit IRC (Read error: Connection reset by peer) [05:11] *** pikhq has quit IRC (Ping timeout: 370 seconds) [05:23] *** bithippo has quit IRC (Quit: Page closed) [05:52] *** mistym has quit IRC (Remote host closed the connection) [06:21] *** mistym has joined #archiveteam [06:41] *** pikhq has joined #archiveteam [06:51] *** zenguy_pc has quit IRC (hub.efnet.us irc.Prison.NET) [06:51] *** Burninate has quit IRC (hub.efnet.us irc.Prison.NET) [06:51] *** kisspunch has quit IRC (hub.efnet.us irc.Prison.NET) [06:51] *** midas has quit IRC (hub.efnet.us irc.Prison.NET) [06:51] *** wyatt8740 has quit IRC (hub.efnet.us irc.Prison.NET) [06:51] *** d6e has quit IRC (hub.efnet.us irc.Prison.NET) [06:51] *** patrickod has quit IRC (hub.efnet.us irc.Prison.NET) [06:51] *** yuvadm has quit IRC (hub.efnet.us irc.Prison.NET) [06:51] *** db48x has quit IRC (hub.efnet.us irc.Prison.NET) [06:55] *** yuvadm_ has joined #archiveteam [06:58] *** Burnin8 has joined #archiveteam [07:02] *** zenguy_pc has joined #archiveteam [07:02] *** kisspunch has joined #archiveteam [07:02] *** midas has joined #archiveteam [07:02] *** d6e has joined #archiveteam [07:02] *** wyatt8740 has joined #archiveteam [07:02] *** patrickod has joined #archiveteam [07:06] *** SN4T14_ has quit IRC (Read error: Connection reset by peer) [07:11] *** SN4T14 has joined #archiveteam [07:18] *** SN4T14 has quit IRC (Read error: Connection reset by peer) [07:19] *** SN4T14 has joined #archiveteam [07:25] *** SN4T14 has quit IRC (Read error: Connection reset by peer) [07:25] *** SN4T14 has joined #archiveteam [07:36] *** JesseW has quit IRC (Quit: Leaving.) [07:48] *** primus104 has joined #archiveteam [08:00] *** McGEE has quit IRC (Quit: Connection closed for inactivity) [08:09] *** caber has quit IRC (Read error: Operation timed out) [08:11] *** caber has joined #archiveteam [08:14] *** primus104 has quit IRC (Leaving.) [08:18] *** signius has quit IRC (Ping timeout: 265 seconds) [08:24] *** SN4T14_ has joined #archiveteam [08:26] *** SN4T14 has quit IRC (Ping timeout: 369 seconds) [08:31] *** signius has joined #archiveteam [08:34] *** habi has joined #archiveteam [08:35] *** habi has left [08:40] *** mistym has quit IRC (Remote host closed the connection) [08:45] *** brayden_ has joined #archiveteam [08:45] *** brayden has quit IRC (Read error: Connection reset by peer) [08:48] *** primus104 has joined #archiveteam [09:05] *** bzc6p_ is now known as bzc6p [09:17] *** username1 has quit IRC (Quit: Leaving) [09:22] *** schbirid has joined #archiveteam [09:41] *** mistym has joined #archiveteam [09:43] *** schbirid has quit IRC (Quit: Leaving) [09:47] *** mistym has quit IRC (Ping timeout: 252 seconds) [10:18] *** nox has quit IRC () [11:01] *** nox has joined #archiveteam [11:15] *** primus104 has quit IRC (Leaving.) [11:16] *** random353 has joined #archiveteam [11:17] Hello. Is it okay to upload 500GB of website files to archive.org? It's compressed as 7z. [11:19] random353: .warc.gz is preferred for web content, but if that's not available, other formats are also okay. [11:20] Size of 500 GB is maybe too big. [11:21] We should wait for some others' opinion about that. [11:21] random353: what is that content, by the way? [11:21] what website? [11:22] 4chan and websites like 4chan [11:22] text of threads [11:22] no images [11:28] *** random353 has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) [11:55] *** mariusz has joined #archiveteam [11:59] *** Ymgve has joined #archiveteam [12:15] *** BlueMaxim has quit IRC (Quit: Leaving) [12:16] *** SN4T14_ has quit IRC (Ping timeout: 606 seconds) [12:16] did anybody catch that "Google Moderator is shutting down on June 30, 2015" [12:17] guess not [12:20] Sanqui: long ago [12:20] #moderhater [12:21] there's no page for it lol [12:21] I was about to make one, guess I still should [12:21] sure [12:21] the channel is empty too [12:22] maybe we should just throw it in archivebot and call it a day [12:22] Well, we're not late at all [12:23] god I hate mediawiki [12:24] Moderator: "site is pure javascript but there are csv zip download links" (chfoo, Apr 10) [12:28] So archivebot can't do here too much I guess. [12:28] How much I hate Javascript. [12:29] *** sirdancea has quit IRC (Quit: Leaving) [12:32] "For the month of July, Google Moderator will be “read-only.”" [12:32] So we shouldn't even start saving it before July. [12:55] yeah [13:16] *** sirdancea has joined #archiveteam [13:23] *** primus104 has joined #archiveteam [14:05] *** primus104 has quit IRC (Leaving.) [14:09] *** bzc6p_ has joined #archiveteam [14:16] *** SN4T14 has joined #archiveteam [14:16] *** bzc6p has quit IRC (Ping timeout: 600 seconds) [14:41] *** toad1 has joined #archiveteam [14:41] *** McGEE has joined #archiveteam [14:44] *** JesseW has joined #archiveteam [14:50] *** toad2 has quit IRC (Read error: Operation timed out) [15:02] *** mariusz has quit IRC (WeeChat 1.1) [15:10] *** schbirid has joined #archiveteam [15:53] *** VADemon has joined #archiveteam [16:04] *** cva has quit IRC (Remote host closed the connection) [16:16] *** cva has joined #archiveteam [16:37] *** RichardG_ has joined #archiveteam [16:40] *** RichardG has quit IRC (Read error: Operation timed out) [16:48] *** mistym has joined #archiveteam [16:53] *** bzc6p_ is now known as bzc6p [17:37] *** JesseW has quit IRC (Ping timeout: 512 seconds) [17:37] *** Swizzle has joined #archiveteam [18:00] *** deathy has quit IRC (Remote host closed the connection) [18:06] *** habi has joined #archiveteam [18:12] *** Zebranky has quit IRC (Ping timeout: 240 seconds) [18:14] *** deathy has joined #archiveteam [18:17] *** habi has quit IRC (Quit: Leaving.) [18:21] *** Zebranky has joined #archiveteam [18:45] *** JesseW has joined #archiveteam [18:47] any reasonable way to get age blocked pages like http://community.quakecon.org/2015/06/03/quakecon-interview-tim-willits/ into WM? [18:49] what's WM? [18:50] wayback machne [18:50] ah, makes sense. [18:51] AFAIK, the WM doesn't support any pages that aren't available without any registration... [18:52] *** philpem has joined #archiveteam [18:54] schbirid: an obvious but long way is 1. enter date 2. save cookie 3. wget --load-cookies 4. upload warc 5. ask it to be fed by WM. Only question is if Wayback should have agelimited pages without agelimit. (JesseW: if it is done that way, WM "supports" it.) [18:55] I don't know FurAffinity's case, but if there *are* 18+ pages saved, that is a precedent case. Or, the same is being planned with Blogspot [18:56] as they will also pop up in WM [18:57] * JesseW nods -- good to know [18:58] *** habi has joined #archiveteam [18:59] JesseW: basically, WM shows anything that is loaded into it. It just depends on the human loading it. The Internet Archive crawler is another cup of tea, that is, of course, not that intelligent to do such tricks, and IA doesn't want it either, I guess. [18:59] Only ArchiveTeam is so rude: 18+, no robots.txt etc. [19:01] :-) [19:10] *** habi has left [19:11] Moto is: Fucking grab it regardless [19:11] :) [19:11] Saving Your Shit [19:29] *** aaaaaaaaa has joined #archiveteam [19:33] *** primus104 has joined #archiveteam [20:01] *** bzc6p has quit IRC (Read error: Operation timed out) [20:17] *** cva has quit IRC (Ping timeout: 186 seconds) [20:27] *** signius has quit IRC (Ping timeout: 240 seconds) [20:40] *** signius has joined #archiveteam [20:44] Hi, folks. [20:44] I'm up and down due to illness. Something needed? [20:44] I see someone wanted to upload 500gb of "website files" [20:45] Ostensibly they were worried abusing that much "drive storage" and "bandwidth speed" [20:45] in #coldstorage, we're making good progress downloading over 2 GB of sf.net project metadata... [20:46] Good [20:46] Suck that place dry. We should have done it a year ago. [20:47] SketchCow: Get well soon! [21:22] you know what grinds my gears? sites that discriminate solely against the wget user agent. [21:22] wget --user-agent="Eat Delicious Poop" [21:22] by changing a single letter or making the agent string "" I can download the file, but if it's wget's agent, NOOOO [21:22] wyatt8740: and it's so easy to avoid... [21:23] yeah [21:23] I suppose it's a bit of speedbump, though [21:23] just annoying... why do they even bother? [21:23] I mean, if you're scripting a ton of downloads, its only one change [21:23] and if you're downloading a single file, why does it matter if you use wget? [21:23] relevant, http://ascii.textfiles.com/archives/1311 [21:24] the closest I've ever done to that is requiring HTTPS access to not give a 403 [21:24] but there's at least a good reason for that [21:26] and if I link to it then, I give a https:// link. [21:27] lol '--user-agent=EatDeliciousPoop' [21:57] so i'm grabbing koreanet-1/daegu/2003/special videos [21:58] there over 50 minutes each [22:04] *** bzc6p has joined #archiveteam [22:11] *** habi has joined #archiveteam [22:15] * JesseW was just reading over SketchCow's last few blog posts ( http://ascii.textfiles.com/ ) -- you folks are completely MAD. And it's WONDERFUL. [22:17] Is it madness in a sane world or sanity in a mad one? And could you ever really know? [22:17] both, clearly [22:20] There's an "off-topic channel" at #archiveteam-bs. If it's not about archiving something right now, please join that channel :) [22:24] jesus [22:24] gitorious [22:24] this is more a shitshow than a bloodbath [22:24] gitorijesus [22:24] what, running it? [22:24] xmc: what have they done now? [22:24] rsync is up to 337G of address space and 11G of actual memory pages touched [22:25] oh heh you thought the tracker was bad, try running the gitorious app [22:25] there's like ten grillion hardlinks [22:25] rsync -H is in dire need of ... something [22:25] lmdb support for tracking hard links? ;) [22:25] it tracks all the hardklunks in some kind of hashtable [22:25] lmdb? [22:26] just a good kv store [22:26] oh [22:26] i mean i could also patch it to use sqlite, but that's not happening [22:27] i think the right course of action now is to mount the filesystem readonly on their end and just rsync the fs as a file directly [22:27] then i can deal with it as a loop filesystem and call it enough [22:30] *** habi has left [22:30] I keep reading lmdb as imdb and getting confused [23:31] *** BlueMaxim has joined #archiveteam [23:40] *** mistym has quit IRC (Remote host closed the connection) [23:46] *** primus has quit IRC (Ping timeout: 306 seconds)