#archiveteam 2015-06-06,Sat

↑back Search

Time Nickname Message
00:04 πŸ”— godane has quit IRC (Read error: Operation timed out)
00:17 πŸ”— bithippo has joined #archiveteam
00:17 πŸ”— bithippo Can someone toss http://www.clearwater.edu/ into ArchiveBot? They're closing their doors June 30th (http://www.clearwater.edu/news/campusnews.asp?ObjectID=2131)
00:17 πŸ”— bithippo Thanks!
00:20 πŸ”— godane has joined #archiveteam
00:20 πŸ”— garyrh Thanks, it's in!
00:20 πŸ”— bithippo Thank you!
00:24 πŸ”— bithippo has quit IRC (Quit: Page closed)
00:37 πŸ”— JesseW has joined #archiveteam
00:41 πŸ”— dinomite_ has joined #archiveteam
00:43 πŸ”— joepie91_ has joined #archiveteam
00:44 πŸ”— edsu_ has joined #archiveteam
00:44 πŸ”— dinomite has quit IRC (Read error: Operation timed out)
00:44 πŸ”— joepie91 has quit IRC (Read error: Operation timed out)
00:44 πŸ”— SketchCow has quit IRC (Read error: Connection reset by peer)
00:44 πŸ”— torvik has quit IRC (Ping timeout: 255 seconds)
00:44 πŸ”— Stiletto has quit IRC (Ping timeout: 255 seconds)
00:45 πŸ”— edsu has quit IRC (Ping timeout: 255 seconds)
00:45 πŸ”— torvik_ has joined #archiveteam
00:45 πŸ”— swebb has quit IRC (Ping timeout: 255 seconds)
00:45 πŸ”— swebb_ has joined #archiveteam
00:45 πŸ”— torvik_ is now known as torvik
00:45 πŸ”— Stiletto has joined #archiveteam
00:45 πŸ”— swebb_ is now known as swebb
00:48 πŸ”— SketchCow has joined #archiveteam
00:50 πŸ”— Stiletto has quit IRC (Ping timeout: 240 seconds)
00:54 πŸ”— godane has quit IRC (Quit: Leaving.)
00:54 πŸ”— godane has joined #archiveteam
01:04 πŸ”— JesseW has quit IRC (Quit: Leaving.)
01:10 πŸ”— Stiletto has joined #archiveteam
01:12 πŸ”— JesseW has joined #archiveteam
01:20 πŸ”— Start has joined #archiveteam
01:27 πŸ”— n00b674 has joined #archiveteam
01:30 πŸ”— cva_ has joined #archiveteam
01:31 πŸ”— godane has quit IRC (Quit: Leaving.)
01:31 πŸ”— n00b674 has quit IRC (Client Quit)
01:31 πŸ”— godane has joined #archiveteam
01:38 πŸ”— primus104 has quit IRC (Leaving.)
01:45 πŸ”— username1 has joined #archiveteam
01:47 πŸ”— schbirid2 has quit IRC (Read error: Operation timed out)
01:56 πŸ”— bzc6p_ has joined #archiveteam
01:58 πŸ”— Start_ has joined #archiveteam
02:03 πŸ”— bzc6p has quit IRC (Ping timeout: 600 seconds)
02:04 πŸ”— mistym has joined #archiveteam
02:08 πŸ”— Start has quit IRC (Ping timeout: 740 seconds)
02:15 πŸ”— Start_ has quit IRC (Ping timeout: 740 seconds)
02:24 πŸ”— cva_ is now known as cva
03:42 πŸ”— Ymgve__ has quit IRC ()
04:28 πŸ”— aaaaaaaaa has quit IRC (Leaving)
04:29 πŸ”— bithippo has joined #archiveteam
04:41 πŸ”— VADemon has quit IRC (Read error: Connection reset by peer)
05:11 πŸ”— pikhq has quit IRC (Ping timeout: 370 seconds)
05:23 πŸ”— bithippo has quit IRC (Quit: Page closed)
05:52 πŸ”— mistym has quit IRC (Remote host closed the connection)
06:21 πŸ”— mistym has joined #archiveteam
06:41 πŸ”— pikhq has joined #archiveteam
06:51 πŸ”— zenguy_pc has quit IRC (hub.efnet.us irc.Prison.NET)
06:51 πŸ”— Burninate has quit IRC (hub.efnet.us irc.Prison.NET)
06:51 πŸ”— kisspunch has quit IRC (hub.efnet.us irc.Prison.NET)
06:51 πŸ”— midas has quit IRC (hub.efnet.us irc.Prison.NET)
06:51 πŸ”— wyatt8740 has quit IRC (hub.efnet.us irc.Prison.NET)
06:51 πŸ”— d6e has quit IRC (hub.efnet.us irc.Prison.NET)
06:51 πŸ”— patrickod has quit IRC (hub.efnet.us irc.Prison.NET)
06:51 πŸ”— yuvadm has quit IRC (hub.efnet.us irc.Prison.NET)
06:51 πŸ”— db48x has quit IRC (hub.efnet.us irc.Prison.NET)
06:55 πŸ”— yuvadm_ has joined #archiveteam
06:58 πŸ”— Burnin8 has joined #archiveteam
07:02 πŸ”— zenguy_pc has joined #archiveteam
07:02 πŸ”— kisspunch has joined #archiveteam
07:02 πŸ”— midas has joined #archiveteam
07:02 πŸ”— d6e has joined #archiveteam
07:02 πŸ”— wyatt8740 has joined #archiveteam
07:02 πŸ”— patrickod has joined #archiveteam
07:06 πŸ”— SN4T14_ has quit IRC (Read error: Connection reset by peer)
07:11 πŸ”— SN4T14 has joined #archiveteam
07:18 πŸ”— SN4T14 has quit IRC (Read error: Connection reset by peer)
07:19 πŸ”— SN4T14 has joined #archiveteam
07:25 πŸ”— SN4T14 has quit IRC (Read error: Connection reset by peer)
07:25 πŸ”— SN4T14 has joined #archiveteam
07:36 πŸ”— JesseW has quit IRC (Quit: Leaving.)
07:48 πŸ”— primus104 has joined #archiveteam
08:00 πŸ”— McGEE has quit IRC (Quit: Connection closed for inactivity)
08:09 πŸ”— caber has quit IRC (Read error: Operation timed out)
08:11 πŸ”— caber has joined #archiveteam
08:14 πŸ”— primus104 has quit IRC (Leaving.)
08:18 πŸ”— signius has quit IRC (Ping timeout: 265 seconds)
08:24 πŸ”— SN4T14_ has joined #archiveteam
08:26 πŸ”— SN4T14 has quit IRC (Ping timeout: 369 seconds)
08:31 πŸ”— signius has joined #archiveteam
08:34 πŸ”— habi has joined #archiveteam
08:35 πŸ”— habi has left
08:40 πŸ”— mistym has quit IRC (Remote host closed the connection)
08:45 πŸ”— brayden_ has joined #archiveteam
08:45 πŸ”— brayden has quit IRC (Read error: Connection reset by peer)
08:48 πŸ”— primus104 has joined #archiveteam
09:05 πŸ”— bzc6p_ is now known as bzc6p
09:17 πŸ”— username1 has quit IRC (Quit: Leaving)
09:22 πŸ”— schbirid has joined #archiveteam
09:41 πŸ”— mistym has joined #archiveteam
09:43 πŸ”— schbirid has quit IRC (Quit: Leaving)
09:47 πŸ”— mistym has quit IRC (Ping timeout: 252 seconds)
10:18 πŸ”— nox has quit IRC ()
11:01 πŸ”— nox has joined #archiveteam
11:15 πŸ”— primus104 has quit IRC (Leaving.)
11:16 πŸ”— random353 has joined #archiveteam
11:17 πŸ”— random353 Hello. Is it okay to upload 500GB of website files to archive.org? It's compressed as 7z.
11:19 πŸ”— bzc6p random353: .warc.gz is preferred for web content, but if that's not available, other formats are also okay.
11:20 πŸ”— bzc6p Size of 500 GB is maybe too big.
11:21 πŸ”— bzc6p We should wait for some others' opinion about that.
11:21 πŸ”— bzc6p random353: what is that content, by the way?
11:21 πŸ”— bzc6p what website?
11:22 πŸ”— random353 4chan and websites like 4chan
11:22 πŸ”— random353 text of threads
11:22 πŸ”— random353 no images
11:28 πŸ”— random353 has quit IRC (Quit: http://www.mibbit.com ajax IRC Client)
11:55 πŸ”— mariusz has joined #archiveteam
11:59 πŸ”— Ymgve has joined #archiveteam
12:15 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
12:16 πŸ”— SN4T14_ has quit IRC (Ping timeout: 606 seconds)
12:16 πŸ”— Sanqui did anybody catch that "Google Moderator is shutting down on June 30, 2015"
12:17 πŸ”— Sanqui guess not
12:20 πŸ”— bzc6p Sanqui: long ago
12:20 πŸ”— bzc6p #moderhater
12:21 πŸ”— Sanqui there's no page for it lol
12:21 πŸ”— Sanqui I was about to make one, guess I still should
12:21 πŸ”— bzc6p sure
12:21 πŸ”— bzc6p the channel is empty too
12:22 πŸ”— Sanqui maybe we should just throw it in archivebot and call it a day
12:22 πŸ”— bzc6p Well, we're not late at all
12:23 πŸ”— Sanqui god I hate mediawiki
12:24 πŸ”— bzc6p Moderator: "site is pure javascript but there are csv zip download links" (chfoo, Apr 10)
12:28 πŸ”— bzc6p So archivebot can't do here too much I guess.
12:28 πŸ”— bzc6p How much I hate Javascript.
12:29 πŸ”— sirdancea has quit IRC (Quit: Leaving)
12:32 πŸ”— bzc6p "For the month of July, Google Moderator will be β€œread-only.”"
12:32 πŸ”— bzc6p So we shouldn't even start saving it before July.
12:55 πŸ”— Sanqui yeah
13:16 πŸ”— sirdancea has joined #archiveteam
13:23 πŸ”— primus104 has joined #archiveteam
14:05 πŸ”— primus104 has quit IRC (Leaving.)
14:09 πŸ”— bzc6p_ has joined #archiveteam
14:16 πŸ”— SN4T14 has joined #archiveteam
14:16 πŸ”— bzc6p has quit IRC (Ping timeout: 600 seconds)
14:41 πŸ”— toad1 has joined #archiveteam
14:41 πŸ”— McGEE has joined #archiveteam
14:44 πŸ”— JesseW has joined #archiveteam
14:50 πŸ”— toad2 has quit IRC (Read error: Operation timed out)
15:02 πŸ”— mariusz has quit IRC (WeeChat 1.1)
15:10 πŸ”— schbirid has joined #archiveteam
15:53 πŸ”— VADemon has joined #archiveteam
16:04 πŸ”— cva has quit IRC (Remote host closed the connection)
16:16 πŸ”— cva has joined #archiveteam
16:37 πŸ”— RichardG_ has joined #archiveteam
16:40 πŸ”— RichardG has quit IRC (Read error: Operation timed out)
16:48 πŸ”— mistym has joined #archiveteam
16:53 πŸ”— bzc6p_ is now known as bzc6p
17:37 πŸ”— JesseW has quit IRC (Ping timeout: 512 seconds)
17:37 πŸ”— Swizzle has joined #archiveteam
18:00 πŸ”— deathy has quit IRC (Remote host closed the connection)
18:06 πŸ”— habi has joined #archiveteam
18:12 πŸ”— Zebranky has quit IRC (Ping timeout: 240 seconds)
18:14 πŸ”— deathy has joined #archiveteam
18:17 πŸ”— habi has quit IRC (Quit: Leaving.)
18:21 πŸ”— Zebranky has joined #archiveteam
18:45 πŸ”— JesseW has joined #archiveteam
18:47 πŸ”— schbirid any reasonable way to get age blocked pages like http://community.quakecon.org/2015/06/03/quakecon-interview-tim-willits/ into WM?
18:49 πŸ”— JesseW what's WM?
18:50 πŸ”— schbirid wayback machne
18:50 πŸ”— JesseW ah, makes sense.
18:51 πŸ”— JesseW AFAIK, the WM doesn't support any pages that aren't available without any registration...
18:52 πŸ”— philpem has joined #archiveteam
18:54 πŸ”— bzc6p schbirid: an obvious but long way is 1. enter date 2. save cookie 3. wget --load-cookies 4. upload warc 5. ask it to be fed by WM. Only question is if Wayback should have agelimited pages without agelimit. (JesseW: if it is done that way, WM "supports" it.)
18:55 πŸ”— bzc6p I don't know FurAffinity's case, but if there *are* 18+ pages saved, that is a precedent case. Or, the same is being planned with Blogspot
18:56 πŸ”— bzc6p as they will also pop up in WM
18:57 πŸ”— * JesseW nods -- good to know
18:58 πŸ”— habi has joined #archiveteam
18:59 πŸ”— bzc6p JesseW: basically, WM shows anything that is loaded into it. It just depends on the human loading it. The Internet Archive crawler is another cup of tea, that is, of course, not that intelligent to do such tricks, and IA doesn't want it either, I guess.
18:59 πŸ”— bzc6p Only ArchiveTeam is so rude: 18+, no robots.txt etc.
19:01 πŸ”— JesseW :-)
19:10 πŸ”— habi has left
19:11 πŸ”— ersi Moto is: Fucking grab it regardless
19:11 πŸ”— ersi :)
19:11 πŸ”— JesseW Saving Your Shit
19:29 πŸ”— aaaaaaaaa has joined #archiveteam
19:33 πŸ”— primus104 has joined #archiveteam
20:01 πŸ”— bzc6p has quit IRC (Read error: Operation timed out)
20:17 πŸ”— cva has quit IRC (Ping timeout: 186 seconds)
20:27 πŸ”— signius has quit IRC (Ping timeout: 240 seconds)
20:40 πŸ”— signius has joined #archiveteam
20:44 πŸ”— SketchCow Hi, folks.
20:44 πŸ”— SketchCow I'm up and down due to illness. Something needed?
20:44 πŸ”— SketchCow I see someone wanted to upload 500gb of "website files"
20:45 πŸ”— SketchCow Ostensibly they were worried abusing that much "drive storage" and "bandwidth speed"
20:45 πŸ”— JesseW in #coldstorage, we're making good progress downloading over 2 GB of sf.net project metadata...
20:46 πŸ”— SketchCow Good
20:46 πŸ”— SketchCow Suck that place dry. We should have done it a year ago.
20:47 πŸ”— Sanqui SketchCow: Get well soon!
21:22 πŸ”— wyatt8740 you know what grinds my gears? sites that discriminate solely against the wget user agent.
21:22 πŸ”— xmc wget --user-agent="Eat Delicious Poop"
21:22 πŸ”— wyatt8740 by changing a single letter or making the agent string "" I can download the file, but if it's wget's agent, NOOOO
21:22 πŸ”— JesseW wyatt8740: and it's so easy to avoid...
21:23 πŸ”— wyatt8740 yeah
21:23 πŸ”— JesseW I suppose it's a bit of speedbump, though
21:23 πŸ”— wyatt8740 just annoying... why do they even bother?
21:23 πŸ”— wyatt8740 I mean, if you're scripting a ton of downloads, its only one change
21:23 πŸ”— wyatt8740 and if you're downloading a single file, why does it matter if you use wget?
21:23 πŸ”— xmc relevant, http://ascii.textfiles.com/archives/1311
21:24 πŸ”— wyatt8740 the closest I've ever done to that is requiring HTTPS access to not give a 403
21:24 πŸ”— wyatt8740 but there's at least a good reason for that
21:26 πŸ”— wyatt8740 and if I link to it then, I give a https:// link.
21:27 πŸ”— wyatt8740 lol '--user-agent=EatDeliciousPoop'
21:57 πŸ”— godane so i'm grabbing koreanet-1/daegu/2003/special videos
21:58 πŸ”— godane there over 50 minutes each
22:04 πŸ”— bzc6p has joined #archiveteam
22:11 πŸ”— habi has joined #archiveteam
22:15 πŸ”— * JesseW was just reading over SketchCow's last few blog posts ( http://ascii.textfiles.com/ ) -- you folks are completely MAD. And it's WONDERFUL.
22:17 πŸ”— aaaaaaaaa Is it madness in a sane world or sanity in a mad one? And could you ever really know?
22:17 πŸ”— JesseW both, clearly
22:20 πŸ”— ersi There's an "off-topic channel" at #archiveteam-bs. If it's not about archiving something right now, please join that channel :)
22:24 πŸ”— xmc jesus
22:24 πŸ”— xmc gitorious
22:24 πŸ”— xmc this is more a shitshow than a bloodbath
22:24 πŸ”— yipdw gitorijesus
22:24 πŸ”— yipdw what, running it?
22:24 πŸ”— JesseW xmc: what have they done now?
22:24 πŸ”— xmc rsync is up to 337G of address space and 11G of actual memory pages touched
22:25 πŸ”— yipdw oh heh you thought the tracker was bad, try running the gitorious app
22:25 πŸ”— xmc there's like ten grillion hardlinks
22:25 πŸ”— xmc rsync -H is in dire need of ... something
22:25 πŸ”— ivan` lmdb support for tracking hard links? ;)
22:25 πŸ”— xmc it tracks all the hardklunks in some kind of hashtable
22:25 πŸ”— xmc lmdb?
22:26 πŸ”— ivan` just a good kv store
22:26 πŸ”— xmc oh
22:26 πŸ”— xmc i mean i could also patch it to use sqlite, but that's not happening
22:27 πŸ”— xmc i think the right course of action now is to mount the filesystem readonly on their end and just rsync the fs as a file directly
22:27 πŸ”— xmc then i can deal with it as a loop filesystem and call it enough
22:30 πŸ”— habi has left
22:30 πŸ”— yipdw I keep reading lmdb as imdb and getting confused
23:31 πŸ”— BlueMaxim has joined #archiveteam
23:40 πŸ”— mistym has quit IRC (Remote host closed the connection)
23:46 πŸ”— primus has quit IRC (Ping timeout: 306 seconds)

irclogger-viewer