[00:01] welp, 200gb away from hittin 10tb of github data [00:03] thanks for downloading all that again, it will surely be handy [00:03] people are pretty delete-happy on github [00:06] WiK: where are you storing this all? [00:08] WiK: are you updating repos with git pull --rebase? there are special considerations if you are updating, as people can force-push commits that will cause commits in your local mirror to eventually disappear [00:08] s/git pull --rebase/git fetch/ or whatever [00:14] im just doing git clones [00:14] balrog: 4 or 5 different external (usb3) harddrives [00:14] and my database keeps track of which hardrive ive stored the project on [00:15] ivan`: im just cloning them, i have not gone back to update anything yet (and may not) [00:16] WiK: if you do update them, you have to disable gc completely, or tag the commits you already have [00:17] ya, for my project i dont really need to go back and update them [00:24] anyone want to wget-lua this domain? https://www.rijksmuseum.nl/en/explore-the-collection/overview [00:24] WiK, check out this lame attempt http://datasyndrome.com/post/51657080886/downloading-and-processing-the-github-data [00:24] claims to have a lot of art; images are split up into tiles and probably need some code [00:28] omf_: i dont know if i would call it a 'lame' attempt [00:29] but no clue what they are tring to do [00:32] also cant tell if they are only downloading data from one project or not [00:38] I have tried things with the githubarchive [00:38] it is very limited data [00:38] I would go so far as to say it is not even a big enough sample to be statistically significant. Thanks for getting all the data [00:47] omf_: How'd the WARC gallery go? [01:14] you guys have old wired mags or some really old computer magazines? [01:15] i need to come up with a good contest question [01:15] OKAY NERDS [01:15] This actually has interest and relevance to the team. [01:16] http://www-jake.archive.org/donate/ [01:16] Looking for mistakes, bugs, stupid [01:18] crappy resize job on brewster [01:19] my version of "WARC gallery" is HTTrack + Directory Opus, flat view enabled, reverse sort by file size, thumbnail view [01:19] many hours can be killed hitting pgdn or the mousewheel [01:20] SketchCow: can i suggest expending 'Programs' or maybe a link to what the programs are from the 'where you money goes'? [01:22] also: there are page errors on : http://www-jake.archive.org/about/volunteerpositions.php [01:22] at the bottom the *'s are outside of the box under pysical/special requirements [01:27] That's a different thing. [01:33] Any other notes? [01:34] It is not responsive for mobile devices [01:34] not really, i just loked at the site and asked 'why would i donate?' [01:34] I can fix that [01:35] As for the gallery it keeps crashing on the 50gb warc and I have no idea why [01:36] Which mobile device, omf_ ? [01:36] Because I'm on my ipad, it's fine. [01:36] I tested with the andriod sdk and the opera mobile with multiple user agent strings [01:37] I just used it successfully on my Galaxy S4 [01:38] Also the bitcoin button does not appear [01:40] It won't appear if you select subscription [01:41] okay [03:08] 12631766 tumblogs.txt [03:08] that's a lot of tumblogs [03:09] that's the number of unique tumblr subdomains/blogs we (IA) know about [03:30] http://tracker.archiveteam.org/greader/ :-) [03:36] :D [03:37] https://github.com/ArchiveTeam/greader-grab :-) [03:55] a lot of words from http://www.archiveteam.org/index.php?title=Posterous should be on http://www.archiveteam.org/index.php?title=Google_Reader [03:55] in case somebody really likes writing words [03:56] Failed WgetDownload for Item 0000010776 [03:56] Process WgetDownload returned exit code 5 for Item 0000010776 [03:56] hmm [03:57] i must be missing seesaw [03:57] that's 5 SSL verification failure. [03:57] I pinned the download to EquifaxSecureCA [03:57] maybe you're in another country and getting a different CA [03:57] or your wget is out of whack [03:57] i'm in the us [03:57] same [03:58] hmm [03:59] can you load https://www.google.com/ in Firefox and tell me the cert chain? [03:59] this is a colo'd box so that's a little tricky [03:59] are you using run-pipeline the normal way? [04:00] i think so [04:00] run-pipeline --disable-web-server --concurrent 2 pipeline.py [04:00] might i need ot update my seesaw? [04:01] let me check [04:01] also can you paste me the output of: openssl s_client -connect www.google.com:443 [04:02] seesaw 0.0.12 does support env= [04:02] http://www.skeleboner.com/openssl.txt [04:04] that looks fine, you're not being MITMed or anything [04:04] that's a good thing [04:05] the cert-pinning is done by env=dict(SSL_CERT_DIR=SSL_CERT_DIR), in the pipeline [04:05] I have no idea why it's not working for you [04:06] weirdness [04:06] maybe your wget wants more certs [04:07] did you ./get-wget-lua.sh? [04:07] i did [04:07] is your wget linked to these or something else [04:07] libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007f4e5ec9e000) [04:07] libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f4e5f079000) [04:08] GNU Wget 1.14.lua.20130523-9a5c built on linux-gnu. [04:08] libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007fda7b7b1000) [04:08] libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007fda7bb52000) [04:08] hmm [04:09] let me check if I have a working wget linked to that [04:10] I have one working wget (doing the greader job) linked to libgnutls.so.26 => /usr/lib/x86_64-linux-gnu/libgnutls.so.26 (0x00007fa87b238000) [04:10] and another on CentOS linked to libssl.so.10 => /usr/lib/libssl.so.10 (0xb7f41000) [04:10] libcrypto.so.10 => /usr/lib/libcrypto.so.10 (0xb7db4000) [04:11] but nothing linked to 0.9.8, so that could be the problem [04:11] hmm ok [04:11] I'll have to fix it since a lot of people probably have that [04:11] yeah, i think i'm debian stable [04:12] urg and i'm afk [04:13] since you have no men in the middle, you are welcome to remove that env= line if you want to get it started [04:13] ok [04:14] thanks :) [04:14] thanks for grabbing [04:15] of course! [04:15] gotta get in early so i can pretend that i can compete with underscor briefly [04:16] :D [04:17] :p [04:19] underscor cheats, you know that right [04:20] how does underscor cheat? [04:20] i would like to also cheat in a similar fashion ;) [04:20] I work for IA [04:20] so I have a lot of spare pipes [04:20] so yeah [04:20] would like to cheat in a similar fashion ;) [04:21] haha [04:21] kennethre has a better deal [04:21] he can scale much bigger than I [04:21] (works for heroku) [04:21] nice [04:24] meanwhile I'm a tiny australian with bad internet :( [04:27] Isn't that all australians? [04:34] * BlueMax slaps underscor. [04:34] Stop insulting my country! [04:52] I cheated by taking credit for 91 items that were rm'ed and re-done ;) [05:18] bluemax: I'm in australia, with 100mbps (admittedly it's work's connection) [05:18] I've never been anywhere near something like that [05:38] pft: I installed an amd64 Debian 6 and my wget-lua is linked to [05:38] libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007f59352f9000) [05:38] libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007f5934f58000) [05:38] no problems with the SSL [05:41] pft: oh never mind, it is FUBAR :-) thanks for helping narrow this down [06:22] * SmileyG looks in [06:22] how we doing guys? [06:29] pft: fixed in latest greader-grab [09:09] tumblr has a ton of blogs that start with a hyphen like http://-sheselectric-.tumblr.com/ [09:09] most browsers/dns servers seem to refuse such madness [09:09] google was okay with them, though :) [09:20] I cannot even click this in Quassel IRC... [09:28] Firefox 21.0 hates that link. [09:29] I managed to load it on Chrome 27 on Windows 7 using level3's dns servers [09:30] Sounds like some really good way to hide a website... how much blocking software or browsers used by government agencies will fail here? ;) [10:40] tales from a pre-wget-lua world https://github.com/ArchiveTeam/archive-wars/blob/master/archivewars.sh [10:48] indeed [10:48] Pre-WARC world as well :) [11:56] i'm grabing the support forums of theblaze [12:26] I have a question about the Warrior - when I up the numner of simultaneous sessions (in settings) - does it not happen until current projects are finished? [12:30] I think it'll spin up more when an item is completed [12:40] Good, good. We'll see when they complete then. I'm new at this, only fired it up yesterday :) [12:41] I find it usually takes effect straight away, unless the warrior is shutting down for some reason [12:41] Turning the number DOWN won't have an effect until a job completes - as it will finish what it started - but it is usually able to start new jobs straight away if the number goes up. [12:42] Hmm, weird then. I upped the number of download from 2 to 4, but it's still churning away at the original 2. [12:43] I'd say, wait to see it start the next item.. it'll probably do it sooner or later [12:43] It's been at these two for a good long while, so i'd rather not restart and loose the work. [12:43] Yep. :) [12:43] Thanks for the answers. [12:47] Yup it starts more when an item finishes. [12:47] and welcome menacespb :) [13:00] Smiley: Thanks :) It's important work, and hey - I had a laptop on my desk that wasn't doing anything much besides irc anyway, so.. :) [13:01] hehe [13:10] i found another project: http://www.fuzzymemories.tv [13:16] there is also a youtube channel: http://www.youtube.com/user/FuzzyMemoriesTV [16:11] Never noticed the warrior didn't do that until today. Doh! :) [16:13] [if you want to force the new jobs to arrive without waiting for the existing ones to end, just click 'shut down' - don't worry, it won't - then click 'keep running') [16:16] SyntaxError: Expected ] [16:18] pick up square bracket [16:18] > YOU NOW HAVE THE SQUARE BRACKET [16:18] take sentence [16:18] ye cannot get ye bracket [16:18] > YOU CAN'T TAKE A SENTENCE [16:18] grasp sentence [16:18] > YOU HAVE THE SENTENCE [16:18] apply sentence [16:19] remove end bracket [16:19] > YOU HAVE BEEN SENTENCED TO DEATH [16:19] F*@!# [16:19] :) [19:22] https://twitter.com/vincentchu/status/339825371912495104 [19:24] Cloud services [19:53] hoi, the fileplanet archiving has upped all the files to IA now (1 year + couple of weeks later) :) [19:53] next before publicity is making a nice interface [19:53] and a readme etc [19:54] ~120 000 public files [19:54] ~200-300 000 not public as those are mixed with private files and i highly value privacy [19:56] 9.5TB public [19:57] close to 2TB non-public i think [19:57] please do not shoot the publicity gun, right now it is not for end users at all [19:57] anyways, yay, finally :) [23:00] Anyone remember who was discussing cinnemageddon a while back? Perhaps here or in -bs? [23:09] Thanks, schbiridi