#archiveteam 2013-05-30,Thu

↑back Search

Time Nickname Message
00:01 🔗 WiK welp, 200gb away from hittin 10tb of github data
00:03 🔗 ivan` thanks for downloading all that again, it will surely be handy
00:03 🔗 ivan` people are pretty delete-happy on github
00:06 🔗 balrog WiK: where are you storing this all?
00:08 🔗 ivan` WiK: are you updating repos with git pull --rebase? there are special considerations if you are updating, as people can force-push commits that will cause commits in your local mirror to eventually disappear
00:08 🔗 ivan` s/git pull --rebase/git fetch/ or whatever
00:14 🔗 WiK im just doing git clones
00:14 🔗 WiK balrog: 4 or 5 different external (usb3) harddrives
00:14 🔗 WiK and my database keeps track of which hardrive ive stored the project on
00:15 🔗 WiK ivan`: im just cloning them, i have not gone back to update anything yet (and may not)
00:16 🔗 ivan` WiK: if you do update them, you have to disable gc completely, or tag the commits you already have
00:17 🔗 WiK ya, for my project i dont really need to go back and update them
00:24 🔗 ivan` anyone want to wget-lua this domain? https://www.rijksmuseum.nl/en/explore-the-collection/overview
00:24 🔗 omf_ WiK, check out this lame attempt http://datasyndrome.com/post/51657080886/downloading-and-processing-the-github-data
00:24 🔗 ivan` claims to have a lot of art; images are split up into tiles and probably need some code
00:28 🔗 WiK omf_: i dont know if i would call it a 'lame' attempt
00:29 🔗 WiK but no clue what they are tring to do
00:32 🔗 WiK also cant tell if they are only downloading data from one project or not
00:38 🔗 omf_ I have tried things with the githubarchive
00:38 🔗 omf_ it is very limited data
00:38 🔗 omf_ I would go so far as to say it is not even a big enough sample to be statistically significant. Thanks for getting all the data
00:47 🔗 SketchCow omf_: How'd the WARC gallery go?
01:14 🔗 WiK you guys have old wired mags or some really old computer magazines?
01:15 🔗 WiK i need to come up with a good contest question
01:15 🔗 SketchCow OKAY NERDS
01:15 🔗 SketchCow This actually has interest and relevance to the team.
01:16 🔗 SketchCow http://www-jake.archive.org/donate/
01:16 🔗 SketchCow Looking for mistakes, bugs, stupid
01:18 🔗 DFJustin crappy resize job on brewster
01:19 🔗 ivan` my version of "WARC gallery" is HTTrack + Directory Opus, flat view enabled, reverse sort by file size, thumbnail view
01:19 🔗 ivan` many hours can be killed hitting pgdn or the mousewheel
01:20 🔗 WiK SketchCow: can i suggest expending 'Programs' or maybe a link to what the programs are from the 'where you money goes'?
01:22 🔗 WiK also: there are page errors on : http://www-jake.archive.org/about/volunteerpositions.php
01:22 🔗 WiK at the bottom the *'s are outside of the box under pysical/special requirements
01:27 🔗 SketchCow That's a different thing.
01:33 🔗 SketchCow Any other notes?
01:34 🔗 omf_ It is not responsive for mobile devices
01:34 🔗 WiK not really, i just loked at the site and asked 'why would i donate?'
01:34 🔗 omf_ I can fix that
01:35 🔗 omf_ As for the gallery it keeps crashing on the 50gb warc and I have no idea why
01:36 🔗 SketchCow Which mobile device, omf_ ?
01:36 🔗 SketchCow Because I'm on my ipad, it's fine.
01:36 🔗 omf_ I tested with the andriod sdk and the opera mobile with multiple user agent strings
01:37 🔗 SketchCow I just used it successfully on my Galaxy S4
01:38 🔗 omf_ Also the bitcoin button does not appear
01:40 🔗 SketchCow It won't appear if you select subscription
01:41 🔗 omf_ okay
03:08 🔗 underscor 12631766 tumblogs.txt
03:08 🔗 underscor that's a lot of tumblogs
03:09 🔗 underscor that's the number of unique tumblr subdomains/blogs we (IA) know about
03:30 🔗 ivan` http://tracker.archiveteam.org/greader/ :-)
03:36 🔗 BlueMax :D
03:37 🔗 ivan` https://github.com/ArchiveTeam/greader-grab :-)
03:55 🔗 ivan` a lot of words from http://www.archiveteam.org/index.php?title=Posterous should be on http://www.archiveteam.org/index.php?title=Google_Reader
03:55 🔗 ivan` in case somebody really likes writing words
03:56 🔗 pft Failed WgetDownload for Item 0000010776
03:56 🔗 pft Process WgetDownload returned exit code 5 for Item 0000010776
03:56 🔗 pft hmm
03:57 🔗 pft i must be missing seesaw
03:57 🔗 ivan` that's 5 SSL verification failure.
03:57 🔗 ivan` I pinned the download to EquifaxSecureCA
03:57 🔗 ivan` maybe you're in another country and getting a different CA
03:57 🔗 ivan` or your wget is out of whack
03:57 🔗 pft i'm in the us
03:57 🔗 ivan` same
03:58 🔗 pft hmm
03:59 🔗 ivan` can you load https://www.google.com/ in Firefox and tell me the cert chain?
03:59 🔗 pft this is a colo'd box so that's a little tricky
03:59 🔗 ivan` are you using run-pipeline the normal way?
04:00 🔗 pft i think so
04:00 🔗 pft run-pipeline --disable-web-server --concurrent 2 pipeline.py
04:00 🔗 pft might i need ot update my seesaw?
04:01 🔗 ivan` let me check
04:01 🔗 ivan` also can you paste me the output of: openssl s_client -connect www.google.com:443
04:02 🔗 ivan` seesaw 0.0.12 does support env=
04:02 🔗 pft http://www.skeleboner.com/openssl.txt
04:04 🔗 ivan` that looks fine, you're not being MITMed or anything
04:04 🔗 pft that's a good thing
04:05 🔗 ivan` the cert-pinning is done by env=dict(SSL_CERT_DIR=SSL_CERT_DIR), in the pipeline
04:05 🔗 ivan` I have no idea why it's not working for you
04:06 🔗 pft weirdness
04:06 🔗 ivan` maybe your wget wants more certs
04:07 🔗 ivan` did you ./get-wget-lua.sh?
04:07 🔗 pft i did
04:07 🔗 ivan` is your wget linked to these or something else
04:07 🔗 ivan` libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007f4e5ec9e000)
04:07 🔗 ivan` libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f4e5f079000)
04:08 🔗 pft GNU Wget 1.14.lua.20130523-9a5c built on linux-gnu.
04:08 🔗 pft libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007fda7b7b1000)
04:08 🔗 pft libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007fda7bb52000)
04:08 🔗 pft hmm
04:09 🔗 ivan` let me check if I have a working wget linked to that
04:10 🔗 ivan` I have one working wget (doing the greader job) linked to libgnutls.so.26 => /usr/lib/x86_64-linux-gnu/libgnutls.so.26 (0x00007fa87b238000)
04:10 🔗 ivan` and another on CentOS linked to libssl.so.10 => /usr/lib/libssl.so.10 (0xb7f41000)
04:10 🔗 ivan` libcrypto.so.10 => /usr/lib/libcrypto.so.10 (0xb7db4000)
04:11 🔗 ivan` but nothing linked to 0.9.8, so that could be the problem
04:11 🔗 pft hmm ok
04:11 🔗 ivan` I'll have to fix it since a lot of people probably have that
04:11 🔗 pft yeah, i think i'm debian stable
04:12 🔗 pft urg and i'm afk
04:13 🔗 ivan` since you have no men in the middle, you are welcome to remove that env= line if you want to get it started
04:13 🔗 pft ok
04:14 🔗 pft thanks :)
04:14 🔗 ivan` thanks for grabbing
04:15 🔗 pft of course!
04:15 🔗 pft gotta get in early so i can pretend that i can compete with underscor briefly
04:16 🔗 underscor :D
04:17 🔗 pft :p
04:19 🔗 BlueMax underscor cheats, you know that right
04:20 🔗 pft how does underscor cheat?
04:20 🔗 pft i would like to also cheat in a similar fashion ;)
04:20 🔗 underscor I work for IA
04:20 🔗 underscor so I have a lot of spare pipes
04:20 🔗 pft so yeah
04:20 🔗 pft would like to cheat in a similar fashion ;)
04:21 🔗 underscor haha
04:21 🔗 underscor kennethre has a better deal
04:21 🔗 underscor he can scale much bigger than I
04:21 🔗 underscor (works for heroku)
04:21 🔗 pft nice
04:24 🔗 BlueMax meanwhile I'm a tiny australian with bad internet :(
04:27 🔗 underscor Isn't that all australians?
04:34 🔗 * BlueMax slaps underscor.
04:34 🔗 BlueMax Stop insulting my country!
04:52 🔗 ivan` I cheated by taking credit for 91 items that were rm'ed and re-done ;)
05:18 🔗 trs80 bluemax: I'm in australia, with 100mbps (admittedly it's work's connection)
05:18 🔗 BlueMax I've never been anywhere near something like that
05:38 🔗 ivan` pft: I installed an amd64 Debian 6 and my wget-lua is linked to
05:38 🔗 ivan` libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007f59352f9000)
05:38 🔗 ivan` libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007f5934f58000)
05:38 🔗 ivan` no problems with the SSL
05:41 🔗 ivan` pft: oh never mind, it is FUBAR :-) thanks for helping narrow this down
06:22 🔗 * SmileyG looks in
06:22 🔗 SmileyG how we doing guys?
06:29 🔗 ivan` pft: fixed in latest greader-grab
09:09 🔗 ivan` tumblr has a ton of blogs that start with a hyphen like http://-sheselectric-.tumblr.com/
09:09 🔗 ivan` most browsers/dns servers seem to refuse such madness
09:09 🔗 ivan` google was okay with them, though :)
09:20 🔗 Tomcat_ I cannot even click this in Quassel IRC...
09:28 🔗 ersi Firefox 21.0 hates that link.
09:29 🔗 ivan` I managed to load it on Chrome 27 on Windows 7 using level3's dns servers
09:30 🔗 Tomcat_ Sounds like some really good way to hide a website... how much blocking software or browsers used by government agencies will fail here? ;)
10:40 🔗 ivan` tales from a pre-wget-lua world https://github.com/ArchiveTeam/archive-wars/blob/master/archivewars.sh
10:48 🔗 ersi indeed
10:48 🔗 ersi Pre-WARC world as well :)
11:56 🔗 godane i'm grabing the support forums of theblaze
12:26 🔗 menacespb I have a question about the Warrior - when I up the numner of simultaneous sessions (in settings) - does it not happen until current projects are finished?
12:30 🔗 ersi I think it'll spin up more when an item is completed
12:40 🔗 menacespb Good, good. We'll see when they complete then. I'm new at this, only fired it up yesterday :)
12:41 🔗 antomatic I find it usually takes effect straight away, unless the warrior is shutting down for some reason
12:41 🔗 antomatic Turning the number DOWN won't have an effect until a job completes - as it will finish what it started - but it is usually able to start new jobs straight away if the number goes up.
12:42 🔗 menacespb Hmm, weird then. I upped the number of download from 2 to 4, but it's still churning away at the original 2.
12:43 🔗 ersi I'd say, wait to see it start the next item.. it'll probably do it sooner or later
12:43 🔗 menacespb It's been at these two for a good long while, so i'd rather not restart and loose the work.
12:43 🔗 menacespb Yep. :)
12:43 🔗 menacespb Thanks for the answers.
12:47 🔗 Smiley Yup it starts more when an item finishes.
12:47 🔗 Smiley and welcome menacespb :)
13:00 🔗 menacespb Smiley: Thanks :) It's important work, and hey - I had a laptop on my desk that wasn't doing anything much besides irc anyway, so.. :)
13:01 🔗 Smiley hehe
13:10 🔗 godane i found another project: http://www.fuzzymemories.tv
13:16 🔗 godane there is also a youtube channel: http://www.youtube.com/user/FuzzyMemoriesTV
16:11 🔗 antomatic Never noticed the warrior didn't do that until today. Doh! :)
16:13 🔗 antomatic [if you want to force the new jobs to arrive without waiting for the existing ones to end, just click 'shut down' - don't worry, it won't - then click 'keep running')
16:16 🔗 InitHello SyntaxError: Expected ]
16:18 🔗 antomatic pick up square bracket
16:18 🔗 antomatic > YOU NOW HAVE THE SQUARE BRACKET
16:18 🔗 antomatic take sentence
16:18 🔗 InitHello ye cannot get ye bracket
16:18 🔗 antomatic > YOU CAN'T TAKE A SENTENCE
16:18 🔗 antomatic grasp sentence
16:18 🔗 antomatic > YOU HAVE THE SENTENCE
16:18 🔗 InitHello apply sentence
16:19 🔗 antomatic remove end bracket
16:19 🔗 InitHello > YOU HAVE BEEN SENTENCED TO DEATH
16:19 🔗 antomatic F*@!#
16:19 🔗 antomatic :)
19:22 🔗 SketchCow https://twitter.com/vincentchu/status/339825371912495104
19:24 🔗 Marcelo Cloud services
19:53 🔗 Schbirid hoi, the fileplanet archiving has upped all the files to IA now (1 year + couple of weeks later) :)
19:53 🔗 Schbirid next before publicity is making a nice interface
19:53 🔗 Schbirid and a readme etc
19:54 🔗 Schbirid ~120 000 public files
19:54 🔗 Schbirid ~200-300 000 not public as those are mixed with private files and i highly value privacy
19:56 🔗 Schbirid 9.5TB public
19:57 🔗 Schbirid close to 2TB non-public i think
19:57 🔗 Schbirid please do not shoot the publicity gun, right now it is not for end users at all
19:57 🔗 Schbirid anyways, yay, finally :)
23:00 🔗 underscor Anyone remember who was discussing cinnemageddon a while back? Perhaps here or in -bs?
23:09 🔗 SketchCow Thanks, schbiridi

irclogger-viewer