Time |
Nickname |
Message |
00:01
🔗
|
WiK |
welp, 200gb away from hittin 10tb of github data |
00:03
🔗
|
ivan` |
thanks for downloading all that again, it will surely be handy |
00:03
🔗
|
ivan` |
people are pretty delete-happy on github |
00:06
🔗
|
balrog |
WiK: where are you storing this all? |
00:08
🔗
|
ivan` |
WiK: are you updating repos with git pull --rebase? there are special considerations if you are updating, as people can force-push commits that will cause commits in your local mirror to eventually disappear |
00:08
🔗
|
ivan` |
s/git pull --rebase/git fetch/ or whatever |
00:14
🔗
|
WiK |
im just doing git clones |
00:14
🔗
|
WiK |
balrog: 4 or 5 different external (usb3) harddrives |
00:14
🔗
|
WiK |
and my database keeps track of which hardrive ive stored the project on |
00:15
🔗
|
WiK |
ivan`: im just cloning them, i have not gone back to update anything yet (and may not) |
00:16
🔗
|
ivan` |
WiK: if you do update them, you have to disable gc completely, or tag the commits you already have |
00:17
🔗
|
WiK |
ya, for my project i dont really need to go back and update them |
00:24
🔗
|
ivan` |
anyone want to wget-lua this domain? https://www.rijksmuseum.nl/en/explore-the-collection/overview |
00:24
🔗
|
omf_ |
WiK, check out this lame attempt http://datasyndrome.com/post/51657080886/downloading-and-processing-the-github-data |
00:24
🔗
|
ivan` |
claims to have a lot of art; images are split up into tiles and probably need some code |
00:28
🔗
|
WiK |
omf_: i dont know if i would call it a 'lame' attempt |
00:29
🔗
|
WiK |
but no clue what they are tring to do |
00:32
🔗
|
WiK |
also cant tell if they are only downloading data from one project or not |
00:38
🔗
|
omf_ |
I have tried things with the githubarchive |
00:38
🔗
|
omf_ |
it is very limited data |
00:38
🔗
|
omf_ |
I would go so far as to say it is not even a big enough sample to be statistically significant. Thanks for getting all the data |
00:47
🔗
|
SketchCow |
omf_: How'd the WARC gallery go? |
01:14
🔗
|
WiK |
you guys have old wired mags or some really old computer magazines? |
01:15
🔗
|
WiK |
i need to come up with a good contest question |
01:15
🔗
|
SketchCow |
OKAY NERDS |
01:15
🔗
|
SketchCow |
This actually has interest and relevance to the team. |
01:16
🔗
|
SketchCow |
http://www-jake.archive.org/donate/ |
01:16
🔗
|
SketchCow |
Looking for mistakes, bugs, stupid |
01:18
🔗
|
DFJustin |
crappy resize job on brewster |
01:19
🔗
|
ivan` |
my version of "WARC gallery" is HTTrack + Directory Opus, flat view enabled, reverse sort by file size, thumbnail view |
01:19
🔗
|
ivan` |
many hours can be killed hitting pgdn or the mousewheel |
01:20
🔗
|
WiK |
SketchCow: can i suggest expending 'Programs' or maybe a link to what the programs are from the 'where you money goes'? |
01:22
🔗
|
WiK |
also: there are page errors on : http://www-jake.archive.org/about/volunteerpositions.php |
01:22
🔗
|
WiK |
at the bottom the *'s are outside of the box under pysical/special requirements |
01:27
🔗
|
SketchCow |
That's a different thing. |
01:33
🔗
|
SketchCow |
Any other notes? |
01:34
🔗
|
omf_ |
It is not responsive for mobile devices |
01:34
🔗
|
WiK |
not really, i just loked at the site and asked 'why would i donate?' |
01:34
🔗
|
omf_ |
I can fix that |
01:35
🔗
|
omf_ |
As for the gallery it keeps crashing on the 50gb warc and I have no idea why |
01:36
🔗
|
SketchCow |
Which mobile device, omf_ ? |
01:36
🔗
|
SketchCow |
Because I'm on my ipad, it's fine. |
01:36
🔗
|
omf_ |
I tested with the andriod sdk and the opera mobile with multiple user agent strings |
01:37
🔗
|
SketchCow |
I just used it successfully on my Galaxy S4 |
01:38
🔗
|
omf_ |
Also the bitcoin button does not appear |
01:40
🔗
|
SketchCow |
It won't appear if you select subscription |
01:41
🔗
|
omf_ |
okay |
03:08
🔗
|
underscor |
12631766 tumblogs.txt |
03:08
🔗
|
underscor |
that's a lot of tumblogs |
03:09
🔗
|
underscor |
that's the number of unique tumblr subdomains/blogs we (IA) know about |
03:30
🔗
|
ivan` |
http://tracker.archiveteam.org/greader/ :-) |
03:36
🔗
|
BlueMax |
:D |
03:37
🔗
|
ivan` |
https://github.com/ArchiveTeam/greader-grab :-) |
03:55
🔗
|
ivan` |
a lot of words from http://www.archiveteam.org/index.php?title=Posterous should be on http://www.archiveteam.org/index.php?title=Google_Reader |
03:55
🔗
|
ivan` |
in case somebody really likes writing words |
03:56
🔗
|
pft |
Failed WgetDownload for Item 0000010776 |
03:56
🔗
|
pft |
Process WgetDownload returned exit code 5 for Item 0000010776 |
03:56
🔗
|
pft |
hmm |
03:57
🔗
|
pft |
i must be missing seesaw |
03:57
🔗
|
ivan` |
that's 5 SSL verification failure. |
03:57
🔗
|
ivan` |
I pinned the download to EquifaxSecureCA |
03:57
🔗
|
ivan` |
maybe you're in another country and getting a different CA |
03:57
🔗
|
ivan` |
or your wget is out of whack |
03:57
🔗
|
pft |
i'm in the us |
03:57
🔗
|
ivan` |
same |
03:58
🔗
|
pft |
hmm |
03:59
🔗
|
ivan` |
can you load https://www.google.com/ in Firefox and tell me the cert chain? |
03:59
🔗
|
pft |
this is a colo'd box so that's a little tricky |
03:59
🔗
|
ivan` |
are you using run-pipeline the normal way? |
04:00
🔗
|
pft |
i think so |
04:00
🔗
|
pft |
run-pipeline --disable-web-server --concurrent 2 pipeline.py |
04:00
🔗
|
pft |
might i need ot update my seesaw? |
04:01
🔗
|
ivan` |
let me check |
04:01
🔗
|
ivan` |
also can you paste me the output of: openssl s_client -connect www.google.com:443 |
04:02
🔗
|
ivan` |
seesaw 0.0.12 does support env= |
04:02
🔗
|
pft |
http://www.skeleboner.com/openssl.txt |
04:04
🔗
|
ivan` |
that looks fine, you're not being MITMed or anything |
04:04
🔗
|
pft |
that's a good thing |
04:05
🔗
|
ivan` |
the cert-pinning is done by env=dict(SSL_CERT_DIR=SSL_CERT_DIR), in the pipeline |
04:05
🔗
|
ivan` |
I have no idea why it's not working for you |
04:06
🔗
|
pft |
weirdness |
04:06
🔗
|
ivan` |
maybe your wget wants more certs |
04:07
🔗
|
ivan` |
did you ./get-wget-lua.sh? |
04:07
🔗
|
pft |
i did |
04:07
🔗
|
ivan` |
is your wget linked to these or something else |
04:07
🔗
|
ivan` |
libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007f4e5ec9e000) |
04:07
🔗
|
ivan` |
libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f4e5f079000) |
04:08
🔗
|
pft |
GNU Wget 1.14.lua.20130523-9a5c built on linux-gnu. |
04:08
🔗
|
pft |
libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007fda7b7b1000) |
04:08
🔗
|
pft |
libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007fda7bb52000) |
04:08
🔗
|
pft |
hmm |
04:09
🔗
|
ivan` |
let me check if I have a working wget linked to that |
04:10
🔗
|
ivan` |
I have one working wget (doing the greader job) linked to libgnutls.so.26 => /usr/lib/x86_64-linux-gnu/libgnutls.so.26 (0x00007fa87b238000) |
04:10
🔗
|
ivan` |
and another on CentOS linked to libssl.so.10 => /usr/lib/libssl.so.10 (0xb7f41000) |
04:10
🔗
|
ivan` |
libcrypto.so.10 => /usr/lib/libcrypto.so.10 (0xb7db4000) |
04:11
🔗
|
ivan` |
but nothing linked to 0.9.8, so that could be the problem |
04:11
🔗
|
pft |
hmm ok |
04:11
🔗
|
ivan` |
I'll have to fix it since a lot of people probably have that |
04:11
🔗
|
pft |
yeah, i think i'm debian stable |
04:12
🔗
|
pft |
urg and i'm afk |
04:13
🔗
|
ivan` |
since you have no men in the middle, you are welcome to remove that env= line if you want to get it started |
04:13
🔗
|
pft |
ok |
04:14
🔗
|
pft |
thanks :) |
04:14
🔗
|
ivan` |
thanks for grabbing |
04:15
🔗
|
pft |
of course! |
04:15
🔗
|
pft |
gotta get in early so i can pretend that i can compete with underscor briefly |
04:16
🔗
|
underscor |
:D |
04:17
🔗
|
pft |
:p |
04:19
🔗
|
BlueMax |
underscor cheats, you know that right |
04:20
🔗
|
pft |
how does underscor cheat? |
04:20
🔗
|
pft |
i would like to also cheat in a similar fashion ;) |
04:20
🔗
|
underscor |
I work for IA |
04:20
🔗
|
underscor |
so I have a lot of spare pipes |
04:20
🔗
|
pft |
so yeah |
04:20
🔗
|
pft |
would like to cheat in a similar fashion ;) |
04:21
🔗
|
underscor |
haha |
04:21
🔗
|
underscor |
kennethre has a better deal |
04:21
🔗
|
underscor |
he can scale much bigger than I |
04:21
🔗
|
underscor |
(works for heroku) |
04:21
🔗
|
pft |
nice |
04:24
🔗
|
BlueMax |
meanwhile I'm a tiny australian with bad internet :( |
04:27
🔗
|
underscor |
Isn't that all australians? |
04:34
🔗
|
* |
BlueMax slaps underscor. |
04:34
🔗
|
BlueMax |
Stop insulting my country! |
04:52
🔗
|
ivan` |
I cheated by taking credit for 91 items that were rm'ed and re-done ;) |
05:18
🔗
|
trs80 |
bluemax: I'm in australia, with 100mbps (admittedly it's work's connection) |
05:18
🔗
|
BlueMax |
I've never been anywhere near something like that |
05:38
🔗
|
ivan` |
pft: I installed an amd64 Debian 6 and my wget-lua is linked to |
05:38
🔗
|
ivan` |
libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007f59352f9000) |
05:38
🔗
|
ivan` |
libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007f5934f58000) |
05:38
🔗
|
ivan` |
no problems with the SSL |
05:41
🔗
|
ivan` |
pft: oh never mind, it is FUBAR :-) thanks for helping narrow this down |
06:22
🔗
|
* |
SmileyG looks in |
06:22
🔗
|
SmileyG |
how we doing guys? |
06:29
🔗
|
ivan` |
pft: fixed in latest greader-grab |
09:09
🔗
|
ivan` |
tumblr has a ton of blogs that start with a hyphen like http://-sheselectric-.tumblr.com/ |
09:09
🔗
|
ivan` |
most browsers/dns servers seem to refuse such madness |
09:09
🔗
|
ivan` |
google was okay with them, though :) |
09:20
🔗
|
Tomcat_ |
I cannot even click this in Quassel IRC... |
09:28
🔗
|
ersi |
Firefox 21.0 hates that link. |
09:29
🔗
|
ivan` |
I managed to load it on Chrome 27 on Windows 7 using level3's dns servers |
09:30
🔗
|
Tomcat_ |
Sounds like some really good way to hide a website... how much blocking software or browsers used by government agencies will fail here? ;) |
10:40
🔗
|
ivan` |
tales from a pre-wget-lua world https://github.com/ArchiveTeam/archive-wars/blob/master/archivewars.sh |
10:48
🔗
|
ersi |
indeed |
10:48
🔗
|
ersi |
Pre-WARC world as well :) |
11:56
🔗
|
godane |
i'm grabing the support forums of theblaze |
12:26
🔗
|
menacespb |
I have a question about the Warrior - when I up the numner of simultaneous sessions (in settings) - does it not happen until current projects are finished? |
12:30
🔗
|
ersi |
I think it'll spin up more when an item is completed |
12:40
🔗
|
menacespb |
Good, good. We'll see when they complete then. I'm new at this, only fired it up yesterday :) |
12:41
🔗
|
antomatic |
I find it usually takes effect straight away, unless the warrior is shutting down for some reason |
12:41
🔗
|
antomatic |
Turning the number DOWN won't have an effect until a job completes - as it will finish what it started - but it is usually able to start new jobs straight away if the number goes up. |
12:42
🔗
|
menacespb |
Hmm, weird then. I upped the number of download from 2 to 4, but it's still churning away at the original 2. |
12:43
🔗
|
ersi |
I'd say, wait to see it start the next item.. it'll probably do it sooner or later |
12:43
🔗
|
menacespb |
It's been at these two for a good long while, so i'd rather not restart and loose the work. |
12:43
🔗
|
menacespb |
Yep. :) |
12:43
🔗
|
menacespb |
Thanks for the answers. |
12:47
🔗
|
Smiley |
Yup it starts more when an item finishes. |
12:47
🔗
|
Smiley |
and welcome menacespb :) |
13:00
🔗
|
menacespb |
Smiley: Thanks :) It's important work, and hey - I had a laptop on my desk that wasn't doing anything much besides irc anyway, so.. :) |
13:01
🔗
|
Smiley |
hehe |
13:10
🔗
|
godane |
i found another project: http://www.fuzzymemories.tv |
13:16
🔗
|
godane |
there is also a youtube channel: http://www.youtube.com/user/FuzzyMemoriesTV |
16:11
🔗
|
antomatic |
Never noticed the warrior didn't do that until today. Doh! :) |
16:13
🔗
|
antomatic |
[if you want to force the new jobs to arrive without waiting for the existing ones to end, just click 'shut down' - don't worry, it won't - then click 'keep running') |
16:16
🔗
|
InitHello |
SyntaxError: Expected ] |
16:18
🔗
|
antomatic |
pick up square bracket |
16:18
🔗
|
antomatic |
> YOU NOW HAVE THE SQUARE BRACKET |
16:18
🔗
|
antomatic |
take sentence |
16:18
🔗
|
InitHello |
ye cannot get ye bracket |
16:18
🔗
|
antomatic |
> YOU CAN'T TAKE A SENTENCE |
16:18
🔗
|
antomatic |
grasp sentence |
16:18
🔗
|
antomatic |
> YOU HAVE THE SENTENCE |
16:18
🔗
|
InitHello |
apply sentence |
16:19
🔗
|
antomatic |
remove end bracket |
16:19
🔗
|
InitHello |
> YOU HAVE BEEN SENTENCED TO DEATH |
16:19
🔗
|
antomatic |
F*@!# |
16:19
🔗
|
antomatic |
:) |
19:22
🔗
|
SketchCow |
https://twitter.com/vincentchu/status/339825371912495104 |
19:24
🔗
|
Marcelo |
Cloud services |
19:53
🔗
|
Schbirid |
hoi, the fileplanet archiving has upped all the files to IA now (1 year + couple of weeks later) :) |
19:53
🔗
|
Schbirid |
next before publicity is making a nice interface |
19:53
🔗
|
Schbirid |
and a readme etc |
19:54
🔗
|
Schbirid |
~120 000 public files |
19:54
🔗
|
Schbirid |
~200-300 000 not public as those are mixed with private files and i highly value privacy |
19:56
🔗
|
Schbirid |
9.5TB public |
19:57
🔗
|
Schbirid |
close to 2TB non-public i think |
19:57
🔗
|
Schbirid |
please do not shoot the publicity gun, right now it is not for end users at all |
19:57
🔗
|
Schbirid |
anyways, yay, finally :) |
23:00
🔗
|
underscor |
Anyone remember who was discussing cinnemageddon a while back? Perhaps here or in -bs? |
23:09
🔗
|
SketchCow |
Thanks, schbiridi |