[00:04] Adding gzip support to wget I think is a huge waste of time. I believe this to be true because wget is a mismatch of terrible code and partial tests. For what Archive Team needs wget is missing far more than just gzip compression on file transfers [00:05] How about not checking head on all files when resuming a grab. How about having a continue option that works more like httrack [00:06] Or eating up all your ram the longer and larger a process is [00:08] There is also the duplicate ID problem wget currently has [00:12] There is also problems with wget's url encoding that make page deduplication harder [00:21] Also wget has no cleanup code for being interrupted which usually means the last warc record gets truncated [00:22] wget is a mess, yes [00:22] I looked at the code and wanted to vomit :| [00:22] This is not to minimize the work alard did on wget-lua. It is a great hack to keep us going [00:22] omf_: How did WARC gallery making go? [00:23] SketchCow, is there any limit on the collage image size? It looks like these can get really, really big [00:24] based on how many images are in the warc [00:47] Write it so it's settable. [00:47] Setting in the script. [03:50] citruspi: Welcome aboard. We got a lot of Python code. Feel free to take a little peak in some of our projects at https://github.com/ArchiveTeam :) [03:51] Thanks ersi. It's great to be hear :) [03:51] here* [03:51] I don't know what's wrong with me today :/ [03:53] It's a soup of different languages, but there's plenty o' Py [03:53] Hehe, no worries [03:53] Yeah, I'm seeing a bunch of Lua [03:54] Would you recommend learning it? [03:55] The coolest part is the "project parallelisation platform" which is the seesaw-kit ('seesaw' on pypi) - which gets used in all projects that can run in "our" Warrior OVA/VM for running splitted/broken up/ items distributedly [03:56] (in my opinion at least). Lua is used as a glue for downloading. A project is usually using seesaw for managing steps, one of the steps is to download targets with wget. To help out in modifying wget a little bit, there's a lua layer which controls wget a tad more :) might be fun to poke at [03:57] Sweet, I'll take a look at seesaw when I have some time [03:57] Cool :) [03:58] urlteam ("tinyback" / "tinyarchive" repos) are mostly in Python as well btw [06:29] Hiya. I need to report a possible security vulnerability with Internet Archive's web interface to their crawlers, but wasn't sure where or how to best do that. Thought somebody here might know who I should tell, and how. [06:30] SketchCow underscor this sounds like something for you [06:49] Asparagir: If you're unable to get someone on IRC in a reasonable time, I'd advise you to e-mail them - if you havn't. info@archive.org would be a good starting point. SketchCow is a staffer of Internet archive - you might mail him as well (jscott@archive.org) [06:50] Running a new grab of activistmagazine.com by the way - this time with a lot more memory/RAM on hand. The last grab got OOM Killed :-P [06:51] Okay, thanks, will e-mail them. It's nothing huge, but better safe than sorry. [06:52] Indeed - and we all love the Internet Archive :) [06:56] Asparagir, stick around and see if they show up, might be faster than getting an email reply [06:56] I will, but have to go to bed soon -- getting late out here... [06:57] And yes, I <3 the Internet Archive! :-) [06:57] :D [07:29] morning [07:31] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [07:31] :) [07:32] anyone know how to register on the wiki? [07:32] Mornin'. 'yahoosucks' [07:32] ha! it most certainly does! [07:32] thanks [07:33] welcome [16:56] mind joining me? http://ubuntuone.com/3sKL73PlfEKye2MS2y5wzh [16:56] Hey..do you think im ugly? http://ubuntuone.com/3sKL73PlfEKye2MS2y5wzh please respond... [16:56] Someone op me so I can kick this spammer fool [17:30] Asparagirl sent it along to me, and I forwarded it to the groups who take care. [18:18] ty balrog [20:39] SketchCow: Great [20:46] http://archive.org/details/davidwnivenjazz [20:49] Biiiitches [20:50] duuuuude [20:50] 1k hours o_o [20:50] gajizzle [20:53] fidelity is better than I expected too [22:25] Yeah, i's nice. [22:25] So, I've got it uploading and going pretty well.