#archiveteam 2013-05-28,Tue

↑back Search

Time Nickname Message
00:04 🔗 omf_ Adding gzip support to wget I think is a huge waste of time. I believe this to be true because wget is a mismatch of terrible code and partial tests. For what Archive Team needs wget is missing far more than just gzip compression on file transfers
00:05 🔗 omf_ How about not checking head on all files when resuming a grab. How about having a continue option that works more like httrack
00:06 🔗 omf_ Or eating up all your ram the longer and larger a process is
00:08 🔗 omf_ There is also the duplicate ID problem wget currently has
00:12 🔗 omf_ There is also problems with wget's url encoding that make page deduplication harder
00:21 🔗 omf_ Also wget has no cleanup code for being interrupted which usually means the last warc record gets truncated
00:22 🔗 balrog wget is a mess, yes
00:22 🔗 balrog I looked at the code and wanted to vomit :|
00:22 🔗 omf_ This is not to minimize the work alard did on wget-lua. It is a great hack to keep us going
00:22 🔗 SketchCow omf_: How did WARC gallery making go?
00:23 🔗 omf_ SketchCow, is there any limit on the collage image size? It looks like these can get really, really big
00:24 🔗 omf_ based on how many images are in the warc
00:47 🔗 SketchCow Write it so it's settable.
00:47 🔗 SketchCow Setting in the script.
03:50 🔗 ersi citruspi: Welcome aboard. We got a lot of Python code. Feel free to take a little peak in some of our projects at https://github.com/ArchiveTeam :)
03:51 🔗 citruspi Thanks ersi. It's great to be hear :)
03:51 🔗 citruspi here*
03:51 🔗 citruspi I don't know what's wrong with me today :/
03:53 🔗 ersi It's a soup of different languages, but there's plenty o' Py
03:53 🔗 ersi Hehe, no worries
03:53 🔗 citruspi Yeah, I'm seeing a bunch of Lua
03:54 🔗 citruspi Would you recommend learning it?
03:55 🔗 ersi The coolest part is the "project parallelisation platform" which is the seesaw-kit ('seesaw' on pypi) - which gets used in all projects that can run in "our" Warrior OVA/VM for running splitted/broken up/ items distributedly
03:56 🔗 ersi (in my opinion at least). Lua is used as a glue for downloading. A project is usually using seesaw for managing steps, one of the steps is to download targets with wget. To help out in modifying wget a little bit, there's a lua layer which controls wget a tad more :) might be fun to poke at
03:57 🔗 citruspi Sweet, I'll take a look at seesaw when I have some time
03:57 🔗 ersi Cool :)
03:58 🔗 ersi urlteam ("tinyback" / "tinyarchive" repos) are mostly in Python as well btw
06:29 🔗 Asparagir Hiya. I need to report a possible security vulnerability with Internet Archive's web interface to their crawlers, but wasn't sure where or how to best do that. Thought somebody here might know who I should tell, and how.
06:30 🔗 BlueMax SketchCow underscor this sounds like something for you
06:49 🔗 ersi Asparagir: If you're unable to get someone on IRC in a reasonable time, I'd advise you to e-mail them - if you havn't. info@archive.org would be a good starting point. SketchCow is a staffer of Internet archive - you might mail him as well (jscott@archive.org)
06:50 🔗 ersi Running a new grab of activistmagazine.com by the way - this time with a lot more memory/RAM on hand. The last grab got OOM Killed :-P
06:51 🔗 Asparagir Okay, thanks, will e-mail them. It's nothing huge, but better safe than sorry.
06:52 🔗 ersi Indeed - and we all love the Internet Archive :)
06:56 🔗 BlueMax Asparagir, stick around and see if they show up, might be faster than getting an email reply
06:56 🔗 Asparagir I will, but have to go to bed soon -- getting late out here...
06:57 🔗 Asparagir And yes, I <3 the Internet Archive! :-)
06:57 🔗 BlueMax :D
07:29 🔗 samwilson morning
07:31 🔗 samwilson WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
07:31 🔗 samwilson :)
07:32 🔗 samwilson anyone know how to register on the wiki?
07:32 🔗 TrojanEel Mornin'. 'yahoosucks'
07:32 🔗 samwilson ha! it most certainly does!
07:32 🔗 samwilson thanks
07:33 🔗 TrojanEel welcome
16:56 🔗 MariaBrow mind joining me? http://ubuntuone.com/3sKL73PlfEKye2MS2y5wzh
16:56 🔗 MariaBrow Hey..do you think im ugly? http://ubuntuone.com/3sKL73PlfEKye2MS2y5wzh please respond...
16:56 🔗 omf_ Someone op me so I can kick this spammer fool
17:30 🔗 SketchCow Asparagirl sent it along to me, and I forwarded it to the groups who take care.
18:18 🔗 SmileyG ty balrog
20:39 🔗 ersi SketchCow: Great
20:46 🔗 SketchCow http://archive.org/details/davidwnivenjazz
20:49 🔗 SketchCow Biiiitches
20:50 🔗 DFJustin duuuuude
20:50 🔗 ersi 1k hours o_o
20:50 🔗 ersi gajizzle
20:53 🔗 DFJustin fidelity is better than I expected too
22:25 🔗 SketchCow Yeah, i's nice.
22:25 🔗 SketchCow So, I've got it uploading and going pretty well.

irclogger-viewer