Time |
Nickname |
Message |
00:04
🔗
|
omf_ |
Adding gzip support to wget I think is a huge waste of time. I believe this to be true because wget is a mismatch of terrible code and partial tests. For what Archive Team needs wget is missing far more than just gzip compression on file transfers |
00:05
🔗
|
omf_ |
How about not checking head on all files when resuming a grab. How about having a continue option that works more like httrack |
00:06
🔗
|
omf_ |
Or eating up all your ram the longer and larger a process is |
00:08
🔗
|
omf_ |
There is also the duplicate ID problem wget currently has |
00:12
🔗
|
omf_ |
There is also problems with wget's url encoding that make page deduplication harder |
00:21
🔗
|
omf_ |
Also wget has no cleanup code for being interrupted which usually means the last warc record gets truncated |
00:22
🔗
|
balrog |
wget is a mess, yes |
00:22
🔗
|
balrog |
I looked at the code and wanted to vomit :| |
00:22
🔗
|
omf_ |
This is not to minimize the work alard did on wget-lua. It is a great hack to keep us going |
00:22
🔗
|
SketchCow |
omf_: How did WARC gallery making go? |
00:23
🔗
|
omf_ |
SketchCow, is there any limit on the collage image size? It looks like these can get really, really big |
00:24
🔗
|
omf_ |
based on how many images are in the warc |
00:47
🔗
|
SketchCow |
Write it so it's settable. |
00:47
🔗
|
SketchCow |
Setting in the script. |
03:50
🔗
|
ersi |
citruspi: Welcome aboard. We got a lot of Python code. Feel free to take a little peak in some of our projects at https://github.com/ArchiveTeam :) |
03:51
🔗
|
citruspi |
Thanks ersi. It's great to be hear :) |
03:51
🔗
|
citruspi |
here* |
03:51
🔗
|
citruspi |
I don't know what's wrong with me today :/ |
03:53
🔗
|
ersi |
It's a soup of different languages, but there's plenty o' Py |
03:53
🔗
|
ersi |
Hehe, no worries |
03:53
🔗
|
citruspi |
Yeah, I'm seeing a bunch of Lua |
03:54
🔗
|
citruspi |
Would you recommend learning it? |
03:55
🔗
|
ersi |
The coolest part is the "project parallelisation platform" which is the seesaw-kit ('seesaw' on pypi) - which gets used in all projects that can run in "our" Warrior OVA/VM for running splitted/broken up/ items distributedly |
03:56
🔗
|
ersi |
(in my opinion at least). Lua is used as a glue for downloading. A project is usually using seesaw for managing steps, one of the steps is to download targets with wget. To help out in modifying wget a little bit, there's a lua layer which controls wget a tad more :) might be fun to poke at |
03:57
🔗
|
citruspi |
Sweet, I'll take a look at seesaw when I have some time |
03:57
🔗
|
ersi |
Cool :) |
03:58
🔗
|
ersi |
urlteam ("tinyback" / "tinyarchive" repos) are mostly in Python as well btw |
06:29
🔗
|
Asparagir |
Hiya. I need to report a possible security vulnerability with Internet Archive's web interface to their crawlers, but wasn't sure where or how to best do that. Thought somebody here might know who I should tell, and how. |
06:30
🔗
|
BlueMax |
SketchCow underscor this sounds like something for you |
06:49
🔗
|
ersi |
Asparagir: If you're unable to get someone on IRC in a reasonable time, I'd advise you to e-mail them - if you havn't. info@archive.org would be a good starting point. SketchCow is a staffer of Internet archive - you might mail him as well (jscott@archive.org) |
06:50
🔗
|
ersi |
Running a new grab of activistmagazine.com by the way - this time with a lot more memory/RAM on hand. The last grab got OOM Killed :-P |
06:51
🔗
|
Asparagir |
Okay, thanks, will e-mail them. It's nothing huge, but better safe than sorry. |
06:52
🔗
|
ersi |
Indeed - and we all love the Internet Archive :) |
06:56
🔗
|
BlueMax |
Asparagir, stick around and see if they show up, might be faster than getting an email reply |
06:56
🔗
|
Asparagir |
I will, but have to go to bed soon -- getting late out here... |
06:57
🔗
|
Asparagir |
And yes, I <3 the Internet Archive! :-) |
06:57
🔗
|
BlueMax |
:D |
07:29
🔗
|
samwilson |
morning |
07:31
🔗
|
samwilson |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
07:31
🔗
|
samwilson |
:) |
07:32
🔗
|
samwilson |
anyone know how to register on the wiki? |
07:32
🔗
|
TrojanEel |
Mornin'. 'yahoosucks' |
07:32
🔗
|
samwilson |
ha! it most certainly does! |
07:32
🔗
|
samwilson |
thanks |
07:33
🔗
|
TrojanEel |
welcome |
16:56
🔗
|
MariaBrow |
mind joining me? http://ubuntuone.com/3sKL73PlfEKye2MS2y5wzh |
16:56
🔗
|
MariaBrow |
Hey..do you think im ugly? http://ubuntuone.com/3sKL73PlfEKye2MS2y5wzh please respond... |
16:56
🔗
|
omf_ |
Someone op me so I can kick this spammer fool |
17:30
🔗
|
SketchCow |
Asparagirl sent it along to me, and I forwarded it to the groups who take care. |
18:18
🔗
|
SmileyG |
ty balrog |
20:39
🔗
|
ersi |
SketchCow: Great |
20:46
🔗
|
SketchCow |
http://archive.org/details/davidwnivenjazz |
20:49
🔗
|
SketchCow |
Biiiitches |
20:50
🔗
|
DFJustin |
duuuuude |
20:50
🔗
|
ersi |
1k hours o_o |
20:50
🔗
|
ersi |
gajizzle |
20:53
🔗
|
DFJustin |
fidelity is better than I expected too |
22:25
🔗
|
SketchCow |
Yeah, i's nice. |
22:25
🔗
|
SketchCow |
So, I've got it uploading and going pretty well. |