Time |
Nickname |
Message |
00:13
🔗
|
|
Start has joined #archiveteam |
00:13
🔗
|
|
cvb has quit IRC (Ping timeout: 255 seconds) |
00:17
🔗
|
|
bwn_ has quit IRC (Read error: Operation timed out) |
00:32
🔗
|
|
remsen has quit IRC (Read error: Operation timed out) |
00:39
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
01:08
🔗
|
jleclanch |
https://web.archive.org/web/20101204061054/http://www.worldofwarcraft.com/info/burningcrusade/index.xml yay websites that used xsl... |
01:09
🔗
|
joepie91 |
heh |
01:20
🔗
|
|
Ymgve has quit IRC (Read error: Connection reset by peer) |
01:22
🔗
|
|
zenguy_pc has joined #archiveteam |
01:23
🔗
|
|
Ymgve has joined #archiveteam |
01:37
🔗
|
|
philpem has quit IRC (Ping timeout: 252 seconds) |
01:53
🔗
|
|
remsen has joined #archiveteam |
02:06
🔗
|
|
vitzli has joined #archiveteam |
02:18
🔗
|
|
primus104 has quit IRC (Leaving.) |
02:27
🔗
|
|
vitzli has quit IRC (Ping timeout: 255 seconds) |
02:41
🔗
|
|
vitzli has joined #archiveteam |
02:47
🔗
|
dashcloud |
no ongoing projects, but there's a very complete archive of HP's stuff (pre-split HP), and piles of driver CDs on archive.org; I think there's also a pretty complete Dell driver FTP set as well |
03:02
🔗
|
SketchCow |
Needs more. |
03:02
🔗
|
SketchCow |
Site needs tons of curation, too, obviously. |
03:05
🔗
|
|
remsen has quit IRC (Read error: Operation timed out) |
03:13
🔗
|
|
Stiletto has joined #archiveteam |
03:33
🔗
|
|
vtyl has joined #archiveteam |
03:33
🔗
|
phuzion |
SketchCow: sorry about the ping earlier without a message to follow up with, I was gonna ask about the google code rsync target on FOS. |
03:34
🔗
|
|
remsen has joined #archiveteam |
03:37
🔗
|
|
lytv has quit IRC (Read error: Operation timed out) |
03:39
🔗
|
godane |
SketchCow: looks like some mp3s are incomplete from kpfa |
03:40
🔗
|
godane |
i put like 2 of the mp3s into archivebot so we have full prove that there server is hosting the incomplete file |
03:41
🔗
|
godane |
look at the Wed0700 hour here: https://archive.org/details/kpfa-archives-radio-podcast-2005-08-10 |
03:41
🔗
|
godane |
it should be closer to 2 hours |
03:41
🔗
|
godane |
cause thats the morning show |
03:42
🔗
|
SketchCow |
Got it |
04:20
🔗
|
|
vOYtEC has quit IRC (Read error: Connection reset by peer) |
04:24
🔗
|
|
vOYtEC has joined #archiveteam |
04:35
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
04:37
🔗
|
|
icedice has joined #archiveteam |
04:45
🔗
|
|
icedice has quit IRC (Ping timeout: 360 seconds) |
04:47
🔗
|
|
chfoo has quit IRC (Quit: quit) |
04:52
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
05:04
🔗
|
jleclanch |
SketchCow: hey there, you wanted some stuff |
05:07
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
05:07
🔗
|
SketchCow |
I want so much fucking stuff. |
05:07
🔗
|
jleclanch |
SketchCow: you like xml,? |
05:08
🔗
|
SketchCow |
Like, as friends, or dating? |
05:08
🔗
|
jleclanch |
well, you know, it's flexible and extensible |
05:08
🔗
|
jleclanch |
SketchCow: https://leclan.ch/public/armory-dump-warning-2gb-uncompressed.tar.xz (18mb compressed, 1.5gb uncompressed) |
05:09
🔗
|
jleclanch |
lotsa dumps i found on my hd |
05:09
🔗
|
jleclanch |
SketchCow: 3 full scans from the wowarmory website back when it was fully xml-based. structured wow data basically. it's stuff! |
05:12
🔗
|
jleclanch |
SketchCow: and with that out of the way, i have 40 gigabytes of virtual ticket videos and id like to put them up somewhere, but my upload is really poor =( |
05:13
🔗
|
|
WinterFox has joined #archiveteam |
05:14
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
05:19
🔗
|
|
RichardG has joined #archiveteam |
05:20
🔗
|
SketchCow |
What are virtual ticket videos. |
05:22
🔗
|
jleclanch |
SketchCow: sorry, shouldve said. paywalled videos. http://sprunge.us/gRIf |
05:26
🔗
|
SketchCow |
Can you upload this xml |
05:26
🔗
|
SketchCow |
Do the description and context. Otherwise, I'd just be making stuff up. |
05:28
🔗
|
jleclanch |
SketchCow: xml's in there -> https://leclan.ch/public/armory-dump-warning-2gb-uncompressed.tar.xz |
05:28
🔗
|
jleclanch |
it's like 300k files |
05:37
🔗
|
|
nightpool has joined #archiveteam |
05:45
🔗
|
|
nightpool has quit IRC (Read error: Operation timed out) |
05:46
🔗
|
DFJustin |
http://archive.org/upload/ |
05:47
🔗
|
jleclanch |
DFJustin: im aware and it's not ideal at all, I can't maintain a good upload over time, ideally id like to torrent it out to someone who can upload it |
05:53
🔗
|
SketchCow |
I came across your website and was really intruiged behind the whole concept of it. |
05:53
🔗
|
SketchCow |
I was wondering, are there any current bypasses for Yahoo 2-step verfication, I would be willing to pay (if necessary). |
05:53
🔗
|
SketchCow |
Fine, I'll do it. |
05:54
🔗
|
phuzion |
jleclanch: If you torrent it, that data would be uploaded from your connection anyways. IA supports ingesting data with bittorrent, but it's less than ideal. If that's the route you'd like to go, then check this out: https://archive.org/about/faqs.php#321 |
05:55
🔗
|
jleclanch |
phuzion: yeah uploading is not a problem in and of itself. it's maintaining a connection for 40gb over the web page |
05:55
🔗
|
jleclanch |
phuzion: i didnt know it supported bittorrent ill check it out |
05:55
🔗
|
phuzion |
jleclanch: I don't know whether IA times out those bittorrent sessions after a while or anything, but it's worth a shot. |
05:57
🔗
|
phuzion |
Either way, you're still talking about pushing 40GB of data up on what I'm assuming is a rather small residential internet connection. |
05:57
🔗
|
phuzion |
It's going to take a long time no matter what protocol you use. |
05:57
🔗
|
jleclanch |
yeah but torrent is easier :) |
05:57
🔗
|
jleclanch |
i mean |
05:58
🔗
|
jleclanch |
this is already technically split in 1-4gb videos |
06:10
🔗
|
xmc |
if they are a bunch of 1-4gb files, you can upload them individually to the same item |
06:11
🔗
|
xmc |
i think the website uploader works with that |
06:12
🔗
|
jleclanch |
ill figure sth out, there's no rush |
06:12
🔗
|
jleclanch |
if someone wants the dump though let me know |
06:18
🔗
|
|
icedice has joined #archiveteam |
06:27
🔗
|
phuzion |
jleclanch: are you sending the data to IA with a torrent? |
06:28
🔗
|
phuzion |
Or are you uploading using the web interface? |
06:28
🔗
|
jleclanch |
phuzion: im not sending anything yet, got other stuff to upload first |
06:28
🔗
|
phuzion |
Do you have the torrent created yet? |
06:30
🔗
|
jleclanch |
phuzion: no, I won't be doing this today. I'll call my isp to bump my upload speed first |
06:30
🔗
|
phuzion |
Ok |
06:30
🔗
|
jleclanch |
phuzion: why, you interested? |
06:32
🔗
|
phuzion |
My thoughts were that I could download the torrent in the background and hang onto it. |
06:38
🔗
|
|
nightpool has joined #archiveteam |
06:49
🔗
|
|
nightpool has quit IRC (Ping timeout: 606 seconds) |
06:57
🔗
|
SketchCow |
Length: 18956704 (18M) [application/octet-stream] |
06:57
🔗
|
SketchCow |
Saving to: 'armory-dump-warning-2gb-uncompressed.tar.xz' |
06:57
🔗
|
SketchCow |
100%[=====================================================================================================>] 18,956,704 83.4KB/s in 3m 47s |
06:57
🔗
|
SketchCow |
2015-11-26 05:57:12 (81.6 KB/s) - 'armory-dump-warning-2gb-uncompressed.tar.xz' saved [18956704/18956704] |
07:02
🔗
|
|
BlueMaxim has quit IRC (Leaving) |
07:03
🔗
|
|
bwn_ has joined #archiveteam |
07:18
🔗
|
|
cvb has joined #archiveteam |
07:18
🔗
|
|
cvb has quit IRC (Connection closed) |
07:22
🔗
|
|
cvb has joined #archiveteam |
07:25
🔗
|
|
primus104 has joined #archiveteam |
07:52
🔗
|
|
vitzli has joined #archiveteam |
07:54
🔗
|
|
remsen has quit IRC (Read error: Operation timed out) |
07:54
🔗
|
|
GLaDOS has quit IRC (Read error: Operation timed out) |
07:54
🔗
|
|
BlueMaxim has joined #archiveteam |
07:59
🔗
|
|
GLaDOS has joined #archiveteam |
08:01
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
08:03
🔗
|
|
godane has left |
08:08
🔗
|
|
godane has joined #archiveteam |
08:10
🔗
|
SketchCow |
https://archive.org/details/negativland coming nicely. |
08:14
🔗
|
godane |
you maybe getting this as a collection of videos: https://www.youtube.com/channel/UCFlf_u19WYW0ftOuxuWfLKQ/videos |
08:15
🔗
|
godane |
alot of Macy's Thanksgiving Day Parades |
08:17
🔗
|
|
cvb has quit IRC (Quit: Leaving) |
08:27
🔗
|
|
rolfb has joined #archiveteam |
08:33
🔗
|
|
atomotic has joined #archiveteam |
08:33
🔗
|
|
nightpool has joined #archiveteam |
08:35
🔗
|
|
bwn_ has quit IRC (Read error: Operation timed out) |
08:36
🔗
|
|
rolfb has quit IRC (Linkinus - http://linkinus.com) |
08:39
🔗
|
|
nightpool has quit IRC (Read error: Operation timed out) |
08:42
🔗
|
|
primus104 has quit IRC (Leaving.) |
09:03
🔗
|
|
xk_id has joined #archiveteam |
09:43
🔗
|
|
schbirid has joined #archiveteam |
09:54
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:20
🔗
|
|
primus104 has joined #archiveteam |
10:26
🔗
|
|
bwn has joined #archiveteam |
10:46
🔗
|
|
vOYtEC has quit IRC (rm -r *) |
10:54
🔗
|
|
Sk1d has joined #archiveteam |
10:54
🔗
|
|
vOYtEC has joined #archiveteam |
11:44
🔗
|
|
bwn has quit IRC (Read error: Connection reset by peer) |
11:57
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
11:58
🔗
|
|
remsen has joined #archiveteam |
12:03
🔗
|
|
remsen2 has joined #archiveteam |
12:04
🔗
|
antomatic |
The docsstoc grab seems to be occasionally getting caught up in loops of URLS like /images/images/images/images/images/images/ or /docs/../images/../content/../docs/../images/../content/ and such... any word on a fix yet? |
12:08
🔗
|
|
remsen has quit IRC (Read error: Operation timed out) |
12:29
🔗
|
|
PrincessK has joined #archiveteam |
12:30
🔗
|
|
PrincessK is now known as Knoeki |
12:30
🔗
|
Knoeki |
\o |
12:35
🔗
|
|
nightpool has joined #archiveteam |
12:43
🔗
|
|
nightpool has quit IRC (Ping timeout: 483 seconds) |
12:44
🔗
|
antomatic |
another popular one is http://embed.docstoc.com/handlers/downloadfilefromflash.ashx?docid=10971830&ref_url=http://www.docstoc.com/docs/chrome://skype_ff_toolbar_win/content/flags/chrome://skype_ff_toolbar_win/content/chrome://skype_ff_toolbar_win/content/chrome://skype_ff_toolbar_win/content/chrome://skype_ff_toolbar_win/content/flags/chrome://skype_ff_toolbar_win/content/chrome://skype_ff_toolbar_ |
12:44
🔗
|
antomatic |
win/content/flags/chrome://skype_ff_toolbar_win/content/chrome://skype_ff_toolbar_win/content/chrome://skype_ff_toolbar_win/content/flags/chrome://skype_ff_toolbar_win/content/chrome://skype_ff_toolbar_win/content/arrow.gif |
12:45
🔗
|
Knoeki |
I can't even click that anymore :') |
12:46
🔗
|
antomatic |
I doubt if it does anything. :) |
12:48
🔗
|
|
luckcolor has joined #archiveteam |
12:51
🔗
|
|
atomotic has joined #archiveteam |
12:51
🔗
|
Knoeki |
antomatic: haha, well, I was refering to the fact that it's split up over 2 lines :P |
12:51
🔗
|
Knoeki |
https://twitter.com/knoeki/status/669860020931141632 |
12:54
🔗
|
antomatic |
ah, always had a soft spot for those commodore monitors. :) |
12:56
🔗
|
luckcolor |
hello guys |
12:57
🔗
|
antomatic |
hlloo! |
12:57
🔗
|
Knoeki |
antomatic: I've got 2 here, they were both dead before I moved a couple weeks ago |
12:58
🔗
|
Knoeki |
now one of them magically works perfectly again :') |
12:58
🔗
|
Knoeki |
must've bumped it the right way |
13:10
🔗
|
|
WinterFox has quit IRC (Remote host closed the connection) |
13:22
🔗
|
|
Knoeki is now known as PrincessK |
13:26
🔗
|
arkiver |
antomatic: fix is in |
13:27
🔗
|
antomatic |
[[applause]] |
13:29
🔗
|
antomatic |
nice one arkiver |
13:35
🔗
|
arkiver |
chfoo: sorry to ping you again. Can you please recreate the googlecode rsync target on FOS? |
13:39
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
13:43
🔗
|
|
vitzli_ has joined #archiveteam |
13:45
🔗
|
|
vitzli has quit IRC (Read error: Operation timed out) |
13:47
🔗
|
|
primus104 has quit IRC (Leaving.) |
13:52
🔗
|
Atluxity |
whats up with docstoc? |
13:53
🔗
|
arkiver |
hmm looks like items/min went down dramatically |
13:54
🔗
|
Atluxity |
yeah |
13:54
🔗
|
arkiver |
looks like our useragent isn't blocked |
13:54
🔗
|
arkiver |
Atluxity: can you check your IPs? |
13:54
🔗
|
arkiver |
or just some of them? |
13:54
🔗
|
arkiver |
might be IP bans |
13:54
🔗
|
Atluxity |
besides some of them trying to get some ridiculus loop urls, I only saw tracker limit error |
13:55
🔗
|
arkiver |
yeah I just paused the grab |
13:55
🔗
|
arkiver |
restarted now, do you see anything? |
13:56
🔗
|
Atluxity |
I do not seem to be ip banned |
13:58
🔗
|
arkiver |
Atluxity: I recently set the new version of the scripts in the tracker |
13:59
🔗
|
arkiver |
Though, that version was released some time ago already. I wanted to give some time to update |
13:59
🔗
|
|
Ghost_of_ has joined #archiveteam |
14:01
🔗
|
arkiver |
Hi Ghost_of_ |
14:01
🔗
|
Ghost_of_ |
hi, arkiver ... 'sup? |
14:01
🔗
|
arkiver |
We had someone from yuku here yesterday |
14:01
🔗
|
arkiver |
our useragent was blocked due to the high traffic, but unblocked now |
14:02
🔗
|
arkiver |
Basically they have some problems with advertising income due to our non-human traffic |
14:02
🔗
|
arkiver |
But that'll be fixed |
14:03
🔗
|
Ghost_of_ |
so, they're basically OK with the archiving? |
14:03
🔗
|
arkiver |
yes |
14:03
🔗
|
Ghost_of_ |
cool |
14:03
🔗
|
arkiver |
Log starting from here http://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-11-25,Wed&sel=435#l431 |
14:03
🔗
|
arkiver |
person is yukundali |
14:03
🔗
|
Ghost_of_ |
"a bad script" :) |
14:03
🔗
|
arkiver |
let's talk further in #archiveteam-bs though |
14:19
🔗
|
|
atomotic has joined #archiveteam |
14:19
🔗
|
|
icedice has joined #archiveteam |
14:33
🔗
|
antomatic |
aw, man.. got a docstoc grab running through recursive URLs including /../../../../Local%20Settings/Temporary%20%Internet%20Files/varun_/varun?AMFI%20-%20Association%20of%20Mutual%20Fund%20in%20India1_files/mutualfundind_files//../../../../Local%20Settings/Temporary%20%Internet%20Files/varun_/varun?AMFI%20-%20Association%20of%20Mutual%20Fund%20in%20India1_files/mutualfundind_files//../../../../Loca |
14:33
🔗
|
antomatic |
l%20Settings/Temporary%20%Internet%20Files/varun_/varun?AMFI%20-%20Association%20of%20Mutual%20Fund%20in%20India1_files/mutualfundind_files//../../../../Local%20Settings/Temporary%20%Internet%20Files/varun_/varun?AMFI%20-%20Association%20of%20Mutual%20Fund%20in%20India1_files/mutualfundind_files/ |
14:36
🔗
|
|
vitzli_ has quit IRC (Quit: Leaving) |
14:36
🔗
|
arkiver |
is that with updated scripts? |
14:36
🔗
|
antomatic |
yes |
14:39
🔗
|
arkiver |
do you have a full log for me? |
14:39
🔗
|
arkiver |
scripts do mark the url as a loop and skip the URL, so something else keeps queueing more URLs |
14:39
🔗
|
arkiver |
does* |
14:39
🔗
|
arkiver |
and which item is it? |
14:39
🔗
|
antomatic |
not sure, I'm trying to stop the scripts at the moment |
14:39
🔗
|
antomatic |
I only see this stuff going past on the screen |
14:40
🔗
|
Atluxity |
arkiver: when did you update script? |
14:40
🔗
|
arkiver |
yesterday (for me) |
14:42
🔗
|
arkiver |
I returned the version to the previous now and the tracker has a lot more requests coming in. |
14:42
🔗
|
arkiver |
so the problem is not updated scripts |
14:42
🔗
|
|
vitzli has joined #archiveteam |
14:52
🔗
|
Atluxity |
to make debugging easier I thought it was a good time for me to restart my hose, kill some of the long looping urls |
14:53
🔗
|
Atluxity |
I had some urls counting 60 000 chars in lenght |
14:54
🔗
|
Atluxity |
it seemed to affekt my memory |
14:55
🔗
|
Atluxity |
unfortunatly I did not consider you would have use for those logs |
14:55
🔗
|
|
zerkalo has quit IRC (Remote host closed the connection) |
15:05
🔗
|
antomatic |
arkiver: here's one looper - http://pastebin.com/VNA4CsUH |
15:06
🔗
|
antomatic |
the other one is a URL so long that it's bigger than my scrollback buffer - http://pastebin.com/zZk9Yh2q |
15:08
🔗
|
antomatic |
how can I run-pipeline and log everything too? |
15:08
🔗
|
antomatic |
[I realise that's a silly question but I just haven't done it before] |
15:11
🔗
|
antomatic |
Once things start to loop it looks like they go on for a while though: http://pastebin.com/wuWPP8Yt |
15:12
🔗
|
luckcolor |
if you want too do a pipeline |
15:13
🔗
|
luckcolor |
cat hello.txt > txt.log |
15:13
🔗
|
luckcolor |
using tee |
15:13
🔗
|
luckcolor |
command shoudl both print out and pipeline into a file |
15:14
🔗
|
luckcolor |
https://en.wikipedia.org/wiki/Tee_%28command%29 |
15:15
🔗
|
antomatic |
ah, interesting - thanks |
15:15
🔗
|
luckcolor |
np |
15:15
🔗
|
luckcolor |
so yeah it will rpobably be ./run-pipeline | tee log.txt |
15:16
🔗
|
luckcolor |
no wait |
15:16
🔗
|
luckcolor |
ùthe examble i made is probably wrong |
15:16
🔗
|
luckcolor |
:P |
15:16
🔗
|
luckcolor |
no it's fine |
15:16
🔗
|
luckcolor |
lint program.c | tee program.lint |
15:30
🔗
|
DFJustin |
https://twitter.com/wikileaks/status/669900131777576960 |
15:36
🔗
|
luckcolor |
well i'm not downloading that :P |
15:36
🔗
|
luckcolor |
too shitty internet speed |
15:41
🔗
|
SketchCow |
HAPPY THANKSGIVING FROM THE USANIANS TO ALL |
15:42
🔗
|
arkiver |
SketchCow: unfortunately chfoo has not yet responded to the googlecode rsync target problem |
15:43
🔗
|
arkiver |
can you please create an rsync target in /chfoo/ for googlecode? |
15:43
🔗
|
|
nightpool has joined #archiveteam |
15:44
🔗
|
|
scyther has joined #archiveteam |
15:49
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
15:49
🔗
|
|
nightpool has quit IRC (Read error: Operation timed out) |
15:51
🔗
|
SketchCow |
Done? |
15:51
🔗
|
SketchCow |
googlecode? |
15:52
🔗
|
icedice |
Can !yahoo be used at Blogspot and WordPress or will Google and Automattic detect that and block it? |
15:53
🔗
|
arkiver |
SketchCow: we are about to start the project, but we need a rsync target. |
15:53
🔗
|
arkiver |
We had a target on FOS, but it's not available anymore |
15:55
🔗
|
SketchCow |
I think I just added one |
15:55
🔗
|
arkiver |
Yes! looks like it's working! |
15:55
🔗
|
arkiver |
I'll start the google code project |
15:57
🔗
|
SketchCow |
ANything else before I go take care of my thanksgiving lady |
16:01
🔗
|
arkiver |
Nothing very important at the moment |
16:05
🔗
|
|
primus104 has joined #archiveteam |
16:10
🔗
|
SketchCow |
I'll be on and off today. |
16:12
🔗
|
|
nertzy has quit IRC (Read error: Connection reset by peer) |
16:17
🔗
|
SketchCow |
Heading down |
16:18
🔗
|
SketchCow |
One last thing: Of course all my automatic pushers had run out, so those are running, already took FOS's drive from 49% to 41% and dropping |
16:18
🔗
|
SketchCow |
All gamefront, of course |
16:18
🔗
|
SketchCow |
And Yuku continues uploading, and adrive went in |
16:19
🔗
|
|
Start has joined #archiveteam |
16:25
🔗
|
antomatic |
arkiver: can't tell you what caused it, but just seen "Lua runtime error: docstoc.lua:122: invalid capture index." scroll by |
16:26
🔗
|
arkiver |
something with a loop? |
16:26
🔗
|
antomatic |
i don't know - i don't think so |
16:27
🔗
|
|
zenguy_pc has quit IRC (Ping timeout: 252 seconds) |
16:30
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
16:31
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
16:34
🔗
|
|
dashcloud has joined #archiveteam |
16:39
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
16:39
🔗
|
|
nightpool has joined #archiveteam |
16:44
🔗
|
|
Jonimus has quit IRC (Read error: Operation timed out) |
16:46
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
16:47
🔗
|
|
nightpool has quit IRC (Read error: Operation timed out) |
16:59
🔗
|
luckcolor |
arkiver is still the googlecode project missing the target? |
17:03
🔗
|
Atluxity |
no |
17:03
🔗
|
Atluxity |
it has a target now |
17:03
🔗
|
Atluxity |
now we are waiting for a git merge |
17:03
🔗
|
arkiver |
luckcolor: yeah |
17:04
🔗
|
arkiver |
I mean no, it's not missing the target |
17:04
🔗
|
luckcolor |
ok |
17:04
🔗
|
luckcolor |
i'll do some items of docstoc in the meantime |
17:04
🔗
|
luckcolor |
also i had a random bug |
17:04
🔗
|
luckcolor |
when i loaded the warrior page it freeze |
17:04
🔗
|
luckcolor |
and a script error oocured on thebrowser |
17:05
🔗
|
luckcolor |
i rebooted the warrior and it's fixed |
17:06
🔗
|
HCross |
luckcolor, did it have items with lots of URLS in it? |
17:08
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
17:08
🔗
|
luckcolor |
maybe |
17:08
🔗
|
luckcolor |
i couldn't check |
17:09
🔗
|
luckcolor |
the webpage didn't render |
17:09
🔗
|
HCross |
if it has lots it will do that |
17:09
🔗
|
luckcolor |
definetely the web interface has some bugs |
17:09
🔗
|
luckcolor |
like not applying the web login password |
17:10
🔗
|
|
remsen has joined #archiveteam |
17:13
🔗
|
|
remsen2 has quit IRC (Read error: Operation timed out) |
17:29
🔗
|
jleclanch |
SketchCow: hey idk if this is useful to you, or anyone in the chan, but i found some scraping scripts for gog/steam/metacritic reviews i made a year ago for a thing. https://github.com/jleclanche/scrape-scripts |
17:36
🔗
|
arkiver |
So I'm now indexing some FTP servers |
17:36
🔗
|
arkiver |
Basically we'll also check every now and then if the FTPs have new files and grab those too |
17:51
🔗
|
icedice |
Do aborted ArchiveBot jobs still get uploaded to Archive.org? |
17:52
🔗
|
icedice |
I have a an !a archivation running that I think got everything I need and now it has started archiving the blogs of the people that commented on the page. |
17:53
🔗
|
icedice |
So I think it's time to pull the plug on that once |
17:53
🔗
|
icedice |
*one |
17:54
🔗
|
DFJustin |
yes |
17:54
🔗
|
DFJustin |
you can just use agressive ignores though which is usually better |
17:54
🔗
|
icedice |
DO you have any guide for that? |
17:55
🔗
|
|
xk_id has joined #archiveteam |
17:55
🔗
|
icedice |
Because 20 000 pages and counting is a bit much for one blog post archival |
17:55
🔗
|
DFJustin |
oh yeah just abort that |
17:55
🔗
|
icedice |
clarification: ca 1200 / 20 000+ archived |
17:58
🔗
|
|
Start has joined #archiveteam |
18:12
🔗
|
|
DopefishJ has joined #archiveteam |
18:12
🔗
|
|
swebb sets mode: +o DopefishJ |
18:13
🔗
|
|
DFJustin has quit IRC (Ping timeout: 310 seconds) |
18:29
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
18:30
🔗
|
|
xk_id has joined #archiveteam |
18:34
🔗
|
|
DFJustin has joined #archiveteam |
18:34
🔗
|
|
swebb sets mode: +o DFJustin |
18:35
🔗
|
|
DopefishJ has quit IRC (Read error: Operation timed out) |
18:37
🔗
|
|
philpem has joined #archiveteam |
18:40
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
18:46
🔗
|
|
xk_id_ has joined #archiveteam |
18:46
🔗
|
|
xk_id has quit IRC (Read error: Connection reset by peer) |
18:48
🔗
|
|
Start has joined #archiveteam |
18:58
🔗
|
|
Ghost_of_ has quit IRC (Quit: Leaving) |
19:06
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
19:06
🔗
|
|
Start has joined #archiveteam |
19:14
🔗
|
|
remsen has quit IRC (Read error: Operation timed out) |
19:16
🔗
|
|
bwn has joined #archiveteam |
19:16
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
19:25
🔗
|
|
xk_id_ has quit IRC (Remote host closed the connection) |
19:32
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
19:56
🔗
|
|
bwn has joined #archiveteam |
19:57
🔗
|
|
luckcolor has quit IRC (Quit: Leaving) |
19:58
🔗
|
icedice |
How do I add the no-offsite-links ignore pattern to a running archivation job? |
19:59
🔗
|
Atluxity |
define running archivation job |
19:59
🔗
|
icedice |
I am archiving a Photobucket album and currently it's archiving Pintrest images |
20:00
🔗
|
Atluxity |
how is this archivation done? |
20:01
🔗
|
icedice |
I used !a http://smg.photobucket.com/user/BlackjackGabbiani/library/Snatcher%20Leo/ on ArchiveBot |
20:01
🔗
|
Atluxity |
ah |
20:02
🔗
|
icedice |
I figured that I wouldn't be archiving 3300 items just for seven images |
20:03
🔗
|
Atluxity |
I have never seen such an igset |
20:03
🔗
|
Atluxity |
are you looking for !ao ? |
20:04
🔗
|
icedice |
I've been told that !ao wouldn't get the fullsize images |
20:05
🔗
|
icedice |
and that it'd just get the thumbnails |
20:05
🔗
|
icedice |
https://archivebot.readthedocs.org/en/latest/commands.html#ignore |
20:07
🔗
|
joepie91 |
I don't believe it is possible to ignore offsite links later on |
20:08
🔗
|
joepie91 |
(currently) |
20:08
🔗
|
joepie91 |
icedice: you can hack something together with a regex with a negative lookahead, if you're feeling adventurous, but that will also prevent static assets from being downloaded |
20:09
🔗
|
icedice |
I think I'll just abort it |
20:09
🔗
|
icedice |
I have the pages archived since long ago by now |
20:09
🔗
|
Atluxity |
icedice: --no-offsite-links needs to be a parameter to !a |
20:09
🔗
|
Atluxity |
it can not be added with igset, it seems |
20:09
🔗
|
icedice |
and I don't see any appeal in archiving random Pintrest images |
20:09
🔗
|
icedice |
ok |
20:10
🔗
|
joepie91 |
you'd be surprised :P |
20:10
🔗
|
joepie91 |
heh |
20:10
🔗
|
icedice |
If I was doing a complete siterip, sure |
20:11
🔗
|
icedice |
but just archiving a few thousand Pintrest images seems more like a bumb in the road than a good archivation effort |
20:11
🔗
|
icedice |
I mean, if it was the target that would be another thing |
20:16
🔗
|
icedice |
Can the no-offsite-links pattern be added to the list of commands that can be used during archivation processes in a future update of ArchiveBot? |
20:19
🔗
|
icedice |
130 items instead of 3300 when using no-offsite-links on a small Photobucket album |
20:19
🔗
|
icedice |
Not bad |
20:20
🔗
|
icedice |
*edit: 255 (it hadn't counted all the items at that time, it seems) |
20:24
🔗
|
joepie91 |
icedice: best file a bug on the repo :) |
20:25
🔗
|
joepie91 |
icedice: https://github.com/ArchiveTeam/ArchiveBot |
20:25
🔗
|
icedice |
Ok, I'll do that |
20:25
🔗
|
|
xk_id has joined #archiveteam |
20:28
🔗
|
|
icedice has left Leaving |
20:32
🔗
|
|
xk_id has quit IRC (Read error: Operation timed out) |
20:50
🔗
|
|
Start has joined #archiveteam |
21:10
🔗
|
antomatic |
arkiver: docstoc doc number 18063834 goes into an /images/images/images/images/... loop, if that helps any? |
21:11
🔗
|
antomatic |
arkiver: and docis 6588886 has recursive &ref_url= s like http://embed.docstoc.com/handlers/downloadfilefromflash.ashx?docid=6588886&ref_url=http://www.docstoc.com/docs/6588886/../../../images/../../../skins-1.5/common/images/../../../images/../../../images/../../../skins-1.5/common/images/../../../images/../../../skins-1.5/common/images/../../../skins-1.5/common/images/../../../images/wikimedia-bu |
21:11
🔗
|
antomatic |
tton.png |
21:11
🔗
|
antomatic |
*docid |
21:11
🔗
|
arkiver |
it's always with the downloadfilefromflash urls |
21:12
🔗
|
antomatic |
mm, these last two certainly were |
21:15
🔗
|
antomatic |
then again, what creates those recursive URLs to serve as the referrer in the first place, I wonder. |
21:17
🔗
|
antomatic |
the bare referrer URL does seem to redirect to something valid-looking (although it's not there, so then redirects again to a 404) |
21:18
🔗
|
|
xk_id has joined #archiveteam |
21:18
🔗
|
antomatic |
but I wonder if there's something on the underlying pages giving out bad links which then get innocently followed and which then trigger the recursion |
21:19
🔗
|
|
atomotic has joined #archiveteam |
21:24
🔗
|
antomatic |
hm, wonder if the &ref_url element is even needed at all, come to think of it |
21:28
🔗
|
arkiver |
I'll block urls with a nuber of / i them |
21:28
🔗
|
arkiver |
in* |
21:29
🔗
|
|
cvb has joined #archiveteam |
21:38
🔗
|
|
Ungstein1 has quit IRC (Quit: Leaving.) |
21:42
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
21:49
🔗
|
|
Start has quit IRC (Read error: Operation timed out) |
21:49
🔗
|
|
nertzy has joined #archiveteam |
21:50
🔗
|
|
Start has joined #archiveteam |
21:50
🔗
|
|
nertzy has quit IRC (Client Quit) |
21:54
🔗
|
|
nertzy has joined #archiveteam |
21:56
🔗
|
arkiver |
SketchCow: I sent you a mail |
21:59
🔗
|
ersi |
I'll call you to make you aware of my notification on IRC where I notify you about the e-mail I sent you |
22:01
🔗
|
|
aaaaaaaaa has joined #archiveteam |
22:01
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
22:02
🔗
|
joepie91 |
ersi: I'll make sure to pass it along |
22:02
🔗
|
joepie91 |
;) |
22:07
🔗
|
|
BlueMaxim has joined #archiveteam |
22:13
🔗
|
HCross |
What is the docstock channel? |
22:17
🔗
|
|
scyther has quit IRC (Read error: Connection reset by peer) |
22:17
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
22:18
🔗
|
arkiver |
#docstop |
22:21
🔗
|
|
HCross has quit IRC (Read error: Operation timed out) |
22:26
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
22:28
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
22:29
🔗
|
|
schbirid has joined #archiveteam |
22:48
🔗
|
|
BlueMaxim has quit IRC (Read error: Connection reset by peer) |
22:49
🔗
|
|
BlueMaxim has joined #archiveteam |
22:51
🔗
|
|
JSharp___ has quit IRC (Remote host closed the connection) |
22:51
🔗
|
|
zyphlar__ has quit IRC (Remote host closed the connection) |
23:03
🔗
|
|
HarryCros has joined #archiveteam |
23:05
🔗
|
arkiver |
antomatic: sorry, I was busy with google code |
23:05
🔗
|
arkiver |
I'll add a ignore pattern for the loops tomorrow morning |
23:05
🔗
|
* |
arkiver is afk for the night |
23:06
🔗
|
arkiver |
loops always suck in these kind of projects. it's hard to find out in scripts is an url is a loop |
23:07
🔗
|
arkiver |
basically you can never be 100% sure an url is a loop without a human looking at it |
23:07
🔗
|
arkiver |
but we can be 90% sure with certain ignore patterns, so let's do that |
23:11
🔗
|
|
JSharp___ has joined #archiveteam |
23:19
🔗
|
|
zerkalo has joined #archiveteam |
23:25
🔗
|
|
zyphlar__ has joined #archiveteam |
23:27
🔗
|
|
maseck has quit IRC (Read error: Operation timed out) |
23:28
🔗
|
|
maseck has joined #archiveteam |
23:50
🔗
|
|
Start has joined #archiveteam |
23:58
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |