Time |
Nickname |
Message |
00:00
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
00:03
🔗
|
|
BlueMaxim has joined #archiveteam |
00:08
🔗
|
|
Kitaru has joined #archiveteam |
00:18
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
00:25
🔗
|
hook54321 |
What useragent does archivebot use? |
00:38
🔗
|
FalconK |
erm? |
00:38
🔗
|
FalconK |
why do you ask? |
00:38
🔗
|
FalconK |
each of the bots can use their own |
00:59
🔗
|
|
yipdw has quit IRC (Quit: No Ping reply in 180 seconds.) |
01:02
🔗
|
|
yipdw has joined #archiveteam |
01:03
🔗
|
|
JesseW has joined #archiveteam |
01:10
🔗
|
|
j08nY has quit IRC (Quit: Leaving) |
01:17
🔗
|
|
Aranje has quit IRC (Ping timeout: 260 seconds) |
01:17
🔗
|
hook54321 |
FalconK: I want to check what a site looks like with the UA before archiving it. |
01:18
🔗
|
FalconK |
:) |
01:18
🔗
|
FalconK |
the default is in the source code somewhere, over on github |
01:32
🔗
|
hook54321 |
FalconK: Default/1 ? |
01:33
🔗
|
|
fie has joined #archiveteam |
01:35
🔗
|
|
Aranje has joined #archiveteam |
01:37
🔗
|
|
Kitaru has joined #archiveteam |
01:50
🔗
|
|
Aranje has quit IRC (Ping timeout: 260 seconds) |
01:56
🔗
|
|
Aranje has joined #archiveteam |
02:29
🔗
|
|
Aranje has quit IRC (Quit: Three sheets to the wind) |
02:32
🔗
|
SketchCow |
Downloaded: 41395 files, 748G in 4d 5h 8m 25s (2.10 MB/s) |
02:32
🔗
|
SketchCow |
WHEEEEEEEEEEEEEEEEEEEEEEEEEE |
02:34
🔗
|
Frogging |
such speed |
02:34
🔗
|
|
Frogging sets mode: +o yipdw |
02:48
🔗
|
|
philpem has quit IRC (Ping timeout: 260 seconds) |
02:49
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
03:12
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
03:15
🔗
|
|
BartoCH has joined #archiveteam |
03:23
🔗
|
|
RichardG has quit IRC (Ping timeout: 260 seconds) |
03:24
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
03:27
🔗
|
|
dashcloud has joined #archiveteam |
03:52
🔗
|
|
RichardG has joined #archiveteam |
04:05
🔗
|
|
JesseW has joined #archiveteam |
04:05
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:09
🔗
|
yipdw |
hook54321: https://github.com/ArchiveTeam/ArchiveBot/blob/master/pipeline/pipeline.py#L111-L114 |
04:10
🔗
|
yipdw |
the "and not Mozilla ..." bit is there to satisfy checks for /Mozilla/ etc |
04:13
🔗
|
|
Sk1d has joined #archiveteam |
04:28
🔗
|
|
VADemon has joined #archiveteam |
04:33
🔗
|
|
Stiletto has quit IRC () |
04:48
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
04:49
🔗
|
|
DiscantX has joined #archiveteam |
05:06
🔗
|
|
Kitaru has joined #archiveteam |
05:10
🔗
|
|
Kitaru has quit IRC (Client Quit) |
05:17
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
05:18
🔗
|
|
DiscantX has quit IRC (Read error: Operation timed out) |
05:20
🔗
|
|
dashcloud has joined #archiveteam |
05:32
🔗
|
|
Stiletto has joined #archiveteam |
05:33
🔗
|
|
robink has joined #archiveteam |
05:50
🔗
|
|
Deewiant_ has joined #archiveteam |
05:51
🔗
|
|
aschmitz_ has joined #archiveteam |
05:51
🔗
|
|
LordNigh2 has joined #archiveteam |
05:52
🔗
|
|
aschmitz has quit IRC (hub.se efnet.portlane.se) |
05:52
🔗
|
|
d_rebel has quit IRC (hub.se efnet.portlane.se) |
05:52
🔗
|
|
Lord_Nigh has quit IRC (hub.se efnet.portlane.se) |
05:52
🔗
|
|
wp494 has quit IRC (hub.se efnet.portlane.se) |
05:52
🔗
|
|
Gfy has quit IRC (hub.se efnet.portlane.se) |
05:52
🔗
|
|
Deewiant has quit IRC (hub.se efnet.portlane.se) |
05:52
🔗
|
|
thefinn93 has quit IRC (hub.se efnet.portlane.se) |
05:52
🔗
|
|
espes__ has quit IRC (hub.se efnet.portlane.se) |
05:52
🔗
|
|
xhdr has quit IRC (hub.se efnet.portlane.se) |
05:52
🔗
|
|
Fletcher_ has quit IRC (hub.se efnet.portlane.se) |
05:52
🔗
|
|
xhdr- has joined #archiveteam |
05:55
🔗
|
|
d_rebel has joined #archiveteam |
05:55
🔗
|
|
wp494 has joined #archiveteam |
05:55
🔗
|
|
thefinn93 has joined #archiveteam |
05:55
🔗
|
|
Fletcher_ has joined #archiveteam |
06:03
🔗
|
|
Gfy_ has joined #archiveteam |
06:07
🔗
|
|
LordNigh2 is now known as Lord_Nigh |
06:09
🔗
|
|
mutoso has quit IRC (Read error: Operation timed out) |
06:09
🔗
|
|
mutoso has joined #archiveteam |
06:15
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
06:16
🔗
|
|
Deewiant_ is now known as Deewiant |
06:17
🔗
|
|
espes__ has joined #archiveteam |
06:34
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
06:37
🔗
|
|
Start has joined #archiveteam |
07:03
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
07:13
🔗
|
|
bwn has quit IRC (Ping timeout: 244 seconds) |
07:19
🔗
|
|
bwn has joined #archiveteam |
07:27
🔗
|
|
Lord_Nigh has quit IRC (Ping timeout: 244 seconds) |
07:30
🔗
|
|
Lord_Nigh has joined #archiveteam |
07:55
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
07:55
🔗
|
|
RichardG has joined #archiveteam |
08:14
🔗
|
|
DiscantX has joined #archiveteam |
08:28
🔗
|
|
redlob has quit IRC (Quit: ZNC - http://znc.in) |
08:33
🔗
|
|
redlob has joined #archiveteam |
08:33
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
08:33
🔗
|
|
dashcloud has joined #archiveteam |
09:01
🔗
|
|
Wuked has joined #archiveteam |
09:02
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
09:02
🔗
|
|
RichardG has joined #archiveteam |
09:14
🔗
|
|
WinterFox has joined #archiveteam |
09:18
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
09:23
🔗
|
|
dashcloud has joined #archiveteam |
09:26
🔗
|
|
DiscantX has quit IRC (Read error: Operation timed out) |
09:43
🔗
|
|
Wuked has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) |
09:58
🔗
|
|
Wuked has joined #archiveteam |
10:11
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
10:11
🔗
|
|
RichardG has joined #archiveteam |
10:31
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
10:33
🔗
|
Medowar |
since thomas is finished, can anyone set the default back to urlteam? |
10:57
🔗
|
|
zhongfu_ has joined #archiveteam |
10:58
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
10:58
🔗
|
|
zhongfu has quit IRC (Ping timeout: 260 seconds) |
10:58
🔗
|
|
RichardG has joined #archiveteam |
11:23
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
11:23
🔗
|
|
RichardG has joined #archiveteam |
11:26
🔗
|
|
zhongfu_ is now known as zhongfu |
11:32
🔗
|
Igloo |
Medowar: there are still items for Thomas? |
11:53
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
11:53
🔗
|
|
RichardG has joined #archiveteam |
12:13
🔗
|
Frogging |
Thomas is gone though |
12:31
🔗
|
|
j08nY has joined #archiveteam |
12:44
🔗
|
Igloo |
Ah has it finally shut? |
12:48
🔗
|
Frogging |
Yes. I forget the date but it was in the past week |
12:49
🔗
|
Igloo |
I was still getting results from it a day or so back |
12:49
🔗
|
Igloo |
Nevermind, I think we got a good chunk |
13:08
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
13:15
🔗
|
|
dashcloud has joined #archiveteam |
13:16
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
13:16
🔗
|
|
RichardG has joined #archiveteam |
13:44
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
13:44
🔗
|
|
RichardG has joined #archiveteam |
13:47
🔗
|
|
hellow has joined #archiveteam |
13:47
🔗
|
|
hellow is now known as bayesianp |
13:49
🔗
|
bayesianp |
http://www.examiner.com/ is closing on July 10 |
13:49
🔗
|
bayesianp |
see: http://wikipediocracy.com/forum/viewtopic.php?f=21&t=7869 |
13:50
🔗
|
bayesianp |
Has anyone archived it yet? |
13:51
🔗
|
VADemon |
^ robots.txt is quite restrictive but it has a /sitemap.xml |
13:51
🔗
|
Igloo |
It's being crawled right now bayesianp |
13:55
🔗
|
bayesianp |
VADemon: I only see a HTML sitemap... where's the xml? |
13:58
🔗
|
VADemon |
Sorry, I should've copy-pasted the URL: http://www.examiner.com/sitemapindex.xml which leads to other pages containing the actual links |
14:01
🔗
|
bayesianp |
Igloo: are you using the sitemap.xml for the crawl? |
14:01
🔗
|
Igloo |
It's in our archivebot added by SketchCow |
14:02
🔗
|
Igloo |
I don't know what it was seeded with but I don't think so |
14:03
🔗
|
godane |
code to make web archive based on day: curl -L -s http://www.examiner.com/html_sitemap/content/2010/01/01 | grep '^<li><a href="' | sed 's|<li><a href="|http://www.examiner.com|g' | sed 's|".*||g' |
14:06
🔗
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
14:19
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
14:28
🔗
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
14:33
🔗
|
|
dashcloud has joined #archiveteam |
14:44
🔗
|
|
bayesianp has quit IRC (Quit: -) |
14:55
🔗
|
|
Start has joined #archiveteam |
14:55
🔗
|
|
atomotic has joined #archiveteam |
15:02
🔗
|
HCross |
One thing: If anyone wants to run a newsbuddy grabber, we really really could do with a few more now |
15:02
🔗
|
HCross |
just let me know if you want to help, please |
15:02
🔗
|
Atluxity |
HCross: I'll get back to you when I got some time |
15:03
🔗
|
HCross |
ok. thanks |
15:03
🔗
|
Atluxity |
maybe some time during the summer |
15:03
🔗
|
Igloo |
I can run one HCross |
15:03
🔗
|
Igloo |
Tell me what you need |
15:03
🔗
|
HCross |
Igloo, #newsgrabber |
15:04
🔗
|
|
BlueMaxim has joined #archiveteam |
15:07
🔗
|
|
Kitaru has joined #archiveteam |
15:10
🔗
|
|
zxtx has left Leaving |
15:29
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
15:29
🔗
|
|
RichardG has joined #archiveteam |
15:30
🔗
|
|
JesseW has joined #archiveteam |
15:35
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
15:38
🔗
|
|
Aranje has joined #archiveteam |
15:41
🔗
|
SketchCow |
============== |
15:41
🔗
|
SketchCow |
1. A bug in the archive uploader script meant our uploads weren't being derived/integrated. A fix is coming very shortly. |
15:42
🔗
|
|
metalcamp has joined #archiveteam |
15:42
🔗
|
SketchCow |
2. A bug with the ROBOTS.TXT being misread is fixed and stuff will be seen again |
15:42
🔗
|
SketchCow |
============== |
15:43
🔗
|
|
Kitaru_ has joined #archiveteam |
15:43
🔗
|
|
Kitaru has quit IRC (Ping timeout: 258 seconds) |
15:47
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
16:00
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
16:09
🔗
|
|
SilSte has quit IRC (Ping timeout: 194 seconds) |
16:16
🔗
|
|
Kitaru_ has quit IRC (Quit: This computer has gone to sleep) |
16:24
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
16:46
🔗
|
|
morbus_ has joined #archiveteam |
16:48
🔗
|
|
Morbus has quit IRC (Read error: Operation timed out) |
16:53
🔗
|
|
Kitaru has joined #archiveteam |
17:11
🔗
|
|
j08nY has quit IRC (Ping timeout: 633 seconds) |
17:19
🔗
|
swebb |
I'm having problems uploading files to internet archive using the ia tool - my uploads are being disconnected after about 300MB or so of an upload. |
17:20
🔗
|
swebb |
requests.exceptions.ConnectionError: ('Connection aborted.', error(32, 'Broken pipe')) |
17:20
🔗
|
swebb |
uploading WEB-20160629153930591-00003-32710~db~8443.warc.gz: [################################] 411/494 - 00:00:30 |
17:21
🔗
|
xmc |
is it just a RST or is there other stuff happening to the connection also? |
17:22
🔗
|
swebb |
It is reproducable - I've not had any files > 300MB succeed in the last 20-30 mins |
17:22
🔗
|
xmc |
hm |
17:22
🔗
|
xmc |
maybe you're hitting an angry s3 node |
17:23
🔗
|
swebb |
possibly |
17:23
🔗
|
swebb |
Turning on --debug doesn't really provide any interesting information |
17:23
🔗
|
xmc |
tcpdump? :) |
17:29
🔗
|
SketchCow |
Angry S3 Node is my spirit animal |
17:29
🔗
|
|
bauruine has quit IRC (Ping timeout: 260 seconds) |
17:31
🔗
|
swebb |
tcpdump just says tons of RST's from IA. |
17:31
🔗
|
swebb |
35.197587 207.241.224.50 -> 10.0.0.13 TCP 54 http > 57043 [RST] Seq=959 Win=0 Len=0 |
17:31
🔗
|
swebb |
35.197707 207.241.224.50 -> 10.0.0.13 TCP 54 http > 57043 [RST] Seq=959 Win=0 Len=0 |
17:31
🔗
|
swebb |
35.197767 207.241.224.50 -> 10.0.0.13 TCP 54 http > 57043 [RST] Seq=959 Win=0 Len=0 |
17:31
🔗
|
xmc |
huuuh |
17:32
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
17:32
🔗
|
swebb |
I've got a pretty fast connection - could I be overloading the S3 endpoint? |
17:32
🔗
|
xmc |
maybe its disk is full? |
17:33
🔗
|
xmc |
what's the dns name of the s3 endpoint again? |
17:33
🔗
|
swebb |
yea, perhaps. Something along those lines maybe. |
17:33
🔗
|
swebb |
s3.us.archive.org |
17:34
🔗
|
xmc |
s3.us.archive.org is an alias for s3-lb1.us.archive.org. |
17:34
🔗
|
xmc |
hm |
17:35
🔗
|
swebb |
I'll just wait a while and try again later. If it's a full disk or something that's temporary, it'll resolve itself. |
17:35
🔗
|
HCross |
swebb, watching newsbuddy and ill let you know if he has any issues |
17:35
🔗
|
swebb |
ok |
17:35
🔗
|
swebb |
I'm only trying to upload like 3.5 GB, so I can wait. :) |
17:35
🔗
|
HCross |
hes pushed out over 1.2TB today fine |
17:38
🔗
|
|
philpem has joined #archiveteam |
17:39
🔗
|
|
Stilett0 has joined #archiveteam |
17:39
🔗
|
|
dashcloud has joined #archiveteam |
17:39
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
17:40
🔗
|
Igloo |
HCross: Newsbuddy is hardcore. Does IA have the capcity for that?! |
17:43
🔗
|
swebb |
Weird, but I used tc to throttle my connection to 1mbps and the transfer was killed after approx 30s. I then throttled to 2mbps and again, dropped after 30s. |
17:44
🔗
|
swebb |
it also seems to slow down pretty dramatically just before the disconnect. |
17:46
🔗
|
HCross |
Igloo, yes. Sometimes. It depends what part of the IA is on fire |
17:47
🔗
|
swebb |
My crawl of gawker media is 122G compressed so-far. |
17:50
🔗
|
|
Aranje has quit IRC (Quit: Three sheets to the wind) |
17:52
🔗
|
|
bauruine has joined #archiveteam |
17:54
🔗
|
|
Kitaru_ has joined #archiveteam |
17:56
🔗
|
|
Kitaru has quit IRC (Ping timeout: 258 seconds) |
18:19
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
18:19
🔗
|
|
RichardG has joined #archiveteam |
18:26
🔗
|
|
Chorca has joined #archiveteam |
18:43
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
18:43
🔗
|
|
RichardG has joined #archiveteam |
19:13
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
19:13
🔗
|
|
RichardG has joined #archiveteam |
19:14
🔗
|
|
DiscantX has joined #archiveteam |
19:19
🔗
|
joepie91 |
https://techcrunch.com/2016/07/07/vyclone-hits-the-deadpool/ |
19:40
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
19:40
🔗
|
|
RichardG has joined #archiveteam |
19:40
🔗
|
|
Gfy_ is now known as Gfy |
19:45
🔗
|
swebb |
uploads are working again. |
19:45
🔗
|
swebb |
(for me) |
19:46
🔗
|
Igloo |
Working for me, Currently pumping ~100mbit at IA |
19:47
🔗
|
swebb |
Yea, me too. I'm not sure at the data rate though. I've got a gig fibre connection, so probably somewhere around 100mbit |
20:15
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
20:15
🔗
|
|
RichardG has joined #archiveteam |
20:49
🔗
|
|
DiscantX has quit IRC (Read error: Operation timed out) |
20:50
🔗
|
|
j08nY has joined #archiveteam |
20:57
🔗
|
|
Wuked has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) |
21:09
🔗
|
|
Aranje has joined #archiveteam |
21:17
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
21:33
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
21:36
🔗
|
|
dashcloud has joined #archiveteam |
21:57
🔗
|
|
Swizzle has joined #archiveteam |
21:59
🔗
|
|
DoomTay has joined #archiveteam |
22:01
🔗
|
|
maseck has quit IRC (Read error: Operation timed out) |
22:21
🔗
|
|
Start has joined #archiveteam |
22:43
🔗
|
|
JordanJ2 has quit IRC (ZNC - http://znc.in) |
23:00
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
23:09
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
23:09
🔗
|
|
RichardG has joined #archiveteam |
23:36
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
23:36
🔗
|
|
RichardG has joined #archiveteam |
23:46
🔗
|
|
j08nY has quit IRC (Remote host closed the connection) |
23:52
🔗
|
|
BlueMaxim has joined #archiveteam |
23:54
🔗
|
|
maseck has joined #archiveteam |
23:55
🔗
|
|
DoomTay has quit IRC (Ping timeout: 268 seconds) |
23:59
🔗
|
|
Kitaru_ has quit IRC (Quit: This computer has gone to sleep) |