Time |
Nickname |
Message |
00:04
π
|
JesseW |
luckcolor: seems sensible to me |
00:07
π
|
|
dashcloud has joined #archiveteam |
00:08
π
|
SketchCow |
http://fos.textfiles.com/pipeline.html |
00:08
π
|
SketchCow |
As per Pipeline, we've begun moving nunij and area51 in (it says no mover script, that's not true) |
00:08
π
|
SketchCow |
It's a function that I'm not using |
00:19
π
|
JesseW |
hm, on http://fos.textfiles.com/ARCHIVETEAM/ I see various items just labeled as "archiveteam", like: archiveteam_20160902230555 which (according to the idx file) seems to consist of a bunch of torrents. Those are area51, I presume? |
00:19
π
|
|
hictooth has quit IRC (Ping timeout: 268 seconds) |
00:23
π
|
|
godane1 has quit IRC (Quit: Leaving.) |
00:23
π
|
JesseW |
ah archiveteam_20160902230555 has a title that identifies it as part of the "Torrent Time Capsule" |
00:24
π
|
JesseW |
others of the unlabeled (by identifier) ones are part of BayImg. One is the grab of thomas.congress.gov: archiveteam_201607050000 |
00:25
π
|
JesseW |
and another Friends Reunited: archiveteam_2016062210391011 |
00:25
π
|
JesseW |
which I think has a page on the wiki; I should add a link |
00:25
π
|
SketchCow |
Thanks, detective |
00:26
π
|
JesseW |
you knew I would |
00:26
π
|
JesseW |
(and there already is a link) |
00:27
π
|
SketchCow |
There's what you can do and what you should do |
00:27
π
|
JesseW |
eh, this one *seemed* to fit both categories. If not, I'm happy to know otherwise |
00:30
π
|
|
WinterFox has joined #archiveteam |
00:35
π
|
|
pfallenop has quit IRC (Read error: Operation timed out) |
00:43
π
|
arkiver |
Update on tumblr and flickr projects. I have now uploaded an original and deduplicate WARC here https://archive.org/download/flickrestdeduphmsdfofjdsd |
00:43
π
|
arkiver |
I asked in #warrior for help to see if these are correct. |
00:43
π
|
arkiver |
I have also asked the wayback team at Internet Archive if they can have a look at these two WARCs. |
00:44
π
|
arkiver |
If they are confirmed to be good, this deduplication script will be used in the tumblr and flickr projects. |
00:44
π
|
arkiver |
xmc, PurpleSym ^ |
00:49
π
|
|
kristian_ has joined #archiveteam |
01:08
π
|
arkiver |
for anyone running googlecode |
01:08
π
|
arkiver |
do your items also only get 503 and then abort? |
01:16
π
|
|
maelstrom has joined #archiveteam |
01:19
π
|
arkiver |
Please let me know as soon as possible why your items are aborting with googlecode |
01:23
π
|
JesseW |
Mine are just ratelimited. |
01:26
π
|
arkiver |
hmm |
01:26
π
|
arkiver |
well let me know if you do get any please |
01:26
π
|
* |
arkiver is afk for the night |
01:26
π
|
JesseW |
will do |
01:26
π
|
arkiver |
thanks! |
01:27
π
|
arkiver |
if someone can confirm the 503's, I'll send a mail |
01:33
π
|
JesseW |
arkiver: Ah, I got a 503! |
01:34
π
|
|
Brah has joined #archiveteam |
01:34
π
|
JesseW |
but I lost the logs :-( |
01:34
π
|
|
Brah has quit IRC (Client Quit) |
01:43
π
|
JesseW |
arkiver: got the 503's: http://paste.nerds.io/sokikubejo.js |
01:43
π
|
JesseW |
please send the email |
01:53
π
|
|
VADemon has quit IRC (Quit: left4dead) |
02:08
π
|
|
ndiddy has quit IRC (Ping timeout: 632 seconds) |
02:19
π
|
|
kristian_ has quit IRC (Quit: Leaving) |
02:19
π
|
joepie91 |
arkiver: JesseW: is that not the bot block page? |
02:24
π
|
JesseW |
i think so, yes |
02:24
π
|
JesseW |
that's why arkiver is going to write an email |
02:28
π
|
|
kristian_ has joined #archiveteam |
02:48
π
|
|
kristian_ has quit IRC (Quit: Leaving) |
03:40
π
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
03:43
π
|
|
pfallenop has joined #archiveteam |
04:04
π
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:08
π
|
|
signius has quit IRC (Read error: Operation timed out) |
04:10
π
|
|
Sk1d has joined #archiveteam |
04:24
π
|
|
signius has joined #archiveteam |
04:28
π
|
|
Aranje has quit IRC (Ping timeout: 260 seconds) |
04:48
π
|
|
maelstrom has quit IRC (Quit: Leaving) |
05:08
π
|
|
godane has joined #archiveteam |
06:08
π
|
xmc |
arkiver: neat. is it better to shovel around 10x more data than we are going to wind up with, or to figure out how to not fetch so much in the first place? |
06:29
π
|
|
Honno has joined #archiveteam |
06:57
π
|
|
BlueMaxim has joined #archiveteam |
07:12
π
|
|
DFJustin has quit IRC (Remote host closed the connection) |
07:12
π
|
|
DFJustin has joined #archiveteam |
07:12
π
|
|
swebb sets mode: +o DFJustin |
07:40
π
|
PurpleSym |
arkiver: warcat says βBad payload digest.β for revisit and warcinfo records in the deduplicated WARC. |
07:42
π
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
07:48
π
|
|
Simpbrain has quit IRC (Read error: Operation timed out) |
07:53
π
|
|
ravetcofx has quit IRC (Ping timeout: 501 seconds) |
08:03
π
|
|
metal_cam has joined #archiveteam |
08:30
π
|
|
vOYtEC has joined #archiveteam |
08:36
π
|
|
schbirid has joined #archiveteam |
08:39
π
|
|
Simpbrain has joined #archiveteam |
09:47
π
|
|
tuankiet has joined #archiveteam |
10:05
π
|
|
atomotic has joined #archiveteam |
10:17
π
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
10:20
π
|
|
kristian_ has joined #archiveteam |
10:59
π
|
|
REiN^ has joined #archiveteam |
11:03
π
|
|
bRick5772 has joined #archiveteam |
11:20
π
|
|
VADemon has joined #archiveteam |
11:30
π
|
|
bRick5772 has quit IRC (Quit: Leaving.) |
11:51
π
|
|
Morbus has quit IRC (http://www.disobey.com/) |
11:56
π
|
|
Morbus has joined #archiveteam |
11:56
π
|
|
kristian_ has quit IRC (Quit: Leaving) |
12:13
π
|
|
signius has quit IRC (Read error: Operation timed out) |
12:24
π
|
arkiver |
joepie91: yeah, but we're not crawling google code too fast, so I don't think it is caused by that |
12:24
π
|
arkiver |
JesseW: mail sent/ |
12:24
π
|
arkiver |
.* |
12:27
π
|
arkiver |
xmc: the HTTP headers returned by flickr on the images are not the for different URLs with the same pauload (same image on c1 and c2 for example). |
12:29
π
|
arkiver |
So if we would generate the record of these other image URLs instead of crawling them we'd have to fake or loose some of the headers |
12:29
π
|
arkiver |
What do you think? |
12:32
π
|
arkiver |
PurpleSym: I think this is caused by warcat not keeping revisit records in mind when checking the payload digest https://github.com/chfoo/warcat/blob/master/warcat/tool.py#L295-L300 and https://github.com/chfoo/warcat/blob/master/warcat/verify.py#L56-L67 |
12:32
π
|
arkiver |
I'm not 100% sure though |
12:32
π
|
arkiver |
The payload digest of the revisit records is the same as the payload digest of the record with the original data, where the revisit record is pointing too |
12:33
π
|
PurpleSym |
Youβre right, thatβs a bug. |
12:38
π
|
arkiver |
so |
12:38
π
|
arkiver |
anything on the bioware forums? |
12:41
π
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
12:45
π
|
arkiver |
I see it's in archivebot, nvm |
12:49
π
|
Medowar |
pipeline status page borked? http://fos.textfiles.com/pipeline.html |
12:53
π
|
|
Simpbrain has quit IRC (Read error: Connection reset by peer) |
13:15
π
|
arkiver |
hmm, let me retype <arkiver>xmc: the HTTP headers returned by flickr on the images are not the for different URLs with the same pauload (same image on c1 and c2 for example). |
13:16
π
|
arkiver |
xmc: the HTTP headers returned by flickr for the images are not the same for different URLs with the same payload (same image on c1 and c2 for example). |
13:37
π
|
|
phuzion has quit IRC (Read error: Operation timed out) |
13:40
π
|
|
phuzion has joined #archiveteam |
14:22
π
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
14:25
π
|
|
VADemon has quit IRC (Quit: left4dead) |
14:41
π
|
|
polm has quit IRC (Quit: leaving) |
14:48
π
|
|
signius has joined #archiveteam |
15:34
π
|
|
_hyperion has joined #archiveteam |
15:38
π
|
|
_hyperion is now known as arkiver2 |
15:39
π
|
|
arkiver2 has quit IRC (Quit: BitchX: try our Windows Me and Windows XP flavors too!) |
16:02
π
|
|
MMovie2 has joined #archiveteam |
16:03
π
|
|
MMovie has quit IRC (Read error: Operation timed out) |
16:21
π
|
|
ravetcofx has joined #archiveteam |
16:41
π
|
|
JesseW has joined #archiveteam |
16:44
π
|
|
VADemon has joined #archiveteam |
17:06
π
|
|
metalcamp has joined #archiveteam |
17:10
π
|
|
metal_cam has quit IRC (Read error: Operation timed out) |
17:11
π
|
|
metal_cam has joined #archiveteam |
17:16
π
|
|
metalcamp has quit IRC (Read error: Operation timed out) |
17:25
π
|
|
Infreq has quit IRC (Read error: Operation timed out) |
17:29
π
|
|
Infreq has joined #archiveteam |
18:37
π
|
|
metalcamp has joined #archiveteam |
18:42
π
|
|
metal_cam has quit IRC (Read error: Operation timed out) |
19:23
π
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
19:24
π
|
|
RichardG has joined #archiveteam |
19:25
π
|
|
metalcamp has quit IRC (Read error: Operation timed out) |
19:43
π
|
|
ndiddy has joined #archiveteam |
19:54
π
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
20:09
π
|
|
RichardG has quit IRC (Read error: Operation timed out) |
20:10
π
|
|
RichardG has joined #archiveteam |
20:21
π
|
|
daank has joined #archiveteam |
20:21
π
|
|
daank has quit IRC (Client Quit) |
20:34
π
|
|
metalcamp has joined #archiveteam |
20:48
π
|
|
Aranje has joined #archiveteam |
20:52
π
|
|
schbirid has quit IRC (Quit: Leaving) |
20:58
π
|
|
ravetcofx has quit IRC (Read error: Operation timed out) |
21:04
π
|
|
metalcamp has quit IRC (Read error: Operation timed out) |
21:21
π
|
|
vOYtEC has quit IRC (Ping timeout: 633 seconds) |
21:46
π
|
|
all_ has joined #archiveteam |
21:47
π
|
|
all_ has quit IRC (Client Quit) |
22:15
π
|
|
RichardG has quit IRC (Read error: Operation timed out) |
22:16
π
|
|
RichardG has joined #archiveteam |
22:19
π
|
|
WinterFox has joined #archiveteam |
22:24
π
|
|
VADemon has quit IRC (Quit: left4dead) |
22:45
π
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
23:12
π
|
|
JesseW has joined #archiveteam |
23:14
π
|
|
Honno has quit IRC (Read error: Operation timed out) |
23:33
π
|
|
arkiver2_ has joined #archiveteam |
23:33
π
|
|
arkiver2_ has quit IRC (Client Quit) |
23:51
π
|
xmc |
ah, huh |
23:51
π
|
xmc |
ok |
23:54
π
|
|
verizon has joined #archiveteam |
23:56
π
|
verizon |
hello =) |
23:56
π
|
xmc |
sorry, i don't like verizon |
23:56
π
|
verizon |
me neither |
23:57
π
|
verizon |
but who is the less worse |
23:57
π
|
verizon |
that is the question |
23:57
π
|
verizon |
so... any ideas |