Time |
Nickname |
Message |
00:02
π
|
|
SN4T14 has quit IRC (Quit: ZNC 1.6.3 - http://znc.in) |
00:03
π
|
|
SN4T14 has joined #archiveteam-bs |
00:11
π
|
|
nightpool has quit IRC (Read error: Operation timed out) |
00:13
π
|
|
nightpool has joined #archiveteam-bs |
01:01
π
|
|
TheLovina has quit IRC (Read error: Operation timed out) |
01:03
π
|
|
TheLovina has joined #archiveteam-bs |
01:16
π
|
hook54321 |
!a http://82.221.129.208/ --useragent firefox |
01:28
π
|
|
username1 has joined #archiveteam-bs |
01:31
π
|
|
schbirid2 has quit IRC (Read error: Operation timed out) |
01:51
π
|
|
pizzaiolo has quit IRC (Quit: pizzaiolo) |
02:41
π
|
|
Odd0002 has quit IRC (Remote host closed the connection) |
03:00
π
|
hook54321 |
So I just saw this: https://github.com/chfoo/wpull/issues/356 |
03:00
π
|
hook54321 |
Would it be possible to incentivize sites to not disallow ia_archiver in their robots.txt file by respecting delay specified in robots.txt? |
03:01
π
|
SketchCow |
We don't negotiate with terrorists |
03:01
π
|
hook54321 |
lol. |
03:02
π
|
Frogging |
:p |
03:06
π
|
hook54321 |
but like if we were going to do that as the issue suggests, i don't see why we would want to cooperate with people that disallow the wayback machine. |
03:07
π
|
hook54321 |
i think that it's stupid that some sites try to tell people to use a crawl delay of 10 seconds though |
03:27
π
|
hook54321 |
Brendan Eich appears to be supporting this: https://github.com/EdOverflow/security-txt |
03:32
π
|
|
qw3rty119 has joined #archiveteam-bs |
03:38
π
|
|
qw3rty118 has quit IRC (Read error: Operation timed out) |
03:51
π
|
|
Stilett0 has joined #archiveteam-bs |
04:09
π
|
hook54321 |
Wiki is acting kinda funny |
04:24
π
|
hook54321 |
JAA: Daily Stormer is moving to the TOR Network |
04:26
π
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
04:33
π
|
|
Sk1d has joined #archiveteam-bs |
04:37
π
|
hook54321 |
Apparently Google froze their domain, so they can't move it now. |
04:46
π
|
|
robink has quit IRC (Read error: Connection reset by peer) |
04:51
π
|
|
robink has joined #archiveteam-bs |
04:55
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
05:01
π
|
|
dashcloud has joined #archiveteam-bs |
05:05
π
|
|
kimmer22 has joined #archiveteam-bs |
05:14
π
|
|
kimmer2 has quit IRC (Ping timeout: 633 seconds) |
05:20
π
|
|
Stilett0 is now known as Stiletto |
05:25
π
|
|
kimmer2 has joined #archiveteam-bs |
05:33
π
|
|
kimmer22 has quit IRC (Ping timeout: 633 seconds) |
05:53
π
|
zino |
hook54321: Something that might be more fruitful is checking what the support for HTTP error 429 is in wpull. I've seen logs where we get a lot of 429s followed by a 200 followed by a lot of 429s again. RFC6585. Either: |
05:53
π
|
zino |
1) wpull does not handle the Retry-After header |
05:53
π
|
zino |
2) The site is still not prepared to answer requests after timeout |
05:53
π
|
zino |
3) The site does not send a Rety-After header |
05:53
π
|
zino |
If it's 2 or 3, then it's not much we can do, if it's 1 we would probably save all sides trouble by implementing it, and minimize chances to get IP-banned. Then add a pipeline override if there is reason to ignore requests from the server to back off. |
05:53
π
|
|
HCross has quit IRC (Read error: Connection reset by peer) |
05:54
π
|
|
HCross has joined #archiveteam-bs |
05:55
π
|
|
robogoat has quit IRC (Read error: Operation timed out) |
05:56
π
|
|
robogoat has joined #archiveteam-bs |
06:19
π
|
|
kimmer22 has joined #archiveteam-bs |
06:19
π
|
|
godane has quit IRC (Quit: Leaving.) |
06:26
π
|
|
kimmer2 has quit IRC (Ping timeout: 633 seconds) |
06:49
π
|
|
j08nY has joined #archiveteam-bs |
06:59
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
07:14
π
|
|
BlueMaxim has joined #archiveteam-bs |
07:15
π
|
|
kimmer2 has joined #archiveteam-bs |
07:15
π
|
|
TheLovina has quit IRC (Ping timeout: 370 seconds) |
07:15
π
|
|
TheLovina has joined #archiveteam-bs |
07:20
π
|
|
kimmer22 has quit IRC (Ping timeout: 633 seconds) |
07:20
π
|
|
dashcloud has joined #archiveteam-bs |
07:28
π
|
|
Boppen has quit IRC (Ping timeout: 194 seconds) |
07:41
π
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
07:41
π
|
|
BlueMaxim has joined #archiveteam-bs |
07:48
π
|
|
Honno has joined #archiveteam-bs |
07:49
π
|
|
HCross has quit IRC (Remote host closed the connection) |
07:49
π
|
|
HCross has joined #archiveteam-bs |
08:32
π
|
|
j08nY has quit IRC (Read error: Operation timed out) |
08:34
π
|
|
kimmer22 has joined #archiveteam-bs |
08:38
π
|
|
kimmer2 has quit IRC (Ping timeout: 633 seconds) |
08:40
π
|
|
kimmer2 has joined #archiveteam-bs |
08:40
π
|
|
Boppen has joined #archiveteam-bs |
08:45
π
|
|
kimmer22 has quit IRC (Ping timeout: 633 seconds) |
08:45
π
|
|
kimmer22 has joined #archiveteam-bs |
08:50
π
|
|
kimmer2 has quit IRC (Ping timeout: 632 seconds) |
08:51
π
|
hook54321 |
JAA: Onion address for Daily Stormer: http://dstormer6em3i4km.onion/ |
08:51
π
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
09:25
π
|
|
kimmer2 has joined #archiveteam-bs |
09:30
π
|
|
kimmer22 has quit IRC (Ping timeout: 633 seconds) |
09:32
π
|
|
kimmer1 has joined #archiveteam-bs |
09:36
π
|
|
godane has joined #archiveteam-bs |
09:37
π
|
godane |
looks like IA is down again |
09:48
π
|
hook54321 |
yup |
09:49
π
|
hook54321 |
nothing on their twitter yet. |
10:19
π
|
|
Honno has quit IRC (Read error: Operation timed out) |
10:30
π
|
|
Mateon1 has quit IRC (Ping timeout: 268 seconds) |
10:30
π
|
|
Mateon1 has joined #archiveteam-bs |
10:48
π
|
|
j08nY has joined #archiveteam-bs |
10:56
π
|
|
ivan has quit IRC (Leaving) |
11:18
π
|
|
marvinw has joined #archiveteam-bs |
11:21
π
|
JAA |
Very interesting court decision: https://www.reuters.com/article/us-microsoft-linkedin-ruling-idUSKCN1AU2BV |
11:44
π
|
|
atluxity1 has joined #archiveteam-bs |
11:46
π
|
|
atluxity has quit IRC (Ping timeout: 506 seconds) |
11:50
π
|
JAA |
We should start archiving whois information. |
11:50
π
|
JAA |
And DNS records |
12:43
π
|
joepie91 |
holy shit |
12:43
π
|
joepie91 |
that is actually a Very Big Deal |
13:17
π
|
|
s2e has joined #archiveteam-bs |
13:27
π
|
s2e |
Is there guidance on how to best submit dozens of websites to the internet archive in a way that is respectful of their infrastructure? I work in the internet freedom sector focusing on educational content and many of the resources that get created dissapear in months or a few years. I currently use a simple script to spider and submit new ones to the archive. I'd like to do this in a more automated fashion. |
13:27
π
|
s2e |
But, I want to make sure I am doing it as respectfully as possible. |
13:29
π
|
Sanqui |
to IA's infrastructure? |
13:29
π
|
|
j08nY has quit IRC (Read error: Operation timed out) |
13:29
π
|
Sanqui |
I mean, respectful of IA's infrastructure? |
13:29
π
|
Sanqui |
you probably want archivebot |
13:29
π
|
s2e |
Yeah, if possible. I've seen other efforts try to archive seperately, but they are largely unavailable to others |
13:30
π
|
Sanqui |
join #archivebot, check out how it works, submit a website with !a, watch the dashboard, it'll get absorbed into wayback |
13:30
π
|
s2e |
awesome |
13:32
π
|
Frogging |
joepie91: eli5? |
13:33
π
|
s2e |
Since archivebot is a volunteer service is the method it uses the best method for doing this without a drain on others resources? Is it something I could run on my own to do the archiving and supply the WARC files in the same way? |
13:33
π
|
Sanqui |
Frogging: my understanding is - it is legal to scrape public personal information on websites for commercial purposes |
13:34
π
|
Sanqui |
s2e: you could provide a pipeline, but I'm not sure if we're accepting right now; or you can run something like grab-site yourself, but you'd have to find some avenue to get the warcs into wayback. |
13:35
π
|
Sanqui |
Frogging: not only it is legal, you cannot put measures in place against it |
13:35
π
|
Frogging |
I see. |
13:35
π
|
Sanqui |
IANAL |
13:35
π
|
s2e |
Sanqui: Thanks! I'll start with archivebot and bother IA about WARC inclusion. |
13:35
π
|
Frogging |
the applications they mentioned on the page don't instill confidence |
13:36
π
|
Frogging |
using "publicly available data and artificial intelligence to help companies identify potential customers" |
13:36
π
|
Frogging |
building "algorithms capable of predicting employee behaviors, such as when they might quit" |
13:37
π
|
omglolbah |
"If LinkedIn is going to allow profiles to be indexed by search engines to benefit their platform then why shouldn't the rest of the internet benefit from that as well?" |
13:40
π
|
|
Mateon1 has quit IRC (Remote host closed the connection) |
13:40
π
|
|
kimmer22 has joined #archiveteam-bs |
13:41
π
|
|
Mateon1 has joined #archiveteam-bs |
13:43
π
|
|
s2e has left WeeChat 1.6 |
13:47
π
|
|
kimmer2 has quit IRC (Ping timeout: 633 seconds) |
14:15
π
|
|
j08nY has joined #archiveteam-bs |
15:01
π
|
|
pizzaiolo has joined #archiveteam-bs |
16:04
π
|
|
wabu has quit IRC (Read error: Operation timed out) |
16:09
π
|
|
kimmer2 has joined #archiveteam-bs |
16:13
π
|
|
username1 is now known as schbirid |
16:14
π
|
|
wabu has joined #archiveteam-bs |
16:17
π
|
|
kimmer22 has quit IRC (Ping timeout: 633 seconds) |
17:07
π
|
|
pizzaiolo has quit IRC (pizzaiolo) |
17:08
π
|
xmc |
JAA, joepie91: i was talking with FalconK the other day, and he mentioned the idea of running a recursive resolver that archives all results, and having archivebot and the warrior use it as their default resolvers |
17:08
π
|
xmc |
i really like this idea |
17:09
π
|
xmc |
i'm not sure what the proper archival format for DNS would be |
17:09
π
|
xmc |
I suppose you could cram it into a warc |
17:12
π
|
schbirid |
i thought warc is http |
17:12
π
|
schbirid |
*think |
17:12
π
|
PurpleSym |
It is not limited to HTTP, thereβs a generic βresourceβ record. |
17:13
π
|
schbirid |
oh nice |
17:16
π
|
godane |
this looks like a torrent of the IA 911 videos: http://torrentproject.se/2d64409b6f179bc999159284156b3534711447a1/ |
17:16
π
|
PurpleSym |
Also, DNS perfectly fits into the request/response scheme WARC is using for HTTP. |
17:19
π
|
JAA |
That's a nice idea, apart from the fact that it introduces a single point of failure. If the resolver is down, *everything* crashes and burns. |
17:22
π
|
xmc |
yes, also that |
17:33
π
|
joepie91 |
xmc: schbirid: heritrix stores DNS records in WARCs |
17:33
π
|
joepie91 |
or well, DNS requests and responses |
17:33
π
|
xmc |
hmmmmm |
17:36
π
|
|
kristian_ has joined #archiveteam-bs |
18:14
π
|
|
kristian_ has quit IRC (Quit: Leaving) |
18:23
π
|
godane |
so my birthday is tomorrow |
18:36
π
|
Aoede |
happy birthday godane (I would forget to say this tomorrow :p) |
18:53
π
|
|
fie_ has quit IRC (Ping timeout: 246 seconds) |
19:11
π
|
|
fie has joined #archiveteam-bs |
19:26
π
|
hook54321 |
godane: happy birthday |
19:44
π
|
|
kimmer2 has quit IRC (Ping timeout: 633 seconds) |
20:16
π
|
|
kimmer1 has quit IRC (Quit: Going offline, see ya! (www.adiirc.com)) |
20:56
π
|
hook54321 |
Anyone know if there's something like this for Firefox? https://github.com/kissarat/never-lose |
21:08
π
|
|
bwn has quit IRC (Ping timeout: 268 seconds) |
21:13
π
|
|
bwn has joined #archiveteam-bs |
21:56
π
|
|
Honno has joined #archiveteam-bs |
22:03
π
|
arkiver |
it's 00:03 here now, happy birthday godane :D |
22:16
π
|
|
DFJustin has quit IRC (Read error: Connection reset by peer) |
22:17
π
|
|
DFJustin has joined #archiveteam-bs |
22:18
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
22:18
π
|
|
dashcloud has joined #archiveteam-bs |
22:23
π
|
|
pikhq has quit IRC (Read error: Operation timed out) |
22:23
π
|
Frogging |
that repo's list of porn sites seems to have a disproportionate amount of gay porn |
22:24
π
|
Frogging |
and random tumblrs. interesting. I wonder where they got it from |
22:38
π
|
|
Igloo has quit IRC (Read error: Operation timed out) |
22:38
π
|
|
j08nY has quit IRC (Read error: Operation timed out) |
22:42
π
|
|
pikhq has joined #archiveteam-bs |
22:43
π
|
|
godane has quit IRC (Ping timeout: 250 seconds) |
22:43
π
|
|
Jonimus has quit IRC (Ping timeout: 268 seconds) |
22:45
π
|
|
j08nY has joined #archiveteam-bs |
22:47
π
|
|
godane has joined #archiveteam-bs |
22:47
π
|
|
Igloo has joined #archiveteam-bs |
22:56
π
|
* |
hook54321 shrugs |
23:08
π
|
|
qw3rty111 has joined #archiveteam-bs |
23:10
π
|
|
Jonimus has joined #archiveteam-bs |
23:10
π
|
|
swebb sets mode: +o Jonimus |
23:11
π
|
|
qw3rty112 has joined #archiveteam-bs |
23:11
π
|
|
qw3rty119 has quit IRC (Ping timeout: 600 seconds) |
23:18
π
|
|
qw3rty111 has quit IRC (Read error: Operation timed out) |
23:30
π
|
|
j08nY has quit IRC (Quit: Leaving) |