Time |
Nickname |
Message |
00:17
🔗
|
|
yawkat has quit IRC (Ping timeout: 496 seconds) |
00:29
🔗
|
|
BlueMax has joined #archiveteam-ot |
00:36
🔗
|
|
yawkat has joined #archiveteam-ot |
00:50
🔗
|
|
DigiDigi has joined #archiveteam-ot |
01:13
🔗
|
|
jrwr has quit IRC (Ping timeout: 264 seconds) |
01:13
🔗
|
|
slyphic has quit IRC (Read error: Operation timed out) |
01:13
🔗
|
|
erin has quit IRC (Read error: Operation timed out) |
01:13
🔗
|
|
Frogging has quit IRC (Read error: Operation timed out) |
01:13
🔗
|
|
nyany has quit IRC (Read error: Operation timed out) |
01:13
🔗
|
|
astrid has quit IRC (Read error: Operation timed out) |
01:13
🔗
|
|
Igloo has quit IRC (Read error: Operation timed out) |
01:13
🔗
|
|
superkuh has quit IRC (Read error: Operation timed out) |
01:13
🔗
|
|
nightpool has quit IRC (Write error: Broken pipe) |
01:13
🔗
|
|
Frogging has joined #archiveteam-ot |
01:13
🔗
|
|
nightpoo- has joined #archiveteam-ot |
01:14
🔗
|
|
chfoo has quit IRC (Read error: Operation timed out) |
01:15
🔗
|
|
jognsmith has quit IRC (Ping timeout: 264 seconds) |
01:15
🔗
|
|
VADemon has joined #archiveteam-ot |
01:15
🔗
|
|
arkiver has quit IRC (Read error: Operation timed out) |
01:15
🔗
|
|
chfoo has joined #archiveteam-ot |
01:15
🔗
|
|
terry1 has quit IRC (Read error: Operation timed out) |
01:15
🔗
|
|
Fusl__ sets mode: +o chfoo |
01:15
🔗
|
|
Fusl sets mode: +o chfoo |
01:15
🔗
|
|
Fusl_ sets mode: +o chfoo |
01:16
🔗
|
|
VADemon_ has quit IRC (Read error: Operation timed out) |
01:17
🔗
|
|
arkiver has joined #archiveteam-ot |
01:17
🔗
|
|
Fusl__ sets mode: +o arkiver |
01:17
🔗
|
|
Fusl sets mode: +o arkiver |
01:17
🔗
|
|
Fusl_ sets mode: +o arkiver |
01:17
🔗
|
|
phirephly has quit IRC (Ping timeout: 360 seconds) |
01:18
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
01:18
🔗
|
|
MrRadar_ has quit IRC (Read error: Operation timed out) |
01:18
🔗
|
|
chirlu has quit IRC (Read error: Operation timed out) |
01:21
🔗
|
|
chirlu has joined #archiveteam-ot |
01:23
🔗
|
|
phirephly has joined #archiveteam-ot |
01:24
🔗
|
|
DigiDigi has quit IRC (Read error: Operation timed out) |
01:28
🔗
|
|
superkuh has joined #archiveteam-ot |
01:28
🔗
|
|
schbirid has joined #archiveteam-ot |
01:53
🔗
|
|
terry1 has joined #archiveteam-ot |
02:04
🔗
|
|
nyany has joined #archiveteam-ot |
02:04
🔗
|
|
Igloo has joined #archiveteam-ot |
02:04
🔗
|
|
jrwr has joined #archiveteam-ot |
02:04
🔗
|
|
Fusl sets mode: +o jrwr |
02:04
🔗
|
|
Fusl__ sets mode: +o jrwr |
02:04
🔗
|
|
Fusl_ sets mode: +o jrwr |
02:04
🔗
|
|
DigiDigi has joined #archiveteam-ot |
02:04
🔗
|
|
slyphic has joined #archiveteam-ot |
02:05
🔗
|
|
svchfoo3 sets mode: +o Igloo |
02:05
🔗
|
|
svchfoo1 sets mode: +o Igloo |
02:08
🔗
|
|
MrRadar has joined #archiveteam-ot |
02:13
🔗
|
|
erin has joined #archiveteam-ot |
02:21
🔗
|
|
astrid has joined #archiveteam-ot |
02:21
🔗
|
|
Fusl sets mode: +o astrid |
02:21
🔗
|
|
Fusl__ sets mode: +o astrid |
02:21
🔗
|
|
Fusl_ sets mode: +o astrid |
03:27
🔗
|
|
qw3rty has joined #archiveteam-ot |
03:34
🔗
|
|
qw3rty2 has quit IRC (Ping timeout: 745 seconds) |
03:44
🔗
|
|
odemg has quit IRC (Read error: Operation timed out) |
03:58
🔗
|
|
odemg has joined #archiveteam-ot |
05:00
🔗
|
|
katocala has quit IRC (Read error: Connection reset by peer) |
05:01
🔗
|
|
katocala has joined #archiveteam-ot |
05:09
🔗
|
|
Raccoon has quit IRC (Remote host closed the connection) |
05:22
🔗
|
|
DigiDigi has quit IRC (Remote host closed the connection) |
05:34
🔗
|
|
katocala has quit IRC (Read error: Operation timed out) |
06:14
🔗
|
|
kiska has quit IRC (Quit: Ping timeout (120 seconds)) |
06:14
🔗
|
|
kiska has joined #archiveteam-ot |
06:15
🔗
|
|
Fusl__ sets mode: +o kiska |
06:15
🔗
|
|
Fusl sets mode: +o kiska |
06:15
🔗
|
|
Fusl_ sets mode: +o kiska |
06:31
🔗
|
|
Raccoon has joined #archiveteam-ot |
08:55
🔗
|
|
Dragnog2 has quit IRC (Quit: Connection closed for inactivity) |
09:12
🔗
|
|
yawkat has quit IRC (Ping timeout: 604 seconds) |
09:35
🔗
|
|
yawkat has joined #archiveteam-ot |
09:48
🔗
|
|
ShellyRol has quit IRC (Read error: Connection reset by peer) |
09:48
🔗
|
|
ShellyRol has joined #archiveteam-ot |
10:03
🔗
|
|
Raccoon has quit IRC (Ping timeout: 258 seconds) |
10:05
🔗
|
|
Raccoon has joined #archiveteam-ot |
11:27
🔗
|
|
katocala has joined #archiveteam-ot |
12:10
🔗
|
|
Dragnog2 has joined #archiveteam-ot |
12:16
🔗
|
|
ShellyRol has quit IRC (Ping timeout: 745 seconds) |
12:16
🔗
|
|
ShellyRol has joined #archiveteam-ot |
12:55
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
13:03
🔗
|
|
DigiDigi has joined #archiveteam-ot |
14:20
🔗
|
|
Dragnog2 has quit IRC (Quit: Connection closed for inactivity) |
15:01
🔗
|
|
DogsRNice has joined #archiveteam-ot |
16:23
🔗
|
|
Hani111 has joined #archiveteam-ot |
16:28
🔗
|
|
Laverne has quit IRC (Quit: ZNC 1.7.1+deb1+bionic1 - https://znc.in) |
16:32
🔗
|
|
Laverne has joined #archiveteam-ot |
16:34
🔗
|
|
Hani has quit IRC (Ping timeout: 745 seconds) |
16:34
🔗
|
|
Hani111 is now known as Hani |
17:24
🔗
|
|
Dragnog2 has joined #archiveteam-ot |
18:11
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
18:18
🔗
|
|
kiska1 has quit IRC (Read error: Operation timed out) |
18:22
🔗
|
|
kiska1 has joined #archiveteam-ot |
18:22
🔗
|
|
Fusl__ sets mode: +o kiska1 |
18:22
🔗
|
|
Fusl sets mode: +o kiska1 |
18:22
🔗
|
|
Fusl_ sets mode: +o kiska1 |
18:24
🔗
|
|
kiska1 has quit IRC (Read error: Operation timed out) |
18:39
🔗
|
|
kiska1 has joined #archiveteam-ot |
18:39
🔗
|
|
Fusl__ sets mode: +o kiska1 |
18:39
🔗
|
|
Fusl sets mode: +o kiska1 |
18:39
🔗
|
|
Fusl_ sets mode: +o kiska1 |
18:40
🔗
|
|
m007a83 has quit IRC (Read error: Connection reset by peer) |
18:52
🔗
|
|
kiska has quit IRC (Remote host closed the connection) |
18:52
🔗
|
|
kiska has joined #archiveteam-ot |
18:52
🔗
|
|
Fusl__ sets mode: +o kiska |
18:52
🔗
|
|
Flashfire has joined #archiveteam-ot |
18:52
🔗
|
|
Fusl sets mode: +o kiska |
18:52
🔗
|
|
Fusl_ sets mode: +o kiska |
19:03
🔗
|
|
paul2520 has quit IRC (Read error: Operation timed out) |
19:06
🔗
|
|
paul2520 has joined #archiveteam-ot |
19:22
🔗
|
|
ShellyRol has quit IRC (Read error: Operation timed out) |
19:35
🔗
|
|
ShellyRol has joined #archiveteam-ot |
19:37
🔗
|
|
bluefoo has quit IRC (Read error: Operation timed out) |
19:48
🔗
|
|
sep332 has joined #archiveteam-ot |
20:47
🔗
|
|
BlueMax has joined #archiveteam-ot |
20:52
🔗
|
tuluu |
What are the requirements to request a crawl using #archivebot? |
20:57
🔗
|
Igloo |
Just go into the channel, and read the docs :) |
21:02
🔗
|
tuluu |
Igloo: thanks |
21:04
🔗
|
tuluu |
I already read the docs, but it doesn't say anything about requirements, e.g.: the site is in danger |
21:05
🔗
|
Igloo |
Ah, Basically |
21:05
🔗
|
Igloo |
In danger - Yes |
21:05
🔗
|
Igloo |
Nothing in way back machine - yes |
21:05
🔗
|
Igloo |
Going to be changed - yes |
21:06
🔗
|
Igloo |
A picture of a cat, for the 10,000th time? |
21:06
🔗
|
Igloo |
Not so much. |
21:06
🔗
|
kpcyrd |
what if the cat is really cute |
21:06
🔗
|
Igloo |
Still no. |
21:06
🔗
|
Igloo |
If we get to 500 copies, then sure :p |
21:07
🔗
|
tuluu |
ok, I see |
21:08
🔗
|
|
bluefoo has joined #archiveteam-ot |
21:12
🔗
|
tuluu |
My main plan is to archive https://bandliste.de/, because for many older and small local bands it's the only resource in the web. And many pages are not present in way back machine yet. I thought I can do it on my own with grab-site. But I was told that I can't include it on my own. |
21:14
🔗
|
tuluu |
Not sure, if it is to large for #archivebot. 18000 bands are listed. |
21:14
🔗
|
markedL |
how long did it take to run when you did it |
21:15
🔗
|
tuluu |
I haven't done it yet. I tested grab-site only with smaller sites till now. |
21:16
🔗
|
JAA |
That size is fine for ArchiveBot. |
21:16
🔗
|
JAA |
Depending on how the site works, it might need some ignores, but otherwise, it won't be an issue. |
21:16
🔗
|
JAA |
If you check the AB dashboard, you'll notice that we have many jobs that have retrieved millions of URLs. |
21:18
🔗
|
tuluu |
ok, then I will request it :) |
21:20
🔗
|
tuluu |
JAA: thanks a lot |
21:23
🔗
|
markedL |
you can watch your job on the dashboard |
21:25
🔗
|
tuluu |
markedL: yes, I'm doing it already :) |
21:41
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 612 seconds) |
21:45
🔗
|
|
Mateon1 has joined #archiveteam-ot |
21:50
🔗
|
tuluu |
JAA: Are buttons being clicked on the website during a crawl? Because it seems that you can open a textbox to report dead content on bandliste.de. Does this specific urls should be added to the ignor list? E.g.: https://bandliste.de/DeadContent/band/18707 |
21:52
🔗
|
JAA |
tuluu: It depends strongly on how those "buttons" work. Are they links styled to look like buttons? Are they actual buttons, and if so, is the full URL in the form target or not? Is JavaScript involved? Etc. |
21:52
🔗
|
JAA |
So there's no general reply to that. |
21:53
🔗
|
JAA |
Where does that button/textbox appear? |
21:54
🔗
|
markedL |
yeah, this site has some normal links to actions pages |
21:55
🔗
|
markedL |
https://bandliste.de/Bands/Astronuts/18702/edit.html |
21:56
🔗
|
markedL |
maybe that's not quite a submit, but it makes me nervously close |
21:56
🔗
|
JAA |
Also, that job will blow up due to the /extern/ redirects. That means we'll grab a bunch of stuff from the bands' websites, which I guess is actually a good thing in this case. But the job will take longer. |
21:56
🔗
|
kiska |
Its got the "I am not a robot" check anyway |
21:57
🔗
|
JAA |
Yeah, I saw the edit.html page, and that won't be problematic. |
21:57
🔗
|
tuluu |
This one has no check: https://bandliste.de/DeadContent/band/18707 |
21:57
🔗
|
JAA |
Actually, it might be easier to parse than the HTML page, so I'm inclined to leave it in in case someone wants to rebuild the database at some point after the site dies. |
21:58
🔗
|
JAA |
Yeah, that won't be a problem, but it's not necessary to grab those pages since they have no meaningful content. |
21:58
🔗
|
JAA |
Where does the DeadContent link appear? |
22:00
🔗
|
tuluu |
For example, if you go to a band like https://bandliste.de/Bands/Skatacombo/18706/ . Then there is a small skull symbol on the top right. |
22:00
🔗
|
JAA |
Ah, sneaky. |
22:24
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
22:39
🔗
|
|
Sanky has joined #archiveteam-ot |
22:39
🔗
|
|
Sanqui has quit IRC (Read error: Connection reset by peer) |
22:43
🔗
|
|
godane has joined #archiveteam-ot |