#archiveteam-ot 2019-09-12,Thu

↑back Search

Time Nickname Message
00:17 🔗 yawkat has quit IRC (Ping timeout: 496 seconds)
00:29 🔗 BlueMax has joined #archiveteam-ot
00:36 🔗 yawkat has joined #archiveteam-ot
00:50 🔗 DigiDigi has joined #archiveteam-ot
01:13 🔗 jrwr has quit IRC (Ping timeout: 264 seconds)
01:13 🔗 slyphic has quit IRC (Read error: Operation timed out)
01:13 🔗 erin has quit IRC (Read error: Operation timed out)
01:13 🔗 Frogging has quit IRC (Read error: Operation timed out)
01:13 🔗 nyany has quit IRC (Read error: Operation timed out)
01:13 🔗 astrid has quit IRC (Read error: Operation timed out)
01:13 🔗 Igloo has quit IRC (Read error: Operation timed out)
01:13 🔗 superkuh has quit IRC (Read error: Operation timed out)
01:13 🔗 nightpool has quit IRC (Write error: Broken pipe)
01:13 🔗 Frogging has joined #archiveteam-ot
01:13 🔗 nightpoo- has joined #archiveteam-ot
01:14 🔗 chfoo has quit IRC (Read error: Operation timed out)
01:15 🔗 jognsmith has quit IRC (Ping timeout: 264 seconds)
01:15 🔗 VADemon has joined #archiveteam-ot
01:15 🔗 arkiver has quit IRC (Read error: Operation timed out)
01:15 🔗 chfoo has joined #archiveteam-ot
01:15 🔗 terry1 has quit IRC (Read error: Operation timed out)
01:15 🔗 Fusl__ sets mode: +o chfoo
01:15 🔗 Fusl sets mode: +o chfoo
01:15 🔗 Fusl_ sets mode: +o chfoo
01:16 🔗 VADemon_ has quit IRC (Read error: Operation timed out)
01:17 🔗 arkiver has joined #archiveteam-ot
01:17 🔗 Fusl__ sets mode: +o arkiver
01:17 🔗 Fusl sets mode: +o arkiver
01:17 🔗 Fusl_ sets mode: +o arkiver
01:17 🔗 phirephly has quit IRC (Ping timeout: 360 seconds)
01:18 🔗 schbirid has quit IRC (Read error: Operation timed out)
01:18 🔗 MrRadar_ has quit IRC (Read error: Operation timed out)
01:18 🔗 chirlu has quit IRC (Read error: Operation timed out)
01:21 🔗 chirlu has joined #archiveteam-ot
01:23 🔗 phirephly has joined #archiveteam-ot
01:24 🔗 DigiDigi has quit IRC (Read error: Operation timed out)
01:28 🔗 superkuh has joined #archiveteam-ot
01:28 🔗 schbirid has joined #archiveteam-ot
01:53 🔗 terry1 has joined #archiveteam-ot
02:04 🔗 nyany has joined #archiveteam-ot
02:04 🔗 Igloo has joined #archiveteam-ot
02:04 🔗 jrwr has joined #archiveteam-ot
02:04 🔗 Fusl sets mode: +o jrwr
02:04 🔗 Fusl__ sets mode: +o jrwr
02:04 🔗 Fusl_ sets mode: +o jrwr
02:04 🔗 DigiDigi has joined #archiveteam-ot
02:04 🔗 slyphic has joined #archiveteam-ot
02:05 🔗 svchfoo3 sets mode: +o Igloo
02:05 🔗 svchfoo1 sets mode: +o Igloo
02:08 🔗 MrRadar has joined #archiveteam-ot
02:13 🔗 erin has joined #archiveteam-ot
02:21 🔗 astrid has joined #archiveteam-ot
02:21 🔗 Fusl sets mode: +o astrid
02:21 🔗 Fusl__ sets mode: +o astrid
02:21 🔗 Fusl_ sets mode: +o astrid
03:27 🔗 qw3rty has joined #archiveteam-ot
03:34 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
03:44 🔗 odemg has quit IRC (Read error: Operation timed out)
03:58 🔗 odemg has joined #archiveteam-ot
05:00 🔗 katocala has quit IRC (Read error: Connection reset by peer)
05:01 🔗 katocala has joined #archiveteam-ot
05:09 🔗 Raccoon has quit IRC (Remote host closed the connection)
05:22 🔗 DigiDigi has quit IRC (Remote host closed the connection)
05:34 🔗 katocala has quit IRC (Read error: Operation timed out)
06:14 🔗 kiska has quit IRC (Quit: Ping timeout (120 seconds))
06:14 🔗 kiska has joined #archiveteam-ot
06:15 🔗 Fusl__ sets mode: +o kiska
06:15 🔗 Fusl sets mode: +o kiska
06:15 🔗 Fusl_ sets mode: +o kiska
06:31 🔗 Raccoon has joined #archiveteam-ot
08:55 🔗 Dragnog2 has quit IRC (Quit: Connection closed for inactivity)
09:12 🔗 yawkat has quit IRC (Ping timeout: 604 seconds)
09:35 🔗 yawkat has joined #archiveteam-ot
09:48 🔗 ShellyRol has quit IRC (Read error: Connection reset by peer)
09:48 🔗 ShellyRol has joined #archiveteam-ot
10:03 🔗 Raccoon has quit IRC (Ping timeout: 258 seconds)
10:05 🔗 Raccoon has joined #archiveteam-ot
11:27 🔗 katocala has joined #archiveteam-ot
12:10 🔗 Dragnog2 has joined #archiveteam-ot
12:16 🔗 ShellyRol has quit IRC (Ping timeout: 745 seconds)
12:16 🔗 ShellyRol has joined #archiveteam-ot
12:55 🔗 BlueMax has quit IRC (Quit: Leaving)
13:03 🔗 DigiDigi has joined #archiveteam-ot
14:20 🔗 Dragnog2 has quit IRC (Quit: Connection closed for inactivity)
15:01 🔗 DogsRNice has joined #archiveteam-ot
16:23 🔗 Hani111 has joined #archiveteam-ot
16:28 🔗 Laverne has quit IRC (Quit: ZNC 1.7.1+deb1+bionic1 - https://znc.in)
16:32 🔗 Laverne has joined #archiveteam-ot
16:34 🔗 Hani has quit IRC (Ping timeout: 745 seconds)
16:34 🔗 Hani111 is now known as Hani
17:24 🔗 Dragnog2 has joined #archiveteam-ot
18:11 🔗 godane has quit IRC (Read error: Operation timed out)
18:18 🔗 kiska1 has quit IRC (Read error: Operation timed out)
18:22 🔗 kiska1 has joined #archiveteam-ot
18:22 🔗 Fusl__ sets mode: +o kiska1
18:22 🔗 Fusl sets mode: +o kiska1
18:22 🔗 Fusl_ sets mode: +o kiska1
18:24 🔗 kiska1 has quit IRC (Read error: Operation timed out)
18:39 🔗 kiska1 has joined #archiveteam-ot
18:39 🔗 Fusl__ sets mode: +o kiska1
18:39 🔗 Fusl sets mode: +o kiska1
18:39 🔗 Fusl_ sets mode: +o kiska1
18:40 🔗 m007a83 has quit IRC (Read error: Connection reset by peer)
18:52 🔗 kiska has quit IRC (Remote host closed the connection)
18:52 🔗 kiska has joined #archiveteam-ot
18:52 🔗 Fusl__ sets mode: +o kiska
18:52 🔗 Flashfire has joined #archiveteam-ot
18:52 🔗 Fusl sets mode: +o kiska
18:52 🔗 Fusl_ sets mode: +o kiska
19:03 🔗 paul2520 has quit IRC (Read error: Operation timed out)
19:06 🔗 paul2520 has joined #archiveteam-ot
19:22 🔗 ShellyRol has quit IRC (Read error: Operation timed out)
19:35 🔗 ShellyRol has joined #archiveteam-ot
19:37 🔗 bluefoo has quit IRC (Read error: Operation timed out)
19:48 🔗 sep332 has joined #archiveteam-ot
20:47 🔗 BlueMax has joined #archiveteam-ot
20:52 🔗 tuluu What are the requirements to request a crawl using #archivebot?
20:57 🔗 Igloo Just go into the channel, and read the docs :)
21:02 🔗 tuluu Igloo: thanks
21:04 🔗 tuluu I already read the docs, but it doesn't say anything about requirements, e.g.: the site is in danger
21:05 🔗 Igloo Ah, Basically
21:05 🔗 Igloo In danger - Yes
21:05 🔗 Igloo Nothing in way back machine - yes
21:05 🔗 Igloo Going to be changed - yes
21:06 🔗 Igloo A picture of a cat, for the 10,000th time?
21:06 🔗 Igloo Not so much.
21:06 🔗 kpcyrd what if the cat is really cute
21:06 🔗 Igloo Still no.
21:06 🔗 Igloo If we get to 500 copies, then sure :p
21:07 🔗 tuluu ok, I see
21:08 🔗 bluefoo has joined #archiveteam-ot
21:12 🔗 tuluu My main plan is to archive https://bandliste.de/, because for many older and small local bands it's the only resource in the web. And many pages are not present in way back machine yet. I thought I can do it on my own with grab-site. But I was told that I can't include it on my own.
21:14 🔗 tuluu Not sure, if it is to large for #archivebot. 18000 bands are listed.
21:14 🔗 markedL how long did it take to run when you did it
21:15 🔗 tuluu I haven't done it yet. I tested grab-site only with smaller sites till now.
21:16 🔗 JAA That size is fine for ArchiveBot.
21:16 🔗 JAA Depending on how the site works, it might need some ignores, but otherwise, it won't be an issue.
21:16 🔗 JAA If you check the AB dashboard, you'll notice that we have many jobs that have retrieved millions of URLs.
21:18 🔗 tuluu ok, then I will request it :)
21:20 🔗 tuluu JAA: thanks a lot
21:23 🔗 markedL you can watch your job on the dashboard
21:25 🔗 tuluu markedL: yes, I'm doing it already :)
21:41 🔗 Mateon1 has quit IRC (Ping timeout: 612 seconds)
21:45 🔗 Mateon1 has joined #archiveteam-ot
21:50 🔗 tuluu JAA: Are buttons being clicked on the website during a crawl? Because it seems that you can open a textbox to report dead content on bandliste.de. Does this specific urls should be added to the ignor list? E.g.: https://bandliste.de/DeadContent/band/18707
21:52 🔗 JAA tuluu: It depends strongly on how those "buttons" work. Are they links styled to look like buttons? Are they actual buttons, and if so, is the full URL in the form target or not? Is JavaScript involved? Etc.
21:52 🔗 JAA So there's no general reply to that.
21:53 🔗 JAA Where does that button/textbox appear?
21:54 🔗 markedL yeah, this site has some normal links to actions pages
21:55 🔗 markedL https://bandliste.de/Bands/Astronuts/18702/edit.html
21:56 🔗 markedL maybe that's not quite a submit, but it makes me nervously close
21:56 🔗 JAA Also, that job will blow up due to the /extern/ redirects. That means we'll grab a bunch of stuff from the bands' websites, which I guess is actually a good thing in this case. But the job will take longer.
21:56 🔗 kiska Its got the "I am not a robot" check anyway
21:57 🔗 JAA Yeah, I saw the edit.html page, and that won't be problematic.
21:57 🔗 tuluu This one has no check: https://bandliste.de/DeadContent/band/18707
21:57 🔗 JAA Actually, it might be easier to parse than the HTML page, so I'm inclined to leave it in in case someone wants to rebuild the database at some point after the site dies.
21:58 🔗 JAA Yeah, that won't be a problem, but it's not necessary to grab those pages since they have no meaningful content.
21:58 🔗 JAA Where does the DeadContent link appear?
22:00 🔗 tuluu For example, if you go to a band like https://bandliste.de/Bands/Skatacombo/18706/ . Then there is a small skull symbol on the top right.
22:00 🔗 JAA Ah, sneaky.
22:24 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
22:39 🔗 Sanky has joined #archiveteam-ot
22:39 🔗 Sanqui has quit IRC (Read error: Connection reset by peer)
22:43 🔗 godane has joined #archiveteam-ot

irclogger-viewer