[00:37] *** BlueMax has joined #archiveteam-ot [00:59] *** Mateon1 has quit IRC (Ping timeout: 265 seconds) [00:59] *** Mateon1 has joined #archiveteam-ot [02:34] *** exoire has quit IRC (Read error: Operation timed out) [02:34] *** Hani111 has joined #archiveteam-ot [02:37] *** Hani has quit IRC (Ping timeout: 252 seconds) [02:37] *** Hani111 is now known as Hani [02:42] *** Despatche has joined #archiveteam-ot [03:08] *** wp494 has quit IRC (Ping timeout: 268 seconds) [03:08] *** wp494 has joined #archiveteam-ot [03:41] *** Despatche has quit IRC (Read error: Operation timed out) [03:45] *** Despatche has joined #archiveteam-ot [03:46] *** Despatche has quit IRC (Remote host closed the connection) [03:47] *** Despatche has joined #archiveteam-ot [04:21] *** Despatche has quit IRC (Read error: Operation timed out) [04:36] *** Despatche has joined #archiveteam-ot [04:38] *** Despatche has quit IRC (Client Quit) [04:38] *** Despatche has joined #archiveteam-ot [04:55] *** odemg has quit IRC (Ping timeout: 265 seconds) [05:07] *** odemg has joined #archiveteam-ot [05:58] *** bithippo has quit IRC (Textual IRC Client: www.textualapp.com) [07:32] *** Despatche has quit IRC (Quit: Connection reset by deer) [07:47] *** Stiletto has joined #archiveteam-ot [07:47] *** Stiletto has quit IRC (Read error: Connection reset by peer) [07:49] *** Stiletto has joined #archiveteam-ot [09:30] *** BlueMax has quit IRC (Read error: Connection reset by peer) [10:49] *** odemg has quit IRC (Ping timeout: 265 seconds) [11:01] *** odemg has joined #archiveteam-ot [11:25] ouch http://xor.meo.ws/83f02d67/d1b8/4c3b/84d8/cd0202d66cbb.png [12:08] *** wp494 has quit IRC (Ping timeout: 252 seconds) [12:09] *** wp494 has joined #archiveteam-ot [12:54] *** VerfiedJ has joined #archiveteam-ot [13:40] *** twoTBHetz has joined #archiveteam-ot [13:41] Hey i may write my own blog/cms/forum engine. What makes a website great/fun to archive? [13:51] *** benjinsmi has joined #archiveteam-ot [13:55] *** benjins has quit IRC (Read error: Operation timed out) [14:03] twoTBHetz: no javascript, cool & permanent urls with a minimum of query parameters, no redundant pages [14:04] Are Cookies or in URL auth prefered? [14:19] cookies [14:40] don't include javascript that breaks when browsed on another domain e.g. web.archive.org [14:41] we generally archive without javascript execution but then wayback goes ahead and serves the javascript which can do bad things sometimes [14:43] don't have per-comment/per-thread-post URLs, use #anchors for those [15:15] *** vitzli has joined #archiveteam-ot [15:15] id-based post redirects :) [15:30] *** vitzli has quit IRC (Quit: Leaving) [15:35] psi, what do you mean by that? [15:36] are up-to-date sitemaps useful? [15:36] If you have "normal" URLs like http://example.com/post/five-ways-to-eat-a-carrot also have a redirect with the post ID like http://example.com/post/1 [15:36] ah, so easy enumeration? [15:37] Because that's super easy to enumerate - just assign a list of numbers to a warrior and have at it [15:40] I thought i might make the website "self-archiving" providing storing versions of changed content [15:42] this would make archiving it a pain though [16:24] *** benjins has joined #archiveteam-ot [16:29] *** benjinsmi has quit IRC (Read error: Operation timed out) [17:16] *** twoTBHetz has quit IRC (Quit: Leaving) [17:34] *** systwi has quit IRC (Ping timeout: 600 seconds) [18:23] *** schbirid has joined #archiveteam-ot [18:48] *** exoire has joined #archiveteam-ot [18:58] any idea whats wrong with rtwn? [19:20] twoTBHetz (if you read logs), in addition to the others: Proper status codes; for example, all errors should be 4xx or 5xx and use the most explicit status code available, e.g. 429 if the crawler's exceeding some rate limit rather than generic 403. Everything that modifies server state should be a POST request (e.g. "like this post"). Sitemaps. An unrestrictive robots.txt (probably not really in scope [19:20] of developing such an engine). Use canonical URLs everywhere; for example, if there's "next post"-type links, that should be a clean link to the next post, not some "?action=nextpost" crap. [19:21] ivan_: The main argument against that I think is that links using only anchors break when posts are deleted or moved. Could be handled with JS probably, but ugh. [20:24] *** m007a83_ has joined #archiveteam-ot [20:28] *** m007a83 has quit IRC (Read error: Operation timed out) [20:32] *** Despatche has joined #archiveteam-ot [20:59] *** exoire has quit IRC (Read error: Operation timed out) [21:12] *** wp494 has quit IRC (Read error: Operation timed out) [21:13] *** wp494 has joined #archiveteam-ot [21:17] *** BlueMax has joined #archiveteam-ot [21:39] *** kode54 has quit IRC (Quit: ZNC 1.7.1 - https://znc.in) [21:47] *** VerfiedJ has quit IRC (Ping timeout: 252 seconds) [21:48] *** kode54 has joined #archiveteam-ot [21:50] JAA: leave them in the count for the pagination :-) [21:55] ivan_: "Help, this thread page doesn't show 25 posts!!!1!!" [21:56] :-) [21:56] I guess one solution would be to leave the posts in the thread and just replace the contents with a "this post has been deleted" message. I've seen that somewhere before; might've been UOL. [21:57] However, some forum softwares also let you move posts between threads, e.g. merge threads or move off-topic stuff to a separate thread. That might make it a bit trickier. [22:02] *** VerfiedJ has joined #archiveteam-ot [22:09] *** Despatche has quit IRC (Remote host closed the connection) [22:09] *** Despatche has joined #archiveteam-ot [22:21] *** exoire has joined #archiveteam-ot [22:22] *** schbirid has quit IRC (Remote host closed the connection) [22:23] relevant: https://www.hezmatt.org/~mpalmer/blog/2018/12/12/falsehoods-programmers-believe-about-pagination.html [22:25] :-) [22:32] anarcat: There are some things missing there [22:33] * A single item will not appear on multiple pages [22:33] * A single item will not appear several times on the same page [23:04] *** Mateon1 has quit IRC (Ping timeout: 600 seconds) [23:08] *** Mateon1 has joined #archiveteam-ot [23:09] *** VerfiedJ has quit IRC (Read error: Connection reset by peer) [23:09] *** VerfiedJ has joined #archiveteam-ot [23:17] *** Despatche has quit IRC (Read error: Connection reset by peer) [23:20] *** Despatche has joined #archiveteam-ot