#archiveteam-bs 2020-07-20,Mon

↑back Search

Time Nickname Message
00:17 πŸ”— BlueMax has joined #archiveteam-bs
00:31 πŸ”— Docti has joined #archiveteam-bs
00:33 πŸ”— Docti Hello, here I am for further discusion
00:33 πŸ”— JAA Docti: Is there a list of the forums that will be deleted? They only listed the ones that will live on, if I understand that correctly.
00:35 πŸ”— Docti You are right, they only listed what will not be deleted
00:36 πŸ”— Docti Here is the list of all the categories. If they are not in the list of what will be kept, they will be deleted https://forum.doctissimo.fr/doctissimo/liste_categorie.htm
00:37 πŸ”— Docti But, since they might have changed the name of what will be kept (because they might rename it or put it in new categories), perhaps it would be easier to save everything ?
00:37 πŸ”— JAA Wow, 320 million messages in total (across all forums, not just the sexuality ones).
00:38 πŸ”— JAA I can try, but chances are their servers won't be able to handle that much in 4 days.
00:40 πŸ”— Docti Thank you
00:40 πŸ”— Docti So far I believe only the Sexuality forums are in danger, because they are not "socially acceptable"
00:40 πŸ”— Docti I believe the other categories about family, daily life, medicine, etc, are safe
00:40 πŸ”— JAA They're really going out of their way to make it hard to access the content. Some threads are not linked directly but their URLs are hidden in "cryptlinks" (which are just base64-encoded).
00:41 πŸ”— JAA Yeah, I will target only the sexuality forums.
00:42 πŸ”— Docti Perfect :) I have to admit I have very little knowledge in that domain of computer science. I am sure you will do your best
00:43 πŸ”— JAA There are still 54 million posts just in these forums. This will be a challenge.
00:43 πŸ”— JAA (For their servers)
00:48 πŸ”— Docti This is much more than what I thought ! I hope the servers will be able to handle it
01:05 πŸ”— JAA Docti: I can't find "Troubles de l’érection" and "Ejaculation prΓ©maturΓ©e" on https://forum.doctissimo.fr/doctissimo/liste_categorie.htm. They're listed as remaining in the announcement.
01:12 πŸ”— Docti Indeed, I cannot find them either. Perhaps they have already been moved to their new categories ?
01:13 πŸ”— Docti By the way, forget the part "Exhibition et voyeurs" or put it with the lowest priority - lots of post and topics of very low quality
01:14 πŸ”— JAA Well, it seems that the ones that were already moved are still linked. E.g. "Contraception" is linked to the SantΓ© part of the forums, not in SexualitΓ© anymore.
01:15 πŸ”— JAA Only some of them, apparently. http://forum.doctissimo.fr/doctissimo/erection/liste_sujet-1.htm exists and redirects to https://forum.doctissimo.fr/sante/Troubles-de-l-erection/liste_sujet-1.htm
01:16 πŸ”— JAA Yeah, the other one was moved as well: https://forum.doctissimo.fr/sante/Ejaculation-prematuree-ou-precoce/liste_sujet-1.htm
01:20 πŸ”— Docti Indeed, nice find, they have already been moved, but not the others. Just so you know: IST = Infections sexuellement transmissibles
01:23 πŸ”— JAA Yes, some have been moved and are linking to the new location, some have been moved and are not linked anymore, and some have not been moved yet.
01:24 πŸ”— JAA Yep, figured that one out, thanks! :-)
01:26 πŸ”— mtntmnky has quit IRC (Remote host closed the connection)
01:27 πŸ”— mtntmnky has joined #archiveteam-bs
01:27 πŸ”— JAA There's also one which is in a different section but not listed in the announcement ("Techniques de sΓ©duction").
01:28 πŸ”— JAA It appears that it wasn't moved recently but was already in the psychology section before.
01:29 πŸ”— JAA Some of the ones that will stay were also already elsewhere.
01:31 πŸ”— Docti Great, they will not be deleted, so this is something you will not have to save
01:33 πŸ”— JAA Oh fantastic, they didn't even set up redirects for the former locations of the moved categories.
01:45 πŸ”— Docti That's nice :) If you don't mind, I will have to leave, it is a bit too late for me. I believe you can contact me on my page on the archiveteam site in the meantime. Thank a lot you for you work !
01:46 πŸ”— JAA Yeah, I should leave as well, but I'll see if I can get it running before.
01:46 πŸ”— JAA Good night!
01:47 πŸ”— Docti Good night
01:53 πŸ”— Docti has quit IRC (Ping timeout: 252 seconds)
01:59 πŸ”— Nikchemny has joined #archiveteam-bs
02:00 πŸ”— Nikchemny JAA: Hello, are there news about archive.st?
02:00 πŸ”— Nikchemny Btw, do AT use Alexa toolbar?
02:01 πŸ”— JAA Nikchemny: No, I've been busy archiving things that disappear in the next few days.
02:02 πŸ”— Nikchemny Which for example?
02:04 πŸ”— JAA Bitcoin Forum, Kongregate forums, now Doctissimo forums.
02:04 πŸ”— Nikchemny Hm, that's interesting. Are they on AB?
02:05 πŸ”— JAA No
02:05 πŸ”— JAA That's why I've been busy and still am.
02:21 πŸ”— PovAddict bitcoin forum really?
02:29 πŸ”— JAA Forum goes down, forum gets saved.
02:32 πŸ”— Nikchemny I think everyone must download https://www.alexa.com/toolbar to thank Brewster Kahle.
02:33 πŸ”— JAA No thanks, especially since it hasn't been owned by Brewster in ... 20 years?
02:34 πŸ”— Nikchemny Yes, I know, but he created it...
02:34 πŸ”— JAA So? I'd rather thank him for creating IA by donating there.
02:35 πŸ”— JAA Than install adware/spyware/whatever the hell that toolbar probably is.
02:36 πŸ”— Nikchemny Do you think that Amazon is watching for the Alexa toolbar users?
02:36 πŸ”— Nikchemny Btw, I realized that fart.website is 17,000 in Russia https://www.alexa.com/siteinfo/fart.website
02:36 πŸ”— JAA You bet they're using it to determine your interests and serve the corresponding Amazon ads.
02:37 πŸ”— JAA In any case, this is very off-topic. -> #archiveteam-ot
02:37 πŸ”— Nikchemny Ok
02:37 πŸ”— JAA I think my Doctissimo script is ready, just running some tests.
02:41 πŸ”— JAA Yep, seems fine. Let's set some servers on fire.
02:46 πŸ”— JAA There are 13857 pages listing 50 topics each across all forums I intend to grab now (everything in the SexualitΓ© section). So that's 679k topics. Average post count per topic is about 80, so that's "only" around 1.4 million requests in total. Not as bad as expected.
02:47 πŸ”— JAA Er, typo, 693k topics
02:47 πŸ”— Nikchemny JAA: Btw, what's with Telegram? Is it saves good or some content is missing (Mhm, videos and audios can't be saved)?
02:49 πŸ”— JAA Nikchemny: Never looked into it in detail. Can you check some of the accounts we've archived with ArchiveBot in the WBM?
02:52 πŸ”— Nikchemny Eghrm. You saved it like t.me/something? Em, it can't be looked good. If the start link would be t.me/s/something then There would be the last 20 posts and I can tell which number of post is the last. Hm, JAA, I'll look for COVID-channels, I mean for their pics
02:55 πŸ”— Nikchemny JAA: Does it save posts like t.me/s/something/number or t.me/something/number?
02:56 πŸ”— JAA Nikchemny: /s/channel/postid are the ones of interest.
02:56 πŸ”— JAA The other thing is only a "open in Telegram" page that doesn't show anything.
02:56 πŸ”— Nikchemny JAA: Looks like that and looks like crap http://web.archive.org/web/20200630152730/https://t.me/MINSAPCuba/73
02:56 πŸ”— JAA That's not /s/.
02:56 πŸ”— Nikchemny In AB collection, btw
02:56 πŸ”— JAA https://web.archive.org/web/20200630154215/https://t.me/s/MINSAPCuba/73
02:57 πŸ”— Nikchemny Yes, sorry
02:58 πŸ”— Nikchemny JAA: Btw, it shows 10 previous and 10 next posts. So AB saves the same post 20 times
02:58 πŸ”— JAA Yep
02:59 πŸ”— JAA No way to avoid that.
02:59 πŸ”— JAA Looks like we're not saving these attachments(?): https://web.archive.org/web/20200630160216/https://t.me/s/MINSAPCuba/44
03:00 πŸ”— Nikchemny Yes, of course. Only Telegram users can do that
03:00 πŸ”— JAA Doctissimo archival started. Retrieving 1 GB per minute, so I might run out of disk space quickly.
03:01 πŸ”— JAA But it handles ~40 req/s just fine so far.
03:02 πŸ”— Nikchemny Looks like Telegram is not too friendly for not registered people. Only pics and text. No files, no audios, no videos
03:03 πŸ”— Nikchemny JAA: attachments can be saved only in app (or in web version, but I've never used it)
03:04 πŸ”— JAA Nikchemny: Maybe there is a way to access them, but it's just hidden very well.
03:05 πŸ”— Nikchemny Β―\_(ツ)_/Β―
03:06 πŸ”— JAA So disk space shouldn't be too much of an issue. Retrieving 1 GB per minute, but it compresses down to 120 MiB or so. :-)
03:09 πŸ”— JAA Just to be clear: I'm only fetching the category and topic pages, nothing else.
03:10 πŸ”— Nikchemny JAA: So, what does AT think about Telegram chats? Maybe just make bot that could join important chats and then just scroll them as long as possible? I know it can't be in WBM and (as I saw once) there is # in chat's link, so WBM can't show it
03:10 πŸ”— Nikchemny *join in web version
03:12 πŸ”— Nikchemny There is https://t.me/lurkisdead , chat of lurkmore.to
03:13 πŸ”— PovAddict could use the telegram API as an actual registered user, but I'm not sure if Telegram servers would get angry and rate-limit (or ban), so I'd be reluctant to try with my regular account / phone number
03:14 πŸ”— JAA I bet there's something in the terms about it, and they'd ban quickly when discovered or reported.
03:14 πŸ”— PovAddict I don't know if bot accounts can read past chatgroup history
03:15 πŸ”— Nikchemny Em, why? Do you think they can recognize when just person try to scroll chat and when bot scroll and saves it?
03:16 πŸ”— Nikchemny Nope, I don't mean Telegram bots. Just use clear number, register with it and save as more as possible chat?
03:19 πŸ”— JAA Doctissimo performance improved over the past 10 minutes. Maybe they're using autoscaling. I'm now at 64 req/s, and response times reduced by a third.
03:19 πŸ”— Nikchemny JAA PovAddict: Is this a crazy idea?
03:20 πŸ”— JAA No, but it will probably break quickly.
03:22 πŸ”— Nikchemny Well, for the first try, I think, https://t.me/lurkisdead is good.
03:25 πŸ”— Nikchemny JAA: Btw, anyone can download all his chats if he/she uses desktop app
03:26 πŸ”— Nikchemny Maybe with attachments
03:47 πŸ”— Raccoon has joined #archiveteam-bs
03:58 πŸ”— qw3rty_ has quit IRC (Ping timeout: 622 seconds)
04:13 πŸ”— Nikchemny .
04:21 πŸ”— Nikchemny has quit IRC (https://mibbit.com Online IRC Client)
05:10 πŸ”— wyatt8740 has quit IRC (Ping timeout: 260 seconds)
05:12 πŸ”— wyatt8740 has joined #archiveteam-bs
06:15 πŸ”— britmob has quit IRC (Read error: Connection reset by peer)
06:30 πŸ”— qw3rty has joined #archiveteam-bs
06:41 πŸ”— britmob has joined #archiveteam-bs
07:32 πŸ”— jshoard has joined #archiveteam-bs
07:33 πŸ”— PovAddict has quit IRC (Read error: Operation timed out)
09:04 πŸ”— HP_Archiv has joined #archiveteam-bs
10:11 πŸ”— lennier1 has quit IRC (Read error: Connection reset by peer)
10:13 πŸ”— lennier1 has joined #archiveteam-bs
10:54 πŸ”— JAA Doctissimo is nearly done, just a few huge topics remaining.
10:54 πŸ”— JAA The SexualitΓ© section of Doctissimo*
10:56 πŸ”— VoynichCr https://github.blog/2020-07-16-github-archive-program-the-journey-of-the-worlds-open-source-code-to-the-arctic/
10:57 πŸ”— VoynichCr "First, their well-known Wayback Machine is accessing and archiving raw GitHub data as WARCs, or Web ARChive files. As of this writing they have archived some 55TB of data. Second, they have the goal of making entire archived GitHub repositories available via β€œgit clone,” while also keeping repo comments, issues, and other metadata easily accessible on the web. This second initiative is well underway and initial archiving is expected
11:16 πŸ”— JAA *record scratch* Yep, that's us, and you probably wonder how we got into this situation.
11:16 πŸ”— JAA #gitgud on hackint
15:04 πŸ”— BlueMax has quit IRC (Read error: Connection reset by peer)
15:10 πŸ”— fredgido_ has quit IRC (Read error: Connection reset by peer)
15:19 πŸ”— Arcorann has quit IRC (Read error: Connection reset by peer)
15:31 πŸ”— systwi_ has joined #archiveteam-bs
15:36 πŸ”— systwi has quit IRC (Read error: Operation timed out)
16:00 πŸ”— kiska has quit IRC (Remote host closed the connection)
16:01 πŸ”— kiska has joined #archiveteam-bs
16:05 πŸ”— britmob has quit IRC (Read error: Operation timed out)
16:43 πŸ”— schbirid has joined #archiveteam-bs
16:45 πŸ”— britmob has joined #archiveteam-bs
17:04 πŸ”— VerifiedJ has joined #archiveteam-bs
17:17 πŸ”— Nikchemny has joined #archiveteam-bs
17:18 πŸ”— Nikchemny JAA: There is https://docplayer.ru/ and there is example of the document: https://docplayer.ru/34254329-A-a-pleshakov-s-a-pleshakov-enciklopediya-puteshestviy-strany-mira-kniga-dlya-uchashchihsya-nachalnyh-klassov.html
17:18 πŸ”— Nikchemny Can AB save PDF or not?
17:21 πŸ”— Nikchemny Ah, here is link: https://docplayer.ru/storage/54/34254329/1595269256/uooJxGU_y6Zjzfubsxfreg/34254329.pdf . It can be reached only with Google-captcha
17:32 πŸ”— Nikchemny JAA: Btw, I made https://archive.st/archive/2020/7/www.mk.ru/wae8/ , copy of https://www.mk.ru/politics/2020/07/20/kadyrov-otvetil-ssha-na-sankcii-fotografiey-s-avtomatami.html . Image ( https://archive.st/archive/2020/7/www.mk.ru/wae8/July202020508pm-58i1956pyt8zflwu0haadkhn4r7a9ecm.jpg ) looks similar, but archived copy is not
17:32 πŸ”— Nikchemny https://archive.st/archive/2020/7/www.mk.ru/wae8/www.mk.ru/politics/2020/07/20/kadyrov-otvetil-ssha-na-sankcii-fotografiey-s-avtomatami.html
17:32 πŸ”— PovAddict has joined #archiveteam-bs
17:33 πŸ”— DogsRNice has joined #archiveteam-bs
17:33 πŸ”— Nikchemny PovAddict: Which Telegram chat are you gonna to save?
17:48 πŸ”— PovAddict I have no time for any of this
18:05 πŸ”— prq has joined #archiveteam-bs
18:23 πŸ”— Nikchemny has quit IRC (Quit: https://mibbit.com Online IRC Client)
18:30 πŸ”— icedice has joined #archiveteam-bs
18:39 πŸ”— SketchCow -- archive team slogan
19:54 πŸ”— lennier2 has joined #archiveteam-bs
19:55 πŸ”— lennier1 has quit IRC (Ping timeout: 260 seconds)
19:55 πŸ”— lennier2 is now known as lennier1
20:00 πŸ”— Nikchemny has joined #archiveteam-bs
20:02 πŸ”— Nikchemny katocala: Please add links from articles from this category https://ru.wikipedia.org/wiki/ΠšΠ°Ρ‚Π΅Π³ΠΎΡ€ΠΈΡ:ΠžΡ€Π³Π°Π½ΠΈΠ·Π°Ρ†ΠΈΠΈ_Π²_России,_ΡΠ°ΠΌΠΎΡΡ‚ΠΎΡΡ‚Π΅Π»ΡŒΠ½ΠΎ_ΠΏΡ€ΠΈΡΡƒΠΆΠ΄Π°ΡŽΡ‰ΠΈΠ΅_ΡƒΡ‡Ρ‘Π½Ρ‹Π΅_стСпСни to https://www.archiveteam.org/index.php?title=ArchiveBot/Educational_institutions/list#Russia
20:03 πŸ”— Nikchemny And this https://ru.wikipedia.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D0%B8%D1%8F:%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D0%B9%D1%81%D0%BA%D0%B8%D0%B5_%D0%B8%D0%BD%D1%81%D1%82%D0%B8%D1%82%D1%83%D1%82%D1%8B_%D0%B8%D1%81%D0%BA%D1%83%D1%81%D1%81%D1%82%D0%B2%D0%B0_%D0%B8_%D0%BA%D1%83%D0%BB%D1%8C%D1%82%D1%83%D1%80%D1%8B
20:11 πŸ”— schbirid has quit IRC (Remote host closed the connection)
20:15 πŸ”— Docti has joined #archiveteam-bs
20:20 πŸ”— Docti has quit IRC (Ping timeout: 252 seconds)
20:51 πŸ”— HP_Archiv has quit IRC (Quit: Leaving)
20:52 πŸ”— paul2520 I saw this petition being circulated, with someone who has inside connections suggesting there have been layoffs and MNopedia.org is at risk. It's a fabulous, mostly Creative-Commons resource used on Wikipedia (some articles are heavily based on MNopedia articles).
20:52 πŸ”— paul2520 I know at least some of the content is in the Wayback Machine... but can we get a crawl?
20:53 πŸ”— paul2520 I'd be happy to run one myself (my bandwidth can support it), but it doesn't seem to be MediaWiki
20:53 πŸ”— paul2520 oh sorry, petition link: https://docs.google.com/forms/d/e/1FAIpQLSdeqbOJ9-tmdrwXXfvbgR5BZASPh5jqZVkuyj5S-ox0VADQCQ/viewform
20:54 πŸ”— paul2520 ...in the meanwhile, I'll run a wget with mirror options today or overnight.
21:14 πŸ”— Dallas has joined #archiveteam-bs
21:16 πŸ”— Nikchemny paul2520: Why not just ask for saving https://www.mnopedia.org/ on AB channel?
21:17 πŸ”— Nikchemny Or it's too big?
21:18 πŸ”— paul2520 Nikchemny:ah, I always ask in the wrong #archive* channel
21:18 πŸ”— Nikchemny #archivebot ?
21:18 πŸ”— paul2520 thanks.
21:18 πŸ”— paul2520 done
21:18 πŸ”— paul2520 it's probably relatively small
21:20 πŸ”— Nikchemny Btw, it looks like people on AB are very busy. I think (if they won't save it) you can ask for it again.
21:20 πŸ”— Nikchemny paul2520: Btw, it is on mediawiki or not?
21:21 πŸ”— paul2520 I don't think so
21:21 πŸ”— paul2520 any ETA, like tomorrow? or more like give it a week and see if things are quieter/slower?
21:21 πŸ”— paul2520 (or do you think asking at a weird time might work?)
21:23 πŸ”— Nikchemny I think that your question could be sink in all this text. Sorry, I'm not at AB-channel, so I can't see what's happening there.
21:26 πŸ”— Nikchemny paul2520
21:37 πŸ”— jshoard has quit IRC (Leaving)
21:44 πŸ”— VerifiedJ has quit IRC (Quit: Leaving)
21:50 πŸ”— Nikchemny has quit IRC (Quit: Page closed)
22:16 πŸ”— Dallas has quit IRC (Quit: Dallas)
22:17 πŸ”— Dallas has joined #archiveteam-bs
22:17 πŸ”— HP_Archiv has joined #archiveteam-bs
22:30 πŸ”— Jake has quit IRC (Read error: Connection reset by peer)
22:30 πŸ”— Jake5 has joined #archiveteam-bs
22:30 πŸ”— Jake5 is now known as Jake
22:48 πŸ”— Dallas has quit IRC (Quit: Dallas)
22:52 πŸ”— Dallas has joined #archiveteam-bs
23:08 πŸ”— Dallas has quit IRC (Quit: Dallas)
23:09 πŸ”— Dallas has joined #archiveteam-bs
23:10 πŸ”— Arcorann has joined #archiveteam-bs
23:53 πŸ”— BlueMax has joined #archiveteam-bs
23:55 πŸ”— JAA The Doctissimo SexualitΓ© finished this afternoon. I'm considering grabbing all the other forums as well. It's fast and not too big, so why not?
23:56 πŸ”— JAA Not too big in terms of WARC size, that is. The 54 million posts from SexualitΓ© were only 60-odd GiB.
23:57 πŸ”— paul2520 nice

irclogger-viewer