#archiveteam-bs 2020-03-17,Tue

↑back Search

Time Nickname Message
00:03 🔗 atphoenix has quit IRC (Read error: Connection reset by peer)
00:06 🔗 atphoenix has joined #archiveteam-bs
00:44 🔗 atphoenix has quit IRC (Read error: Connection reset by peer)
00:45 🔗 atphoenix has joined #archiveteam-bs
01:19 🔗 d5f4a3622 has quit IRC (Read error: Connection reset by peer)
01:21 🔗 d5f4a3622 has joined #archiveteam-bs
02:09 🔗 RichardG_ is now known as RichardG
02:15 🔗 HP_Archiv has joined #archiveteam-bs
02:22 🔗 OrIdow6 has quit IRC (Ping timeout: 276 seconds)
02:28 🔗 OrIdow6 has joined #archiveteam-bs
03:05 🔗 chr1sm has quit IRC (Quit: Connection closed for inactivity)
04:12 🔗 qw3rty__ has joined #archiveteam-bs
04:16 🔗 kahuna has quit IRC (Read error: Connection reset by peer)
04:20 🔗 qw3rty_ has quit IRC (Read error: Operation timed out)
04:34 🔗 d5f4a3622 has quit IRC (https://i.imgur.com/xacQ09F.mp4)
05:25 🔗 Ryz has quit IRC (Remote host closed the connection)
05:26 🔗 kiska18 has joined #archiveteam-bs
05:26 🔗 fuzzy8021 has quit IRC (Read error: Operation timed out)
05:27 🔗 Ryz has joined #archiveteam-bs
05:31 🔗 fuzzy8021 has joined #archiveteam-bs
06:50 🔗 HP_Archiv has quit IRC (Ping timeout: 610 seconds)
07:01 🔗 HP_Archiv has joined #archiveteam-bs
07:23 🔗 fuzzy8021 has quit IRC (Read error: Connection reset by peer)
07:23 🔗 fuzzy8021 has joined #archiveteam-bs
09:02 🔗 icedice2 has joined #archiveteam-bs
09:08 🔗 icedice has quit IRC (Read error: Operation timed out)
10:25 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
11:18 🔗 icedice2 has quit IRC (Quit: Leaving)
12:01 🔗 HP_Archiv has quit IRC (Quit: Leaving)
13:54 🔗 kahuna has joined #archiveteam-bs
14:21 🔗 alex73 Is there any way to archive facebook page/user feed ? I don't see anything helpful on the https://www.archiveteam.org/index.php?title=Facebook
14:21 🔗 JAA alex73: Yes, snscrape and ArchiveBot/chromebot, most easily used through socialbot in #archivebot.
14:59 🔗 alex73 JAA: and where is result ? I see command "snscrape facebook-user ellenmilliongraphics" in #archivebot, but don't see anything on the https://web.archive.org/web/20200222233121/https://www.facebook.com/ellenmilliongraphics/
15:08 🔗 JAA alex73: Hmm, yeah, looks like the pipeline that ran on was banned from Facebook. The posts are there though, e.g. https://web.archive.org/web/20200222000546/https://www.facebook.com/ellenmilliongraphics/photos/1003312429705331/
15:10 🔗 JAA Actually no, the profile page was also archived: https://web.archive.org/web/20200221234402/https://www.facebook.com/ellenmilliongraphics/
15:10 🔗 JAA The snapshot you linked was not done by ArchiveBot.
15:21 🔗 alex73 JAA: Do you mean snscrape archived only some posts, not a full posts feed ? There is no https://web.archive.org/web/*/https://www.facebook.com/ellenmilliongraphics/posts/ page.
15:23 🔗 JAA alex73: First, terminology: snscrape doesn't archive anything, it just scrapes the feed; the archival happens by running snscrape to discover the URLs for each post/photo/etc. and then feeding that into an actual archiving tool.
15:24 🔗 JAA And what was archived is the profile page plus the page for each entry on the feed. Scrolling doesn't work on the WBM, comments won't load, and the page you mentioned was never retrieved either.
15:26 🔗 JAA Try https://web.archive.org/web/*/https://www.facebook.com/ellenmilliongraphics/* to discover what was archived.
15:28 🔗 mtntmnky has quit IRC (Remote host closed the connection)
15:28 🔗 mtntmnky has joined #archiveteam-bs
15:37 🔗 alex73 Hm. Understood. In this case, I can't see list of entries like in facebooks. Only by parsing https://web.archive.org/web/*/https://www.facebook.com/ellenmilliongraphics/* via some tool. Do you plan to fix/implement it somehow ?
15:37 🔗 JAA Due to how Facebook works and sucks, it's virtually impossible to implement that.
15:43 🔗 alex73 Does snscrape use chrome for load facebook pages ? In this case, it could write final DOM as html.
15:45 🔗 JAA No, it's a Python tool that emulates the scrolling.
15:45 🔗 JAA I.e. constructs the XHR that are needed to load more content etc.
15:46 🔗 JAA I never made any attempt to actually reverse-engineer the scrolling code entirely. Facebook for example includes a variety of extra parameters in the scrolling URL, including a hash of the list of all JS "modules" that are active. That's essentially impossible to do without actually executing the JS.
15:47 🔗 JAA While it would be possible to archive the pagination as snscrape retrieves it, that would be useless for the scrolling in the WBM.
15:50 🔗 alex73 Yes, I understand. My idea was to archive DOM after real Chrome will retrieve it. It could be something like Selenium+Chrome, or some other way when Chrome will act as real browser, but will be controlled by our script.
15:52 🔗 alex73 Or, some converter to RSS-like feed with own rendering.
15:54 🔗 alex73 Okay, I understood that posts can be archived now, but if we want to have some facebook-like list of enrties, it should be implemented. In case if it required by somebody except me.
16:09 🔗 JAA chromebot does that.
16:10 🔗 JAA But it's not in the Wayback Machine because only original data goes in there, not something derived like this.
16:15 🔗 alex73 Does anybody need such functionality ? If yes, it could be some 'pseudo-page', like https://web.archive.org/web/*/https://www.facebook.com/ellenmilliongraphics/posts-list/, or, if WM should contain only original data, it could be some site like http://facebook-mirror.archiveteam.org/ellenmilliongraphics/posts/, that can be archived also.
16:32 🔗 JAA If someone wants to do it, feel free to set up the latter.
17:24 🔗 godane Vogue Italia Archive is free : https://www.instagram.com/p/B91DUriK9qS/
17:52 🔗 asdf0101 has quit IRC (The Lounge - https://thelounge.chat)
17:53 🔗 asdf0101 has joined #archiveteam-bs
17:59 🔗 anarchat has quit IRC (Read error: Connection reset by peer)
18:00 🔗 anarcat has joined #archiveteam-bs
18:04 🔗 alex73 Could anybody please to run archiving of some sites for me ? https://lit-bel.org/, http://dziejaslou.by/, http://www.baradulin.by/.
18:21 🔗 Ryz Hello alex73, what are the reasons for archiving those 3 websites?
18:55 🔗 alex73 Ryz: some important belarusian literature sites
19:02 🔗 Ryz I'll see what I can do alex73, problem is we're in the middle of coronavirus stuff to archive~
19:03 🔗 alex73 thank you
19:05 🔗 Ryz When archiving http://dziejaslou.by/ - AB won't be able to grab the Issuu content; I'm not sure if it'll grab the embeds properly
19:08 🔗 Ryz alex73 ^
19:31 🔗 Dallas has joined #archiveteam-bs
19:51 🔗 alex73 Ryz: it's okay. Issue content is not important part
20:44 🔗 d5f4a3622 has joined #archiveteam-bs
21:36 🔗 schbirid has joined #archiveteam-bs
21:51 🔗 dxrt has quit IRC (Ping timeout: 276 seconds)
21:59 🔗 Datechnom has quit IRC (Read error: Operation timed out)
22:01 🔗 dxrt has joined #archiveteam-bs
22:02 🔗 Iglooop1 sets mode: +o dxrt
22:38 🔗 dxrt has quit IRC (Ping timeout: 276 seconds)
22:41 🔗 dxrt has joined #archiveteam-bs
22:41 🔗 svchfoo1 sets mode: +o dxrt
22:52 🔗 Datechnom has joined #archiveteam-bs
23:00 🔗 BlueMax has joined #archiveteam-bs

irclogger-viewer